Python, one of the reasons ChatGPT is pretty good at maths

One key reason ChatGPT is good at handling mathematical tasks and data analysis is its integration with Python, especially in its premium versions. Python not only powers the solutions provided but also allows users to see how those solutions are generated. In this post, I will show how you can take advantage of knowing this to make your ChatGPT usage more reliable and powerful.

If you’re curious about the process behind the analysis, you can enable the option to “Always show code when using data analyst” in your ChatGPT settings. This feature, as illustrated in the image below, allows you to monitor the Python code being executed, offering a deeper understanding of the operations involved.

The settings

Let me share an example to demonstrate how this works. I recently uploaded a dataset containing 42,000 reviews of Disneyland Parks from Kaggle.com and asked ChatGPT to categorize the responses into three groups based on their rating scores.

Kaggle.com data
The task for ChatGPT

Before presenting any results, ChatGPT first displays the Python code it uses. The process begins by importing the Pandas library, which is essential for handling the Excel file I uploaded. Then, ChatGPT loads the Disney data into a structured format and uses .head() to display the first few rows of the dataset.

Python Code
Python Code

Next, ChatGPT creates a new column named “Sentiment” and assigns each review a sentiment category (positive, negative, or neutral).

Python Code

After processing the data, ChatGPT summarises the number of reviews in each sentiment category. If I had not enabled the “show code” option, I would have only seen the final summary without insight into how it was derived.

The results

For another example, I asked ChatGPT to calculate the mean ratings for these three sentiment groups. Thanks to the power of Python, the Python code behind this operation is straightforward. The results make sense: the mean rating for positive reviews falls between 4 and 5, for negative reviews between 1 and 2, and for neutral reviews, it’s exactly 3.

Python Code

Observing the Python code used in these analyses can provide valuable insights into the process, allowing you to better understand and trust the results.

In this post, the examples shown involve simple operations. However, if we had chosen to cluster the open-ended text responses, we would have seen Python code importing more advanced libraries like TfidfVectorizer, KMeans, and PCA from sklearn, as well as plt from matplotlib. Additionally, if I requested an alternative method (such as Latent Class Analysis instead of K-Means) ChatGPT would adjust and load the appropriate libraries.

This capability ensures that ChatGPT will likely provide valid and reliable results as long as it correctly interprets your instructions and data.