I just wrapped up a chapter on AI in data analysis for my upcoming book, AI in Academic Research, which is set to be published in the next few weeks. Since this is such a crucial topic for researchers, I wanted to share some insights with you here.
AI has tremendously facilitated the process of data analysis. If you’ve ever spent hours—or even days—cleaning, structuring, and making sense of your data, you know how tedious this process can be. Luckily, AI can now do this job much more efficiently and quickly.
There are now powerful AI tools designed specifically for data analysis, and in my book, I covered a wide variety of them showing exactly how they work and what you can do with each one of them.
I also talked about AI chatbots and the different ways you can leverage their power in doing data analysis. This post summarizes some of the insights regarding the use of AI in data analysis and more specifically AI chatbots.
Using AI for Data Analysis
If you ask me about my favourite AI tools I would not hesitate to answer AI chatbots and more specifically ChatGPT4 and Claude Sonnet (both of which are available for a fee of 20$ per month).
These chatbots can literally make your data analysis way more easier. They can help you clean and structure your data, generate summaries, create visualizations like charts and graphs, and even identify trends and patterns.
But before we dive into how to use them, let’s address something critical—ethics and data security.
A Word of Caution: Data Privacy and Ethics
It goes without saying that to get the most from these AI chatbots you will need to upload your data to them.
However, before you upload any research data to an AI chatbot (or any AI tool for that matter), you need to think about privacy. Some platforms use the data you provide to improve their models, which could compromise sensitive information.
Luckily, both ChatGPT-4 and Claude Sonnet allow you to opt out of data training. In Claude Sonnet , this is the default setting. For ChatGPT-4, you need to manually disable it by:
- Logging into your ChatGPT account
- Clicking your profile picture (top right)
- Going to Settings → Data Controls
- Turning off “Improve the model for everyone”
Even with these safeguards in place, you should always take extra precautions. Never upload identifiable or sensitive data—anonymize it first. And always use AI tools from reputable providers with strong privacy policies (I cover a carefully curated list of these in my book).
Getting Started: Uploading and Analyzing Your Data
To use an AI chatbot for data analysis, the first thing you need is a CSV file (well preferably, you can use other type formats as well) with your data. Both ChatGPT-4 and Claude Sonnet allow you to upload files directly.
If you’ve never used AI chatbots for data analysis before, I highly recommend starting with a sample dataset to practice before working on your actual research data. Platforms like Kaggle and Data.gov offer publicly available datasets in various formats—just be sure to check their usage policies before downloading anything.
An even simpler approach is to generate hypothetical data using Claude or ChatGPT. You can do this with a prompt like:
“Generate a sample dataset with 200 rows and five columns: ‘Name’ (random names), ‘Age’ (18-65), ‘Occupation’ (various jobs), ‘Monthly Income’ ($2,000-$10,000), and ‘Education Level’ (High School, Bachelor’s, Master’s, PhD). Format it as a CSV table.”
This allows you to experiment with AI’s data analysis capabilities in a risk-free way, helping you build the skills and confidence to apply these tools to real research.
Practicing with hypothetical data also helps you understand the strengths and limitations of AI-driven analysis. You’ll get a sense of how well AI cleans and structures data, summarizes key insights, generates visualizations, and identifies trends so that when it comes time to analyze your actual research data, you’re fully prepared.
So, before diving in, take some time to experiment, explore, and refine your approach with sample data. It will make a huge difference in how effectively you leverage AI for your research.
Here’s a hypothetical dataset generated with ChatGPT-4o to illustrate the use cases below.
Once ready go ahead and upload your research data and start your analysis. Examples of analytic tasks you can do include:
1. Clean and Structure Your Data
Before diving into analysis, you need to clean up your data—because let’s be honest, raw data is often messy. Missing values, inconsistencies, formatting issues… all of these can throw off your results. The good news? AI can take a lot of that grunt work off your plate. It can quickly spot gaps, fix errors, and structure your dataset so it’s ready for analysis.
Here are examples of prompts to use to clean and structure you data:
- “I have uploaded a CSV file containing my research data. Can you check for missing values and suggest how to handle them?”
- “Can you standardize the date format in my dataset to YYYY-MM-DD?”
- “Please identify and correct any inconsistencies in the column [column name].”
- “Find duplicate entries in my dataset and suggest whether I should remove or merge them.”
- “Can you restructure this dataset so that it’s formatted properly for analysis?”
2. Generate Quick Summaries
Once you’ve cleaned and structured your data, it’s time to make sense of it. Instead of scrolling through endless rows and columns, let AI do the heavy lifting. You can ask it to provide a concise report with key insights from your data, summarize trends, and even highlight outliers that might be worth a closer look. This is definitely a great way to get AI to tell you what’s going on in your dataset without the headache of manual number-crunching.
Here are examples of prompts to use to generate quick summaries:
- “Summarize the key insights from my dataset in plain language.”
- “What are the top five most frequent values in the column [column name]?”
- “Can you calculate the average, median, and standard deviation for [column name]?”
- “Identify any significant outliers in my dataset and explain their possible impact.”
- “Give me a quick summary of the distribution of values in [column name].”
3. Create Visualizations
One of the best things about AI chatbots like ChatGPT and Claude is that they bring everything into one platform: you can clean, structure, analyze, and even visualize your data without switching tools. And it’s not just basic charts, these AI tools can generate interactive visualizations, meaning you can tweak them, refine them, and get exactly what you need.
You can generate a bar chart, line graph, scatter plot, heat map, and many more. You can either explicitly instruct the chatbot to generate a specific chart or let it analyze your data and suggest the best visualization for the insights you’re looking for .
Here are examples of prompts to use to create different visualizations:
- “Create a bar chart showing the distribution of [column name].”
- “Generate a line graph comparing [column 1] and [column 2] over time.”
- “Can you visualize the correlation between [column A] and [column B] using a scatter plot?”
- “Create a pie chart that shows the percentage breakdown of categories in [column name].”
- “Generate a heatmap to display the correlation between all numerical columns in my dataset.”
Here is an example of a bar chart I generated using ChatGPT 4o based on the hypothetical data I shared above. The chart shows the distribution of product categories in the dataset. It gives a clear picture of how frequently each category (Electronics, Clothing, and Groceries) appears in the data.
4. Identify Trends and Correlations
Another key way AI can assist you is by spotting patterns and relationships in your data; things you might not notice just by looking at the numbers. If you need to analyze how two variables interact, uncover hidden correlations, or track emerging trends, AI can run the analysis in seconds.
Here are examples of prompts to use in this regard:
- “Analyze my dataset and identify any notable trends over time.”
- “Are there any strong correlations between variables in my dataset? If so, which ones?”
- “Can you run a regression analysis to determine if [column X] predicts [column Y]?”
- “Find seasonal trends in my time-series data and summarize the key insights.”
- “Detect any patterns in customer behavior based on [column name].”
Here is an example of a correlation heatmap I generated using ChatGPT-4o based on the hypothetical data I shared above. The heatmap visualizes the relationships between the numerical variables in the dataset, such as Sales, Customers, and Revenue. It helps identify strong positive or negative correlations, showing how these factors interact.
What AI Can’t Do for You
I get it, I feel the same way. We’re incredibly lucky to live in a time where AI can take over the tedious, time-consuming parts of research thus freeing us up for more meaningful, creative work. But let’s not get carried away, AI is not the all-powerful tool some make it out to be. It has serious limitations, and I go into these in depth in my book AI in Academic Research.
For starters, keep in mind that AI doesn’t truly understand language the way we do, no matter how polished and immaculate its outputs. To an AI model, language isn’t about meaning, it’s just a sequence of tokens, a statistical pattern of words. It can generate responses that sound intelligent, but it doesn’t think or reason like a human (and sometimes it can even hallucinate and generate inaccuracies). That’s why:
- It can misinterpret context: AI lacks real-world understanding, so it may draw conclusions that don’t actually make sense.
- It can’t verify facts: AI doesn’t “know” things; it predicts words based on patterns in its training data. That means it can confidently give you incorrect or misleading information.
- It won’t replace your critical thinking: AI can analyze data, but it can’t decide what findings are meaningful or how they fit within your research. That’s on you.
- It struggles with nuance and judgment: If your research involves complex interpretation or ethical considerations, AI won’t be able to navigate those subtleties.
So, yes AI is a powerful assistant but it’s not a replacement for your expertise. Leverage its power to speed up repetitive tasks, but always double-check its outputs, apply your own analysis, and trust your own judgment. At the end of the day, research is about insight, interpretation, and pushing the boundaries of knowledge. That part? AI can’t do for you.
Final Thoughts
That was a quick overview of how you can put AI chatbots to work for your data analysis. In my book, AI in Academic Research, I dive deeper into these strategies, share more practical tips, and introduce several other AI tools that can support not just data analysis, but the entire research process—from reviewing literature and writing papers to visualizing data and beyond.
Stay tuned for the book release! If you’re an academic researcher, this is a resource you won’t want to miss.