Yesterday, while exploring the literature on visual communication in research—a new field for me—I decided to test ChatGPT Deep Search. I was curious to see what it could discover, so I crafted a detailed prompt using ChatGPT 4.0 for this purpose.
For those of you who follow my blog, you might remember I wrote about ChatGPT Deep Search when it first became available to Plus users, highlighting several limitations for academic researchers.

With those caveats in mind, I decided to enlist ChatGPT Deep Search’s help—but with an experimental twist. I planned to use identical prompts to generate two separate reports at different times. I ran the first prompt yesterday and received a detailed report, which you can access through this link.
After 24 hours, I opened a fresh chat window, ran the same prompt again, and generated another detailed report, which you can check out [here].
This experiment came to me as I was thinking: what if two researchers, completely unaware of each other, were working on the same topic and both relied on ChatGPT Deep Search to assist with their literature review? Would they receive the same references? If so, would the tool generate nearly identical reports? Or would there be noticeable differences? These questions got me curious, so I decided to put it to the test.
Based on my previous experience with ChatGPT Deep Search—which I shared in an earlier post—I had a few hunches about how it might respond to my experiment. But I wasn’t entirely sure. The results, however, confirmed my doubts.
Let’s start with the sources. The first report took six minutes to generate and included 27 sources, while the second report—also generated in six minutes—had a slightly higher count of 32 sources.
While there were some differences in the sources used, what stood out was the uneven distribution of citations within the text. Even more interestingly, the most frequently referenced sources were almost identical in both reports (e.g., Rougier & Droettboom, 2014).
This aligns with something I mentioned in my first post about ChatGPT Deep Search—it only pulls from a limited pool of academic knowledge, specifically content that isn’t paywalled. This means it lacks access to many qualitative academic studies hidden behind paywalls.
To be fair, this isn’t a flaw of Deep Search itself but rather a reflection of how academic knowledge is locked behind financial barriers—a much bigger issue in our capitalist world (but that’s a discussion for another day).
Writing Style
The tone of both reports is distinctly academic, and their overall structure is strikingly similar. Take a look at these opening sentences from the introduction:
Report 1
“Visual elements such as graphs, charts, diagrams, and infographics play a crucial role in enhancing comprehension and retention of research findings. They allow complex data and concepts to be understood at a glance by presenting patterns and relationships that would be difficult to discern from text alone” (Divecha & Karande, 2023).
Report 2
“Visual elements like graphs, charts, and diagrams can dramatically improve how well your audience understands and remembers your research findings. Instead of wading through dense text, readers can grasp complex data ‘at a glance’ through visuals that highlight patterns or trends” (Bobek & Tversky, 2016).
While the phrasing is slightly different, the core idea and structure are nearly identical. But here’s the real issue: the sources are completely different. In Report 1, the claim is attributed to Divecha & Karande (2023), while in Report 2, it’s Bobek & Tversky (2016).
This raises an important concern—how is ChatGPT Deep Search selecting and attributing sources? Is it genuinely pulling references that best support the argument, or is it swapping citations while keeping the structure intact? This inconsistency in sourcing is something researchers relying on AI-generated literature reviews need to be cautious about.
I should note that both papers are factually relevant to the idea being expressed in that paragraph. However, what intrigued me was why ChatGPT Deep Search would assign different sources to nearly identical sentences. If the underlying concept remains the same, why swap citations instead of consistently referencing the same study?
This pattern extends beyond just the introduction—several sections of both reports follow a similar structure. What Deep Search seems to have done is paraphrase the content while maintaining the same overall organization, swapping in different references along the way.
Of course, this isn’t a formal study, and my findings should definitely be taken with a grain of salt. But they do raise some interesting questions about how AI tools like Deep Search retrieve and attribute academic sources—and how researchers should approach these outputs with a critical eye.
Now, here’s the downside of this comparison.
If two researchers are working on the same topic and both rely on Deep Search, they’ll likely end up with very similar reports. This reinforces something I’ve always said about generative AI: I personally don’t use these tools to generate factual or data-driven reports.
If you’re a researcher—and I know many of you here are—you need to do the reading yourself. You cannot write meaningfully about a topic if you’re not engaging with the material firsthand. No AI tool can replace that process. Even if Deep Search were capable of producing a flawless, well-referenced report, so what? You wouldn’t actually learn from it. You’d be like someone giving a lecture on a subject they barely understand themselves.
I understand this is just the first version of Deep Search, and OpenAI might refine it into something even more advanced—perhaps something that mimics the work of an experienced scholar.
But no matter how sophisticated it gets, it should never be seen as a shortcut for academic research. Research isn’t just about producing a paper; it’s about your intellectual growth, deep reflection, and engaging with ideas before contributing to the field. That part—you simply have to do yourself.
Let me be clear: if you approach AI with what I call the “shortcut mindset,” you’re already starting off on the wrong foot. No matter how “intelligent” AI becomes, it should always be a co-thinker, a partner in cognitive tasks—never a replacement.
Now, onto the good part!
If you’re diving into a new research topic, Deep Search can be a great jumpstart—a springboard to get you going. At the very least, it will surface a few solid research papers in its references, and from there, you can apply the snowballing method—checking the references within those papers to uncover even more relevant studies. The deeper you go, the more you’ll start recognizing which works are frequently cited, signaling the seminal research in the field.
I was going to say that Deep Search might also help structure your research, but honestly, ChatGPT 4.0 and 4.5 can already do that—probably faster and more effectively.
One final note
I know my writing sometimes comes across as if I’m against AI, but that couldn’t be further from the truth. I genuinely believe we are incredibly fortunate as researchers and academics to live in this era. AI allows us to push the boundaries of our cognitive abilities without the same linguistic constraints that once held us back. But the key is using it thoughtfully and responsibly—not as a crutch, but as a co-thinker that amplifies, rather than replaces, our intellectual efforts.