What makes PhD-level AI (o1) different from GPT?

What Makes PhD-Level AI (o1) Different

While OpenAI hasn’t officially confirmed every technique, here are some that likely contribute to this huge performance improvement:

Chain of Thought (CoT): Rather than providing an answer immediately, the model is guided to think step-by-step natively, emulating a human-like reasoning process.
Reflection: The model is likely capable of evaluating its own responses, reflecting on areas for improvement or errors before delivering the final output.
Monte Carlo Tree Search (MCTS): MCTS is a decision-making tool that uses a tree structure to explore different solution paths. It simulates various options, branching out like a tree, and selects the most promising path based on past successes. This approach is especially effective for tasks requiring strategic thinking, like games or planning.
Source: Wikipedia
Reinforcement Learning (RL): The insights from the “thinking” processes are combined with reinforcement signals. For example, if the model’s thinking highlights a logical error and a human feedback signal confirms it, the model receives a stronger learning signal, reinforcing the need to avoid similar errors in the future.

So, the new species (AI models) that we create are learning how to “think”.

System 2 Thinking: A New Paradigm for AI

Previous models often provided quick answers without “thinking”. While a “think step by step” prompt was effective, as was pointed out in the beginner-level tips for using AI in medical education, it was a workaround rather than a solution.

PhD-level AI models now embody System 2 Thinking—a concept from psychology referring to slower, deliberate, logical processing, similar to how humans solve complex problems. This native “thinking” capability means the model can better evaluate context, plan answers, and “reason” deeply. It’s no longer just parroting back text; it’s actually solving problems more like a human.

The Trade-Offs: Speed and Cost

There is, however, a price for this advanced reasoning power. These System 2 models are significantly slower and roughly six times more expensive to run than GPT-4o. As such, it’s essential to use them strategically. While they excel in high-stakes tasks requiring nuanced reasoning, they’re overkill for simpler, everyday queries.

However, thanks to Moore's Law—the trend where computing power doubles about every two years while costs decrease (and some even predict it may increase at a squared rate)—this expense is likely to drop over time. This means advanced models will become faster, cheaper, and more accessible, eventually allowing broader use across various applications.

One day, we eventually will reach a point that human reasoning for any task is no longer feasible. It is where artificial general intelligence (AGI) era begins, which we should be prepared for because of the really important consequences it can lead to.

Yavuz Selim Kıyak, MD, PhD (aka MedEdFlamingo)

Follow the flamingo on X (Twitter) at @MedEdFlamingo for daily content.
Subscribe to the flamingo’s YouTube channel.
LinkedIn is another option to follow.
Who is the flamingo?

Related #MedEd reading:

Kıyak, Y. S., & Emekli, E. (2024). ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review. Postgraduate medical journal, qgae065. https://academic.oup.com/pmj/advance-article/doi/10.1093/postmj/qgae065/7688383

Kıyak, Y. S., & Kononowicz, A. A. (2024). Case-based MCQ generator: A custom ChatGPT based on published prompts in the literature for automatic item generation. Medical Teacher, 1-3. https://www.tandfonline.com/doi/full/10.1080/0142159X.2024.2314723

MedEdFlamingo's Newsletter

What a medical educator should know about the new AI model, o1?

Discussion about this post