ChatGPT barely passed U.S. Medical exam — expect AI doctors soon
A recent study by AnsibleHealth AI researchers tested OpenAI's ChatGPT on the United States Medical Licensing Exam (USMLE), a standardized series of three exams required for U.S. doctors vying for a medical license. The AI chatbot technically passed but with average scores, barely making it through. The achievement may be lackluster, but it still marks a landmark accomplishment for the AI industry. The system passed without any specialized input from human trainers, which is commendable.
ChatGPT managed to score between 52.4% and 75% across all three levels of the USMLE exam, which is on par with the passing threshold of 60% for the exam. The researchers involved in the study believe that this is the first time AI was able to perform at or near the passing threshold for the notoriously difficult exam. Despite the average test scores, ChatGPT was highly commended for its ability to create "new, non-obvious, and clinically valid insights" for 88.9% of its responses.
The USMLE exams test participants on basic science, clinical reasoning, medical management, and bioethics, making them suitable for testing ChatGPT's capabilities. Unlike previous generations of AI systems, ChatGPT relies on a large language model trained to predict a sequence of words based on the context of the words that came before it. Therefore, ChatGPT can generate sequences of words that were not previously seen by the algorithm, which could make coherent sense.
It's worth noting that ChatGPT managed to outperform PubMedGPT, another large language model AI trained exclusively on biomedical literature, despite its generalized training. The researchers optimistically believe ChatGPT's passable grade could hint towards a future where AI systems can play an assisting role in medical education, as a prelude to future integration into clinical decision-making.
ChatGPT has passed other exams recently, too, such as the MBA-level exam given to business students at the University of Pennsylvania and a law exam given to students at the Minnesota University Law School. In the law exam case, ChatGPT barely passed with a C+. The researchers believe that the big potential for the AI industry lies in the fact that a lawyer could use ChatGPT to produce a rough first draft and make their practice more effective.
While ChatGPT has shown impressive progress in reading comprehension and writing, it is abysmal at math. Researchers say the AI system only performs at a sixth-grade level when it comes to math. Additionally, the AI stumbles when asked basic arithmetic problems in natural language format. Its large language model training causes ChatGPT to provide answers confidently, but they could be completely divorced from reality.