ChatGPT has made waves since its launch in late 2022. The potential of the underlying technology has initiated a rat race amongst Big Tech companies while also raising existential crises. While these developments are generating interesting, if not important, discussions around the broader field of artificial intelligence (AI), it is equally important to take a step back and consider what scientific studies have to say about ChatGPT in healthcare.
With that in mind, we set out to analyse the first studies analysing ChatGPT in the medical field. These articles, published between November 2022 and early March 2023, cover a range of topics from the technology’s use in medical education to assisting in radiologic decision-making. These publications are summarised in the table below.
|ChatGPT: friend or foe?||The Lancet||The article discusses the benefits and ethical concerns of ChatGPT, and suggests the need for more oversight and investment in AI output detectors to address potential errors and biases in its output.|
|Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios||Springer Link||The article examines the possibility of utilizing ChatGPT within the healthcare industry and accentuates its potential uses and constraints in clinical settings, scientific output, improper use in medicine and research, and deliberation of public health issues. It stresses the significance of educating individuals about the correct application of AI-driven language models.|
|AI chatbots not yet ready for clinical use||Medrxiv||This article compares the performance of two generative AI models, ChatGPT and Foresight NLP, in forecasting relevant diagnoses based on clinical vignettes, while discussing important considerations and limitations of transformer-based chatbots for clinical use.|
|The potential impact of ChatGPT in clinical and translational medicine||PMC||ChatGPT has great potential in assisting basic research and accelerating the technological transformation of clinical and translational medicine, such as in drug discovery, disease prediction, diagnosis, and assessment of therapeutic targets, but it is important to use it as a tool to support, rather than replace, healthcare professionals in their decision-making process.|
|Does ChatGPT Provide Appropriate and Equitable Medical Advice?: A Vignette-Based, Clinical Evaluation Across Care Contexts||Medrxiv||The study evaluated ChatGPT’s ability to provide appropriate and equitable medical advice by presenting it with 96 advice-seeking vignettes and found that while it consistently provided background information, it did not reliably offer appropriate and personalized medical advice.|
|Can artificial intelligence help for scientific writing?||BMC||The article discusses the potential use of OpenAI’s ChatGPT chatbot in scientific writing, such as assisting researchers in organizing material, generating initial drafts, and proofreading, but cautions that it should not replace human judgment and that ethical concerns, such as plagiarism and accessibility, need to be addressed through regulation.|
|Comparing human and artificial intelligence in writing for health journals: an exploratory study||Medrxiv||The study aims to evaluate the quality of scientific writing produced by ChatGPT compared to human authors and highlights the need for solutions to manage potential misuse and hazards of the technology.|
|Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow||Medrxiv||The study presents the potential use of artificial intelligence tools like ChatGPT in the clinical workflow and shows that it achieves an average performance of 71.8% across all vignettes and question types, although it has limitations inherent to the artificial intelligence model itself that need to be considered.|
|Assessing the Value of ChatGPT for Clinical Decision Support Optimization||Medrxiv||The AI-generated suggestions were considered to be original and had a high level of clarity and relevance, with moderate usefulness, low acceptance, bias, inversion, and redundancy.|
|The future of medical education and research: Is ChatGPT a blessing or blight in disguise?||TandFOnline||ChatGPT in scientific research raises ethical concerns due to accountability issues, lack of critical thinking, and inaccuracy of content, and experts suggest it should only be used as an add-on to constructive writing and reviewing material at the moment.|
|An Explorative Assessment of ChatGPT as an Aid in Medical Education: Use it with Caution||Medrxiv||ChatGPT can be used as a tool to assist educators, but it is not currently a dependable source of information for medical students and educators.|
|ChatGPT- versus human-generated answers to frequently asked questions about diabetes: a Turing test-inspired survey among employees of a Danish diabetes center||Medrxiv||The study found that participants were able to distinguish between ChatGPT-generated answers and human-written answers somewhat better than flipping a coin, but participants who had used ChatGPT before could reveal 10% more answers correctly than those who had not, suggesting that the structure of the text provided an important clue.|
|How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment||Jmir Publications||The paper demonstrates the ability of ChatGPT to accurately answer medical questions and provide logical explanations, making it a potentially useful tool for medical education and small group discussion.|
|Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma||Medrxiv||The study analyzed ChatGPT’s responses on the management of cirrhosis and HCC and found limitations but also potential as an informational tool for patients and physicians to improve outcomes.|
|Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making||Medrxiv||This study demonstrates that ChatGPT, a large language model, can assist in radiologic decision-making at the point of care, achieving moderate to high accuracy in determining appropriate imaging steps for breast cancer screening and breast pain evaluation, although limitations of the model, such as misalignment and “hallucinations”, must be considered when designing clinically-oriented prompts for use with large language models.|
|Analysis of large-language model versus human performance for genetics questions||Medrxiv||The use of language models like ChatGPT in clinical genetics has the potential to provide rapid and accurate responses to genetic-related questions, aid healthcare professionals in diagnosis and treatment, and make genetic information more widely available to a non-expert audience.|
|Putting ChatGPT’s Medical Advice to the (Turing) Test||Medrxiv||The article discusses a study that found AI-based chatbots to be weakly distinguishable from human providers in terms of responses and mildly positively trusted by respondents, with potential for use in healthcare administrative tasks and chronic disease management.|
|Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of its Successes and Shortcomings||Medrxiv||The article discusses the performance of the ChatGPT language model in responding to questions on the OKAP exam in ophthalmology, finding that it achieved an accuracy comparable to that of a first-year resident, although it struggled with highly specialized topics, and discussing the limitations and potential of the model for clinical use in ophthalmology.|
|Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study||Jeehp||The study found that ChatGPT’s performance in a parasitology examination was lower than that of medical students in Korea, and its correct answer rate was not related to the items’ knowledge level.|
|How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment||Medrxiv||The study shows that ChatGPT can be used as an educational tool as it has an amount of medical knowledge comparable to a third-year medical student and provides personalized and interpretable responses, creating an on-demand interactive learning environment for students to improve their information retention and learning experience.|
|Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models||Medrxiv||The study shows that ChatGPT is able to perform complex medical tasks with rising accuracy and has the potential to generate novel insights that can assist human learners in medical education.|
This article provides an overview of the topics and issues that those papers cover. In doing so, we hope to clarify the current scientific stance on ChatGPT’s potential in healthcare.
Three areas of focus for ChatGPT in medicine
Based on our search conducted in early March, we identified 21 studies that assessed ChatGPT on a medical front. From these, three main areas of focus for ChatGPT emerged, namely clinical use, answering medical questions and assisting in education, and scientific writing and research. We won’t be going into the specifics of each study but rather highlight the broad strokes of these focus in the following sections.
1. Clinical uses of ChatGPT
A number of potential clinical use cases for ChatGPT have been proposed and tested by researchers. These include clinical data management, recruitment for clinical trials as well as assistance in clinical decision-making.
Studies have evaluated some of these potentials. For example, researchers at Harvard Medical School piloted ChatGPT’s utility in radiologic decision-making. They found the AI tool to determine, with moderate accuracy, adequate steps for patients requiring breast cancer screening. In another study, researchers found the generative AI model to display high accuracy in determining the clinical workflow of hypothetical patient cases.
However, researchers have also raised issues in considering ChatGPT for clinical uses. Such models are prone to biases embedded in the training dataset, usually from online sources. In addition, they can generate “hallucinations”, or output that is nonsensical to the provided prompt.
2. ChatGPT’s medical education aptitude
Given ChatGPT’s Q&A-styled interactions, such applications to train medical students and to inform patients seem to be a logical step. Researchers have tested the software’s aptitude in such situations.
By evaluating its performance on the United States Medical Licensing Examination, a study has found ChatGPT’s score to be comparable to that of a third-year medical student. The researchers suggest that it could be used as an on-demand interactive learning tool for students and group discussions. Others found that it could aid medical educators in drafting their course content and assessments.
In another case, researchers assessed ChatGPT’s responses to frequently asked questions regarding the management and care of patients with cirrhosis and hepatocellular carcinoma. They found the majority of responses to be accurate but inadequate, indicating that the tool can be supplementary to patient education in addition to the standard of care.
However, researchers also recommend expert oversight and caution due to the occurrence of incorrect information, which can be misinterpreted.
3. Scientific research with ChatGPT
Given its text generation prowess, academics have contemplated ChatGPT’s contribution to scientific writing. They found that its assistance can range from summarising data to writing a full draft of a paper.
Another group of researchers directly compared short scientific journal articles written by human authors and ChatGPT. Their tests revealed that while ChatGPT was more time-efficient, human authors displayed better performance on completeness, scientific content and credibility.
In response to the software’s uses in academic writing, academic journals have updated their policies on the use of AI-assisted tools in scientific writing, stating that such use should be declared and manual checks should be performed on AI-generated output.
With its reported lack of critical thinking, content inaccuracies and accountability issues, scientists suggest that ChatGPT can be better adopted as an add-on for reviewing and rephrasing text.
It’s worth bearing in mind that those studies are considering the current version of ChatGPT, which is based on the GPT-3.5 large language model. OpenAI has already launched, with limited access, the next-generation model, GPT-4. It reportedly performs 40% better than GPT-3.5 at producing factual responses, and we can expect future iterations to bring further improvements. Thus, some of the limitations listed here might be overcome while other means of caution might arise as technology advances.
Written by Dr. Bertalan Meskó & Dr. Pranavsingh Dhunnoo
The post ChatGPT In Healthcare: What Science Says appeared first on The Medical Futurist.