Can AI match a human dietitian in a medicated weight-loss program?
A new study published in Healthcare has assessed whether advanced large language models (LLMs), specifically ChatGPT-4o and ChatGPT-4o1 preview, can match the quality of human dietitian responses to complex questions from patients in a real-world medicated digital weight-loss service (DWLS).
As asynchronous DWLSs grow in popularity, patients often wait for a human team member to respond to their queries. This study set out to explore whether LLMs could deliver high-quality dietary support in real time, across a set of questions drawn directly from patient–clinician interactions within the Eucalyptus DWLS.
Neither ChatGPT-4o nor ChatGPT-4o1 preview was statistically outperformed by human dietitians on any of the study's 10 questions, including on empathy and relatability.
Understanding the results
What did the study find?
Ten questions, five broad and five narrow, were developed by Eucalyptus dietitians based on common themes from real patient interactions. Responses from the two ChatGPT models and two human dietitians were scored by four independent assessors across four criteria: scientific correctness, comprehensibility, empathy/relatability, and actionability. All assessors were blinded to who gave each response.
The study revealed that:
- Neither LLM was statistically outperformed by human dietitians on any of the 10 questions, or on any of the four individual assessment criteria. No median question score from any respondent fell below seven out of ten.
- On the empathy/relatability criterion, widely cited as a key limitation of AI in healthcare, neither ChatGPT model was outscored by the human dietitians. This challenges the prevailing assumption that LLMs are meaningfully inferior to humans in conveying empathy across complex patient queries.
- ChatGPT-4o achieved statistically higher scores than both human dietitians on the aggregated actionability criterion (p < 0.01, large effect size), suggesting the model may be more consistent in providing specific, practical guidance, possibly because LLMs follow instructions without the intuitive omissions that human responders sometimes make.
- Where statistically significant differences did emerge between individual coaches on specific questions, they were as likely to reflect variation between the two human dietitians as between humans and AI.
What this means for digital obesity care
This study is the first to assess ChatGPT-4o's capabilities in a medicated DWLS context, and the results suggest advanced LLMs have the potential to play a meaningful supporting role alongside human clinicians in these services.
If LLMs can reliably assist with lifestyle coaching at this level of complexity, the implications are significant: patients could receive real-time responses to their queries at any time of day, workforce pressures on dietitian teams could be reduced, and the cost of DWLSs could fall, improving access for lower socioeconomic groups who are overrepresented in obesity statistics.
It is important to note that this study does not suggest LLMs should replace dietitians. Questions of patient privacy, algorithmic bias, and clinical safety mean that human oversight remains essential. These findings instead point toward a hybrid model, where AI supports, rather than supplants, the human care team.
