Doctors 1, Diagnosis Apps 0

In the battle of who can correctly diagnose disease better, it seems doctors still have online apps beat. A group of self-diagnosing apps got many less diagnoses correct than doctors in new research published today in JAMA Internal Medicine.

About one in three U.S. adults have visited an online site to check their symptoms, according to 2013 research by the Benton Foundation. Apps and websites “are very commonly used by the average person on the street,” says Ateev Mehrotra, a physician at Harvard Medical School.

In 2015 research published in The BMJ (formerly the British Medical Journal), Mehrotra and his team fed 23 symptom-checkers with symptoms from 45 standard patient cases inlcuding those later diagnosed with asthma and malaria. The team found that the checkers listed the correct diagnosis about a third of the time.

In the new experiment, the researchers compared the checkers’ accuracy to the accuracy of 234 medical physicians, fellows, and residents. For each case, at least 20 doctors provided an online platform with their top three diagnoses.

The physicians listed the correct diagnosis first about 72 percent of the time, compared to the apps, which listed the correct diagnosis first 34 percent of the time.

“Physicians are by no means perfect,” Mehrotra says. They can still get diagnoses wrong about 10 to 15 percent of the time.

However, he isn’t surprised by the performance of the self-diagnosing tools. “I wasn’t expecting them to surpass,” he says.

Others raise questions.

“There are major methodologic problems with the approach that was used,”Mark Graber, a fellow at Research Triangle Institute International who researches health care quality and diagnostics and was not involved in the study, writes in an email. “The physicians in the study had essentially as much time as they wanted to evaluate and research each case; this is hardly comparable to actual practice. Its likely that the specific methods used overestimated the physician performance and underestimated ‘symptom checker’ performance.”

Mehrotra says, “We tried to make it as realistic as possible but I think those criticisms are fair.”

He says timing isn’t an issue for computer programs, but people should be careful interpreting the results for real life because the doctors were not making decisions under pressure. They were also not able to examine patients.

He also admits that there may have been some tweaks in the online sites since when the study was run and today, but he doesn’t believe that these minor site updates would have had a large effect on the outcome.

He believes it’s more realistic for apps to help doctors diagnose, instead of simply replacing them. That’s one goal of systems such as IBM’s Watson AI.

“I like the idea, we’re just not there yet,” he says.