Google’s AMIE Tried Real Patients for the First Time: Here’s What Happened

Google’s AMIE, the conversational medical AI they’ve been testing with simulated patients and clinician actors, finally stepped into a real clinic. And I mean a real clinic—Beth Israel Deaconess Medical Center (BIDMC) in Boston, not a lab with actors playing sick.

The study, published as a preprint and detailed in a blog post by Mike Schaekermann and Alan Karthikesalingam, is exactly the kind of grounded evidence we need more of in medical AI. It’s prospective, IRB-approved, pre-registered, and single-center. No hype, just data.

The Setup: AI Takes a History Before the Doctor Walks In

AMIE was deployed to handle pre-visit clinical history taking for patients with new, non-emergency complaints—think a cough that won’t quit or a weird rash, not chest pain or seizures. Patients booked for either in-person or telehealth appointments were invited to chat with AMIE via a secure web link before their actual consultation.

Here’s the key safety net: every AI-patient interaction was overseen by a physician via a live video call with screen sharing. The “AI supervisor” had a predefined set of safety criteria and could jump in if the system went off the rails. That’s not just a nice-to-have; it’s mandatory when you’re dealing with real people who might have real emergencies the AI misses.

The system then generated a transcript and summary for the clinician, giving them a head start on the visit. Think of it as a very thorough, AI-powered intake form that also happens to be conversational.

What They Found (and What They Didn’t)

The study was a feasibility trial, not a randomized controlled trial. So don’t expect a headline like “AI beats doctors at diagnosis.” The goal was to see if AMIE could safely and acceptably gather information in a live clinical workflow.

From what I’ve read, the system performed reasonably well at history taking. Patients generally found the interaction acceptable, and clinicians found the summaries useful. But the real takeaway is the safety framework. The physician oversight wasn’t just a checkbox; it was active and necessary. The blog doesn’t detail how many interventions occurred, but the fact that they had a structured set of criteria for intervention tells me they expected edge cases.

And they’re right to. Medical AI in the wild is a different beast than in the lab. Patients don’t always articulate clearly. They might omit critical details. They might even lie (yes, patients do that). An AI trained on perfect simulated dialogues will struggle with the messiness of real human communication.

Why This Matters More Than Another Benchmark

We’ve seen plenty of AI systems that ace multiple-choice medical exams or outperform residents on simulated cases. But those are controlled environments. This study is one of the first to put a conversational diagnostic AI in front of actual patients in a real clinic, under real time pressure, with real consequences.

It’s a small step, but it’s in the right direction. Google is treating this as a milestone in their “evidence roadmap,” and that language is telling. They know that regulatory approval and clinical adoption won’t come from a flashy demo. It’ll come from painstaking, boring, essential work like this.

The Catch: This Isn’t Autonomous AI

Let’s be clear: AMIE wasn’t left alone with patients. Every chat was supervised. That’s not a failure—it’s responsible science. But it means the path to truly autonomous diagnostic AI is still long. We need studies with larger populations, multiple sites, and randomized designs before we can even think about removing the human supervisor.

And then there’s the question of liability. If AMIE misses a red flag and the supervising physician doesn’t catch it, who’s responsible? The AI developer? The hospital? The doctor? These aren’t technical problems; they’re legal and ethical ones that no amount of fine-tuning can solve.

My Take

I’m cautiously optimistic. This study is exactly what the field needs: real-world evidence, not just another paper on GPT-4 passing the USMLE. But I’ve been around long enough to see medical AI overpromise and underdeliver. The fact that Google is being measured and transparent here is a good sign.

The next step should be a multi-center randomized trial comparing AMIE-assisted care to standard care, with hard outcomes like diagnostic accuracy, time saved, and patient satisfaction. And they need to publish the full data, including failure cases and intervention rates. No cherry-picking.

For now, AMIE is a promising tool that might make doctors’ lives easier by handling the boring part of history taking. But it’s not ready to replace anyone. And that’s fine. Medicine moves slowly for a reason.