Founder & President Alison Darcy, PhD, sat down with Eric Topol, MD, a leading expert in A.I. and medicine to discuss large language models (LLMs) and their opportunities for medicine.
Topol is the author of Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again among other books. He is also a cardiologist and the Founder & Director of Scripps Research Translational Institute
This is the second of a two-part interview: “A.I. in Healthcare: The Hope and the Hype.” See part one here.
Now is the time for evidence generation: If we are going to get clinicians and the public comfortable with a new standard of care, we need the type of proof that comes from randomized trials and rigorous prospective work.
Fear is holding us back: We must acknowledge that LLMs have limitations and can be used to create dangerous fakes, but we also must admit that nothing in medicine is 100%. The key to getting the most from this powerful technology is balancing the freedom to innovate with the need to be responsible.
AI + Human: This is not an either/or situation. The combination of man and machine will enable us to make the most of this technology safely.
(edited for clarity)
Someone recently sent me a screenshot of the front cover of the Daily Mail from around 1995 or something pretty late into the nineties that said, “It’s official! The Internet is a fad.” I think the way people talk about AI now is the way that people talked about the Internet in the nineties–”It’s the best thing ever.” “It’s so dangerous.” In fact, it’s been a tool that has been both.
What do you think the media is getting right about LLMs? How do you see that conversation unfolding? Has it been helpful, balanced, unhelpful?
Some are feeding the pure hype, and some are taking a very harsh stance. You don’t hear, “Okay, here’s how we will take this across the goal line.” You don’t hear that, right? And that’s where we should be putting our efforts. Not just coming up with the design but also executing the work because that’s the missing link right now. We’ve got something so powerful, so pluripotent, but you have to prove it.
On the clinician side, it has to be very compelling. A lot of physicians and nurses and pharmacists and all the professionals, they’re very leery about this stuff, not just because they think it’s a job threat–which most hopefully realize it’s not a job threat–but more that the harm potential is there. So we have to have compelling evidence. That can come from randomized trials. It can come from really well-done, rigorous prospective work. But that’s what’s needed now. Whatever tool we have to work with, get that compelling evidence base out there so that people start to get comfortable with a new standard of care.
We did a survey a few years ago and tested the general public’s acceptance of digital therapeutics and the field in general. And we found that people are actually very discerning. They are definitely optimistic and eager to adopt these tools–if there is evidence to support them. And I think that’s a very good thing. Because again, I do worry about the undermining of public confidence. Where you have a lack of regulation, and you also have a great need, it creates a big vacuum that gets filled with, in some cases, just a lot of noise. Everybody’s making the same claims, and everybody is saying the same thing. And over time, the absence of data risks undermining confidence.
Your point is a really important one. If we have a fiasco because of premature adoption, it could really hurt the field. It could hurt the progress. It could get stigmatized. Hopefully, we all work together to usher this in properly.
In your Nature Medicine piece, you had a call to action for technologists and clinicians to come together and partner, which I also believe in. As clinicians, we can’t bury our heads in the sand. And you want to be at the table developing the technologies. Who do you think is doing this well right now?
It is happening so quickly. A lot of groups are trying to get first mover advantage. We have a paper in Nature about foundation models in medicine. And we like to think that we’re coming up with really good ideas. The group at Harvard, led by Zak Kohane, is certainly one of the leading groups in the U.S. But there are other groups like Pearse Keane and his colleagues at Moorfields in the U.K., which are doing some amazing things, starting with the retina, but also widening that out to the whole body, the gateway. That’s a good example of this whole idea of foundation models. You start with one, the retinal photo, and then you expand your understanding of a person and their risk for various conditions.
We’re seeing different groups around the world–those are just a few that are really very thoughtful–that are waiting to take these tools, which, as you’ve already pointed out, are just going to keep getting better, to apply them to prove that this is truly transformative, which I believe if there ever was something that fulfilled that descriptor, this is it.
There’s doing the work, and then there’s helping people more broadly understand the work in the appropriate context. You’ve done an amazing job at this through your writing. I’ve actually got a burning question: how do you have any time for anything else? But we can leave that to the end of the interview.
We have to find ways to get to people where they are receptive and eager to learn. But also, we just can’t be talking in abstract about excitement. In evidence, we trust. Once we have that, we’ll get many more receptors open to buy into it.
How important is explainability in medicine? Does it depend on intended use? What are your thoughts?
This has been a highly contested topic. I tend to side with the computer scientists here. A lot of things in medicine, we have no idea how they work. Electroconvulsive therapy for very severe depression. Who knows how that works? But we use it. We use anesthesia. We have no idea how these agents actually work. So how can we hold machines to a different standard if the validation is really solid, compelling, and even if we can’t fully explain it now?
At the same time, we’re getting much better at deconstructing models, reverse engineering to find out, the saliency maps and the features that drive their success and their accuracy. So I think we’re going to get to a kind of a steady state where hopefully, we’ll get validation and we’ll also get some explainability.
But this also carries over to the large language models because some people say, is this really artificial general intelligence? Well, it doesn’t really matter what it is if it’s working.
Given that perspective, I wonder, l what do you think about this letter written to say we should pause for four or six months? And similarly, the Italian government has implemented a ban. [Edit note: it has since been lifted.]
This is really crazy stuff. We’re not going to have any ban. It’s not going to happen. And it’s a race now that has enormous commercial interests, not just the tech titans between Microsoft and Google, but far bigger than that. And the fact that it has so many benefits incubating. We don’t want to suppress that. But we do need to figure out ways to authenticate. When you have, the pope wearing some down jacket, that’s not right. There should be an immediate signal that this is fake. We haven’t done that yet. That is, to basically use A.I. to determine whether something is genuine. That’s something we have to work on.
The idea that we could somehow keep this on ice for six months or a year indefinitely is absurd. It’s never going to happen. It might sound good to certain people who are worried. And I’m worried, too. When you say optimistic, it doesn’t mean that you turn a blind eye toward where you go wrong.
The funny thing about AI scientists is they think that every problem with AI can be solved by using AI. It’s not necessarily true. But maybe here it is. What is done properly? What is basically fabricated? We’ve got to have a handle on that.
In the pandemic, one of the lessons for me, at least, was how quickly things can change if you have the political will. So what is the role of government here? Is it regulation?
We don’t prevent progress, but we have to have guardrails. Coming up with that right balance–we haven’t done that yet. Even with A.I. being narrow and unimodal, we still haven’t figured out how to exploit the autodidactic function.
The algorithms were frozen the day they get approved. If you could let them loose and just keep getting better and better. But the way we’ve handled unimodal A.I., like, for example, radiology scans, it’s really hurt the power. We’ve diminished the remarkable power. So we haven’t yet found that balance, Ali, we haven’t figured out how to do it.
Eroding the model over time ultimately is not in the patient’s best interest. Eric, you must have a front seat here. What’s going on? Why haven’t we been able to figure out how to regulate sensibly?
This gives me fits just because we haven’t really yet figured out how to derive maximal benefit and minimize risk. I’m hoping that the lessons we’ve learned so far with images, which have been the sweet spot for medicine for the last few years, that the shortcomings will not repeat themselves and that we will take advantage of that.
The more data that goes into models, the more tokens, the better parameters. We acknowledge that this is dynamic. You can’t just approve a GPT X that day and then not let any further inputs be incorporated. I do hope that we’ll figure this out.
Regulatory bodies throughout the world tend to be very conservative. They are aware of the power, and they are more afraid of it. We have to come up with the right balance and we haven’t gotten there yet.
I’ve heard people argue that it goes back to the Hippocratic Oath. “Do No Harm” has created an environment where we fear harm. If the Hippocratic Oath could have been more like, “Do Good,” it would be more suited for our modern times in modern medicine. Now we’re in this place where we’re stuck.
I think the position of the net benefit to the patients is correct. We seem to over-anchor on this large fear of any risk whatsoever. The truth is that lots of human-delivered services are associated–I mean, they’re all–associated with risk because we are humans, and humans are human.
I’m so glad you mention that. There is a lot of unintended harm that’s done in human-to-human connects. And so, if we were to come up with a new oath, it would be “Maximize Net Benefit” because fixating on harm is what holds us back. There is no such thing as any intervention, medication, device, or diagnostic test that’s 100% foolproof.
So acknowledging that so we’re not stymied and don’t hold back from the progress that lies ahead. That’s really important.
A balanced approach is the only way to move forward. When you’re involved in creating something that is relatively new or uses new technology, if there is one place where it doesn’t work in a completely optimized way, it’s held up as “Aha, you see, evidence of harm.” It’s so damaging when you look at the overall picture and the overall potential and the actual realized benefit as well.
Can I may I ask you one more question? What are you reading right now, and what books should you should everybody read?
I just read that GPT-4 book [The AI Revolution in Medicine: GPT-4 and Beyond], which was excellent. And I wrote a review on Substack about that. The one I’m just starting is by a friend of mine, Peter Hotez, on Deadly Anti-Science.
I’m a reading fanatic. I really enjoy all this stuff. There’s just never enough time to read all the books.
Founder & President Alison Darcy, PhD, sat down with Eric Topol, MD, a leading expert in A.I. and medicine to discuss large language models (LLMs) and their opportunities for medicine.
Topol is the author of Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again among other books. He is also a cardiologist and the Founder & Director of Scripps Research Translational Institute
This is the second of a two-part interview: “A.I. in Healthcare: The Hope and the Hype.” See part one here.
Now is the time for evidence generation: If we are going to get clinicians and the public comfortable with a new standard of care, we need the type of proof that comes from randomized trials and rigorous prospective work.
Fear is holding us back: We must acknowledge that LLMs have limitations and can be used to create dangerous fakes, but we also must admit that nothing in medicine is 100%. The key to getting the most from this powerful technology is balancing the freedom to innovate with the need to be responsible.
AI + Human: This is not an either/or situation. The combination of man and machine will enable us to make the most of this technology safely.
(edited for clarity)
Someone recently sent me a screenshot of the front cover of the Daily Mail from around 1995 or something pretty late into the nineties that said, “It’s official! The Internet is a fad.” I think the way people talk about AI now is the way that people talked about the Internet in the nineties–”It’s the best thing ever.” “It’s so dangerous.” In fact, it’s been a tool that has been both.
What do you think the media is getting right about LLMs? How do you see that conversation unfolding? Has it been helpful, balanced, unhelpful?
Some are feeding the pure hype, and some are taking a very harsh stance. You don’t hear, “Okay, here’s how we will take this across the goal line.” You don’t hear that, right? And that’s where we should be putting our efforts. Not just coming up with the design but also executing the work because that’s the missing link right now. We’ve got something so powerful, so pluripotent, but you have to prove it.
On the clinician side, it has to be very compelling. A lot of physicians and nurses and pharmacists and all the professionals, they’re very leery about this stuff, not just because they think it’s a job threat–which most hopefully realize it’s not a job threat–but more that the harm potential is there. So we have to have compelling evidence. That can come from randomized trials. It can come from really well-done, rigorous prospective work. But that’s what’s needed now. Whatever tool we have to work with, get that compelling evidence base out there so that people start to get comfortable with a new standard of care.
We did a survey a few years ago and tested the general public’s acceptance of digital therapeutics and the field in general. And we found that people are actually very discerning. They are definitely optimistic and eager to adopt these tools–if there is evidence to support them. And I think that’s a very good thing. Because again, I do worry about the undermining of public confidence. Where you have a lack of regulation, and you also have a great need, it creates a big vacuum that gets filled with, in some cases, just a lot of noise. Everybody’s making the same claims, and everybody is saying the same thing. And over time, the absence of data risks undermining confidence.
Your point is a really important one. If we have a fiasco because of premature adoption, it could really hurt the field. It could hurt the progress. It could get stigmatized. Hopefully, we all work together to usher this in properly.
In your Nature Medicine piece, you had a call to action for technologists and clinicians to come together and partner, which I also believe in. As clinicians, we can’t bury our heads in the sand. And you want to be at the table developing the technologies. Who do you think is doing this well right now?
It is happening so quickly. A lot of groups are trying to get first mover advantage. We have a paper in Nature about foundation models in medicine. And we like to think that we’re coming up with really good ideas. The group at Harvard, led by Zak Kohane, is certainly one of the leading groups in the U.S. But there are other groups like Pearse Keane and his colleagues at Moorfields in the U.K., which are doing some amazing things, starting with the retina, but also widening that out to the whole body, the gateway. That’s a good example of this whole idea of foundation models. You start with one, the retinal photo, and then you expand your understanding of a person and their risk for various conditions.
We’re seeing different groups around the world–those are just a few that are really very thoughtful–that are waiting to take these tools, which, as you’ve already pointed out, are just going to keep getting better, to apply them to prove that this is truly transformative, which I believe if there ever was something that fulfilled that descriptor, this is it.
There’s doing the work, and then there’s helping people more broadly understand the work in the appropriate context. You’ve done an amazing job at this through your writing. I’ve actually got a burning question: how do you have any time for anything else? But we can leave that to the end of the interview.
We have to find ways to get to people where they are receptive and eager to learn. But also, we just can’t be talking in abstract about excitement. In evidence, we trust. Once we have that, we’ll get many more receptors open to buy into it.
How important is explainability in medicine? Does it depend on intended use? What are your thoughts?
This has been a highly contested topic. I tend to side with the computer scientists here. A lot of things in medicine, we have no idea how they work. Electroconvulsive therapy for very severe depression. Who knows how that works? But we use it. We use anesthesia. We have no idea how these agents actually work. So how can we hold machines to a different standard if the validation is really solid, compelling, and even if we can’t fully explain it now?
At the same time, we’re getting much better at deconstructing models, reverse engineering to find out, the saliency maps and the features that drive their success and their accuracy. So I think we’re going to get to a kind of a steady state where hopefully, we’ll get validation and we’ll also get some explainability.
But this also carries over to the large language models because some people say, is this really artificial general intelligence? Well, it doesn’t really matter what it is if it’s working.
Given that perspective, I wonder, l what do you think about this letter written to say we should pause for four or six months? And similarly, the Italian government has implemented a ban. [Edit note: it has since been lifted.]
This is really crazy stuff. We’re not going to have any ban. It’s not going to happen. And it’s a race now that has enormous commercial interests, not just the tech titans between Microsoft and Google, but far bigger than that. And the fact that it has so many benefits incubating. We don’t want to suppress that. But we do need to figure out ways to authenticate. When you have, the pope wearing some down jacket, that’s not right. There should be an immediate signal that this is fake. We haven’t done that yet. That is, to basically use A.I. to determine whether something is genuine. That’s something we have to work on.
The idea that we could somehow keep this on ice for six months or a year indefinitely is absurd. It’s never going to happen. It might sound good to certain people who are worried. And I’m worried, too. When you say optimistic, it doesn’t mean that you turn a blind eye toward where you go wrong.
The funny thing about AI scientists is they think that every problem with AI can be solved by using AI. It’s not necessarily true. But maybe here it is. What is done properly? What is basically fabricated? We’ve got to have a handle on that.
In the pandemic, one of the lessons for me, at least, was how quickly things can change if you have the political will. So what is the role of government here? Is it regulation?
We don’t prevent progress, but we have to have guardrails. Coming up with that right balance–we haven’t done that yet. Even with A.I. being narrow and unimodal, we still haven’t figured out how to exploit the autodidactic function.
The algorithms were frozen the day they get approved. If you could let them loose and just keep getting better and better. But the way we’ve handled unimodal A.I., like, for example, radiology scans, it’s really hurt the power. We’ve diminished the remarkable power. So we haven’t yet found that balance, Ali, we haven’t figured out how to do it.
Eroding the model over time ultimately is not in the patient’s best interest. Eric, you must have a front seat here. What’s going on? Why haven’t we been able to figure out how to regulate sensibly?
This gives me fits just because we haven’t really yet figured out how to derive maximal benefit and minimize risk. I’m hoping that the lessons we’ve learned so far with images, which have been the sweet spot for medicine for the last few years, that the shortcomings will not repeat themselves and that we will take advantage of that.
The more data that goes into models, the more tokens, the better parameters. We acknowledge that this is dynamic. You can’t just approve a GPT X that day and then not let any further inputs be incorporated. I do hope that we’ll figure this out.
Regulatory bodies throughout the world tend to be very conservative. They are aware of the power, and they are more afraid of it. We have to come up with the right balance and we haven’t gotten there yet.
I’ve heard people argue that it goes back to the Hippocratic Oath. “Do No Harm” has created an environment where we fear harm. If the Hippocratic Oath could have been more like, “Do Good,” it would be more suited for our modern times in modern medicine. Now we’re in this place where we’re stuck.
I think the position of the net benefit to the patients is correct. We seem to over-anchor on this large fear of any risk whatsoever. The truth is that lots of human-delivered services are associated–I mean, they’re all–associated with risk because we are humans, and humans are human.
I’m so glad you mention that. There is a lot of unintended harm that’s done in human-to-human connects. And so, if we were to come up with a new oath, it would be “Maximize Net Benefit” because fixating on harm is what holds us back. There is no such thing as any intervention, medication, device, or diagnostic test that’s 100% foolproof.
So acknowledging that so we’re not stymied and don’t hold back from the progress that lies ahead. That’s really important.
A balanced approach is the only way to move forward. When you’re involved in creating something that is relatively new or uses new technology, if there is one place where it doesn’t work in a completely optimized way, it’s held up as “Aha, you see, evidence of harm.” It’s so damaging when you look at the overall picture and the overall potential and the actual realized benefit as well.
Can I may I ask you one more question? What are you reading right now, and what books should you should everybody read?
I just read that GPT-4 book [The AI Revolution in Medicine: GPT-4 and Beyond], which was excellent. And I wrote a review on Substack about that. The one I’m just starting is by a friend of mine, Peter Hotez, on Deadly Anti-Science.
I’m a reading fanatic. I really enjoy all this stuff. There’s just never enough time to read all the books.