The last 2 years has seen a dizzying pace of innovation when it comes to foundational AI technologies. While no transformative innovation happens overnight and most innovations that move the needle are usually the result of years and decades of research, we’ve seen generative AI technologies evolve rapidly since the launch of ChatGPT. This has major implications for the use cases that would be possible in the real-world. We all have experience with customer service call centers that frustrate us because the automated system does not understand very basic responses that we provide. We review the transcription of a voice message created by our iPhone and find most of it nonsensical. Given this poor real-world performance, why would we trust any of these technologies to start functioning without human review and supervision?
There is indeed hope that the new generation of generative AI solutions can start delivering such performance. Given that the hope is that these solutions can ultimately process information, understand it, and act on it, we refer to these solutions as AI agents. Agents because they act autonomously on our behalf. This would obviously be a major step forward for our lives and the economy, since we can offload activities done manually and by humans to machines. This is not without historical precedent and if these agents become fully autonomous soon, it is just the next generation of activities that are being automated by machines. It’s important to keep in mind that a little over a 100 years ago, more than 90% of the US population was in farming. Back then, most of the activities related to farming had to be done manually and that required most of our people to be in farming to feed everyone. Today, less than 1% of the US population is in farming and we grow much more food than we did back then. This is because many of those activities were ultimately automated and handed off to machines. People in farming today are happy that they don’t have to spend hours in the hot sun planting seeds, spreading fertilizer, and pulling out the crop. The rest of us who don’t have to be in farming to eat can focus on other jobs that make the modern life possible.
AI agents hold the promise to automate a whole new category of activities. These are activities that involve processing data and information, reasoning on what that information means, and taking actions as a result of that reasoning. These activities would usually be part of what we call “white collar jobs”. Given the nature of these jobs where information in the form of narratives, images, numbers, etc have to be cognitively processed and acted upon, automating is not easy. This is why we have suffered through the automated call center solutions and chatbots that have to kick us up to a human agent as soon as we ask a tough question. Given the complexity of these activities and the level of delivery expected, the bar is very high for automation, especially in healthcare. After all, who would trust AI to handle questions from a patient without human supervision unless they’re confident that the response will not harm the patient? If an AI agent is doing the notes for a doctor, it can not record the wrong medications or other false information that will result in a cascade of wrong conclusions and decisions that would cause harm. As such, with anything autonomous, especially in healthcare, declaring victory would have to wait until we’re fully confident that the output will be safe and hopefully on-point. I emphasize safety first since our credo in healthcare is to “first do no harm.”
Given the expanding adoption of AI scribe as an initial use case for AI in healthcare, it is obvious that the capabilities of this technology are improving. By “technology”, I mean AI agents. While a doctor will review the note once it’s done before they sign that note, therefore not truly autonomous, the act of creating the note from a recoding of the conversation is agentic since no human is involved in that part of the process. Just a couple of years ago when I was writing about AI scribe in my book, AI Doctor, The Rise of Artificial Intelligence in Healthcare, I wrote about the limitations of the technology at the time and how significant effort had to be extended to train the voice technology in each specialty for the output to be acceptable. While optimizing the AI scribe agent for each specialty will result in much better results still, the baseline agents perform at an acceptable level for most doctor-patient interactions, especially in primary care. Given the unsupervised training method used to train the language models and then fine-tuning them on specific data for each use cases, optimizing for the specialties can be done faster and with better results now.
There are a number of other use cases emerging for AI agents in health and healthcare that we will discuss in the next posts but it is worth spending a bit of time in the next post discussing why AI agents are becoming possible now and not 2 years ago when ChatGPT and the large language models were first launched. This has something to do with the fact that the reasoning abilities of these agents are rapidly improving and their ability to process multimodal data such as images, video, tabular data, narratives, etc have significantly improved in the last 2 years, and especially in the past year. Also, it takes time to insert an agent inside an end to end workflow and make sure that the integration is smooth and improves workflows rather than creating more work.