Health systems and care providers must be vigilant in ensuring that the models they implement foster better care and promote health equity and are not biased. Efforts must include a legal, regulatory and compliance review to decide who is in charge of various elements as well as how to avoid patient harm. The governance teams need to work with clinical and scientific experts to verify and ensure that any algorithms or AI have been vetted and tested, as well as set guidelines concerning obtaining patients’ permission to use AI and how to inform patients about the role AI plays in diagnosis or treatment. Whether appointing a Chief AI Officer (CAIO) or another executive to lead AI efforts, organizations need to bring in people with understanding of data and analytics who also understand the nature of health care business, at the highest levels in the organization.
In January 2020, Google Health, the branch of Google focused on health-related research, clinical tools, and partnerships for health care services, released an AI model trained on over 90,000 mammogram X-rays that the company said achieved better results than human radiologists. Google claimed that the algorithm could recognize more false negatives — the kind of images that look normal but contain breast cancer — than previous work. In a rebuttal in the journal Nature, over 19 coauthors affiliated with McGill University, the City University of New York (CUNY), Harvard University, and Stanford University said that the lack of detailed methods and code in Google’s research “undermines its scientific value.”
In their rebuttal, the coauthors of the Nature commentary point out that Google’s breast cancer model research lacks details, including a description of model development as well as the data processing and training pipelines used. Google omitted the definition of several hyper-parameters for the model’s architecture (the variables used by the model to make diagnostic predictions), and it also didn’t disclose the variables used to augment the dataset on which the model was trained.
Partly due to a reticence to release code, datasets, and techniques, much of the data used today to train AI algorithms for diagnosing diseases may perpetuate inequalities. A team of U.K. scientists found that almost all eye disease datasets come from patients in North America, Europe, and China, meaning eye disease-diagnosing algorithms are less certain to work well for racial groups from underrepresented countries. A study of a UnitedHealth Group algorithm determined that it could underestimate the number of Black patients in need of greater care by half. And, a growing body of work suggests that skin cancer-detecting algorithms tend to be less precise when used on Black patients, in part because AI models are trained mostly on images of light-skinned patients.