Transparency of data and AI algorithms is also a major concern. Transparency is relevant at multiple levels. First, in the case of supervised learning, , the accuracy of predictions relies heavily on the accuracy of the underlying annotations inputted into the algorithm. Poorly labeled data will yield poor results so transparency of labeling such that others can critically evaluate the training process for a supervised learning algorithm is paramount to ensuring accuracy.
AI companies need medical experts to annotate images to teach algorithms how to identify anomalies. Tech giants and governments are investing heavily in annotation and making the datasets publicly available to other researchers.
Google DeepMind partnered with Moorfield’s Eye Hospital to explore the use of AI in detecting eye diseases. Recently, DeepMind’s neural networks were able to recommend the correct referral decisions for 50 sight-threatening eye diseases with 94% accuracy.
This was just the Phase 1 of the study. But in order to train the algorithms, DeepMind invested significant time into labeling and cleaning up the database of OCT (Optical Coherence Tomography) scans — used for detection of eye conditions —and making it “AI ready.” Clinical labeling of the 14,884 scans in the dataset involved various trained ophthalmologists and optometrists who had to review the OCT scans. Alibaba had a similar story when it decided to venture into AI for diagnostics around 2016.
There are emerging machine learning methods that can assist in identifying the data to label and labeling the data. Machine learning models trained to support healthcare and life sciences organizations can help automatically normalize, index and structure data. These models can automate key parts o the data labeling process and save time while increasing accuracy.
For example, AstraZeneca has been experimenting with machine learning across all stages of research and development. Labeling the data is a time-consuming step, especially cases where it can take many thousands of tissue-sample images to train an accurate model. AstraZeneca uses a machine learning-powered, human-in-the-loop data-labeling and annotation service to automate some of the most tedious portions of this work, resulting in at least 50% less time spent cataloging samples.