The companies focused on using AI in drug discovery fall into two categories:
- Information engines and disease models inform general drug discovery and can be used by the wider scientific community at the earliest stages of development
- Drug design and optimization vendors produce algorithms design to improve the drug design process and develop candidates from inception through to preclinical testing.
Every biologist understands the concept that proteins are the building blocks of life. Manipulating a protein’s function is often the basis of treatment. Understanding a protein’s structure and how it influences the cell makes up an important part of the drug development pathway. But a quick method of accurately deducing a protein’s structure has proved elusive for scientists, who currently spend years in the lab trying to narrow down potential formations. The CASP (Critical Assessment of Structural Prediction) competition, a worldwide community experiment that aims to solve this problem, was conceived in 1994. This biennial challenge for computational biologists is used to help benchmark methods to predict the structure of proteins from just an amino acid sequence.
The competition made headlines in 2018, when Google’s Deepmind joined with its AlphaFold program and significantly outstripped previous attempts. Two years later, the results of AlphaFold 2 were even more exciting: the program managed to determine the shape of around two-thirds of proteins with an accuracy comparable to laboratory experiments. Some experts are now postulating that the team may be able to solve problems which previously took years in a matter of days, leading to a transformative effect in the way that diseases are treated.
In a previous post, I mentioned how much of the data from previous experiments with targets or molecules has been collected in a way that makes it hard for analysis. On top of that, medical literature can’t just be fed to an AI to extract insights. It has to be annotated, prioritized and structured before it can be useful to AI. NLP can be helpful, but it has to first be trained to look for the right information and some things, such as the quality of the data being analyzed, is beyond its scope. However, this effort might be worthwhile as there’s a massive number of hidden insights sitting in the current literature.
There are emerging machine learning techniques that can help us to prepare data for drug discovery. Machine learning models can be trained to support companies in the life sciences industry and to normalize, index and structure data. AstraZeneca has been using machine learning for R&D, as well as in pathology to review tissue samples more quickly. Labelling data is always time consuming, but it’s even more of a challenge in this case because it can take thousands of tissue sample images to train a model. AstraZeneca uses a machine learning-powered, human-in-the-loop data-labeling and annotation service to automate some of the most tedious portions of this work, resulting in at least 50% less time spent cataloging samples.