Data standardization is critical for aggregating data from different sources to train and use AI algorithms. Data standardization refers to the process of transforming data into a common format that can be understood across different tools and methodologies. This is a key concern because data are collected in different methods for different purposes and can be stored in a wide range of formats using variable database and information systems. Hence, the same data (e.g., a particular biomarker such as blood glucose) can be represented in many different ways across these different systems. Healthcare data has been shown to be more heterogeneous and variable than research data produced within other fields In order to effectively use these data in AI-based technologies, they need to be standardized into a common format.
Interoperability will be essential given the multiple components of a typical clinical workflow. For example, for an AI-assisted radiology workflow, algorithms for protocoling, study prioritization, feature analysis and extraction, and automated report generation could each conceivably be a product of individual specialized vendors, such as GE or Siemens. A set of standards would be necessary to allow integration between these different algorithms and also to allow algorithms to be run on different equipment. Without early efforts to optimize interoperability, the practical effectiveness of AI technologies will be severely limited.
Over the last decade, organizations have focused on digitizing healthcare. In the next decade, making sense of all this data will provide the biggest opportunity to transform care. However, this transformation will primarily depend on data flowing where it needs to, at the right time, and supporting this process in a way that is secure and protects patients’ health data.
Figure: Patient’s Data Resides in Many different silos
It comes down to interoperability. It may not be the most exciting topic, but it’s by far one of the most important, and one the industry needs to prioritize. Interoperability is a loaded term. It can mean that the different data sources are actually connected to each other and can send and receive data. It also means that these different systems can talk to each other in the same language and when they shared data, the receiving systems understands the data coming in from the other database.
In our next entry, we will discuss some of the emerging solutions for this.