To create large and diverse datasets used for training algorithms that will perform as planned in clinical practice, you need data from multiple institutions. This will minimize the chance of using data that is too skewed toward a certain population and introducing bias into the algorithm. Not only are data necessary for initial training, a continued data supply is needed for ongoing training, validation, and improvement of AI algorithms. For widespread implementation, data may need to be shared across multiple institutions and potentially across nations. The data would need to be anonymized and de-identified, and informed consent processes would need to include the possibility of wide distribution. With this scale of dissemination, the notions of patient confidentiality and privacy may need to be reimagined entirely . Subsequently, cybersecurity measures will be increasingly important for addressing the risks of inappropriate use of datasets, inaccurate or inappropriate disclosures, and limitations in de-identification techniques. An example would be data that is de-identified before it leaves a medical center for training algorithms or data sharing between institutions for caring for the same patients. De-identification techniques today are very good but are not perfect.
You can remove patient identifying information from structured fields but patient identifying information may be included in the unstructured notes such as physician notes or radiology reports. This will mean that the data is not fully de-identified and patient identity can be exposed . As you can see, these are not isolated issues and overcoming them requires big-picture thinking, country-wide regulations and laws, new approaches to collection, storage, and sharing of data. This would threaten the business models of many current incumbents and unless there are laws and regulations pushing this along, many of these incumbents will drag their feet in changing their approach. But, in fact, some of the latest laws and regulations such as GPRD in Europe, are making access to data and data-sharing between the institutions more difficult.