We have been examining the topic of data and its importance to developing and running AI algorithms now for several weeks. When it comes to AI, the topic of data can not be discussed enough! If a model is good, it is because it was developed on good data. The reverse is also true. In healthcare, data is hard to get. Healthcare data is very sensitive and private. It can be used for very sinister purposes and guarding it by those who generate it in the context of the provision of care is critical. Of course, this is all good and how it should be. However, it makes it hard to get enough of it, or get a sufficiently diverse dataset to develop and validate AI models.
We are at the dawn of the launch of the first wave of such models and those that have been launched so far are not living up to the hype. Why? It looks like they were not developed on datasets that represent the populations that they are being used for. Why is that happening? Well, most likely it is due to the fact that those developing these models can only get a hold of so much data or data that comes from a certain population. That population is not representative of all of the different types of populations that the model could potentially be used for.
We began the discussion of federated earning as one of the possible solutions to this problem. The benefit of the federated learning is that it potentially allows access to much larger and more diverse datasets. This is because using this approach allows for the data to stay in its native database. This way, the owners of data can keep the data in their own environment but allow for the use of that data in a distributed fashion to build AI models.