AI in healthcare: how does it work and what aspects should you consider?

Part of this blog post has been published in SecurAware September 2018 - Insight - by Christiaan Hillen, Security Analyst at Secura

Artificial intelligence has an increasing effect on healthcare. As with any computer system, there are security risks involved. Christiaan holds degrees both in information science and healthcare, giving him an interesting cross-over perspective in these fields. AI in healthcare: how does it work and what aspects should you consider?

A year ago, a British artificial intelligence (AI) company called "DeepMind" was involved in a ruling concerning London’s Royal Free hospital that failed to comply with the Data Protection Act. The hospital had provided personal data of around 1.6 million patients as part of a trial to test an alert, diagnosis and detection system for acute kidney injury.
This resulted in a nightmare scenario for anyone working with personal data, and in particular with data relating to healthcare. Not only the financial impact, but the loss of patient trust. The promises of AI in helping healthcare professionals to provide better care are clear but divides people in the healthcare business. You are either a critic, or you are an avid believer, up to such a level that it is more heretical to question the benefits of AI than it is to question global warming.

Machine Learning
One of the core elements of AI is machine learning: given a large dataset, we want to know something; does someone belonging in group A (healthy) or group B (sick), that is, we want to classify the instances in the dataset, based on their attributes. For a computer to learn how to classify, it first needs to be trained to recognise and differentiate between people from group A and from group B. The process is straightforward:

  • Get a large set of data containing people that are marked as belonging to group A or to group B.
  • Split this set into smaller sets, each containing persons from group A and from group B.
  • The computer gets one of these sets, with the markings A and B still attached.
  • The computer learns how to classify these people; based on the attributes of each person. This step creates a model for the classification.
  • After having learned this, the computer receives a random validation set, containing different people, to refine the model. Repeat as needed.
  • The final set given to the computer is the (blind) test set, which is used to see if the model is able to correctly classify the subjects.

In simple cases, this process will result in a near 100% correctness of the model. In complex cases this might be difficult to achieve, something like 97% is more realistic. The computer might classify a person as sick while being healthy (false positive), or as healthy while being sick (false negative). Both can be detrimental to the patient. Even with a correctness of 97%, that leaves 48.000 of the 1.6 million that are incorrect.

Imagine an attacker going unnoticed, and being able to ever so slightly influence the model. Can he bring that 97% down to 94%, doubling the amount of incorrect classifications? Even a 1% change would have a significant impact. How do you know if your model has been altered? Security by design should be a part of any AI system.

Computer says no
When decision-making is based on learned models, this process may be so complex that the person operating the computer no longer understands why a decision was made. The computer simply informs the operator what the conclusion is. Thankfully, the GDPR provides some protection here, in article 22: "The data subject shall have the right not to be subject to a decision based solely on automated processing [...]"

If you are keeping subjects in the dark about how a decision is made, chances are they will object. Transparency will make it clear to all parties what is being decided and how? If you can’t explain why the computer says no, how can you be certain that the model is correct? Maybe someone with malicious intent made a change. Use the model as a guide, but always be able to verify results by hand, and have logs to back up any claim and investigate issues. This will also help with legal compliance, you need to be able to show how data is processed.

Models typically have problems with outliers. Data points that are very different from the usual, yet should fall within a certain classification. By definition, the number of outliers in a dataset is low. You might even not find one when training the model on the random set you generated. Such rare cases might be successfully identified by humans, but a computer that has never seen one, might not be able to combine all the attributes to come to the right conclusion.

Integrity checks on training data are important here, to be certain that outliers have not been removed or added.

Rare classes
In some datasets, there may be classes that are quite rare, yet closely related to a common class. Differentiating between these may be difficult, in particular if this small class is so rare that there are only a few instances in the dataset. If there are just one or two instances, the model will probably be able to identify them, but can fail in finding others. Say that all instances of this rare class in the model are male, the model might include a rule that to belong to this class, an instance then needs to be male. Females will be rejected outright. Models can be racist, sexist, and politically incorrect without scruples.

Correlation without causality
When training a model, correlations may show up that have nothing to do with the actual class of an instance. It may be through coincidence that the majority of patients with a patient number ending on a "5" belong to class A. As models have no (human) notion of what is and what is not important, these correlations may be used by the model to successfully classify subjects. Modern models are so complex that their inner workings and classification rules are nearly incomprehensible to humans. What takes a computer moment to understand, or days to create, can take years of human effort to fully grasp. Are you more susceptible to being "A" if your patient number ends with a "5"? Probably not, but the computer might think so. And how it got to that conclusion? Was it a programming error, a malicious insider, or a genuine result?

Test data
In order to properly train a model, more test data is always better. This may be why the data of 1.6 million patients was used by DeepMind. This test set is probably a very good representation of the overall population. Using such data for these purposes might actually be allowed under the GDPR, but do talk to your legal department about this before considering such an undertaking.

An alternative is to create fictive data. This leads to a whole new set of problems however, which combines the already mentioned issues in training a model. How do you recreate outliers? How do you account for rare classes? Are you certain that all correlations (ignoring causality) are present in the set? If you randomise real data, you may lose key attributes that could improve classification. Creating fictive data altogether may not be representative of the population at all, in particular highly complex attribute-combinations.

Test data generation is a hard problem, and we don't have a good solution for this. This is one of the reasons the GDPR allows for the use of personal data for scientific purposes, be it under strict conditions. It also mentions not using production data in a testing environment. Be sure to have regular checks with auditors to remain on the safe side.

If you are using AI methods to gain insight in your customer data, be sure to know what you are doing. If you can't explain to an independent auditor what it is exactly that you are doing with the data, and what decisions are being made based on the model that you created, you cannot harbor the expectation that your customers will understand either.

The security aspect
Do you have logs for all processing of the data? And are you certain that nobody takes home some of the data to work on their personal laptop in the evening? Who is responsible for the data? Who do you need to call when something goes wrong? How are the models validated, and are they securely stored where only authorised personnel can work with them?

Datasets for machine learning can be used to great, and devastating, effect. They can be used for good, and for evil. Protecting these sets and the models that are built with these sets, is therefore of paramount importance. Both the technical, and the organisational aspects of security need to be in order. Work together with auditors for this, they know what you should keep track of and how to comply with legislation whilst offering the best healthcare. Pentesters can help with identifying weaknesses in technical security. With great datasets come great responsibilities.

We are pleased to share our knowledge with anyone for whom digital security is important. Would you like to expand your knowlegdge even more, don't forget to subscribe for our periodical newsletter.

 Subscribe for more news and insights

@ Secura 2020
Webdesign Studio HB / webdevelopment Medusa