AI Assessment in Clinical Applications

There is considerable amount of talk about bias in AI as applied to clinical settings and how to eliminate it. This problem has now attracted the attention of standards bodies with a view to legislating how AI systems should be validated and certified.

AI consists of three types of logic: inductive, deductive and abductive.

However, most modern references to AI are really talking about the inductive type better known within the industry as Machine Learning (ML), which allocates data objects into classes based on the statistical distribution of their features.

Bias in machine learning cannot be eliminated as it is an intrinsic aspect of the method. ML uses a sample of data objects for which it has their features or attribute values and knows their correct classes, and so trains a classification predictive model on this training data. Hence, biases in collecting the sample are intrinsic.

But other biases apart from the data collection process are also introduced along the pathway of developing a working ML predictive classifier.

I don’t believe you can make learning algorithms bias-free. Just like drugs where there is always a percentage of people who have an adverse reaction, so there will be a percentage of people for whom the classifier prediction will be wrong because of:

a. The incompleteness of the learning sets;
b. The conflicting and contradictory content within the training set;
c. The limitations of the learning model to represent the distributions in the training data;
d. The choice of attributes to represent the training examples;
e. The normalisations made of the training data;
f. The time scope of the chosen training data;
g. The limitations of the expertise of those experts deciding on the gold standard values;
h. The use of “large data” sets that are poorly curated, where “good data” is needed.
etc.

If legislators are going to make the AI technology regulation-free, then they should at least require the algorithms to be provided on an accessible web site where “users” can submit their own data to test its validity for their own data sets. Then the users can have confidence or not that the specific tool is appropriate for their setting.

Training set data is typically collected in one of two ways.

1. Curated data collection: careful election and curation of data examples to span a specific range of variables. Its most serious drawback is the effort of manual labour to collate and curate the data collection and hence it is popular for more targeted objectives using smaller training sets.

2. Mass data collection: data is collected en masse from a range of sources and data elements are captured by “deep learning” strategy of compiling large feature vectors for each classification object using an automatic method of collation and compilation.

This approach is popular because it can be highly automated and supports belief in the fallacy that more data means better quality results.

What we don’t need is more “big data”, but rather we need more “good data”.

How do we get GOOD DATA?

The delivery of a machine learning processing system into production means a supply of potential learning material is flowing through the system. Any sensible and well-engineered system will have a mechanism for identifying samples that are too borderline to safely classify. We divert those samples into a Manual processing category.

The challenge with the materials in the Manual class is how to utilise them for improvement in the resident system. A good deal of research has gone into the process of Active Learning, which is the task of selecting new materials to add to the training materials from a large set of untrained samples.

There are two major dimensions to this selection process: Samples that fall near the class boundaries, and samples that represent attribute profiles significantly different to anything in the training materials.

Automated detection of both of these types of samples requires two different analytical methods known as Uncertainty Sampling and Diversity Sampling respectively. An excellent text on these processes is by Robert Munro, Human-in-the-Loop-Machine Learning, MEAP Edition Version 7, 2020, Manning Publications.

GOOD DATA can be accumulated through a strategy of Continuous Process Improvement (CPI) and any worthwhile clinical Machine Learning system will need to have that in place, otherwise the system is trapped in an inescapable universe of self-fulfilling prophecies with no way of learning what it doesn’t know, and should know, to be safe. A clinical ML system without CPI is hobbled by “unknown unknowns” which can lead to errors, both trivial and catastrophic.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.