A recent study reveals how training deep-learning models on patient outcomes can reveal gaps in existing medical knowledge.
Research conducted in the last few years has shown that deep learning can match expert-level performance in medical imaging tasks. However, other research has revealed that it has a tendency to perpetuate biases. The health-care system, already plagued with disparities, could be made much worse with the inefficient application of deep learning technologies.
A recent paper has suggested a way to develop medical algorithms that will help reverse existing inequality in healthcare. The secret being to stop training algorithms to match human expert performance because algorithms that are only trained to match expert performance will keep perpetuating existing gaps and inequities.
The research examines a specific clinical example of the disparities that exist in the treatment of knee osteoarthritis in white and black patients. Assessing the severity of the pain is done by examining a knee x-ray and scoring the patient’s pain on the Kellgren–Lawrence grade (KLG), which calculates pain levels according to different radiographic features, like the degree of missing cartilage or structural damage. This analysis then allows doctors to prescribe the appropriate treatment.
Data collected by the National Institute of Health found that black patients reported a higher level of pain than white patients for the same degree of damage. However, doctors scored Black patients’ pain as far less severe than what they say they’re experiencing. The self-reported pain of black patients’ is usually looked over in favor of the radiologist’s KLG score while prescribing treatment.
One explanation for this occurrence is that black patients could be reporting higher levels of pain to get doctors to treat them more seriously.t There is, however, an alternative explanation. It is possible that the KLG methodology
itself could be biased as it was developed many decades ago with white British populations. So, there may be radiographic indicators of pain that appear more commonly in Black people that just aren’t a part of the KLG rubric.
In order to test this possibility, researchers trained a deep-learning model to predict patients’ self-reported pain level from their knee x-ray. If the model had terrible accuracy, this would suggest that self-reported pain is quite arbitrary. But if the model had near-perfect accuracy, this would suggest that self-reported pain is correlated with radiographic markers in the x-ray. Interestingly enough, the model was found to be much more accurate than KLG at predicting self-reported pain levels especially for Black patients. The racial disparity was found to be reduced at each pain level by almost half.
This study proved that the standard way of measuring pain is indeed flawed, at a much bigger disadvantage to Black people. This should prompt the medical community to investigate the existing algorithm and upgrade their scoring methodology, thereby reducing prevailing biases and inequalities in healthcare.