Explainer: Thinking through the safe use of AI
As the use of artificial intelligence expands across healthcare, there are plenty of justifiable worries about what looks to be a new normal built around this powerful and fast-changing technology. There are labor concerns, widespread distress over fairness, ethics and equity – and, perhaps for some, fear of a dystopian future where intelligent machines grow too powerful.
But the promise of AI and machine learning is also enormous: predictive analytics could mean better health outcomes for individuals and potentially game-changing population health advancements while moving the needle on costs.
Finding a regulatory balance that capitalizes on the good and protects against the bad is a big challenge.
Government and healthcare leaders are more adamant than ever about addressing racial bias, protecting safety and “getting it right.” Getting it wrong could harm patients, erode trust and potentially create legal liabilities for healthcare organizations.
We spoke with Dr. Sonya Makhni, medical director of the Mayo Clinic Platform and senior associate consultant for the Department of Hospital Internal Medicine, about recent developments with healthcare AI and discussed some of the key challenges of tracking performance, generalizability and clinical validity.
Makhni explained how healthcare AI models should be assessed for use, offering the use of readmissions AI as one example of the importance of understanding a specific model’s performance.
Q. What does it mean to deliver an AI solution in general?
A. An AI solution is more than just an algorithm – the solution also includes everything you need to make it work in a real workflow. There are a few key phases to consider when developing and delivering an AI solution.
First is the algorithm design and development phase. During this phase, solution developers should work closely with clinical stakeholders to understand the problem to be solved and the data that is available.
Next, the solution developers can start the process of algorithm development, which itself comprises many steps such as data procurement and preprocessing, model training and model testing (among several other important steps).
Following algorithm development, AI solutions need to be validated on 3rd party data and ideally performed by an independent party. An algorithm that performs well on the initial dataset may perform differently on a different dataset that represents different population demographics. External validation is a key step in understanding an algorithm’s generalizability and bias and should be completed for all clinical AI solutions.
Solutions also should be tested in clinical workflows, and this can be accomplished through pilot studies, prospective studies and trials and through ongoing real-world evidence studies.
Once an AI solution has been assessed for performance, generalizability, bias and clinical validity, we can start to think about how to integrate the algorithm into real clinical workflows. This is a critical and challenging step and requires significant consideration.
Clinical workflows are heterogeneous across health systems, clinical contexts, specialties and even end-users. It is important the prediction outputs are communicated to end-users at the right time, for the right patient, and in the right way. For example, if every AI solution required the end-user to navigate to a different external digital workflow, these solutions may not experience widespread adoption. Suboptimal integration into workflows may even perpetuate bias or worse clinical outcomes.
It is important to work closely with clinical stakeholders, implementation scientists, and human-factors specialists if possible.
Finally, a solution must be monitored and refined for as long as the algorithm is in deployment. The performance of algorithms can change over time, and it is critical that AI solutions are periodically (or in real-time) assessed for both mathematical performance and clinical outcomes.
Q. What are the points in the development of AI that can allow bias to creep in?
A. If leveraged effectively, AI can improve or even transform the way we diagnose and treat diseases.
However, assumptions and decisions are made during each step of the AI development lifecycle, and if incorrect these assumptions can lead to systematic errors. Such errors can skew the end result of an algorithm against a subgroup of patients and ultimately pose risks to healthcare equity. This phenomenon has been demonstrated in existing algorithms and is referred to as algorithmic bias.
For example, if we are designing an algorithm and choose an outcome variable that is inherently biased, then we may perpetuate bias through the use of this algorithm. Or, decisions made during the data preprocessing step might unintentionally negatively impact certain subgroups. Bias can be introduced and/or propagated during every phase, including deployment. Involving key stakeholders can help mitigate the risks and unintended impacts caused by algorithmic bias.
It is likely that almost all AI algorithms exhibit bias.
This does not mean that the algorithm can’t be used; it does highlight the importance of transparency in knowing where the algorithm is biased. An algorithm may perform well in one population and poorly in another; the algorithm can and should still be used in the former as it may improve outcomes. It would be best if it was not used with the population it performs poorly for, however.
Biased algorithms can still be useful, but only if we understand where it is appropriate and not appropriate to use them.
At Mayo Clinic Platform, we have developed a tool to validate algorithms and perform quantitative bias assessments so that we can help end-users better understand how to safely and appropriately use AI solutions in clinical care.
Q. What do AI users have to think through when they use tools like readmission AI?
A. Users of AI algorithms should use the AI development life cycle as a framework to understand where bias may potentially be introduced.
Ideally, users should be aware of the algorithm’s predictors and outcome variable if possible; this may be more challenging to do when using more complex algorithms, however. Understanding the variables used as inputs and outputs of an algorithm can help end-users detect erroneous or problematic assumptions. For example, an outcome variable may be chosen that is itself biased.
End-users should also understand the training population used during model development. The AI solution may have been trained on a population that is not representative of the population where the model is to be applied. This may be an indication to be cautious of the model’s generalizability. To that end, users should understand how well the algorithm performed during development and if the algorithm was externally validated.
Ideally, all algorithms should undergo a bias assessment – quantitative and qualitative. This can help users understand mathematical performance in different subgroups that vary by race, age, gender, etc. Qualitative bias assessments conducted by solution developers can help alert users to situations that may arise in the future as a result of potential algorithmic bias; knowledge of these scenarios can help users better monitor and mitigate unintentional inequities in performance.
Readmission AI solutions should be assessed on similar factors.
Specifically, users should understand if there are certain subgroups where performance varies. These subgroups could consist of patients of different demographics, or even of patients with different diagnoses. This will help clinicians evaluate if and when the model’s predicted output is most appropriate and reliable.
Q. How do you think about AI risk and risk management?
A. Commonly, we think about risk as operational and regulatory risk. These pieces relate to how a digital health solution adheres to privacy, security and regulatory laws and is critical to any assessment.
We should also begin to consider clinical, as well.
In other words, we should consider how an AI solution may impact clinical outcomes and what the potential risks are if an algorithm is incorrect or biased or if actions taken on an algorithm are incorrect or biased.
It is the responsibility of both the solution developers and the end-users to frame an AI solution in terms of risk to the best of their abilities.
There are likely many ways of doing this, and Mayo Clinic Platform has developed our own risk classification system to help us accomplish this where AI solutions undergo a qualification process before external use.
Q. How can clinicians and health systems engage in the process of creating and delivering AI solutions?
A. Clinicians and solution developers should work together collaboratively throughout the AI development lifecycle and through solution deployment.
Active engagement from both parties is necessary in predicting potential areas of bias and/or suboptimal performance. This knowledge will help clarify contexts that are better suited to a given AI algorithm and those that perhaps require more monitoring and oversight. All relevant stakeholders should also be engaged during the deployment phase, and AI solutions should be carefully monitored and refined as necessary.
Andrea Fox is senior editor of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.
Source link