The AMIA 2022 AI Showcase has been devised as a 3-stage submission process where participants will be assessed at submissions 1 was 2 for their progression to the 2nd and 3rd presentations. The stages coincide with the three AMIA conferences:
Informatics Summit, March 21-24, 2022; Clinical Informatics Conference (CIF), May24-26 2022; and, the Annual Symposium, November 5-9, 2022.
Submission requirements for each stage are:
Stage 1. System description, results from a study of algorithm performance, and an outline of the methods of the full evaluation plan
Stage 2. Submission to address usability and workflow aspects of the AI system, including prototype usability or satisfaction, algorithm explainability, implementation lessons, and/or system use in context.
Stage 3. Submissions to summarize the comprehensive evaluation to include research from prior submissions with new research results that measure the impact of the tool.
Now that we have the abstracts for the Stage 1 conference we can analyse the extent to which the community of practice could satisfy those criteria. As well we can identify how the abstracts might fall short of the submission requirements and so aid the authors in closing the gap between their submission and the putative requirements. As well they will be able to ensure their poster or presentation fill some of the gaps where they have the information.
There is also the open question of what effect the “call to arms” in the Showcase was effective and how it might be improved in the future.
A succinct definition of the desirable contents of an Abstract right well be:
- What was done?
- Why was it done?
- What are the results?
- What is the practical significance?
- What is the theoretical significance?
The Clinical AI Showcase has now provided a set of abstracts for the presentations and posters so how might we rate the informativeness of the abstracts against these criteria.
A review of the 22 abstracts has been conducted and a summary based on these 5 content attributes with 4&5 being merged has been provided in Table 2. The summaries have been designated by their key themes and collated into Table 1.
Using the publicised criteria for each of the three Stages of the AI Showcase it would appear that only 9 of the submissions are conformant, i.e. the papers categorised as Extended ML model, Development of ML model and Developed and Deployed. It is an open question as to whether the abstracts classed as Comparison of ML models fulfils Stage 1 criteria. The authors would do well to clarify the coverage of their work in their full presentation/poster.
The other categories appear to exist in a certain type of limbo. The Methods study only group appears out of place given the objectives of the AI Showcase. The Design only group would appear to be stepping out somewhat prematurely, although the broader advertising for the Showcase certainly encouraged early stage entries. As mentioned in my previous blog (blood reference) it would be exceedingly difficult for teams to meet the purported content for deadlines for the Stages 1 and 2 if the ideas for the project are in an embryonic design stage.
Teams with nearly or fully completed projects were able to submit works that also fulfilled many of the criteria for Stage 2 of the Showcase. The Developed and Deployed group showed progression in their projects that had reached deployment but in no case reported usability or workflow aspects with the exception of one paper that claimed their solution was installed at the bedside.
Two abstracts did not describe clinical applications of their ML but rather secondary use and these papers were doing NLP.
Good Abstract Writing
Most abstracts provided reasonable descriptions of the work they had done or intended to do. It was rare for abstracts to describe their results or the significance of their work, this undoubtably can be corrected in Stages 2 or 3 of the Showcase where they are required to report on their assessment of their tools practical use. Only one paper provided information on all four desirable abstract content items.
What can the Showcase Learn and do better
This Showcase has the admirable objective of encouraging researchers and clinical teams to perform AI projects to a better quality and in a more conclusive manner. However its Stages cover a cornucopia of objectives set out in a timeline that is unrealistic for projects just starting and poorly co-ordinated for projects near to or at completion. This is well evidenced by the some 40+ ML projects included in the Conference programme that are not part of the Showcase. If the Showcase is to continue, as it should, then a more considered approach to staged objectives, encouragement of appropriate teams, and more thoughtful timing would be a great spur to its future success.
Might I Humbly Suggest (MIHS) that a more refined definition of the stages be spelled out so that
a. groups just starting ML projects are provided with more systematic guidelines and milestones, and;
b. that groups in the middle of projects can ensure that they have planned for an appropriate level of completeness to their work.
Stage 1. What is the intended deliverable and Why it is necessary – Which clinical community has agreed to deployment and assessment.
Stage 2. What was done in the development – How the deliverable was created and what bench top verifications were conducted.
Stage 3. Deployment and Clinical Assessment – What were the issues in deployment. What were the methods and results of the clinical assessment. What arrangements have been made for the maintenance and improvement of the deliverable.
This definition excludes groups performing ML projects purely for their own investigative interest but without a specific participating clinical community. The place for their work is within the general programme of AMIA conferences. It also means that strictly speaking only 3 of the current acceptances would fit this definition for Stage 1, although 3 of the others could be contracted to fit this definition.
A concerning factor in the current timeline design is the short 2-month span between the deliverables for Stages 1 and 2. A group would pretty much have to have completed Stage 2 to submit to Stage 1 and be ready to submit to Stage 2 in 2 months.
Lastly the cost of attending 3 AMIA conferences in the one year would be excessively taxing especially to many younger scholars in the field. AMIA should provide a two-thirds discount to each conference to those team representatives who commit to entering the Showcase. This would be a great encouragement to get more teams involved.
|Comparison of ML models||5|
|Extended ML model||1|
|Development of ML model||5|
|Developed and Deployed – No operational assessment||3|
|Methods study only||1|
|Paper||What was done||Why was it done||What are their results||What is the significance||Comments||Category|
|Overgaard – CDS Tool for asthma – Presentation||Design of desirable features of AI solution.||to make review of the patient record more efficient.||Unspecified||Unknown – no clinical deployment.||Paper is about the putative design for a risk assessment tool and data extraction from the EHR.||Design only|
|Rossi – 90-day Mortality Prediction Model – Presentation||Deployed 90-day mortality prediction model. **||to align of patient preferences for advance care directives with therapeutic delivery, and improve rates of hospice admission and length of stay.||Unspecified||Unknown – clinical deployment planned.||Model is partially implemented with operationally endorsed workflows.||Development of ML model – Planned Deployment|
|Estiri – Unrecognised Bias in COVID Prediction Models – Presentation||Investigation of four COVID-19 prediction models for bias using an AI evaluation framework. **||AI algorithm biases could exacerbate health inequalities||Unspecified||Unknown – no clinical deployment.||Two bias topics are defined : (a) if the developed models show biases; (b) has the bias changed over time for patients not used in the development.||Comparison of ML models|
|Liu – Explainable ML to predict ICU Delerium – Presentation||A range of features were identified and a variety of MLs evaluated. Three prediction models were evaluated 6,12,24 hours. ****||To more accurately predict the onset of delirium||Described but numerics not provided||Implied due to described implementation design||Paper describes some aspect of all characteristics but not always completely.||Comparison of ML models|
|Patel – Explainable ML to predict periodontal disease – Presentation||“new” variables added to existing models revealed new associations.||Discover new information about risk factors.||Described but associations not provided||Unknown. No clinical assessment of the discovered associations, no clinical deployment.||AI methods not described.||Extended ML model|
|Liang – Wait time prediction for ER/ED – Virtual||Develop ML classifier (Tensorflow) to predict ED/ER wait times. Training and test sets described.||No explanation||Unspecified||unknown -clinical deployment status unclear||Focus is on the ML processes with little other information||Development of ML model|
|Patrick – Deep Understanding for coding pathology reports – Virtual||Built a system to identify cancer pathology reports and code them for 5 data items (Site, Histology, Grade, Behaviour, laterality)||California Cancer registry requested an automated NLP pipeline to improve production line efficiencies.||Various accuracies provided||Improvements over manual case identification and coding provided.||The work of this blog author.||Developed and Deployed – no operational assessment – Not clinical application|
|Eftekhari/Carlin – ML sepsis prediction for hematopoietic cell transplant recipients – Poster||Deployed an early warning system that used EMR data for HCT patients. Pipeline of processing extending to clinal workflows.||Sepsis in HCT patients has a different manifestation to sepsis in other settings.||Only specified results is deployment||Unknown – no clinical assessment.||Deployment is described showing its complexity. No evaluations.||Developed and Deployed – no operational assessment|
|Luo – Evaluation of Deep Phenotype concept recognitions on external EHR datasets – Poster||Recognises human phenotype Ontology concepts in biomedical al texts||No explanation||Unspecified||Unknown – no clinical deployment.||Abstract is the least informative. One sentence only.||Development of ML model – Not clinical application|
|Pillai – Quality Assurance for ML in Radiation Oncology – Poster||Five ML models were built and voting system devised to decide if a radiology treatment plan was Difficult or No Difficult. Feature extraction was provided.||To improve clinical staffs scrutiny of difficult plans to reduce errors downstream. Feature extraction to improve interpretability and transparency. ****||Unspecified||System planned to be integrated into clinical workflow.||Mostly about ML approach but shows some forethought into downstream adoption.||Comparison of ML models – Deployment planned|
|Chen – Validation of prediction of Age-related macular degeneration – Poster||ML Model to predict later AMD degeneration using 80K images from 3K patients.||to predict the risk of progression to vision-threatening late AMD in subsequent years||Unspecified||Unknown – no clinical deployment.||Focus is on the ML processes with little other information.||Development of ML model|
|Saleh – Comparison of predictive models for paediatric deterioration – Poster||Plan to develop and implement ML model to augment prediction of paediatric clinical deterioration within the clinical workflow.||Detecting deterioration in paediatric cases is effective at only 41% using existing tools.||Unspecified – planning stage only||System planned to be integrated into clinical workflow.||Early conceptualisation stage. Well framed objective and attentive to clinical acceptability. No framing of datasets, variables and ML methods.||Design only|
|Shah – ML for medical education simulation of chest radiography – Poster||not available|
|Mathur – Translational aspects of AI – Poster||Evaluation of the TEHAI framework compared to other frameworks for AI with emphasis on translational and ethical features of model development and its deployment.||A lack of standard training data and the clinal barriers to introducing AI into the workplace warrant the development of a AI evaluation framework.||Qualitative assessment of 25 features by reviewers.||No in vitro evaluation – only qualitative assessment||This is an attempt to improve the evaluation criteria we should be suing on AI systems. it fails to make convincing case it isa better method than alternatives.||Methods study only|
|Yu – Evaluating Pediatric sepsis predictive model – Poster||not available|
|Tsui – ML prediction for clinical deterioration and intervention – Poster||Built a ML for an Intensive care warning system for deterioration events and pharmacy interventions. It uses bedside monitor, and EHR data providing results in real-time.||No explanation||Unspecified||Unknown – no clinical assessment.||Operational system. Only a description of the deliverables – no evaluations||Developed and Deployed – no operational assessment|
|Rasmy – Evaluation of DL model for COVID-19 outcomes – Poster||A DL algorithm developed to predict for COVID-19 cases on admission: in-hospital mortality, need for mechanical ventilation, and long hospitalization.||No explanation||Unspecified – no numerics supplied||unknown – no clinical deployment||Seems to concentrate solely on the DL modelling.||Development of ML model|
|Wu – ML for predicting 30-day cancer readmissions – Poster||ML models built to identify 30-day unplanned admissions for cancer patients.||Unplanned dance readmissions have significantly poorly outcomes so the aim is to reduce them.||No Results but promised in the poster/presentation||Unknown – no clinical assessment.||No ML details just a justification in Abstract||Comparison of ML models|
|Mao – DL model for Vancomycin monitoring – Poster||A DL pharmacokinetic model for Vancomycin was compared to a Bayesian model||To provide a more accurate model of Vancomycin monitoring.||The DL model performed better than the Bayesian model.. No numerics provided.||Unknown – no clinical assessment.||Focus is on the ML processes with little other information.||Comparison of ML models|
|Ramnarine – Policy for Mass Casualty Trauma triage – Poster||Design of a strategy to build an ML for ER/ED triage and retriage categorisation for mass casualty incidents||To fill a void in national standards for triage and retriage||Unspecified – design proposal only||Unknown – no clinical assessment of practicality of acceptance.||This is a proposal with no concrete strategy for implementation and what would be used in the investigation from either a data source of ML strategy type.||Design only|