Article Text

Download PDFPDF

Bridging the implementation gap of machine learning in healthcare
  1. Martin G Seneviratne1,
  2. Nigam H Shah1,
  3. Larry Chu2
  1. 1Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California, USA
  2. 2Anesthesia, Stanford University, Stanford, California, USA
  1. Correspondence to Dr Nigam H Shah, 1265 Welch Road, Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA 94305, USA; nigam{at}stanford.edu

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Applications of machine learning on clinical data are now attaining levels of performance that match or exceed human clinicians.1–3 Fields involving image interpretation—radiology, pathology and dermatology—have led the charge due to the power of convolutional neural networks, the existence of standard data formats and large data repositories. We have also seen powerful diagnostic and predictive algorithms built using a range of other data, including electronic health records (EHR), -omics, monitoring signals, insurance claims and patient-generated data.4 The looming extinction of doctors has captured the public imagination, with editorials such as ‘The AI Doctor Will See You Now’.5 The prevailing view among experts is more balanced: that doctors who use artificial intelligence (AI) will replace those who do not.6

Amid such inflated expectations, the elephant in the room is the implementation gap of machine learning in healthcare.7 8 Very few of these algorithms ever make it to the bedside; and even the most technology-literate academic medical centres are not routinely using AI in clinical workflows. A recent systematic review of deep learning applications using EHR data highlighted the need to focus on the last mile of implementation: ‘for direct clinical impact, deployment and automation of deep learning models must be considered’.9 The typical life-cycle of an algorithm remains: train on historical data, publish a good receiver-operator curve and then collect dust in the ‘model graveyard’.

This begs the question: if model performance is so promising, why is there such a chasm between development and deployment? In order …

View Full Text