Article Text
Statistics from Altmetric.com
Introduction
The emergence of machine learning (ML), the application of the more general field of artificial intelligence (AI), to automate statistical inference to detect patterns in data, has opened up entire new domains of complex data analysis. This is especially applicable in high-dimensionality datasets, where patterns and relationships are often not immediately apparent to investigators. ML approaches are increasingly promoted in medical research as a driver for a new generation of rapidly gained, data-derived, scientific insights, as listed in the recent Topol review of technology in the NHS.1
New innovations in implementation and portability of AI in healthcare mean clinicians are already involved in developing ML applications, or incorporating ‘off the shelf’ existing applications into clinical workflows. Investigators must work with frontline teams to understand some of the inherent limitations, biases and exclusions of these applications when critically appraising their utility in the clinical domain.
ML algorithms develop rules and hypotheses based on data. Large numbers of features, such as patient observations, keywords and syntheses of combined parameters, are combined in models to optimise against a chosen outcome. Investigators aim to build a model based on existing data, that will perform equally as well on future data.
Bias exists in human society as an unreasonable assumption or prejudice towards a person or belief. In contrast, bias in ML exists as ‘the error that is introduced by approximating a real life problem, which may be extremely complicated, by a much simpler model’.2 An ML model may therefore be mathematically accurate (unbiased) to a biassed dataset, and vice versa.
The Academy of Royal Medical Colleges has recently produced guidance on using ML in healthcare, and bias is one of its key concerns.3
Investigators must be cognisant of whether they seek to faithfully replicate the societal biases that exist within their …
Footnotes
Twitter @stumaitland
Contributors SM is the lead author, guarantor, and prepared the manuscript based on experience performing research in Neuroinformatics, and utilising machine learning on a regular basis. PME contributed and reviewed the manuscript, provided perspective from Irish practice. RB is the senior author who reviewed the manuscript and directed patient involvement. Used experience as Digital Lead for NIHR North East/Cumbria to express pitfalls in clinical medicine.
Funding SM is funded by the NIHR Newcastle Biomedical Research Centre awarded to the Newcastle upon Tyne Hospitals NHS Foundation Trust and Newcastle University.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.