The promise of big data and predictive analytics to provide new predictive models and improve the practice of medicine must overcome 4 major hurdles, including identification of risk-sensitive decisions, calibration, user trust, and data quality and heterogeneity, before it can be considered of clinical value, according to research published in JAMA.

Nilay D. Shah, PhD, from the Department of Health Sciences Research at the Mayo Clinic in Rochester, Minnesota, and colleagues reviewed some of the caveats associated with big data.

With the regular use of electronic health records (EHRs), enormous amounts of healthcare data have been generated. Natural language processing now allows the use of unstructured data found in clinical notes, as well as structured data. Connections between various data sources such as registries, claims data, and EHRs have created a platform on which big data can operate.

Continue Reading

Yet the authors suggest that several things need to be addressed before the information generated by big data can be considered useful. The first of these is thoughtful identification of risk-sensitive decisions. The authors point out that there are more than 1000 cardiovascular clinical prediction models, but only a few are used regularly to support clinical decision-making. The authors contend that prediction modelers rarely give attention to the formal properties of clinical decisions that would make them risk-sensitive. They note that only when a clear decision can be made and the threshold of risk for that decision is close to the population average risk is a prediction model likely to be used to make clinical decisions.

Another barrier is calibration. Statistical performance may be broken down into discrimination (do individuals with the end point have higher-risk predictions than those without?) and calibration (do X number of patients with a risk prediction of X% develop the end point in question?). In practice, measures of discrimination are stressed over calibration, and model calibration is often less stable than discrimination. However, it remains key to appropriate decision making. The authors argue that poor calibration can lead to harmful decisions.

Related Articles

User trust, transparency, and commercial interests represent the third barrier. The authors note that businesses may have a vested interest in particular outcomes. Furthermore, once healthcare organizations invest in data, tools, and personnel in hopes of determining outcomes using big data, there is often pressure to find rapid results. Of great concern is the potential harm that may ensue from wasted resources and inaccurate prediction that supports poor decisions.

Finally, the quality of the predictive models is entirely dependent on the quality of the underlying data: EHR data may be incomplete or missing entirely; coding for billing purposes may supply data, but they may be inaccurate or incomplete; there is poor standardization of EHR data; and structures and information concerning the natural language-processing algorithms are usually not available.

The authors call for an independent agency that certifies prediction models and approaches to integrate them into clinical practice. They note that there is a need for thorough studies to assess the effect of prediction models on healthcare decisions and patient outcomes.


Shah ND, Steyerberg EW, Kent DM. Big data and predictive analytics: recalibrating expectations. JAMA. 2018;320:27-28.