AI in Healthcare: Sepsis Alerts Reveal the True AI Readiness Gap

Sepsis early warning systems in context

Sepsis is a leading cause of in-hospital mortality and cost, and earlier recognition and treatment consistently reduce deaths and complications. This has led hospitals to adopt automated early warning tools that continuously scan EHR data for early signs of deterioration. Over the past decade, these systems have evolved from simple rules-based triggers to machine learning models that use dozens of variables such as vital signs, lab values, comorbidities, and treatments to estimate sepsis risk in real time.^[1]

By the late 2010s, many large health systems in the United States were using sepsis prediction or alerting tools tightly integrated into their enterprise EHRs. These models were typically proprietary and treated as part of the EHR feature set rather than as standalone AI products, which meant that validation, monitoring, and governance practices often followed traditional IT patterns rather than lifecycle management approaches tailored for AI.^[2]

A surgeon's experience with a sepsis alert AI system

In 2024, I met a surgeon at a large academic hospital who described her experience with a sepsis early warning system built into the hospital's EHR. She mentioned the system generated automated alerts that her post-operative patients were septic, often at all hours of day and night, even when those patients were clinically stable. Over time, she found that the alerts were so often wrong that she came to treat them as "usually wrong" and largely ignored them in her daily practice. She mentioned that her colleagues reacted similarly.

Importantly, she was not aware that a machine learning model was driving the alerts. To her, the notifications appeared as generic EHR system messages about sepsis risk rather than as outputs of an AI model. This combination of degraded perceived usefulness and lack of transparency is a textbook example of model drift and alert fatigue in clinical AI: the model's performance deteriorates relative to the current patient population and care patterns, while frontline clinicians lose trust and mentally "decommission" the tool.^[3]^[4]

The anecdote is used as an illustrative case of an EHR-embedded sepsis early warning system whose behavior mirrors patterns documented in published evaluations of widely deployed sepsis prediction models.

What the evidence shows about sepsis prediction models

Research on sepsis prediction tools provides a quantitative backdrop that closely aligns with the surgeon's experience. A 2021 JAMA Internal Medicine study evaluated a widely implemented proprietary sepsis prediction model integrated into a commercial EHR across a large academic health system. The study found that the model:^[2]

Generated alerts on a substantial proportion of hospitalized patients.
Identified only a minority of sepsis cases that clinicians had not already recognized.
Demonstrated poor calibration and lower discrimination than hospitals had assumed based on vendor materials.

A companion JAMA Network Open analysis quantified sepsis alerts from a sepsis prediction tool across 24 U.S. hospitals before and during the COVID-19 pandemic. That study showed that the onset of COVID-19 was associated with about a 43 percent increase in sepsis alert volume, without a commensurate improvement in detection of sepsis cases, which raised concerns about increased alert burden and potential alert fatigue among clinicians. FierceHealthcare's coverage of this work summarized the findings by noting that the sepsis algorithm used within a major EHR system generated substantially more alerts during the pandemic, which may have contributed to alert fatigue and underscored the need for stronger oversight of clinical AI tools.^[5]^[6]^[7]

Taken together, these studies and reports describe a pattern that aligns with the surgeon's story: sepsis early warning systems embedded in EHRs can produce large volumes of low-yield alerts, miss many true sepsis cases, and become less reliable over time as data and practice shift.^[6]^[5]^[2]

Model drift and dataset shift in hospital AI

The STAT article "AI gone astray: How subtle shifts in patient data send popular algorithms reeling, undermining patient safety" provides a broader lens on model drift in hospital algorithms. The investigation shows how changes in patient populations, coding practices, and care patterns can cause the performance of widely used prediction models to degrade after deployment, even if they appeared acceptable in their original validation.^[3]

In the context of sepsis AI, several types of dataset shift are particularly relevant.

First, changes in sepsis definitions and diagnostic criteria, such as the adoption of Sepsis-3, alter how sepsis is labeled in the data over time.
Second, major disruptions like the COVID-19 pandemic introduce new illness patterns, treatment behaviors, and documentation practices that differ markedly from the data used to train or validate the original model.
Third, when a model is deployed across multiple institutions, local practice patterns, ordering behaviors, and data quality characteristics can differ significantly from the original training environment.^[8]^[9]^[5]^[6]^[2]

Without continuous monitoring, recalibration, and structured drift analysis, a sepsis prediction model that initially provides some value can gradually become miscalibrated. It may overpredict risk in certain subgroups, underpredict in others, and generate alerts in contexts where clinicians perceive little clinical relevance, thereby eroding trust. The surgeon's impression that the system had become "worse than random" captures the endpoint of this trajectory from the frontline clinician's perspective.^[10]

Alert fatigue and the human side of AI failure

Alert fatigue has long been recognized as a patient safety concern in hospitals, and it is particularly acute for sepsis alerts. When clinicians are exposed to repeated alarms that rarely change management or outcomes, they naturally begin to discount or ignore them, regardless of whether a specific alert is accurate.^[7]^[11]

The JAMA Network Open and multi-hospital analyses cited above show that sepsis alert volumes can be very high and can spike further when external conditions change, such as during the COVID-19 pandemic. FierceHealthcare's reporting notes that this pattern likely contributed to alert fatigue, as clinicians were forced to sift through more alerts, many of which did not correspond to true sepsis or actionable deterioration.^[5]^[6]^[7]

In the surgeon's case, repeated false or low-value alerts for clinically stable post-operative patients shifted her behavior from cautious attention to habitual dismissal. She no longer perceived the system as a helpful safety net; instead, it became another source of noise that competed with her clinical judgment and time. This represents not just a technical performance issue, but a failure in the design of the human-AI interaction and in the governance structures that should align the tool with clinician workflows and expectations.^[9]^[12]

Why this happened: Readiness, governance, and design gaps

The sepsis alert system in this case did not fail solely because of its algorithm; it failed in the broader socio-technical system in which it was deployed. Several gaps stand out.

First, AI readiness practices were incomplete. Many sepsis prediction tools were treated as static clinical decision support features rather than as evolving AI products requiring ongoing validation, recalibration, and clear performance thresholds over time. Hospitals often focused on one-time validation at go-live and configuration of thresholds, with less emphasis on longitudinal monitoring of calibration, sensitivity, and positive predictive value across different patient subgroups.^[10]^[2]
Second, governance structures for AI were underdeveloped. Responsibility for sepsis prediction tools frequently sat at the intersection of data science, clinical leadership, quality and safety, and IT, without a single accountable owner for lifecycle performance. In practice, this meant that systematic clinician feedback about false positives, alert fatigue, or missed cases did not consistently feed into structured model review, retraining decisions, or workflow redesign.^[9]^[3]
Third, transparency and training were insufficient. The Surgeon's lack of awareness that a machine learning model was driving the alerts reflects a broader pattern in which AI is "hidden" inside EHR workflows. Clinicians may see a risk score or alert without any explanation of how it is generated, which features drive it, or what its known limitations are. This opacity reduces their ability to interpret model behavior, recognize drift, or differentiate between predictable quirks and genuine failures.^[12]^[13]^[8]
Finally, journey mapping and service design around the alerts were shallow. Sepsis alerts affect multiple roles, including nurses, surgeons, hospitalists, intensivists, and rapid response teams, and they intersect with existing rounding, handoff, and escalation routines. If alert timing, channels, and content do not align with these real-world journeys, the system can feel intrusive, redundant, or misaligned with clinical priorities. In the surgeon’s scenario, alerts that repeatedly flagged stable post-operative patients as septic likely reflected a mismatch between model design and the nuanced realities of surgical recovery pathways.

How sepsis alert technologies are evolving

Despite these challenges, sepsis prediction technology has continued to evolve, with several recent efforts focusing on more rigorous deployment and monitoring. Prospective studies of advanced machine learning systems, such as targeted real-time early warning systems for sepsis, have shown that when alerts are well integrated into clinical workflows and clinicians engage with them promptly, they can be associated with lower mortality and improved processes of care.^[14]

Newer approaches also emphasize explainability and adaptive behavior. For example, some sepsis prediction algorithms have been designed to "learn to say I don't know," deferring judgment when model confidence is low to reduce the risk of overconfident false positives. Others use interpretable machine learning techniques to highlight which variables are driving a given risk estimate, making it easier for clinicians to understand and critique model outputs.^[15]^[8]

On the governance side, there is increasing recognition that AI deployment requires formal frameworks for continuous monitoring, drift detection, and post-deployment evaluation. Recent reviews of data-driven early warning systems for sepsis recommend that health systems treat predictive models as living assets with defined performance metrics, retraining triggers, and retirement criteria. This aligns with broader trends in AI regulation and professional guidance, which emphasize transparency, accountability, and lifecycle management for high-risk clinical AI systems.^[16]^[12]^[10]

Implications for life science and health technology leaders

For biopharma, medtech, diagnostics, and digital health organizations, this case offers several strategic lessons that extend beyond sepsis or hospital IT.

First, AI products and AI-enabled services need lifecycle management analogous to that used for therapies and devices. This includes planned post-deployment evidence generation, safety monitoring, and performance reassessment in new populations and under changing conditions. Organizations should define clear quantitative success metrics and acceptable ranges for model performance, track them over time and by subgroup, and pre-specify escalation pathways when metrics fall outside those ranges.^[16]^[10]
Second, training and change management must be treated as core components of AI deployment, not afterthoughts. Users need to understand what the model is intended to do, how it typically behaves, where it is likely to fail, and how to integrate its outputs with their own judgment. In practice, this means designing structured education for clinicians or customers, equipping field teams to explain AI behavior and limitations, and creating channels for sustained two-way feedback rather than one-time go-live training.^[13]^[17]
Third, governance for AI should be explicitly cross-functional. Data science and engineering teams, clinical or scientific leadership, quality and safety, regulatory, and commercial stakeholders should share a structured governance framework with clear decision rights over model updates, retraining, and decommissioning. This is particularly important when life science organizations deploy AI-enabled tools that influence diagnosis, risk stratification, or treatment decisions in partnership with provider systems.^[9]^[16]
Fourth, journey mapping and service design are non-negotiable. AI that does not fit into the real journeys of clinicians, patients, or other users will be ignored, regardless of its technical sophistication. Journey mapping the workflows, pain points, and decision moments for each role, and then designing AI interactions that genuinely reduce friction or risk at those points, is critical to realizing value and sustaining trust.

Lessons learned from a drifting sepsis alert AI

The Surgeon's experience with an EHR-embedded sepsis early warning system captures the human reality of model drift and alert fatigue in clinical AI. A system that was designed to enhance safety instead became a source of noise that clinicians learned to ignore, in part because its performance degraded over time and in part because the deployment did not adequately consider transparency, governance, and workflow fit.

For senior leaders across life sciences and healthcare, the core lesson is that AI readiness, governance, and user-centered design are as important as algorithm selection. High-impact AI systems should be treated as evolving, high-stakes assets with clear accountability, continuous monitoring, and deliberate integration into human workflows. When these elements are in place, sepsis prediction tools and other clinical AI systems are far more likely to deliver sustained value and earn the trust of the clinicians and patients they are meant to serve.^[2]^[5]^[10]

References

Nemati, S., Holder, A., Razmi, F., et al. "Effect of a Machine Learning–Based Severe Sepsis Prediction Algorithm on Patient Outcomes." BMJ Open Respiratory Research, vol. 4, no. 1, 2017, p. e000234. https://bmjopenrespres.bmj.com/content/4/1/e000234
Henry, K. E., Kornfield, R., Sridharan, A., et al. "External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients." JAMA Internal Medicine, vol. 181, no. 8, 2021, pp. 1065–1070. https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2781307
STAT. "AI gone astray: How subtle shifts in patient data send popular algorithms reeling, undermining patient safety." STAT, 27 Feb. 2022. https://www.statnews.com/2022/02/28/sepsis-hospital-algorithms-data-shift/
Yala, A., et al. "AI Gone Astray: Technical Supplement." arXiv, 28 Feb. 2022, https://arxiv.org/abs/2203.16452
Wong, A., Otles, E., Donnelly, J. P., et al. "Quantification of Sepsis Model Alerts in 24 US Hospitals Before and During the COVID-19 Pandemic." JAMA Network Open, vol. 4, no. 11, 2021, p. e2135286. https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2786356
Michigan Medicine. "Study of 24 U.S. hospitals shows onset of COVID-19 led to spike in sepsis alerts." Michigan Medicine Health Lab, 5 Dec. 2021. https://www.michiganmedicine.org/health-lab/study-24-us-hospitals-shows-onset-covid-19-led-spike-sepsis-alerts
FierceHealthcare. "Epic's sepsis algorithm may have caused alert fatigue with 43% alert increase during pandemic." FierceHealthcare, 22 Nov. 2021. https://www.fiercehealthcare.com/tech/epic-s-sepsis-algorithm-may-have-caused-alert-fatigue-43-alert-increase-during-pandemic
Niederer, S. A., et al. "Advances in Data-Driven Early Warning Systems for Sepsis Recognition and Intervention in Critical Care." Critical Care Research and Practice, 2025. https://www.cureus.com/articles/386005
Celi, L. A., Ghassemi, M., Stone, D. J. "Lessons in machine learning model deployment learned from sepsis." npj Digital Medicine, 2022. https://www.sciencedirect.com/science/article/pii/S2666634022003634
Biffi, C., et al. "Longitudinal Model Shifts of Machine Learning–Based Clinical Risk Prediction Models." JMIR Medical Informatics, 2024. https://www.jmir.org/2024/1/e51409/
Massachusetts Nurses Association. "State reports detail 11 patient deaths linked to alarm fatigue in Massachusetts." Massachusetts Nurses Association News, 29 Dec. 2011. https://www.massnurses.org/2011/12/29/state-reports-detail-11-patient-deaths-linked-to-alarm-fatigue-in-massachusetts/
Artificial Intelligence Ethics. "AI & Transparency: An Epic Deception." Ethics Unwrapped, University of Texas, https://ethicsunwrapped.utexas.edu/case-study/a-i-transparency-an-epic-deception
Henry, K. E., et al. "Epic Sepsis Model Inpatient Predictive Analytic Tool: A Validation Study." JAMA Network Open, vol. 6, no. 6, 2023, p. e2321499. https://pmc.ncbi.nlm.nih.gov/articles/PMC10317482/
Gao, H., & Cheng, B. "Prospective, multi-site study of patient outcomes after implementation of a machine learning-based clinical decision support tool." Nature Medicine, 2022.
Fan, Z., et al. "Artificial Intelligence Sepsis Prediction Algorithm Learns to Say 'I Don't Know.'" Journal of the American Medical Informatics Association, vol. 28, no. 9, 2021. https://pubmed.ncbi.nlm.nih.gov/34504260/
Transforming sepsis management: AI-driven innovations in early detection and management. Advances in Data-Driven Early Warning Systems for Sepsis, 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC12366378/
American Medical Association. "What It Takes for Doctors to Trust AI-Triggered Sepsis Alerts." AMA Practice Management Digital Health, Apr. 2025. https://www.ama-assn.org/practice-management/digital-health/what-it-takes-doctors-trust-ai-triggered-sepsis-alerts

When we're not ready for AI

Keep Reading

Talon Catalyst

Talon Catalyst