— Episode 02 · Methodology log

The search, on screen · receipts for the field map

How I searched the literature for late-onset sepsis prediction models.

Every PubMed query I ran. What each one returned. Where the dead ends were. How the field map in Episode 02 was assembled. The methodology is the second deliverable.

Why this page exists.

Episode 02 of Road to CEPAS 2026 presents a map of the academic groups doing machine-learning work on late-onset sepsis (LOS) prediction in preterm infants. Seven groups, two systematic reviews, four anchor papers. The Substack post argues for what that map means.

This page is the working file. If you want to know how I got from "I'll search PubMed" to "here are the seven groups," this is the answer. Every search string is verbatim. Every result count is real. The dead ends, the corrections, and the moment the field's true shape became visible — all of it is here.

The point is not just transparency. The point is reproducibility. If you want to run the same search for your own corner of the literature, you can use this as a template.

The question I started with.

The framing for the search, agreed before I touched the PubMed tool:

What is the current state of evidence for AI/ML models predicting late-onset sepsis in preterm neonates before clinical recognition?

The narrowing matters. The broader question — "AI for neonatal sepsis" — is too big and includes early-onset sepsis prediction, which is a different clinical problem with stronger baseline predictors. LOS in preterm infants is where the field has done its most distinctive work, and it's the problem I work on. So the question was scoped accordingly.

Stage 1 — The broad cast.

The most obvious starting point: any neonatal sepsis paper using machine learning, last fifteen years.

Stage 1 · PubMed search

Broad ML for neonatal sepsis

query "neonatal sepsis" AND ("machine learning" OR "artificial intelligence" OR "deep learning" OR "prediction model" OR algorithm)

101 papers · 2010 to present · sorted by relevance

Top 20 reviewed by abstract. The breakdown: nine reviews, roughly five off-target papers, six original prediction studies. Two systematic reviews surfaced — Kainth 2024 (which I already had on file) and Persad 2021. The top original work was overwhelmingly Branch B — clinical and laboratory features at the moment of clinical suspicion, not continuous physiological monitoring. The UVA/HeRO lineage was visible but not dominant.

This is where the first methodological alarm should have rung but didn't. The search returned mostly Branch B literature because that's where the ML keyword vocabulary lives. The continuous-vital-signs branch — which is what I actually work on — uses different terminology and was under-represented in the top results.

Stage 2 — Narrowing to the right problem.

Tightening the search to LOS specifically, in preterm infants, with explicit ML methodology.

Stage 2 · PubMed search

LOS · preterm · ML

query ("late-onset sepsis" OR "late onset sepsis") AND (preterm OR premature OR VLBW) AND ("machine learning" OR "artificial intelligence" OR "deep learning" OR algorithm OR "prediction model")

40 papers · much tighter · four anchor papers immediately visible

Berg 2023 — my group's paper — was in this set. So were Kausch 2023 (UVA), Yang 2024 (Eindhoven/Máxima), and an Antwerp paper I initially misremembered as "Innosens / Van Laerhoven."

A quick author check corrected the latter:

query · author check "Van Laerhoven"[Author] AND (sepsis OR neonatal)

0 papers — wrong name

query · corrected "Van Laere"[Author] AND sepsis

3 papers · Meeus et al. 2023 confirmed · senior author David Van Laere

The paper is Meeus M, Beirnaert C, Mahieu L, Laukens K, Meysman P, Mulder A, Van Laere D. Clinical Decision Support for Improved Neonatal Care: The Development of a Machine Learning Model for the Prediction of Late-onset Sepsis and Necrotizing Enterocolitis. J Pediatr 2024;266:113869. The commercial spinoff is Innocens BV, not "Innosens."

Worth noting in this log: small details like that get publicly corrected by doing the search live. The previous mental model — "Van Laerhoven, Innosens" — had been carried for years without verification.

Stage 3 — Snowball from the four anchors.

The PubMed "related articles" function uses word-weighted analysis of titles, abstracts, and MeSH terms — it's a computational similarity measure, not a citation graph. Useful for finding adjacent work that might not surface from keyword searches.

Stage 3 · PubMed find_related_articles

Snowball from the four anchors

seeds Berg 2023 (PMID 37369173) · Kausch 2023 (36593281) · Yang 2024 (39047574) · Meeus 2023 (38065281)

354 related PMIDs returned · top 30 reviewed

Two meaningful additions from the snowball:

Sullivan 2021 — the UVA precursor paper to Kausch 2023. 408 VLBW infants across three NICUs. The "ABDs equally common in LOS and sepsis-ruled-out" finding (47% vs 48%) lives in this paper. Foundation work for POWS.
Dierikx 2022 — a Dutch/Belgian antibiotic stewardship paper I was a co-author on. Adjacent rather than central to the prediction-model question.

No surprise new academic groups emerged from the snowball. At this point I thought I had the field. I was wrong about half of it — but it took the next stage to find out why.

Stage 3.5 — The Kainth SR reveals the second branch.

The Kainth et al. 2024 systematic review (10.1097/INF.0000000000004409) had appeared in Stage 1 results. Reading it end-to-end produced the single most important methodological finding of the search.

The Kainth SR explicitly states: "We excluded studies that developed ML models using only electronically calculated vital parameter indices." In other words, the entire continuous-vital-signs branch of the literature — Berg, Kausch, Yang, Meeus, the HeRO heritage — is deliberately outside the scope of this systematic review.

The Kainth SR also references a separate review for the vital-signs branch: Persad et al. 2021. And there is a second systematic review specifically on HRC monitoring for LOS — Koppens et al. 2023 from Amsterdam UMC (10.1159/000531118) — which Stage 1 had not surfaced.

Two systematic reviews on the same clinical problem, published within months, with almost no overlap in cited literature. The reviews don't cite each other. They review the same disease in the same population through two different methodological lenses.

Branch A — continuous physiological monitoring (HRC, HRV, HR/SpO2 time series). Reviewed by Persad 2021 and Koppens 2023. This is where Berg, Kausch, Yang, Meeus, and the HeRO/Moorman work live.

Branch B — snapshot clinical and laboratory features at the moment of clinical suspicion. Reviewed by Kainth 2024.

The lesson: searching the right field requires speaking its dialect. Branch A's vocabulary is "heart rate characteristics," "HRC index," "HRV," "visibility graph" — these terms don't reliably surface on machine-learning keyword searches. Three keyword searches had missed roughly half of the relevant literature.

Stage 3.6 — Targeted terminology recovers the missing groups.

Once the dialect problem was named, the fix was straightforward: search Branch A's terminology directly.

Stage 3.6 · PubMed search

Vital-signs / HRC / HRV terminology

query ("heart rate variability" OR "heart rate characteristics" OR "vital signs" OR "physiological monitoring") AND ("late-onset sepsis" OR "late onset sepsis" OR LOS) AND (preterm OR neonate)

67 papers · several previously missed

Three new groups, plus a synthesis paper:

Leon et al. 2021 (Rennes, France) — visibility-graph HRV analysis. 49 preterm infants. AUC 87.7% at 6h pre-antibiotic. Methodologically distinctive among the Branch A groups.
Rio et al. 2021 (Lausanne, Switzerland) — the only independent external validation of the commercial HeRO score outside the US. Showed strong gestational-age-dependent performance drift.
Honoré et al. 2023 (Karolinska / KTH Stockholm) — Naive Bayes combining HR + RR + SpO2 + clinical features. Bridges both branches. The senior co-authors had written the Persad 2021 SR.
Koppens et al. 2023 SR (Amsterdam UMC) — the second systematic review identified above.

Stage 3.7 — Negative finding: the Chinese literature.

One additional question worth answering definitively before declaring the map complete: is there a Chinese Branch A literature that the English-language searches missed? Chinese neonatology research is data-rich and methodologically capable; a priori it would be plausible that several Chinese groups are doing continuous-vital-signs ML on LOS.

Stage 3.7 · PubMed search

Chinese affiliation check

query China[Affiliation] AND ("late-onset sepsis" OR "late onset sepsis") AND (preterm OR VLBW OR "very low birth weight") AND ("prediction" OR "machine learning" OR "model" OR "nomogram" OR "algorithm")

15+ Chinese LOS prediction papers · all Branch B

Every Chinese paper using ML or statistical prediction for LOS in preterm infants is using clinical-features or biomarker nomograms, not continuous physiological monitoring. The one Chinese contributor to Branch A is the methodology side of the Yang 2024 paper — first author Meicheng Yang and senior co-author Chengyu Liu are at Southeast University Nanjing, but the clinical data is Dutch (Máxima MC).

Branch A LOS prediction for preterm infants is, almost entirely, a Western European and US research phenomenon. That's not a value judgment — it's a finding worth saying out loud, because it shapes what the field's evidence base actually represents.

Stage 4 — Comparative deep-read of the anchor papers.

Final stage: reading the four anchor model papers (Berg, Kausch, Yang, Meeus) plus three supporting papers (Sullivan 2021, Rio 2021, Koppens 2023 SR) end-to-end and comparing them across eight dimensions: cohort design, signal stack, ML methodology, validation rigor, performance, alarm policy, honest limitations, and unique contributions.

That comparison is the technical core of Episode 03 — A Field Full of ROC Curves. The full comparative table lives there. The findings drive the Episode 02 synthesis.

For Episode 02 specifically, the deep-read confirmed three things visible only when the papers are read together:

Technical convergence. Across signal stacks from 1/hour to 250 Hz, cohort sizes from 51 to 3,151, and ML methods from logistic regression to deep learning, AUCs cluster at 0.78–0.88. Roughly half to two-thirds of LOS episodes are detectable hours before clinical recognition.
Method consensus. Every paper that compared multiple ML methods (Berg, Kausch, Yang) found them roughly equivalent within 0.01 AUC. Two of three explicitly chose logistic regression for interpretability. This is not a deep-learning-shaped problem.
Deployment vacuum. Zero prospective trials. One 14-year-old RCT. One independent external validation (Rio 2021) showing real-world performance drift. The Koppens SR closes with an explicit call for an international RCT.

What this produced.

By the end of five staged searches and roughly 200 abstracts triaged, the field map for Branch A — continuous-physiology ML for LOS in preterm infants — stabilised at seven academic groups plus two synthesis papers. The groups, anchor papers, and DOIs are listed on the main /sepsis page under the Literature base section.

The Substack post for Episode 02 — Mapping the Field — is the readable narrative of what this map means. This page is the receipts.

One small thing to flag honestly: the search continued across multiple chat sessions and was iterative. Stages 1–3 happened in one continuous live session with verbatim PubMed queries; Stages 3.5–4 happened in subsequent conversations including the comparative deep-read after I obtained full text of the four paywalled anchor papers. The narrative above reconstructs the full methodology, not a single chat session.

If you want to reproduce the search yourself, the queries above are sufficient. If you spot a group I missed or a paper I should have anchored on, the contact channel is Substack comments — links below.

What's next.

Episode 03 — A Field Full of ROC Curves. The eight-dimension comparative table across the four anchor model papers, with explicit attention to signal-stack choices and alarm policy design. Those decisions matter more than the ML methodology and almost nobody writes about them clearly. (Now published.)

Episode 04 — Our Own Work, Warts and All. Critical self-appraisal of the Berg 2023 paper. The Episode 02 framing claims WKZ contributed an alarm-fatigue convention that the field adopted within 12 months. Episode 04 is the balance — what we got wrong, what didn't generalise, and what we'd do differently now.