Every PubMed query I ran. What each one returned. Where the dead ends were. How the field map in Episode 02 was assembled. The methodology is the second deliverable.
Episode 02 of Road to CEPAS 2026 presents a map of the academic groups doing machine-learning work on late-onset sepsis (LOS) prediction in preterm infants. Seven groups, two systematic reviews, four anchor papers. The Substack post argues for what that map means.
This page is the working file. If you want to know how I got from "I'll search PubMed" to "here are the seven groups," this is the answer. Every search string is verbatim. Every result count is real. The dead ends, the corrections, and the moment the field's true shape became visible — all of it is here.
The point is not just transparency. The point is reproducibility. If you want to run the same search for your own corner of the literature, you can use this as a template.
The framing for the search, agreed before I touched the PubMed tool:
The narrowing matters. The broader question — "AI for neonatal sepsis" — is too big and includes early-onset sepsis prediction, which is a different clinical problem with stronger baseline predictors. LOS in preterm infants is where the field has done its most distinctive work, and it's the problem I work on. So the question was scoped accordingly.
The most obvious starting point: any neonatal sepsis paper using machine learning, last fifteen years.
Top 20 reviewed by abstract. The breakdown: nine reviews, roughly five off-target papers, six original prediction studies. Two systematic reviews surfaced — Kainth 2024 (which I already had on file) and Persad 2021. The top original work was overwhelmingly Branch B — clinical and laboratory features at the moment of clinical suspicion, not continuous physiological monitoring. The UVA/HeRO lineage was visible but not dominant.
This is where the first methodological alarm should have rung but didn't. The search returned mostly Branch B literature because that's where the ML keyword vocabulary lives. The continuous-vital-signs branch — which is what I actually work on — uses different terminology and was under-represented in the top results.
Tightening the search to LOS specifically, in preterm infants, with explicit ML methodology.
Berg 2023 — my group's paper — was in this set. So were Kausch 2023 (UVA), Yang 2024 (Eindhoven/Máxima), and an Antwerp paper I initially misremembered as "Innosens / Van Laerhoven."
A quick author check corrected the latter:
The paper is Meeus M, Beirnaert C, Mahieu L, Laukens K, Meysman P, Mulder A, Van Laere D. Clinical Decision Support for Improved Neonatal Care: The Development of a Machine Learning Model for the Prediction of Late-onset Sepsis and Necrotizing Enterocolitis. J Pediatr 2024;266:113869. The commercial spinoff is Innocens BV, not "Innosens."
Worth noting in this log: small details like that get publicly corrected by doing the search live. The previous mental model — "Van Laerhoven, Innosens" — had been carried for years without verification.
The PubMed "related articles" function uses word-weighted analysis of titles, abstracts, and MeSH terms — it's a computational similarity measure, not a citation graph. Useful for finding adjacent work that might not surface from keyword searches.
Two meaningful additions from the snowball:
No surprise new academic groups emerged from the snowball. At this point I thought I had the field. I was wrong about half of it — but it took the next stage to find out why.
The Kainth et al. 2024 systematic review (10.1097/INF.0000000000004409) had appeared in Stage 1 results. Reading it end-to-end produced the single most important methodological finding of the search.
The Kainth SR explicitly states: "We excluded studies that developed ML models using only electronically calculated vital parameter indices." In other words, the entire continuous-vital-signs branch of the literature — Berg, Kausch, Yang, Meeus, the HeRO heritage — is deliberately outside the scope of this systematic review.
The Kainth SR also references a separate review for the vital-signs branch: Persad et al. 2021. And there is a second systematic review specifically on HRC monitoring for LOS — Koppens et al. 2023 from Amsterdam UMC (10.1159/000531118) — which Stage 1 had not surfaced.
Two systematic reviews on the same clinical problem, published within months, with almost no overlap in cited literature. The reviews don't cite each other. They review the same disease in the same population through two different methodological lenses.
The lesson: searching the right field requires speaking its dialect. Branch A's vocabulary is "heart rate characteristics," "HRC index," "HRV," "visibility graph" — these terms don't reliably surface on machine-learning keyword searches. Three keyword searches had missed roughly half of the relevant literature.
Once the dialect problem was named, the fix was straightforward: search Branch A's terminology directly.
Three new groups, plus a synthesis paper:
One additional question worth answering definitively before declaring the map complete: is there a Chinese Branch A literature that the English-language searches missed? Chinese neonatology research is data-rich and methodologically capable; a priori it would be plausible that several Chinese groups are doing continuous-vital-signs ML on LOS.
Every Chinese paper using ML or statistical prediction for LOS in preterm infants is using clinical-features or biomarker nomograms, not continuous physiological monitoring. The one Chinese contributor to Branch A is the methodology side of the Yang 2024 paper — first author Meicheng Yang and senior co-author Chengyu Liu are at Southeast University Nanjing, but the clinical data is Dutch (Máxima MC).
Branch A LOS prediction for preterm infants is, almost entirely, a Western European and US research phenomenon. That's not a value judgment — it's a finding worth saying out loud, because it shapes what the field's evidence base actually represents.
Final stage: reading the four anchor model papers (Berg, Kausch, Yang, Meeus) plus three supporting papers (Sullivan 2021, Rio 2021, Koppens 2023 SR) end-to-end and comparing them across eight dimensions: cohort design, signal stack, ML methodology, validation rigor, performance, alarm policy, honest limitations, and unique contributions.
That comparison is the technical core of Episode 03 — A Field Full of ROC Curves. The full comparative table lives there. The findings drive the Episode 02 synthesis.
For Episode 02 specifically, the deep-read confirmed three things visible only when the papers are read together:
By the end of five staged searches and roughly 200 abstracts triaged, the field map for Branch A — continuous-physiology ML for LOS in preterm infants — stabilised at seven academic groups plus two synthesis papers. The groups, anchor papers, and DOIs are listed on the main /sepsis page under the Literature base section.
The Substack post for Episode 02 — Mapping the Field — is the readable narrative of what this map means. This page is the receipts.
One small thing to flag honestly: the search continued across multiple chat sessions and was iterative. Stages 1–3 happened in one continuous live session with verbatim PubMed queries; Stages 3.5–4 happened in subsequent conversations including the comparative deep-read after I obtained full text of the four paywalled anchor papers. The narrative above reconstructs the full methodology, not a single chat session.
If you want to reproduce the search yourself, the queries above are sufficient. If you spot a group I missed or a paper I should have anchored on, the contact channel is Substack comments — links below.
Episode 03 — A Field Full of ROC Curves. The eight-dimension comparative table across the four anchor model papers, with explicit attention to signal-stack choices and alarm policy design. Those decisions matter more than the ML methodology and almost nobody writes about them clearly. (Now published.)
Episode 04 — Our Own Work, Warts and All. Critical self-appraisal of the Berg 2023 paper. The Episode 02 framing claims WKZ contributed an alarm-fatigue convention that the field adopted within 12 months. Episode 04 is the balance — what we got wrong, what didn't generalise, and what we'd do differently now.