Performance with Metrics in Early Detection

Nov 17

The field of blood-based early cancer detection is advancing at remarkable speed. Yet with this rapid progress comes a growing challenge: without shared definitions and consistent terminology, we risk confusing clinicians, regulators, and ultimately the patients these innovations aim to help.

This was the focus of the recent Early Detection Summer Seminar hosted by the BLOODPAC Early Detection and Screening Working Group. Co-chair Christina Clarke (GRAIL) opened the session by highlighting the group’s mission: to align the field on a common lexicon and standard frameworks for clinical validation of blood-based early detection tests. “Right now, we’re not all using the same words for the same concepts,” Clarke noted. “As multi-cancer early detection (MCED) tests and pan-cancer approaches emerge, we must get on the same page about core performance metrics like sensitivity and specificity.”

The seminar featured Professor Peter Sasieni, Joint Lead of the Centre for Cancer Screening, Prevention, and Early Diagnosis at Queen Mary University of London, and Principal Investigator of the NHS-Galleri study. His talk, Modernizing Performance Statistics for Multi-cancer Screening Tests, underscored how cancer screening must be evaluated differently from traditional diagnostic testing.

Sasieni explained that screening is not simply a diagnostic test applied earlier in time. In diagnostic trials, every participant’s disease status is eventually confirmed. In screening, by contrast, it is both unethical and impractical to perform full diagnostic workups on everyone, particularly those with negative results. Screening also targets disease that exists on a continuum. Cancers are often invisible to imaging or biomarkers when they are very small, and the ability to detect them changes with size, stage, and subtype. Whereas a diagnostic test identifies what is already present and does not need to predict what happens in the future, a screening test must detect disease before symptoms arise, ideally at a point when interventions can change outcomes. As Sasieni put it, a screening test does not need to be perfect; it needs only to identify a subgroup for whom further testing is worthwhile.

To make this more tangible, he described screening as a sieve. A large population of asymptomatic individuals is progressively filtered, enriching for those most likely to harbor cancer. The fecal immunochemical test (FIT) for colorectal cancer illustrates the point. In symptomatic patients, any trace of blood is suspicious, but in a screening program thresholds are set to determine which patients require further investigation. FIT is therefore used to rule in cancer among those not seeking medical help, identifying who should undergo colonoscopy. In this way, the sieve analogy highlights that what we seek to determine is clinical validity, captured by the likelihood ratio. For screening, the focus is on the positive likelihood ratio, defined in terms of sensitivity and specificity. This ratio reflects how much more likely disease is in those who test positive compared to the overall population being tested.

Traditional metrics, however, can fall short. Sensitivity is notoriously difficult to measure, since it depends on tumor size, stage, subtype, and even population demographics. Specificity is equally unstable, often shifting with age or the presence of underlying conditions. Sasieni argued that rather than treating sensitivity and specificity as constants, the field should adopt approaches akin to age-standardization in survival statistics—methods that allow performance to be compared across reference populations. More importantly, he called for a pivot toward metrics that capture real-world utility. Diagnostic yield—the number of cancers detected per thousand screens—links performance directly to patient benefit. Positive predictive value (PPV) and negative predictive value (NPV) answer the questions patients care most about: if I test positive, how likely am I to benefit if I go for screening?, and if I test negative, how safe am I? These, he stressed, should focus on actionable cancers that can be treated with curative intent.

Sasieni went further, suggesting practical thresholds to balance invasiveness with benefit. A simple blood or urine test could be justified if it detects at least one cancer per thousand screens. Low-dose CT scans, given radiation risks, should deliver a diagnostic yield of at least three percent. Surgery should not proceed unless the chance of finding cancer is thirty percent or higher. For MCED tests, he proposed benchmarks such as an overall positive likelihood ratio of at least ten, site-specific ratios of at least five, and a PPV of at least 7.5 percent, with site-specific PPVs of at least three percent.

A Panel of Perspectives

Following Sasieni’s presentation, Bree Mitchell (Natera, Working Group co-chair) opened a discussion with Sasieni, Jody Hoyos (Prevent Cancer Foundation), and Donna Roscoe (former FDA). The conversation underscored how metrics must be interpreted not only by researchers but also by clinicians and patients.

Hoyos stressed that patients rarely think in terms of sensitivity and specificity. “Fundamentally, people want to know: does this work? Can I get it? Will it be paid for? What will it feel like? And am I going to be okay?” she said. Surveys from her foundation show that fear of pain, discomfort, costs, and false positives weigh heavily on people considering screening. “None of the benefits matter if patients won’t show up,” she reminded the group.

Roscoe emphasized the importance of clarity and context, noting that it is essential to understand how a test fits into clinical workflow, how it compares to existing tools, and how clinicians should act on its results.

Panelists agreed that some of the greatest confusion stems from terms like sensitivity, specificity, and false positive rates, which often create a false sense of certainty. Hoyos noted that many still conflate screening with diagnosis, while Sasieni pointed out that even when probabilities are explained, most people struggle to interpret small risks. The challenge, several noted, is not just refining the metrics but also finding ways to present them in plain language and visual formats that resonate with physicians and patients alike.

The panel also debated whether test performance should always be broken down by cancer type. Roscoe argued that regulators value granularity for apples-to-apples comparisons, but Sasieni countered that the promise of MCEDs lies in their collective benefit, not in excelling equally at every cancer. Hoyos suggested that transparency will be essential in the early years to build trust, even if critics focus on weaker aspects of performance.

The discussion also turned to the future, as more single- and multi-cancer blood tests come to market. Hoyos observed that patients will rely on their providers for guidance, while Roscoe emphasized the importance of physician-patient engagement after results are delivered. Sasieni raised concerns about equity: if screening requires lengthy consultations, it risks becoming a service only the well-resourced can access. He argued that cancer screening should remain a public health activity, simple and equitable, even as competition introduces complexity.

As the session ended, Mitchell thanked the panelists and audience, and everyone echoed the importance of continuing these conversations. The message was clear: building consensus on metrics is not just a technical exercise. It is about trust, transparency, and creating a foundation for the safe, effective adoption of blood-based screening in both clinical and public health contexts.

Watch the full seminar recording here.

CTCsliquid biopsyprostate cancerbreast cancerblood-based cancer test

Dorys Lopez

Performance with Metrics in Early Detection

New Op-Ed in STAT News: Address liquid biopsy disparities today to ensure equity in outcomes tomorrow