By Lambert Strether of Corrente.
On May 25 of this year, JAMA published Development of a Definition of Postacute Sequelae of SARS-CoV-2 Infection (“Definition”), an “original investigation” whose authors were drawn from the RECOVER Consortium, an initiative of the National Institutes of Health (NIH)[1]. This was an initially welcome development for Long Covid sufferers and activists, since questions had arisen about what exactly patients were getting for the billion dollars RECOVER was appropriated. From STAT:
The federal government has burned through more than $1 billion to study long Covid, an effort to help the millions of Americans who experience brain fog, fatigue, and other symptoms after recovering from a coronavirus infection.
There’s basically nothing to show for it.
The National Institutes of Health hasn’t signed up a single patient to test any potential treatments — despite a clear mandate from Congress to study them.
Instead, the NIH spent the majority of its money on broader, observational research that won’t directly bring relief to patients. But it still hasn’t published any findings from the patients who joined that study, almost two years after it started.
(The STAT article, NC hot take here on April 20, is worth reading in full.) Perhaps unfairly to NIH — one is tempted to say that the mountain has labored, and brought forth a coprolite — a CERN-level headcount may explain both RECOVER’s glacial pace, and its high cost:
That’s a lot of violin lessons for a lot of little Madisons!
“Definition” falls resoundingly into the research (and not treatment) bucket. In this post, I will first look at the public relations debacle (if debacle it was) that immediately followed its release; then I will look at its problematic methodology, and briefly conclude. (Please note that I feel qualified to speak on public relations and institutional issues; very much less so on research methodology, which actually involves (dread word) statistics. So I hope readers will bear with me and correct where necessary.)
The Public Relations Debacle
Our famously free press instantly framed “Definition” as a checklist of Long Covid (LC) symptoms. Here are the headlines. For the common reader:
12 key symptoms define long Covid, new study shows, bringing treatments closer CNN
Long COVID is defined by these 12 symptoms, new study finds CBS
Scientists Identify 12 Major Symptoms of Long Covid Smithsonian
These 12 symptoms may define long COVID, new study finds PBS News Hour
These Are the 12 Major Symptoms of Long COVID Daily Beast
(We will get to the actual so-called “12[2] Symptoms” when we look at methodology.) And for readers in the health industry:
For the first time, researchers identify 12 symptoms of long covid Chief Healthcare Executive
12 symptoms of long COVID, FDA Paxlovid approval & mpox vaccines with Andrea Garcia, JD, MPH AMA Update
Finally! These 12 symptoms define long COVID, say researchers ALM Benefits Pro
With these last three, we can easily see the CEO handing a copy of their “12 symptoms” article to a doctor, the doctor double-checking that headline against the AMA Update’s headline, and incorporating the NIH-branded 12-point checklist into their case notes going forward, and the medical coders at the insurance company (I love that word, “benefits”) nodding approvingly. At last, the clinicians have a checklist! They know what to do!
We’ll see why the whole notion of a checklist with twelve items is wrong and off-point for what “Definition” was actually, or at least putatively, trying to do, but for now it’s easy to see why the press went down this path (or over this cliff). Here is the press release from NIH that accompanied “Definition”‘s publication in JAMA:
Researchers examined data from 9,764 adults, including 8,646 who had COVID-19 and 1,118 who did not have COVID-19. They assessed more than 30 symptoms across multiple body areas and organs and applied statistical analyses that identified that most set apart those with and without long COVID: post-exertional malaise, fatigue, brain fog, dizziness, gastrointestinal symptoms, heart palpitations, issues with sexual desire or capacity, loss of smell or taste, thirst, chronic cough, chest pain, and abnormal movements.
They then established a scoring system based on patient-reported symptoms. , the team gave each patient a score based on symptom combinations. With these scores in hand, researchers identified a meaningful threshold for identifying participants with long COVID. They also found that certain symptoms occurred together and defined four subgroups or “clusters” with a range of impacts on health
So there are 12 symptoms, right? Just like the headline says? Certainly, that’s what a normal reader would take away. And if a temporally pressed reporter goes to the JAMA original and searches on “12”, they find this:
Using the full cohort, LASSO identified 12 symptoms with corresponding scores ranging from 1 to 8 (Table 2). The optimal PASC score threshold used was 12 or greater
And if the reporter goes further and finds Table 2 (we’ll get there when we look at methodology), they will see, yes, 12 symptoms (in rank order identified by something called LASSO).
So it’s easy to see how the headlines were written as they were written, and how the newsroom wrote the stories as they did. The wee problem: The twelve symptoms are not meant to be used clinically, for diagnosis.[3], Lisa McCorkell was the patient representative[4] for the paper, and has this to say:
But the press is not fully understanding the paper which could have dangerous downstream effects. Since the beginning of working on this paper I’ve done everything I could to ensure the model presented in this paper is not used clinically, and I’ll continue to do that. 6/
— Lisa McCorkell (@LisaAMcCorkell) May 27, 2023
Nevertheless, the “12 symptoms” are out of the barn and in the next county, and as a result, you get search results like this:
It’s very easy to imagine a harried ER room nurse hearing “12 Symptoms” on the TV news[5], doublechecking with a Google search, and then making clinical decisions based on a checklist not fit for purpose. Or, for that matter, a doctor.
Now, to be fair to the authors, once one grasps the idea that symptoms, even clusters of symptoms, can exist, and still not be suitable for diagnosis by a clinician, the careful language of “Definition” is clear, starting with the title: “ of a Definition.” And in the Meaning section of the Abstract:
A for identifying PASC cases based on symptoms is a first step to defining PASC as a new condition. These findings require that further incorporates clinical features to arrive at actionable definitions of PASC.
Well and good, but do you see “framework” in the headlines? “Iterative”? “First step”? No? Now, I’d like to exonerate the authors of “Definitions” — “They’re just scientists!” — for that debacle, but I cannot, completely. The authors are well-compensated, sophisticated, and aware professionals; PMC, in fact. I cannot believe that the Cochrane “fools gold” antimask study debacle went unobserved at NIH, especially in the press office. How was it possible that “Definitions” was simply… printed as it was, and no strategic consideration given to shaping the likely coverage?[6] One obvious precautionary measure would have been a preprint, but for reasons unknown to me, NIH did not do that. A second obvious precautionary measure would have been to have the patient representative approve the press release. Ditto. Now let us turn to methodology.
The Problematic Methodology
First, I will look at issues with Table 2, which presents the key twelve-point checklist, and names the algorithm (although without explaining it). After that, I will branch out to a few larger issues. Again I issue a caveat that I’m not a Long Covid maven or a statistics maven, and I hope readers will correct and clarify where needed.
Here is Table 2:
First, some copy editing trifles (highlighted). On “PASC”: As WebMD says: “You might know this as ‘long COVID.’ Experts have coined a new term for it: post-acute sequelae SARS-CoV-2 infection (PASC).” Those lovable scamps, always inventing impenetrable jargon! (Bourdieu would chuckle at this.) On “Dizzines”: Come on. A serious journal doesn’t let a typo like that slip through (maybe they’re accustomed to fixing the preprints?). On “Supplement 3”: The text is highlighted as a link, but clicking it brings up the image, and doesn’t take you to the Supplement. These small errors are important[7], because they indicate that no editor took more than a cursory look at the most important table in the paper. On “LASSO,” hold that thought.
Second, the Covid Action Network points out that some obvious, and serious, symptoms are missing from the list:
[T]he next attempts at diagnostic criteria should take into account existing literature that shows more specifically defined symptoms for Long Covid, from objective findings. (E.g. PoTS, Vestibular issues, migraine, vs more vague symptoms like “headache” or “dizziness.) [The Long Covid Action Project (LCAP)] noticed that while [Post-Extertional Malaise (PEM)] was used as a specific symptom with a high score to produce PASC-positive results, other suites of symptoms, like those in the neurologic category, could have produced an equal or higher score than PEM if questionnaires had not separated neuro-symptoms into multiple subtypes and reduced their total scores. This alone could have created a more scientifically accurate picture of the Long Covid population.
Third, these symptoms — missing, from the patient perspective; to be iterated from the researcher’s perspective, at least one would hope — are the result of “Definition”‘s methodology:
An understandable approach from scientists trained to zero in on the most clearly provable effects. But given the enormous breadth of COVID sequelae, this approach deemphasizes a ton of enormously impactful symptoms. We need solid measures of underlying organ damage.
— Clean Air Kits – Next-Gen Corsi-Rosenthal Boxes (@cleanairkits) May 26, 2023
Fourth, I would argue focus on the “most clearly provable effects” — as opposed to organ damage — is a result of the “LASSO” algorithm named in Table 2. I did a good deal of searching on LASSO, and discovered that most of the examples I could find, even the “real world” ones, were examples of how to run LASSO programs, as opposed to selecting the LASSO algorithm as opposed to others. So that was discouraging. I believe — reinforcing the caveats, plural, given above — that I literally searched on “LASSO” “child of five” (“Explain it to me like I’m five”) to finally come up with this:
Lasso Regression is an essential variable selection technique for eliminating unnecessary variables from your model.
This method can be highly advantageous when some variables do not contribute any variance (predictability) to the model. Lasso Regression will automatically set their coefficients to zero in situations like this, excluding them from the analysis. For example, let’s say you have a skiing dataset and are building a model to see how fast someone goes down the mountain. This dataset has a variable referencing the user’s ability to make basketball shots. This obviously does not contribute any variance to the model – Lasso Regression will quickly identify this and eliminate these variables.
Since variables are being eliminated with Lasso Regression, the model becomes more interpretable and less complex.
Even more important than the model’s complexity is the shrinking of the subspace of your dataset. Since we eliminate these variables, our dataset shrinks in size (dimensionality). This is insanely advantageous for most machine learning models and has been shown to increase model accuracy in things like linear regression and least squares.
Since LC is said to have over 200 candidates for symptoms, you can see why a scientist trying to get their arms around the problem would be very happy to shrink those candidates to 12. But is that true to the disease?
Because LASSO (caveats, caveats) has one problem. From the same source:
One crucial aspect to consider is that Lasso Regression does not handle multicollinearity well.
Multicollinearity occurs when two or more highly correlated predictor variables make it difficult to determine their individual contributions to the model.
Amplifying:
Lasso can be sensitive to multicollinearity, which is when two or more predictors are highly correlated. In this case, Lasso may select one of the correlated predictors and exclude the other [“set their coefficients to zero”], even if both are important for predicting the target variable.
As Ted Nelson wrote, “Everything is deeply intertwingled” (i.e., multicollinear), and if there’s one thing we know about LC, it’s that it’s a disease of the whole body taken as a system, and not of a single organ:
There are some who seek to downplay Long Covid by saying the list of 200 possible symptoms makes it impossible to accurately diagnose and that it could be encompassing illnesses people might have gone on to develop anyway, but there are sound biological reasons for this condition to affect the body in so many different ways.
Angiotensin-converting enzyme receptor 2 (ACE2) is the socket SARS-CoV-2 plugs into to infect human cells. The virus can use other mechanisms to enter cells=, but ACE2 is the most common method. ACE2 is widely expressed in the human body, with highest levels of expression in small intestine, testis, kidneys, heart, thyroid, and adipose (fat) tissue, but it is found almost everywhere, including the blood, spleen, bone marrow, brain, blood vessels, muscle, lungs, colon, liver, bladder, and adrenal gland
Given how common the ACE2 receptor is, it is unsurprising SARS-CoV-2 can cause a very wide range of symptoms.
In other words, multicollinearity everywhere. Not basketball players vs. skiiers at all.
So is LASSO even the right algorithm to handle entwinglement, like ACE2 receptors in every organ? Are there statistics mavens in the readership who can clarify? With that, I will leave the shaky ground of statistics and Table II, and raise two other issues.
First, it’s clear that the population selected for “Definitions” is unrepresentative of the LC population as a whole:
One important thing to remember is RECOVER study on #LongCOVID likely has a lot of patients with more mild version of Long COVID. To participate you have to go in for multiple voluntary medical appointments. People with severe illness and/or bedbound are less able to participate.
— Myra #KeepMasksInHealthCare (@myrabatchelder) May 28, 2023
If the patients in “Definition” are not so ill, that might also account for Table 2’s missing symptoms.
Second, “Definition”‘s questionnaires should include measures of severity, and don’t:
Yes. PEM Q has challenges—”post-exertional malaise” is jargon & can lead to respondent self-selection. Plus, PEM can be delay onset.
Also: symptom lists should focus on severity. Mild, transient fatigue after an illness is common. Debilitating, persistent fatigue is different. https://t.co/C9jmiCa2xi pic.twitter.com/xJTtWzmqbk
— zeynep tufekci (@zeynep) May 29, 2023
Conclusion
The Long Covid Action Project (materials here) is running a letter writing campaign: “Request for NIH to Retract RECOVER Study Regarding 12 Symptom PASC Score For Long Covid.” As of this writing, “only 3,082 more until our goal of 25,600.” You might consider dropping them a line.
Back to the checklist for one moment. One way to look at the checklist is — we’re talking [drumroll] the PMC here — as a set of complex eligibility requirements, whose function is, as usual, gatekeeping and denial:
what they did is create basically a means test to figure out a dx but for smthg that is still not fully understood. it’s premature and rly limited, & this will only further aid ppl already dismissive of lc
— Wendi Muse (@MuseWendi) June 3, 2023
If you score 12, HappyVille! If you score 11, Pain City! And no consideration given to the actual organ damage in your body. And after the last three years following CDC, I find it really, really difficult to give NIH the benefit of the doubt. If one believed that NIH was acting in bad faith, one would see “Definition” as a way to keep the funding gravy train rolling, and the “12 Symptoms” headlines as having the immediate and happy outcome of denying care to the unfit. Stay safe out there, and let’s save some lives!
NOTES
[1] Oddly, the JAMA paper is not yet listed on RECOVER’s publications page.
[2] “12” is such a clickbait-worthy brainworm. “12 Days of Christmas,” “12 apostles,” “12 steps,” “12 months,” “12 signs of the zodiac,” etc. One might wonder where if the number had been “9” or “14” the uptake would have been so instant.
[3] To be fair to the sources, most of them mention this: Not CBS, Chief Health Care Executive, or the Daily Beast, but CNN in paragraph 51, Smithsonian (9), PBS (20), AMA Update (10), and Benefits Pro (17).
[4] There was only one patient representative for the paper:
I wish so much there was more than me. Patient reps have such limited power but it builds as more are included. I tried to get other voices involved even just to review the paper before publication, but that was blocked.
— Lisa McCorkell (@LisaAMcCorkell) May 28, 2023
One seems low, especially given the headcount for the project.
[5] I was not able to find a nursing journal that covered the story.
[6] Unless it was, of course.
[7] Samuel Johnson: “When I take up the end of a web, and find it packthread, I do not expect, by looking further, to find embroidery.”