A chain of around 30,000 genetic letters was all it took to start the covid-19 nightmare, whose death toll is likely to exceed 20 million. Exactly how this story began has been much discussed. Many think that the appearance of covid-19 was a zoonosis, a spillover, as are so many new pathogens, from wild animals, since it resembles a group of coronaviruses found in bats. Others have pointed to the enthusiastic coronavirus engineering taking place in laboratories around the world, but particularly in Wuhan, the Chinese city where the virus was first identified. In February 2021, a team of scientists assembled by the World Health Organization (WHO) to visit Wuhan said a lab leak was extremely unlikely. However, this conclusion was later challenged by the WHOThe head of, who said that dismissing this theory was premature.
Two recent publications appear to have bolstered the case for a natural origin connected to a “wet market” in Wuhan. These markets sell live animals, often housed in poor conditions, and are known to be sites where new pathogens jump from animals to humans. The first cases of covid-19 were grouped around this market. But critics counter that there is so much missing data on the early days of the epidemic that this portrait may be inaccurate.
The opposite idea of a leak from a laboratory is not implausible. Accidental release of viruses from laboratories is more common than many people realize. The 1977 flu epidemic is believed to have started this way. But an escaped virus does not imply an engineered virus. Virology labs are also full of non-engineering labs.
Investigations such as the one carried out in Wuhan offer various ways of virus leakage. A researcher on a field trip could have picked it up in the wild and then returned to Wuhan, thus spreading it to others there. Or someone could have been infected with a virus collected from the wild in the lab itself. But some argue that sars–Ceitherv-2 could have been assembled in a laboratory from other viruses that were already available and then leaked.
Into this fray comes an analysis from an unlikely source. Alex Washburne is a mathematical biologist who runs Selva, a small microbiome science startup based in New York. He is an outsider, although in the past he has worked on virological models as a researcher at Montana State University. For this study, Dr. Washburne collaborated with two other scientists. One is Antonius VanDongen, an associate professor of pharmacology at Duke University in North Carolina. The other, Valentin Bruttel, is a molecular immunologist at the University of Würzburg, Germany. Dr. Washburne and Dr. VanDongen have been active advocates of research into the laboratory leak theory.
The trio base their claim on a novel method for detecting plausibly lab-created viruses. Their analysis, published Oct. 20 on bioRxiv, a preprint server, suggests sars–Ceitherv-2 has some genomic features that they say would appear if the virus had been assembled through some form of genetic engineering. By examining how many of these putative suture sites sars–Ceitherv-2 has, and how relatively short these pieces are, they try to assess how similar the virus is to others found in nature.
They start from the assumption that creating a genome as long as that of sars–Ceitherv-2 would mean combining shorter snippets from existing viruses. For a coronavirus genome assembly, they say an ideal arrangement would be to use between five and eight fragments, all under 8,000 letters. Such fragments are created using restriction enzymes. These are molecular scissors that cut genomic material into particular sequences of genetic letters. If a genome doesn’t have such restriction sites in the right places, researchers often create new ones of their own.
They argue that the distribution of restriction sites for two popular restriction enzymes, BsaI and BsmBI, are “anomalous” in the sars–Ceitherv-2 genome. And the length of the longest chunk is much shorter than you might expect. They determined it by taking 70 disparate coronavirus genomes (not including sars–Ceitherv-2) and cutting them into pieces with 214 commonly used restriction enzymes. From the resulting collection, they were able to calculate the expected fragment lengths when coronaviruses are cut into varying numbers of pieces.
The article, which as a preprint has not received formal peer review and has not been accepted for publication in a journal, will be selected in the coming days, as it should be, because that is how science works. However, early reactions have been deeply divided. Francois Balloux, a professor of computational systems biology at University College London, said he found the results intriguing. “Unlike many of my colleagues, I was unable to identify any fatal flaws in reasoning and methodology. The distribution of BsaI/BsmBI restriction sites in sars–Ceitherv-2 is outlier.” Dr. Balloux said that these should be evaluated in good faith. But Edward Holmes, an evolutionary biologist and virologist at the University of Sydney, said each of the features identified in the paper was natural and already found in other bat viruses. If someone were designing a virus, they would undoubtedly introduce some new ones. He added that “there are a whole range of technical reasons why this is complete nonsense.”
Sylvestre Marillonnet, a synthetic biologist at the Leibniz Institute for Plant Biochemistry in Germany, agreed that the number and distribution of these restriction sites did not seem entirely random, and that the number of silent mutations found at these sites did. I suggested that sars–Ceitherv-2 could have been designed. (Silent mutations are the result of engineers wanting to make changes to a sequence of genetic material without making changes to the proteins encoded by that sequence.) But Dr. Marillonnet also said that there are arguments against this hypothesis. One of them is the minute length of one of the six fragments, something that “does not seem logical to me”.
The other point that Dr. Marillonnet makes is that the restriction sites need not have been present in the final sequence. “Why would people drop sites in and out of the genome when they don’t have to?” he wondered. Previous arguments in support of the possibility of a lab leak have emphasized that a manipulated virus would not need to have such indicators. However, Justin Kinney, a professor at the Cold Spring Harbor Laboratory in New York, said researchers created coronaviruses earlier and left those sites in the genome. He said the genetic signature indicates a virus ripe for further experiments and said it should be taken seriously, but cautioned that the paper needed rigorous peer review.
Erik van Nimwegen of the University of Basel says there are only small snippets of information and that it’s “hard to get anything definitive out of it”. He adds, “It really cannot be excluded at all that such a constellation of sites may have occurred by chance.” The authors of the article admit that this is the case. Kristian Andersen, a professor of immunology and microbiology at the Scripps Research Institute in La Jolla, California, described the pattern on Twitter as “random noise.”
Any conclusion that sars–Ceitherv-2 was designed will be hotly contested. China denies that the virus came from a Chinese laboratory and has called for an investigation into whether it may have originated in the United States. Dr. Washburne and his colleagues say their predictions are testable. If a parent genome sars–Ceitherv-2 is found in nature with restriction sites that are the same or in between, it would increase the chances that this pattern evolved by chance.
Any widely supported conclusion that the virus was genetically engineered would have profound ramifications, both political and scientific. It would put in a new light the behavior of the Chinese government in the early days of the outbreak, in particular its reluctance to share epidemiological data from those days. It would also raise questions about what was known, when, and by whom about the presumably accidental escape of an engineered virus. For now, this is a first draft of the science and should be treated as such. But the tellers are already working. ■
Editor’s Note: The preprint “Endonuclease Fingerprint Indicates Synthetic Origin of sars–Ceitherv-2” by Bruttel, Washburne and VanDongen, can be found on bioRxiv.
All of our stories related to the pandemic can be found in our coronavirus hub.