The 1918 influenza pandemic killed nearly 50 million people worldwide. Almost exactly 100 years later SARS coronavirus-2 (SARS-CoV-2) exploded into human populations leading to the COVID-19 pandemic that so far has infected nearly 400 million people and killed close to 6 million people. The COVID pandemic, in addition to being a horrendous medical crisis, has impacted everything from economics to politics. For over two years the entire world has been dealing with SARS-CoV-2 prevention and containment procedures that have altered and disrupted daily lives across the globe. And SARS-CoV-2 is just one small virus. Virologists estimate that there are between 100 million to 1 trillion virus species on Earth, only a very small fraction of which have been identified. Admittedly, a great many of these viruses infect plants or bacteria and are unlikely to be a risk for humans. Still, even if we just consider mammals as the most likely source of viruses that could infect humans (zoonotic viruses) there are estimated to be at least 300,000 mammalian viruses. Currently, there are slightly over 9000 viruses that have been fully identified and assigned a species (International Committee of Taxonomy of Viruses), so our catalog of viruses is woefully small compared to the total viral population (although 10s of thousands of other viruses have been found but not yet well characterized or classified). Without more knowledge about our viral world, we remain at the mercy of new and novel viruses that are lurking in animal populations just waiting for an opportunity to jump into humans.
Historically, most viruses were identified by isolating them from an infected person, plant, animal, or bacteria, a process that can be slow and difficult. These individual viral isolates were characterized to determine their properties and classify them into taxonomic groups. As DNA and RNA sequencing became commonplace, individual isolates could have their viral genomes fully sequenced to reveal detailed genetic information that allowed more refined classification and comparison of the viruses. More recently, “shotgun sequencing” has been used to greatly expand our ability to identify new viruses. Rather than laboriously isolating and purifying each virus, investigators simply take a sample from the test subject and conduct total DNA sequencing on the sample. For example, the sample might be a swab from the nasal cavity of a healthy person. Total sequencing will reveal a mixture of human DNA sequences, bacterial sequences from organisms residing in the nose, and the sequences of any viruses present in that location. Similar sequencing can be done to identify all the RNA present, and again the results would contain human, bacterial, and viral RNA if viruses were present. While such sequencing is now fairly routine, the more difficult and cumbersome part is searching through the voluminous sequence information to separate the host organism sequences from the microbial sequences, especially when looking for novel viruses whose sequence you know nothing about.
A new paper in the journal Nature developed an approach to utilize the enormous amount of DNA and RNA sequence information that has been deposited in public databases. The sequences in these databases come from numerous different plants, animals, and humans, as well as environmental samples. Most of these deposited sequences were produced for studies that had nothing to do with viruses and have never been examined for possible viral sequences. The authors of the study developed a new search algorithm that allowed them to search 10.2 petabases (1.02 x 1016 bases) of nucleotide sequences in public repositories rapidly and cheaply. They searched the databases for sequences related to the sequence of an enzyme called RNA-dependent RNA polymerase (RDRP) that is specific for RNA viruses. Their search found 131,957 hits, with each hit appearing to be a new and unknown RNA virus, including 9 new coronaviruses. The authors also created a free website (https://serratus.io/) where anyone can explore viral sequence information that they have curated. This rapidly advancing application of computational biology coupled with large-scale genomic databases is finding new viruses at a rate never dreamed of 10 years ago. Hopefully, this expanding knowledge of the virome will help scientists and epidemiologists identify potentially dangerous new viruses before they become novel human pathogens.