One of the most startling findings from sequencing the human genome is that humans aren’t entirely human. Approximately 8% of our DNA is viral, the result of ancient viruses invading our cells and inserting their DNA into our genomes. These insertions are scattered throughout our genomes and are primarily the remnants of a family of viruses called retroviruses. For example, human immunodeficiency virus (HIV) is a type of retrovirus, but there are many other members of this family. Retroviruses have an RNA genome and a unique life cycle (Fig. 1) that requires them to integrate their viral genome into the host cell’s DNA. When retroviruses infect a cell, they bring in their viral RNA genome along with an enzyme called reverse transcriptase (RT). Inside the cell, the RT enzyme copies the viral RNA into a double-stranded DNA version which is then inserted into the cell’s DNA. This inserted retroviral sequence is called a provirus. Once inserted, the provirus becomes a permanent part of that cell’s genetic information as there is no mechanism for either the cell or the virus to remove the viral DNA. This integrated provirus will remain in that cell and all the cell’s progeny as long as they survive. The proviral sequence mimics our own genes and is copied into RNAs by the same cellular components that express our cellular genes. The proviral RNAs will be translated into viral proteins that will assemble into new viral particles. The largest proviral RNA, corresponding to the entire proviral DNA sequence, becomes the new viral genome and is packaged into the new viral particles. Newly made viral particles are released from the cell and can infect other cells in the individual or be transmitted to other people to spread the infection.

Most of the above retroviral infections occur in the general cells of our bodies, the so-called somatic cells. Somatic events are confined to the infected individual and are not passed on to their progeny. When that individual dies whatever proviruses they possessed in their somatic cells are lost. However, if the proviral insertion occurred in germline cells (egg and sperm) then there is a chance for passage to the next generation. If a proviral-infected egg or sperm is used for fertilization and the embryo survives, then the resulting offspring now carries the proviral sequence in the DNA of every cell in their body, including all their egg or sperm cells. When this individual reproduces, their offspring and any future descendants will also carry the provirus and these viral sequences have become genetically “fixed” in this lineage. While such events are incredibly rare, over millions of years of hominid evolution our ancestors slowly accumulated these viral invasions resulting in 8% of modern human DNA being retroviral. These retroviral sequences residing in our genomes are collectively referred to as human endogenous retroviruses (HERVs).
One of the enduring questions about HERVs is their functional role, if any, in our cells. Most of the HERVs are only fragments of the original virus and are no longer complete viral genomes so they cannot make viruses in our cells. Nonetheless, some of the remaining viral genome fragments contain intact genes for viral proteins and/or regulatory sequences that could affect the expression of our nearby cellular genes. We know that at least a few of our “human genes” were actually derived from retroviral DNA. For example, the human syncytin gene originated as a retroviral gene known as Env which encodes for a protein in the membrane envelope of the virion. We coopted this viral gene and it now encodes for a protein required for placental development in pregnancy. However, the role played by most HERVs in our cellular biology is unknown. Originally, most of these viral DNA fragments were believed to be largely inactive in healthy tissues and not be expressed into RNAs or proteins except in certain diseases such as cancers. More recent studies have challenged this belief, and a new paper in the journal PLoS Biology further refutes the inactivity of HERVs, at least for one group of HERVs known as the HERV-K family subgroup HML-2. Members of the HML-2 subgroup entered the human genome at different times in evolutionary history with the oldest members acquired over 5 million years ago and the newest around 200,000 years in the past. HML-2 sequences are found on 21 out of our 23 chromosomes, including both the X and Y chromosomes.
To investigate HML-2 expression in humans, the study authors performed a comprehensive analysis using a database of autopsy tissue samples from 950 individuals. The database contained 54 types of normal tissues, including all the major organs such as brain, kidney, heart, lungs, and liver. Each sample was tested for the presence of HML-2 RNA, an indicator that the integrated HML-2 DNAs were expressed in that tissue type. All 54 tissue types showed expression of HML-2 sequences, although there was considerable variation in expression from individual to individual. Overall, the highest levels of expression were found in the brain, particularly the cerebellum, the pituitary gland, and the thyroid gland. High levels were also observed in the testis. While disproving the belief that most HERV sequences are inactive, the significance of this widespread HML-2 expression in the brain and other tissues remains undetermined. Intriguingly, many of our viral passengers are not quiescent entities that slumber in our genomes and instead remain active entities in our cells. It will be important to determine if this expression is just unimportant background or if it has biological consequences, either favorable or unfavorable, for some cells. Many mysteries remain about the viruses within and how these evolutionary parasites may still be affecting our health thousands of millennia after entry into human genomes.