Scientists release the first complete sequence of a human Y chromosome

Science and Health

Males who feel they are misunderstood – by their wives or all women in the world – can now heave a sigh of relief. For decades, the Y chromosome – one of the two human sex chromosomes – has been notoriously challenging for the genomics community to sequence due to the complexity of its structure.

Now, this elusive area of the genome has been fully sequenced, an accomplishment that finally completes the set of end-to-end human chromosomes and adds 30 million new bases to the human genome reference – mostly from hard-to-sequence satellite DNA. These bases reveal 41 additional protein-coding genes and provide vital insight for those studying important questions related to reproduction, evolution, and population change.

Researchers from the Telomere-to-Telomere (T2T) consortium comprising dozens of researchers, most of them from around the US have made the discovery. The group is co-led by biomolecular engineering assistant Prof. Karen Miga of the University of California at Santa Cruz (UCSC). They haves just announced this achievement in a new paper just published in the prestigious journal Nature under the title “Researchers assemble the first complete sequence of a human Y chromosome.” The whole annotated Y chromosome reference can be accessed on the UCSC Genome Browser and via Github.

“Just a few years ago, half of the human Y chromosome was missing from the reference – the challenging, complex satellite areas,” said Dr. Monika Cechova, co-lead author on the paper and postdoctoral scholar in biomolecular engineering at UCSC. “Back then we didn’t even know if it could be sequenced, it was so puzzling. This is really a huge shift in what’s possible.”

When scientists and clinicians study an individual’s genome, they compare the individuals’ DNA to that of a standard reference to determine where there is variation. Until now, the Y chromosome portion of the human genome has contained large gaps that made it difficult to understand variation and associated disease.

Geneticist Karen Miga is seen at a laboratory, used in research involving the human Y chromosome, at the University of California, in Santa Cruz, California, US, February 10, 2022. (credit: Carolyn Lagattuta, UC Santa Cruz/Handout via REUTERS)

Why is it so hard to decode the Y chromosome?

The structure of the Y chromosome has been challenging to decode because some of the DNA is organized in palindromes – long sequences that are the same forward and backward – spanning up to more than a million base pairs. In addition, a very large part of the Y chromosome that was missing from the previous version of the Y reference is satellite DNA – large, highly repetitive regions of non-protein-coding DNA. On the Y chromosome, two satellites are interlinked with each other, further complicating the sequencing process.

The researchers were able to read the sequences of the Y chromosome without gaps thanks to advances in long-read sequencing technology and new, innovative computational assembly methods that could deal with the repetitive sequences and transform the raw data from sequencing into a usable resource. These new method assemblies allowed the team to tackle some of the particularly challenging aspects of the Y chromosome, such as pinpointing precisely where an inversion occurs in a palindromic. The methods established in the paper will allow scientists to complete more end-to-end reads of human Y chromosomes to get a better understanding of how this genetic material affects the diverse human population.

“It was the Y chromosome that lacked the most sequences from the previous reference genome,” said Dr. Arang Rhie, a computer scientist at the US National Human Genome Research Institute and the paper’s lead author. “It was always irritating knowing we were missing half the Y whenever we tried to do any reference-based analysis. I was really excited to curate the first complete Y, to see what we were actually missing, and what we can now do.”

In 2018, Miga and her colleagues released the first complete map of a human centromere on the Y chromosome. Now, just five years later, the T2T consortium has filled in 30 million additional base pairs, in addition to the first fully sequenced human genome that was released in 2022.

The Y chromosome is most commonly associated with individuals assigned male at birth, but may be found in others, such as intersex people. The sex characteristics regulated by DNA on the Y chromosome are also not equivalent to an individual’s gender identity. While there are relatively few genes on the Y chromosome, the ones that are present are complex and dynamic, and code for important functions such as spermatogenesis, the production of sperm. The complete Y chromosome reference will allow scientists to better study a huge number of features about this part of the human genome in a way that has never before been possible.

Having a clearer picture of the Y chromosome makes it easier to track genes across generations of inheritance and learn how the location and content of genes has changed over time. The 30 million new bases will also be crucial for studying genome evolution.

The complete sequence also reveals important features of medically relevant regions. One such section of the Y chromosome is called the azoospermia factor region, a stretch of DNA containing several genes known to be involved in sperm production. With the newly completed sequence, the researchers studied the structure of a set of inverted repeats or “palindromes” in the azoospermia factor region.

“It is exciting to be able to finally see these sequences in the densely packed regions for the first time. Finally, we can design experiments to test the impact and function of these previously unexplored parts of the Y chromosome,” Miga said.

While the complete human Y chromosome will open the door to many new discoveries, the researchers plan to further improve the study of this region by including the Y chromosome in future versions of the human pangenome, a new reference for genomics that combines the genomic information of multiple people from various ancestral backgrounds to ultimately enable more equitable research and clinical discoveries such as helping to diagnose disease, predict medical outcomes, and guide treatments.