T2T-Primates is a project of the Telomere-to-Telomere consortium and is led by the Makova, Phillippy, and Eichler labs. The project seeks to finish complete, diploid assemblies for key non-human primate species. The project is currently focused on gorilla, bonobo, chimpanzee, orangutan, and gibbon. Following the approach of the human T2T-CHM13 project, all species have been sequenced with high-coverage PacBio HiFi (~60x) and Oxford Nanopore ultra-long (~40x) sequencing reads. For haplotype phasing, Dovetail Hi-C data was generated for all genomes and Strand-seq data is also expected. Parental Illumina data was collected for bonobo and gorilla, where familial trios were available.
Phase one of the project is focused on completing the sex chromosomes; phase two will focus on finishing the autosomes of bonobo and gorilla; and phase three will focus on the remaining genomes. The project is currently in phase one, with draft T2T sex chromosome assemblies now available for all genomes.
Version 1 diploid assemblies were generated with Verkko v1.1, and contigs were chromosome-assigned and oriented by alignment to the previous references. Both X and Y chromosomes are complete for all species listed. Gorilla and bonobo were phased using familial trios, and all others using Hi-C:
- Gorilla gorilla (gorilla)
- Pan paniscus (bonobo)
- Pan troglodytes (chimpanzee)
- Pongo abelii (Sumatran orangutan)
- Pongo pygmaeus (Bornean orangutan)
- Symphalangus syndactylus (siamang gibbon)
All generated sequencing data and assemblies are available for browsing and download from GenomeArk.
Files are generously hosted by Amazon Web Services under s3://genomeark
. Although available as HTTP links above, download performance is improved by using the Amazon Web Services command-line interface. References should be amended to use the s3://
addressing scheme. Amending the max_concurrent_requests
etc. settings as per this guide will improve download performance further.
All data is released to the public domain (CC0) and we encourage its reuse. However, we are in the process of finishing and analyzing these genomes, so to avoid duplicating effort, we encourage you to contact us if you are interested in contributing. The following working groups have been formed:
- Assembly
- Annotation
- Sex chromosome genomics
- Comparative and evolutionary genomics
- Segmental duplications
- Acrocentric chromosomes
- Satellite DNAs
- Mobile elements
- Pangenomics
For any problems related to this dataset, please raise issues on this GitHub repository. For general questions regarding the project, please contact adam.phillippy@nih.gov. More information about our consortium can be found on the T2T homepage.
* Dec 2022. Initial release.