A centromere mapping and annotation pipeline for T2T human genome assemblies implemented in Snakemake
.
![]() Chr1 α-satellite higher-order repeat structure, centromere dip regions, and self-identity plot |
![]() Chr12 α-satellite HOR arrays |
![]() Cumulative α-satellite HOR array lengths |
Verkko
orhifiasm
human genome assemblies- PacBio HiFi reads used in the assemblies
CHM13
reference genome assembly- (Optional) Unaligned BAM files with 5mC modifications at CpG sites.
- Complete and correctly assembled centromere sequences and their regions validated by
NucFlag
. - Centromere alpha-satellite higher order repeat (HOR) array lengths via
censtats
. RepeatMasker
andHumAS-SD
alpha-satellite HOR monomer annotations and plots.ModDotPlot
sequence identity plots.- Combined sequence identity and HOR array structure plots via
cenplot
. - (Optional) Centromere dip region (CDRs) with
CDR-Finder
Read the docs on the CenMAP
wiki.
To run tests, refer to the wiki page.