Multiple genome alignments
Multiple alignments are calculated between groups of genomes. These are used to calculate ancestral sequences, age of base, conservation scores and constrained elements.
Alignments available
Name | Genomes | Method used |
---|---|---|
60 amniota vertebrates | Alpine marmot, Arabian camel, Argentine black and white tegu, Australian saltwater crocodile, Beluga whale, Blue whale, Bonobo, Chacoan peccary, Chicken, Chimpanzee, Chinese hamster CHOK1GS, Common canary, Common wall lizard, Crab-eating macaque, Dingo, Dog, Domestic yak, Duck, Eastern brown snake, Elephant, Eurasian red squirrel, Gibbon, Goat, Golden eagle, Goodes thornscrub tortoise, Gorilla, Great Tit, Greater horseshoe bat, Green anole, Guinea Pig, Horse, Human, Hybrid - Bos Indicus, Indian cobra, Japanese quail, Kakapo, Leopard, Lion, Macaque, Mouse, Mouse Lemur, Narwhal, Northern American deer mouse, Olive baboon, Opossum, Pig, Platypus, Prairie vole, Rabbit, Ryukyu mouse, Shrew mouse, Sperm whale, Sumatran orangutan, Three-toed box turtle, Turkey, Vaquita, Vervet-AGM, White-tufted-ear marmoset, Yarkand deer, Zebra finch | Mercator-Pecan |
10 primates | Bonobo, Chimpanzee, Crab-eating macaque, Gibbon, Gorilla, Human, Macaque, Mouse Lemur, Sumatran orangutan, Vervet-AGM | EPO |
17 sauropsids | Argentine black and white tegu, Australian saltwater crocodile, Chicken, Common canary, Common wall lizard, Duck, Eastern brown snake, Golden eagle, Goodes thornscrub tortoise, Great Tit, Green anole, Indian cobra, Japanese quail, Kakapo, Three-toed box turtle, Turkey, Zebra finch | EPO |
32 fish | Asian bonytongue, Atlantic salmon, Brown trout, Channel bull blenny, Coho salmon, Common carp, European seabass, Fugu, Gilthead seabream, Goldfish, Greater amberjack, Guppy, Indian medaka, Japanese medaka HdrR, Javanese ricefish, Large yellow croaker, Lumpfish, Mexican tetra, Nile tilapia, Orange clownfish, Pinecone soldierfish, Platyfish, Rainbow trout, Reedfish, Siamese fighting fish, Spotted gar, Tetraodon, Tongue sole, Turbot, Turquoise killifish, Zebra mbuna, Zebrafish | EPO |
44 eutherian mammals | Alpine marmot, Arabian camel, Beluga whale, Blue whale, Bonobo, Cattle, Chacoan peccary, Chimpanzee, Chinese hamster CHOK1GS, Crab-eating macaque, Dingo, Dog, Domestic yak, Elephant, Eurasian red squirrel, Gibbon, Goat, Gorilla, Greater horseshoe bat, Guinea Pig, Horse, Human, Hybrid - Bos Indicus, Leopard, Lion, Macaque, Mouse, Mouse Lemur, Narwhal, Northern American deer mouse, Norway rat - BN/NHsdMcwi, Olive baboon, Pig, Prairie vole, Rabbit, Ryukyu mouse, Sheep, Shrew mouse, Sperm whale, Sumatran orangutan, Vaquita, Vervet-AGM, White-tufted-ear marmoset, Yarkand deer | EPO |
22 murinae | Mouse, Mouse 129S1/SvImJ, Mouse A/J, Mouse AKR/J, Mouse BALB/cJ, Mouse C3H/HeJ, Mouse C57BL/6NJ, Mouse CAST/EiJ, Mouse CBA/J, Mouse DBA/2J, Mouse FVB/NJ, Mouse JF1/MsJ, Mouse LP/J, Mouse NOD/ShiLtJ, Mouse NZO/HlLtJ, Mouse PWK/PhJ, Mouse WSB/EiJ, Norway rat - BN/NHsdMcwi, Ryukyu mouse, Shrew mouse, Steppe mouse, Western wild mouse | EPO |
27 sauropsids | Abingdon island giant tortoise, African ostrich, Argentine black and white tegu, Australian saltwater crocodile, Blue-ringed sea krait, Chicken, Chinese softshell turtle, Collared flycatcher, Common canary, Common wall lizard, Duck, Eastern brown snake, Golden eagle, Goodes thornscrub tortoise, Great Tit, Green anole, Indian cobra, Japanese quail, Kakapo, Mainland tiger snake, Medium ground-finch, Painted turtle, Pink-footed goose, Three-toed box turtle, Tuatara, Turkey, Zebra finch | EPO-Extended |
24 primates | Black snub-nosed monkey, Bolivian squirrel monkey, Bonobo, Bushbaby, Chimpanzee, Coquerel's sifaka, Crab-eating macaque, Drill, Gibbon, Golden snub-nosed monkey, Gorilla, Greater bamboo lemur, Human, Ma's night monkey, Macaque, Mouse Lemur, Olive baboon, Panamanian white-faced capuchin, Pig-tailed macaque, Sooty mangabey, Sumatran orangutan, Tarsier, Vervet-AGM, White-tufted-ear marmoset | EPO-Extended |
65 fish | Amazon molly, Asian bonytongue, Atlantic cod, Atlantic herring , Atlantic salmon, Ballan wrasse, Barramundi perch, Bicolor damselfish, Brown trout, Burton's mouthbrooder, Channel bull blenny, Channel catfish, Chinese medaka, Chinook salmon, Climbing perch, Clown anemonefish, Coho salmon, Common carp, Denticle herring, Eastern happy, Electric eel, European seabass, Fugu, Gilthead seabream, Golden-line barbel, Goldfish, Greater amberjack, Guppy, Huchen, Indian medaka, Japanese medaka HdrR, Javanese ricefish, Large yellow croaker, Lumpfish, Lyretail cichlid, Makobe Island cichlid, Mangrove rivulus, Mexican tetra, Midas cichlid, Mummichog, Nile tilapia, Northern pike, Orange clownfish, Paramormyrops kingsleyae, Pike-perch, Pinecone soldierfish, Platyfish, Rainbow trout, Red-bellied piranha, Reedfish, Sailfin molly, Sheepshead minnow, Siamese fighting fish, Spiny chromis, Spotted gar, Stickleback, Tetraodon, Tiger tail seahorse, Tongue sole, Turbot, Turquoise killifish, Yellowtail amberjack, Zebra mbuna, Zebrafish, Zig-zag eel | EPO-Extended |
92 eutherian mammals | Alpaca, Alpine marmot, American bison, American black bear, American mink, Arabian camel, Arctic ground squirrel, Armadillo, Beluga whale, Black snub-nosed monkey, Blue whale, Bolivian squirrel monkey, Bonobo, Bushbaby, Cattle, Chacoan peccary, Chimpanzee, Chinese hamster CHOK1GS, Chinese hamster PICR, Coquerel's sifaka, Crab-eating macaque, Degu, Dingo, Dog, Dolphin, Domestic cat, Domestic yak, Donkey, Drill, Elephant, Eurasian red squirrel, Ferret, Giant panda, Gibbon, Goat, Golden Hamster, Golden snub-nosed monkey, Gorilla, Greater bamboo lemur, Greater horseshoe bat, Guinea Pig, Hedgehog, Horse, Human, Hybrid - Bos Indicus, Hyrax, Kangaroo rat, Leopard, Lesser Egyptian jerboa, Lesser hedgehog tenrec, Lion, Long-tailed chinchilla, Ma's night monkey, Macaque, Megabat, Microbat, Mouse, Mouse Lemur, Naked mole-rat female, Narwhal, Northern American deer mouse, Norway rat - BN/NHsdMcwi, Olive baboon, Panamanian white-faced capuchin, Pig, Pig-tailed macaque, Pika, Polar bear, Prairie vole, Rabbit, Red fox, Ryukyu mouse, Sheep, Shrew, Shrew mouse, Siberian musk deer, Sloth, Sooty mangabey, Sperm whale, Squirrel, Steppe mouse, Sumatran orangutan, Tarsier, Tiger, Tree Shrew, Upper Galilee mountains blind mole rat, Vaquita, Vervet-AGM, Western wild mouse, White-tufted-ear marmoset, Wild yak, Yarkand deer | EPO-Extended |
23 pig breeds | Cattle, Horse, Pig, Pig - Bama miniature, Pig - Bamei, Pig - Berkshire, Pig - Duroc, Pig - Hampshire, Pig - Jinhua, Pig - Landrace, Pig - Largewhite, Pig - Meishan, Pig - NIHS-2020, Pig - Ningxiang, Pig - Ossabaw miniature, Pig - PB115, Pig - Pietrain, Pig - Rongchang, Pig - Tibetan, Pig - Wuzhishan, Pig - euw1 (european wild boar), Pig USMARC, Sheep | EPO-Extended |
10 fowl | Chicken, Chicken (Red Jungle fowl), Chicken (paternal White leghorn layer), Helmeted guineafowl, Indian peafowl, Japanese quail, Mallard, Muscovy Duck (domestic type), Pink-footed goose, Ring-necked pheasant | Cactus |
31 primates | Angola colobus, Black snub-nosed monkey, Bolivian squirrel monkey, Bonobo, Bushbaby, Chicken, Chimpanzee, Coquerel's sifaka, Crab-eating macaque, Drill, Gelada, Gibbon, Golden snub-nosed monkey, Gorilla, Greater bamboo lemur, Green anole, Human, Ma's night monkey, Macaque, Mouse, Mouse Lemur, Olive baboon, Panamanian white-faced capuchin, Pig-tailed macaque, Sooty mangabey, Sumatran orangutan, Tarsier, Tropical clawed frog, Ugandan red Colobus, Vervet-AGM, White-tufted-ear marmoset | Cactus |
26 rodents | Alpine marmot, American beaver, Arctic ground squirrel, Brazilian guinea pig, Chicken, Chinese hamster PICR, Damara mole rat, Daurian ground squirrel, Degu, Eurasian red squirrel, Golden Hamster, Green anole, Guinea Pig, Human, Kangaroo rat, Lesser Egyptian jerboa, Long-tailed chinchilla, Mongolian gerbil, Mouse, Naked mole-rat female, Northern American deer mouse, Prairie vole, Squirrel, Steppe mouse, Tropical clawed frog, Upper Galilee mountains blind mole rat | Cactus |
Alignment methods
PECAN Multiple Alignment
Pecan [1] is used to provide global multiple genomic alignments. First, Mercator is used to build a synteny map between the genomes and then Pecan builds alignments in these syntenic regions.
Pecan is a global multiple sequence alignment program that makes practical the probabilistic consistency methodology for significant numbers of sequences of practically arbitrary length. As input it takes a set of sequences and a phylogenetic tree. The parameters and heuristics it employs are highly user configurable, it is written entirely in Java and also requires the installation of exonerate [2].
EPO Multiple Alignment
The EPO (Enredo, Pecan, Ortheus) [1] pipeline is a three step pipeline for whole-genome multiple alignments.
- Enredo produces colinear segments from extant genomes handling both rearrangements, deletions and duplications.
- Pecan, as described above, is used to align these segments.
- Finally, Ortheus is used to create genome-wide ancestral sequence reconstructions.
The pipeline requires alignments of so-called anchor sequences, which are explained here.
EPO-Extended Multiple Alignment
Due to difficulties with running Ortheus on the fragmented assemblies, we have two flavours of the pipeline.
- The plain EPO pipeline is available on the chromosome-level genomes, listed as EPO in the table above
- The scaffold-level genomes are then projected onto the EPO alignments using LastZ-net alignments, listed as EPO-Extended.
By construction, each pair of EPO and EPO-Extended alignments represent the exact same alignment of chromosome-level genomes.
Progressive Cactus
Progressive-Cactus [3] is a next-generation aligner that stores whole-genome alignments in a graph structure. Genomes can be added incrementally, which makes it scalable to hundreds of genomes.
The Ensembl Compara Perl API provides access to Cactus alignment data in one of two ways: via HAL file (CACTUS_HAL) or database (CACTUS_DB).
Cactus alignment via HAL file
Alignments of type CACTUS_HAL are accessed via a HAL file [4]. For performance reasons, alignments are filtered to remove blocks whose length is below a threshold set to approximately one thousandth the size of the genomic region being accessed. Within each alignment block, aligned sequences are deduplicated per genome, keeping only the aligned sequence with the greatest number of nucleotides for the given genome.
Cactus alignment via database
Alignments of type CACTUS_DB are preloaded from a HAL file into a MySQL database following an approach similar to that used by cactus-hal2maf [3] (version 2.9.7).
-
Dump a MAF alignment file for a given reference genome
(e.g. Homo sapiens) and sequence region (typically 500 kilobases in length) using hal2maf
[4] (version 2.2) with command-line options:
--noAncestors --unique
- Filter out aligned sequences with fewer than 5 nucleotides, and filter out alignment blocks with fewer than 20 alignment columns.
-
Normalise the alignment to merge smaller alignment blocks using taffy
(commit 5221c50)
with command-line options:
--filterGapCausingDupes --maximumBlockLengthToMerge 8000 --maximumGapLength 1200
-
Deduplicate alignments per genome within each MAF block using the mafDuplicateFilter command of mafTools [5]
(commit 259e5b4 of ComparativeGenomicsToolkit version) with command-line option:
--keep-first
- Load MAF alignment blocks into the output MySQL database.
CACTUS_DB alignments are also filtered by the Compara Perl API at access time, with the minimum block length set to one hundredth the size of the accessed region.
References
- Paten B, Herrero J, Beal K, Fitzgerald S, Birney E. "Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs." Genome Res. 2008 Nov;18(11):1814-28.
- Slater GS, Birney E. "Automated generation of heuristics for biological sequence comparison." BMC Bioinformatics. 2005 Feb;6:31.
- Armstrong J, Hickey G, Diekhans M, et al. "Progressive Cactus is a multiple-genome aligner for the thousand-genome era." Nature. 2020 Nov;587(7833):246-251.
- Hickey G, Paten B, Earl D, Zerbino D, Haussler D. "HAL: a hierarchical format for storing and analyzing multiple genome alignments." Bioinformatics. 2013 May;29(10):1341-1342.
- Earl D, Nguyen N, Hickey G, et al. "Alignathon: a competitive assessment of whole-genome alignment methods." Genome Research. 2014 Dec;24(12):2077-2089.