Assembly
The Amazon molly (Poecilia formosa) genome sequence was produced in October 2013 by the Aquatic Genome Models Consortium.
The genome is 1Gb in length, consisting of 3,985 toplevel sequences, all of which are unplaced scaffolds (from 31,058 contigs). The N50 of the contigs of the submitted assembly is 57.47 Kb and the N50 of the scaffolds is 1.574 Mb. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.
Gene annotation
The gene set was built using a mixed approach. Due to the lack of species-specific sequences and the availability of RNASeq data for Amazon molly from Washington University, the final gene set comprises models based on orthologous proteins from the vertebrate division of UniProtKB, longest translations of some stickleback gene models from Ensembl 73, as well as models from RNASeq data.
8162 gene models were made exclusively from RNASeq data. The data were also used to add UTR to gene models. The total gene set contains 23615 protein-coding genes with a further 679 ncRNAs and 60 pseudogenes.
More information
General information about this species can be found in Wikipedia.
Statistics
Summary
Assembly | Poecilia_formosa-5.1.2, INSDC Assembly GCA_000485575.1, Oct 2013 |
Base Pairs | 748,923,461 |
Golden Path Length | 748,923,461 |
Annotation provider | Ensembl |
Annotation method | Full genebuild |
Genebuild started | Nov 2013 |
Genebuild released | Jul 2014 |
Genebuild last updated/patched | Aug 2014 |
Database version | 113.512 |
Gene counts
Coding genes | 23,615 |
Non coding genes | 679 |
Small non coding genes | 665 |
Misc non coding genes | 14 |
Pseudogenes | 60 |
Gene transcripts | 31,637 |
Other
Genscan gene predictions | 45,660 |