Amazon molly assembly and gene annotation

Name: Ensembl Amazon molly Gene Set
Creator: Ensembl
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: genebuild, transcripts, transcription, alignment, loci

Assembly

The Amazon molly (Poecilia formosa) genome sequence was produced in October 2013 by the Aquatic Genome Models Consortium.

The genome is 1Gb in length, consisting of 3,985 toplevel sequences, all of which are unplaced scaffolds (from 31,058 contigs). The N50 of the contigs of the submitted assembly is 57.47 Kb and the N50 of the scaffolds is 1.574 Mb. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.

Gene annotation

The gene set was built using a mixed approach. Due to the lack of species-specific sequences and the availability of RNASeq data for Amazon molly from Washington University, the final gene set comprises models based on orthologous proteins from the vertebrate division of UniProtKB, longest translations of some stickleback gene models from Ensembl 73, as well as models from RNASeq data.

8162 gene models were made exclusively from RNASeq data. The data were also used to add UTR to gene models. The total gene set contains 23615 protein-coding genes with a further 679 ncRNAs and 60 pseudogenes.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	Poecilia_formosa-5.1.2, INSDC Assembly GCA_000485575.1, Oct 2013
Base Pairs	748,923,461
Golden Path Length	748,923,461
Annotation provider	Ensembl
Annotation method	Full genebuild
Genebuild started	Nov 2013
Genebuild released	Jul 2014
Genebuild last updated/patched	Aug 2014
Database version	115.512

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	23,615
Non coding genes	679
Small non coding genes	665
Misc non coding genes	14
A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes	60
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	31,637

Other

Genscan gene predictions

45,660

Amazon molly assembly and gene annotation

Assembly

Gene annotation

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Amazon molly assembly and gene annotation

Assembly

Gene annotation

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us