Spotted gar assembly and gene annotation

Name: Ensembl Spotted gar Gene Set
Creator: Ensembl
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: genebuild, transcripts, transcription, alignment, loci

Assembly

This site displays annotation on version 1 (January 2012) of the spotted gar (Lepisosteus oculatus) genome assembly, known as 'LepOcu1'.

It was produced by the Broad Institute of MIT and Harvard. The primary assembly comprises 29 chromosomes and 1,896 unplaced scaffolds. The collection of 45,199 contigs included in this assembly have an N50 value of 68kb. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.

Gene annotation

The spotted gar LepOcu1 assembly was annotated using the standard Ensembl gene annotation system incorporating RNASeq data. The annotation process is described in the document below.

Detailed information on genebuild (PDF)

RNASeq data set

In addition to the main set, we have predicted gene models for each tissue type using the RNA-Seq pipeline. We did a BLASTp of these models against UniProt vertebrate proteins of protein existence level 1 and 2 in order to confirm the open reading frame (ORF). The best BLAST hit is displayed as a transcript supporting evidence.

The tissue-specific sets of transcript models built using our RNAseq pipeline are as follows:

Tissue	Number of gene models
Brain	17523
Embryo	18376
Eye	18017
Heart	16323
Kidney	18037
Larvae	18577
Liver	16487
Muscle	14740
Skin	17959
Testis	18409
Merged	19683

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	LepOcu1, INSDC Assembly GCA_000242695.1, Dec 2011
Base Pairs	945,878,036
Golden Path Length	945,878,036
Annotation provider	Ensembl
Annotation method	Full genebuild
Genebuild started	Jan 2012
Genebuild released	Dec 2013
Genebuild last updated/patched	Oct 2016
Database version	115.1

Gene counts

Gene/transcipt that contains an open reading frame (ORF).Coding genes	18,341
Non coding genes	4,932
Small non coding genes	2,593
Long non coding genes	2,313
Misc non coding genes	26
A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes	42
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	27,887

Other

Genscan gene predictions

30,348

Spotted gar assembly and gene annotation

Assembly

Gene annotation

RNASeq data set

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Spotted gar assembly and gene annotation

Assembly

Gene annotation

RNASeq data set

More information

Statistics

Summary

Gene counts

Other

About Us

Get help

Our sister sites

Follow us