BiomaRt, Bioconductor R package
The Bioconductor BiomaRt R package is a quick, easy and powerful way to access BioMart right from your R software terminal.
The following documention is using R 2.2 and Bioconductor version 3.1.
Summary
- How to install the Bioconductor BiomaRt R package
- Bioconductor BiomaRt R package documentation
- Bioconductor BiomaRt R examples with the Ensembl Gene mart
How to install the Bioconductor BiomaRt R package
First make sure you have installed the R software on your computer.
Then, run the following commands to install the Bioconductor BiomaRt R package:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("biomaRt")
Bioconductor BiomaRt R package documentation
More information regarding the Bioconductor BiomaRt, R package and documentation can be found on the BiomaRt Bioconductor page.
Bioconductor BiomaRt R examples with the Ensembl Gene mart
listEnsembl & listDatasets
To get the list of all the Ensembl mart availables on the ensembl.org website, run the "listEnsembl" function:
> library(biomaRt) > listEnsembl() biomart version 1 ensembl Ensembl Genes 79 2 snp Ensembl Variation 79 3 regulation Ensembl Regulation 79
You can give an Ensembl archive version as a parameter to get the list of archived Ensembl marts, for example for the Ensembl GRCh37 or release 78 marts:
> listEnsembl(GRCh=37) biomart version 1 ensembl Ensembl Genes 2 snp Ensembl Variation 3 regulation Ensembl Regulation > listEnsembl(version=78) biomart version 1 ensembl Ensembl Genes 78 2 snp Ensembl Variation 78 3 regulation Ensembl Regulation 78
The "listDatasets" function will give you the list of all the species available (mart datasets) for a given mart:
> library(biomaRt) > ensembl = useEnsembl(biomart="ensembl") > head(listDatasets(ensembl)) dataset description version 1 oanatinus_gene_ensembl Ornithorhynchus anatinus genes (OANA5) OANA5 2 cporcellus_gene_ensembl Cavia porcellus genes (cavPor3) cavPor3 3 gaculeatus_gene_ensembl Gasterosteus aculeatus genes (BROADS1) BROADS1 4 lafricana_gene_ensembl Loxodonta africana genes (loxAfr3) loxAfr3 5 itridecemlineatus_gene_ensembl Ictidomys tridecemlineatus genes (spetri2) spetri2 6 choffmanni_gene_ensembl Choloepus hoffmanni genes (choHof1) choHof1
You can also use listDatasets with the Ensembl GRCh37 and archived marts:
> library(biomaRt) > grch37 = useEnsembl(biomart="ensembl",GRCh=37) > listDatasets(grch37)[31:35,] dataset description version 31 hsapiens_gene_ensembl Homo sapiens genes (GRCh37.p13) GRCh37.p13 32 mfuro_gene_ensembl Mustela putorius furo genes (MusPutFur1.0) MusPutFur1.0 33 tbelangeri_gene_ensembl Tupaia belangeri genes (tupBel1) tupBel1 34 ggallus_gene_ensembl Gallus gallus genes (Galgal4) Galgal4 35 xtropicalis_gene_ensembl Xenopus tropicalis genes (JGI4.2) JGI4.2 > ensembl78 = useEnsembl(biomart="ensembl",version=78) > listDatasets(ensembl78)[31:35,] dataset description version 31 mlucifugus_gene_ensembl Myotis lucifugus genes (myoLuc2) myoLuc2 32 hsapiens_gene_ensembl Homo sapiens genes (GRCh38) GRCh38 33 pformosa_gene_ensembl Poecilia formosa genes (PoeFor_5.1.2) PoeFor_5.1.2 34 mfuro_gene_ensembl Mustela putorius furo genes (MusPutFur1.0) MusPutFur1.0 35 tbelangeri_gene_ensembl Tupaia belangeri genes (tupBel1) tupBel1
useEnsembl
The "useEnsembl" function allow you to connect to a an ensembl website mart by specifying a BioMart and dataset parameters. For example, to connect to the Ensembl live gene mart human dataset (GRCh38):
> library(biomaRt) > ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
To connect to the human dataset of the Ensembl GRCh37 or release 78 gene marts:
> library(biomaRt) > ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl", GRCh=37) > ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl", version=78)
listMarts, listDatasets and useMart for the Ensembl mirrors
You can connect to the following Ensembl mirrors using the listMarts, listDatasets and useMart functions:
- Ensembl US West: http://uswest.ensembl.org/index.html
- Ensembl US East: http://useast.ensembl.org/index.html
- Ensembl Asia: http://asia.ensembl.org/index.html
For example to connect to the Ensembl US West mirror:
> library(biomaRt) > listMarts(host="uswest.ensembl.org") biomart version 1 ENSEMBL_MART_ENSEMBL Ensembl Genes 79 2 ENSEMBL_MART_SNP Ensembl Variation 79 3 ENSEMBL_MART_FUNCGEN Ensembl Regulation 79 4 ENSEMBL_MART_VEGA Vega 59 5 pride PRIDE (EBI UK) > ensembl_us_west = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="uswest.ensembl.org") > head(listDatasets(ensembl_us_west)) dataset description version 1 oanatinus_gene_ensembl Ornithorhynchus anatinus genes (OANA5) OANA5 2 cporcellus_gene_ensembl Cavia porcellus genes (cavPor3) cavPor3 3 gaculeatus_gene_ensembl Gasterosteus aculeatus genes (BROADS1) BROADS1 4 lafricana_gene_ensembl Loxodonta africana genes (loxAfr3) loxAfr3 5 itridecemlineatus_gene_ensembl Ictidomys tridecemlineatus genes (spetri2) spetri2 6 choffmanni_gene_ensembl Choloepus hoffmanni genes (choHof1) choHof1
Please note that the useMart function will always require a biomart and host parameters when connecting to an Ensembl mirror website.
listFilters & listAttributes
The "listFilters" function will give you the list of available filters for a given mart and species:
> library(biomaRt) > ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl") > head(listFilters(ensembl)) name description 1 chromosome_name Chromosome name 2 start Gene Start (bp) 3 end Gene End (bp) 4 band_start Band Start 5 band_end Band End 6 marker_start Marker Start
The "listAttributes" function will give you the list of the available attributes for a given mart and species:
> library(biomaRt) > ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl") > head(listAttributes(ensembl)) name description 1 ensembl_gene_id Ensembl Gene ID 2 ensembl_transcript_id Ensembl Transcript ID 3 ensembl_peptide_id Ensembl Protein ID 4 ensembl_exon_id Ensembl Exon ID 5 description Description 6 chromosome_name Chromosome Name
getBM
The "getBM" function allow you to build a BioMart query using a list of mart filters and attributes.
Example query: Fetch all the Ensembl gene, transcript IDs, HGNC symbols and chromosome locations located on the human chromosome 1
> library(biomaRt) > ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl") > chr1_genes <- getBM(attributes=c('ensembl_gene_id', 'ensembl_transcript_id','hgnc_symbol','chromosome_name','start_position','end_position'), filters = 'chromosome_name', values ="1", mart = ensembl) > head(chr1_gene) ensembl_gene_id ensembl_transcript_id hgnc_symbol chromosome_name start_position end_position 1 ENSG00000231510 ENST00000443270 1 5086459 5090899 2 ENSG00000162444 ENST00000315901 RBP7 1 9997206 10016020 3 ENSG00000162444 ENST00000294435 RBP7 1 9997206 10016020 4 ENSG00000270171 ENST00000602640 1 7693124 7694844 5 ENSG00000225643 ENST00000412797 1 25581478 25590356 6 ENSG00000116497 ENST00000530710 S100PBP 1 32816767 32858879
Example query: Fetch Ensembl Gene, Transcript IDs, HGNC symbols and Uniprot Swissprot accessions mapped to the human Ensembl Gene ID "ENSG00000139618"
> library(biomaRt) > ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl") > hgnc_swissprot <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','hgnc_symbol','uniprot_swissprot'),filters = 'ensembl_gene_id', values = 'ENSG00000139618', mart = ensembl) > hgnc_swissprot ensembl_gene_id ensembl_transcript_id hgnc_symbol uniprot_swissprot 1 ENSG00000139618 ENST00000380152 BRCA2 P51587 2 ENSG00000139618 ENST00000528762 BRCA2 3 ENSG00000139618 ENST00000470094 BRCA2 4 ENSG00000139618 ENST00000544455 BRCA2 P51587