Post-GWAS Analysis Pipeline

The Post-GWAS Analysis Pipeline allows you to upload a tab-delimited file with GWAS summary statistics. The variant p-values and effect sizes are then finemapped and collocalised with GTEx eQTL summary statistics, to highlight likely causal gene candidates and the tissue where this effect takes place. This is currently a beta version of the tool and it may change at short notice as we fix bugs or add features.

Input

The input for the Post-GWAS Analysis Pipeline is a tab-delimited summary statistics file from GWAS. The format for this is defined by the NHGRI-EBI GWAS Catalog. This can be a zipped file. The maximum file size is 20 MB.

An example file is shown below: 

chromosome  base_pair_location  variant_id  effect_allele    other_allele   beta    standard_error  p-value
1   2035379 rs10910029  a   g   -0.159  0.1035  8.41E-05
1   2035684 rs10910030  t   c   0.1788  0.1033  5.51E-05
1   2035799 rs10752741  a   g   0.184   0.1034  5.02E-05
1   2035977 rs10752742  t   g   0.1558  0.1033  8.99E-05

Only the variant_id, p_value and beta are actually required for the Post-GWAS Analysis Pipeline to work. These are identified by reading the headers in your input file, so please ensure that these are specified correctly; the column order does not matter.

Running a job

To run a job, select your file from your computer. You can choose the representative 1000 Genomes super-population from which to calculate linkage disequilibrium (LD) from the drop-down.

Click Run.

The Post-GWAS Analysis Pipeline will filter your data based on the p-value of the association. This means that the time it takes depends on the number of above-threshold hits, which is usually correlated with the number of individuals in the study.

Your job will appear in a jobs table and indicate Done when it is finished. Click on [Download Results] to get your data.

Output

Your output will be a zipped folder, which you can expand to give you two files:

1. A short HTML report.

2. A summary report in tsv, called output2.tsv.

3. A detailed output file in tsv, called postgap_output.tsv.

Short HTML report

The short HTML report consists of three tables: Genes, SNPs and Pathways.

The SNPs table lists either be above-threshold variants from the GWAS itself, or variants in LD with GWAS variants, based on 1000 Genomes populations. These are shown linked to genes, which are genes shown by the Genotype-Tissue Expression (GTEx) project to change expression based on the linked variant in a particular tissue, which is also shown in the table. The score combines the p-value of the variant/phenotype association from the GWAS with the p-value of the variant/gene expression association. The table shows the top ten, ranked by this score.

SNPGeneTissuePosterior
A variant either from the GWAS, or in LD with a variant from the GWAS, from 1000 Genomes. The ID is a link to the variant in Ensembl.A gene shown to have its expression affected by the variant, from GTEx.The tissue in which the variant/gene expression association has been identified.Score combining the GWAS p-value with the GTEx p-value.

The genes table combines data from the SNPs table, linking together all instances of the gene. The cluster shows the location of all the variants which were identified linked to the gene. The score in this case is the combination of all the SNPs scores. The top ten are shown, ranked by this score.

GeneClusterTissuePosterior
A gene whose expression is affected by variants in the GWAS. The ID is a link to the gene in Ensembl.The location of the variants that affect the expression of this gene.The tissue in which the variant/gene expression association has been identified.Score combining the SNP scores of all SNPs in the cluster.

The Pathways table links together the associated genes by pathways from Reactome. The scores combine the gene scores for all the genes in that pathway identified. The top ten are shown, ranked by this score.

stldNameScore
Reactome pathway which involves genes linked to the phenotype. The ID is a link to the pathway in Reactome.The name of the Reactome pathway.Score combining the gene scores.

output2.tsv

The postgap report in tsv contains information about all the SNPs, grouped into clusters. Every cluster is shown.

Gene IDCluster descriptionSNP IDSNP posterior probabilityTissueCluster posterior probability
A gene whose expression is affected by variants in the GWAS. The ID is a link to the gene in Ensembl.The location of the variants that affect the expression of this gene.A variant either from the GWAS, or in LD with a variant from the GWAS, from 1000 Genomes, which is part of the cluster. The ID is a link to the variant in Ensembl.Score combining the GWAS p-value with the GTEx p-value.The tissue in which the variant/gene expression association has been identified.Score combining the SNP scores of all SNPs in the cluster.

postgap_output.tsv

The TSV contains the full output of the Post GWAS Analysis Pipeline. Its full format is described in the Post GWAS Analysis Pipeline wiki.