AD brain-specific TWAS analysis

Table of Contents

Introduction

Here, we perform discovery and replication of AD gene-level associations in brain tissues using fqtl models.

Setup

import glob
import itertools as it
import os.path
import numpy as np
import pandas as pd
import scipy.linalg as sl
import scipy.special as sp
import scipy.stats as st
import statsmodels.api as sm
%matplotlib inline
%config InlineBackend.figure_formats = set(['retina'])
import colorcet
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'Nimbus Sans'
plt.rcParams['figure.facecolor'] = 'w'

Data

Fitted elastic net models

Gamazon et al. 2015 trained elastic net models on GTEx v6p. Download the fitted models from PredictDB.

curl -OL "https://zenodo.org/record/3572842/files/GTEx-V6p-1KG-2016-11-16.tar.gz"
tar xf GTEx-V6p-1KG-2016-11-16.tar.gz
rm GTEx-V6p-1KG-2016-11-16.tar.gz

The database uses rsid as the key for the SNPs. Download the dbSNP annotation to convert rsids to (chromosome, start, end).

curl -O --ftp-pasv "https://ftp.ncbi.nih.gov/snp/pre_build152/organisms/human_9606_b151_GRCh37p13/VCF/00-common_all.vcf.gz" 
curl -O --ftp-pasv "https://ftp.ncbi.nih.gov/snp/pre_build152/organisms/human_9606_b151_GRCh37p13/VCF/00-common_all.vcf.gz.tbi"

UTMOST summary statistics

Download UTMOST summary statistics.

curl --ftp-pasv -OL "ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/UTMOST/load_utmost_sumstats/utmost.cross.tissue.sumstats.igap.discovery.txt"
curl --ftp-pasv -OL "ftp://ftp.biostat.wisc.edu/pub/lu_group/Projects/UTMOST/load_utmost_sumstats/utmost.cross.tissue.sumstats.gwax.replicate.txt"

Results

sqtl associations

Read the results.

n_chunks = 29
igap_sqtl_stat = pd.concat([pd.read_csv(f'/broad/hptmp/aksarkar/twas/imputed_IGAP_Alzheimer-{i}.sqtl.stat.txt.gz', index_col=0, sep='\t') for i in range(n_chunks)])
igap_sqtl_pval = pd.concat([pd.read_csv(f'/broad/hptmp/aksarkar/twas/imputed_IGAP_Alzheimer-{i}.sqtl.pval.txt.gz', index_col=0, sep='\t') for i in range(n_chunks)])
gwax_sqtl_stat = pd.concat([pd.read_csv(f'/broad/hptmp/aksarkar/twas/ad-gwax-hg37-{i}.sqtl.stat.txt.gz', index_col=0, sep='\t') for i in range(n_chunks)])
gwax_sqtl_pval = pd.concat([pd.read_csv(f'/broad/hptmp/aksarkar/twas/ad-gwax-hg37-{i}.sqtl.pval.txt.gz', index_col=0, sep='\t') for i in range(n_chunks)])

Find the significant associations in the discovery cohort (IGAP; FDR 5%).

igap_sqtl_sig = igap_sqtl_pval.apply(lambda x: sm.stats.multipletests(x, alpha=0.1, method='fdr_bh')[0])

For the significant associations, determine whether they replicated (GWAX; FDR 5%).

gwax_sqtl_rep = gwax_sqtl_pval[igap_sqtl_sig].apply(lambda x: sm.stats.multipletests(x, alpha=0.1, method='fdr_bh')[0])

fqtl associations

Read the results.

n_chunks = 29
igap_fqtl_stat = pd.concat([pd.read_csv(f'/broad/hptmp/aksarkar/twas/imputed_IGAP_Alzheimer-{i}.fqtl.stat.txt.gz', index_col=0, sep='\t') for i in range(n_chunks)], sort=True)
igap_fqtl_pval = pd.concat([pd.read_csv(f'/broad/hptmp/aksarkar/twas/imputed_IGAP_Alzheimer-{i}.fqtl.pval.txt.gz', index_col=0, sep='\t') for i in range(n_chunks)], sort=True)
gwax_fqtl_stat = pd.concat([pd.read_csv(f'/broad/hptmp/aksarkar/twas/ad-gwax-hg37-{i}.fqtl.stat.txt.gz', index_col=0, sep='\t') for i in range(n_chunks)], sort=True)
gwax_fqtl_pval = pd.concat([pd.read_csv(f'/broad/hptmp/aksarkar/twas/ad-gwax-hg37-{i}.fqtl.pval.txt.gz', index_col=0, sep='\t') for i in range(n_chunks)], sort=True)

Find the significant associations in the discovery cohort (IGAP; FDR 5%), and determine whether they replicated (GWAX; FDR 5%).

igap_fqtl_sig = igap_fqtl_pval.apply(lambda x: sm.stats.multipletests(x, alpha=0.1, method='fdr_bh')[0])
gwax_fqtl_rep = gwax_fqtl_pval[igap_fqtl_sig].apply(lambda x: sm.stats.multipletests(x, alpha=0.1, method='fdr_bh')[0])

UTMOST associations

Read the data.

igap_utmost_pval = pd.read_csv('/broad/hptmp/aksarkar/utmost.cross.tissue.sumstats.igap.discovery.txt', sep='\t', index_col=0)
gwax_utmost_pval = pd.read_csv('/broad/hptmp/aksarkar/utmost.cross.tissue.sumstats.gwax.replicate.txt', sep='\t', index_col=0)
igap_utmost_sig = sm.stats.multipletests(igap_utmost_pval['pvalue'], alpha=0.1, method='fdr_bh')[0]
gwax_utmost_rep = sm.stats.multipletests(gwax_utmost_pval.loc[igap_utmost_sig, 'pvalue'], alpha=0.1, method='fdr_bh')[0]

Comparison

Compare the number of discovered/replicated associations in brain tissues between sqtl/fqtl.

grid = np.arange(gwax_fqtl_rep.shape[1])
plt.clf()
fig, ax = plt.subplots(2, 1, sharex=True)
fig.set_size_inches(7.5, 5.5)

ax[0].bar(grid, igap_sqtl_sig.sum(), color='k', label='sqtl')
ax[0].bar(grid, igap_fqtl_sig.sum(), color='r', alpha=0.5, label='fqtl')
ax[0].legend()
ax[0].set_ylabel('Num disc assoc')

ax[1].bar(grid, gwax_sqtl_rep.sum() / igap_sqtl_sig.sum(), color='k')
ax[1].bar(grid, gwax_fqtl_rep.sum() / igap_fqtl_sig.sum(), color='r', alpha=0.5)
ax[1].set_xticks(grid)
ax[1].set_xticklabels(gwax_fqtl_rep.columns, rotation=90)
ax[1].set_xlabel('Tissue')
ax[1].set_ylabel('Prop repl')
fig.tight_layout()

sqtl-fqtl-ad-repl.png

Compare the total number of associations (across tissues) between sqtl, fqtl, and UTMOST.

pd.Series({'sqtl': igap_sqtl_sig.agg(any, axis=1).sum(),
           'fqtl': igap_fqtl_sig.agg(any, axis=1).sum(),
           'utmost': igap_utmost_sig.sum()})
sqtl       73
fqtl       37
utmost    192
dtype: int64

Compare the total number of replications.

pd.Series({'sqtl': gwax_sqtl_rep.agg(any, axis=1).sum(),
           'fqtl': gwax_fqtl_rep.agg(any, axis=1).sum(),
           'utmost': gwax_utmost_rep.sum()})
sqtl      26
fqtl      10
utmost    29
dtype: int64

Author: Abhishek Sarkar

Created: 2020-04-27 Mon 16:25

Validate