Example usage

Here, we will demonstrate how to run longGWAS analyses using the example dataset provided in the testdata GitHub repository. This data allows us to perform either a cross-sectional, longitudinal, or survival analysis.

To access the data, clone the repository like so:

git clone https://github.com/AMCalejandro/testdata

Cross-sectional analysis

The default longGWAS parameters are set to run a cross-sectional analysis. We can perform this analysis by simply running the following Nextflow command line arguments:

nextflow run michael-ta/longitudinal-GWAS-pipeline -profile standard -r main

The default parameters are specified in the nextflow.config file (shown in the Nextflow Configuration) section. To pass your custom parameters, you can modify the above command using command line parameters from the Command line Parameters section. For example:

nextflow run michael-ta/longitudinal-GWAS-pipeline \
--chunk_size 60000 --minor_allele_freq '0.01' --dataset 'TEST_2' \
-profile standard -r main

If you intend to customise multiple parameters, we recommend you modify the params.yml file and pass it using the -params-file <params.yml> argument. The YAML file contains all the parameters from the Command line Parameters section which allows us to modify them in a more convenient fashion. For example, we can achieve the same result as the command above by changing the chunk_size, minor_allele_freq, and dataset parameters, as indicated by the -> symbol:

# Input files
input                 = "$PWD/example/genotype/chr[1-3].vcf"
covarfile             = "$PWD/example/covariates.tsv"
phenofile             = "$PWD/example/phenotype.cs.tsv"

# Variables names
pheno_name            = 'y'
covariates            = 'SEX age_at_baseline'
study_col             = 'study_arm'
time_col              = 'study_days'

# Model variables
longitudinal_flag     = false
survival_flag         = false
linear_flag           = true
chunk_flag            = true
-> chunk_size            = 60000
plink_chunk_size      = 10000

# Parameters for genetic QC
r2thres               = -9
-> minor_allele_freq     = '0.01'
minor_allele_ct       = '20'
kinship               = '0.177'
ancestry              = 'EUR'
assembly              = 'hg19'

# Identifier for the input genotype files - useful to cache results
-> dataset               = 'TEST_2'

# Generate manhattan plot with result files
mh_plot               = true
}

The analyses can therefore be run like so:

nextflow run michael-ta/longitudinal-GWAS-pipeline \
-params-file params.yml \
-profile standard -r main

Longitudinal analysis

To run a longitudinal analysis, we will need to change the input phenofile, as well as activate the longitudinal_flag. We can do this by specifying these parameters in the Nextflow command:

nextflow run michael-ta/longitudinal-GWAS-pipeline \
--phenofile "$PWD/example/phenotype.lt.tsv" --longitudinal_flag true --dataset 'LONG' \
-profile standard -r main

Alternatively, we can pass the params.yml file using the -params-file option:

nextflow run michael-ta/longitudinal-GWAS-pipeline \
-params-file params.yml \
-profile standard -r main

With the following modified parameters:

# Input files
input                 = "$PWD/example/genotype/chr[1-3].vcf"
covarfile             = "$PWD/example/covariates.tsv"
-> phenofile             = "$PWD/example/phenotype.lt.tsv"

# Variables names
pheno_name            = 'y'
covariates            = 'SEX age_at_baseline'
study_col             = 'study_arm'
time_col              = 'study_days'

# Model variables
-> longitudinal_flag     = true
survival_flag         = false
-> linear_flag           = false
chunk_flag            = true
chunk_size            = 30000
plink_chunk_size      = 10000

# Parameters for genetic QC
r2thres               = -9
minor_allele_freq     = '0.05'
minor_allele_ct       = '20'
kinship               = '0.177'
ancestry              = 'EUR'
assembly              = 'hg19'

# Identifier for the input genotype files - useful to cache results
-> dataset               = 'LONG'

# Generate manhattan plot with result files
mh_plot               = true
}

Survival analysis

To run a survival analysis, we will need to change the input phenofile, as well as activating the survival_flag. We can do this by modifying these parameters in the Nextflow command:

nextflow run michael-ta/longitudinal-GWAS-pipeline \
--phenofile "$PWD/example/phenotype.surv.tsv" --survival_flag true --dataset 'SURV' \
-profile standard -r main

Alternatively, we can pass the params.yml file like so:

nextflow run michael-ta/longitudinal-GWAS-pipeline \
-params-file params.yml \
-profile standard -r main

With the following modified parameters:

# Input files
input                 = "$PWD/example/genotype/chr[1-3].vcf"
covarfile             = "$PWD/example/covariates.tsv"
-> phenofile             = "$PWD/example/phenotype.surv.tsv"

# Variables names
pheno_name            = 'y'
covariates            = 'SEX age_at_baseline'
study_col             = 'study_arm'
time_col              = 'study_days'

# Model variables
longitudinal_flag     = false
-> survival_flag         = true
-> linear_flag           = false
chunk_flag            = true
chunk_size            = 30000
plink_chunk_size      = 10000

# Parameters for genetic QC
r2thres               = -9
minor_allele_freq     = '0.05'
minor_allele_ct       = '20'
kinship               = '0.177'
ancestry              = 'EUR'
assembly              = 'hg19'

# Identifier for the input genotype files - useful to cache results
-> dataset               = 'SURV'

# Generate manhattan plot with result files
mh_plot               = true
}