Title: | Genome-Wide Association Study with SLOPE |
---|---|
Description: | Genome-wide association study (GWAS) performed with SLOPE, short for Sorted L-One Penalized Estimation, a method for estimating the vector of coefficients in linear model. In the first step of GWAS, SNPs are clumped according to their correlations and distances. Then, SLOPE is performed on data where each clump has one representative. |
Authors: | Damian Brzyski [aut], Christine Peterson [aut], Emmanuel J. Candes [aut], Malgorzata Bogdan [aut], Chiara Sabatti [aut], Piotr Sobczyk [cre, aut] |
Maintainer: | Piotr Sobczyk <[email protected]> |
License: | GPL-3 |
Version: | 0.38.1 |
Built: | 2025-03-09 03:56:51 UTC |
Source: | https://github.com/psobczyk/geneslope |
Clumping procedure performed on SNPs, columns of matrix X
, from
object of class screeningResult
,
which is an output of function screen_snps
.
SNPs are clustered based on their correlations. For details see package vignette.
clump_snps(screenResult, rho = 0.5, pValues = NULL, verbose = TRUE)
clump_snps(screenResult, rho = 0.5, pValues = NULL, verbose = TRUE)
screenResult |
object of class screeningResult |
rho |
numeric, minimal correlation between two SNPs to be assigned to one clump |
pValues |
numeric vector, p-values for SNPs computed outside geneSLOPE, eg. with EMMAX |
verbose |
logical, if TRUE (default) progress bar is shown |
object of class clumpingResult
A result of procedure for snp clumping produced by clump_snps
Always a named list of eleven elements
X
numeric matrix, consists of one snp representative for each clump
y
numeric vector, phenotype
SNPnumber
numeric vector, which columns in SNP matrix X_all
are related to clumps representatives
SNPclumps
list of numeric vectors, which columns in SNP matrix
X_all
are related to clump members
X_info
data.frame, mapping information about SNPs from .map file.
Copied from the result of screening procedure.
selectedSnpsNumbers
numeric vector, which rows of X_info
matrix are related to selected clump representatives
X_all
numeric matrix, all the snps that passed screening procedure
numberOfSnps
numeric, total number of SNPs before screening procedure
selectedSnpsNumbersScreening
numeric vector, which rows of X_info
data.frame are related to snps that passed screening
pVals
numeric vector, p-values from marginal tests for each snp
pValMax
numeric, p-value used in screening procedure
Computes sequences for SLOPE according to several pre-defined methods.
create_lambda(n, p, fdr = 0.2, method = c("bhq", "gaussian"))
create_lambda(n, p, fdr = 0.2, method = c("bhq", "gaussian"))
n |
number of observations |
p |
number of variables |
fdr |
target False Discovery Rate (FDR) |
method |
method to use for computing |
The following methods for computing are supported:
bhq
: Computes sequence inspired by Benjamini-Hochberg (BHq)
procedure
gaussian
: Computes modified BHq sequence inspired by
Gaussian designs
Package geneSLOPE performes genome-wide association study (GWAS) with SLOPE, short for Sorted L-One Penalized Estimation. SLOPE is a method for estimating the vector of coefficients in linear model. For details about it see references.
GWAS is splitted into three steps.
In the first step data is read using bigmemory package and immediatly screened using marginal tests for each SNP
SNPs are clumped based on their correlations
SLOPE is performed on data where each clump has one representative (therefore we ensure that variables in linear model are not strognly correlated)
Version: 0.38.1
Malgorzata Bogdan, Damian Brzyski, Emmanuel J. Candes, Christine Peterson, Chiara Sabatti, Piotr Sobczyk
Maintainer: Piotr Sobczyk [email protected]
SLOPE – Adaptive Variable Selection via Convex Optimization, Malgorzata Bogdan, Ewout van den Berg, Chiara Sabatti, Weijie Su and Emmanuel Candes
famFile <- system.file("extdata", "plinkPhenotypeExample.fam", package = "geneSLOPE") mapFile <- system.file("extdata", "plinkMapExample.map", package = "geneSLOPE") snpsFile <- system.file("extdata", "plinkDataExample.raw", package = "geneSLOPE") phe <- read_phenotype(filename = famFile) screening.result <- screen_snps(snpsFile, mapFile, phe, pValMax = 0.05, chunkSize = 1e2) clumping.result <- clump_snps(screening.result, rho = 0.3, verbose = TRUE) slope.result <- select_snps(clumping.result, fdr=0.1) ## Not run: gui_geneSLOPE() ## End(Not run)
famFile <- system.file("extdata", "plinkPhenotypeExample.fam", package = "geneSLOPE") mapFile <- system.file("extdata", "plinkMapExample.map", package = "geneSLOPE") snpsFile <- system.file("extdata", "plinkDataExample.raw", package = "geneSLOPE") phe <- read_phenotype(filename = famFile) screening.result <- screen_snps(snpsFile, mapFile, phe, pValMax = 0.05, chunkSize = 1e2) clumping.result <- clump_snps(screening.result, rho = 0.3, verbose = TRUE) slope.result <- select_snps(clumping.result, fdr=0.1) ## Not run: gui_geneSLOPE() ## End(Not run)
A graphical user interface for performing Genome-wide Association Study with SLOPE
gui_geneSLOPE()
gui_geneSLOPE()
requires installing shiny package
null
identify_clump
identify_clump(x, ...)
identify_clump(x, ...)
x |
appropiate class object |
... |
other arguments |
Enable interactive selection of snps in plot. Return clump number.
Identify clump number in clumpingResult class plot
## S3 method for class 'clumpingResult' identify_clump(x, ...)
## S3 method for class 'clumpingResult' identify_clump(x, ...)
x |
clumpingResult class object |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Identify clump number in selectionResult class plot
## S3 method for class 'selectionResult' identify_clump(x, ...)
## S3 method for class 'selectionResult' identify_clump(x, ...)
x |
selectionResult class object |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Phenotype data
Always a named list of two elements
y
numeric vector, phenotype
yInfo
data.frame, additional information about observations
provied in .fam file
Plot selectionResult class object
## S3 method for class 'selectionResult' plot(x, chromosomeNumber = NULL, clumpNumber = NULL, ...)
## S3 method for class 'selectionResult' plot(x, chromosomeNumber = NULL, clumpNumber = NULL, ...)
x |
selectionResult class object |
chromosomeNumber |
optional parameter, only selected chromosome will be plotted |
clumpNumber |
optional parameter, only SNPs from selected clump will be plotted |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Print clumpingResult class object
## S3 method for class 'clumpingResult' print(x, ...)
## S3 method for class 'clumpingResult' print(x, ...)
x |
clumpingResult class object |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Print phenotypeData class object
## S3 method for class 'phenotypeData' print(x, ...)
## S3 method for class 'phenotypeData' print(x, ...)
x |
phenotypeData class object |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Print function for class screeningResult class
## S3 method for class 'screeningResult' print(x, ...)
## S3 method for class 'screeningResult' print(x, ...)
x |
screeningResult class object |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Print selectionResult class object
## S3 method for class 'selectionResult' print(x, ...)
## S3 method for class 'selectionResult' print(x, ...)
x |
selectionResult class object |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Nothing.
Reading phenotype data from file. It is assumed, that data is given in .fam file. In this format, first column is family id (FID), second is individual id (IID), third is Paternal individual ID (PAT), fourth is Maternal individual ID (MAT), fifth is SEX and sixth and last is PHENOTYPE. If file has only four columns, then it is assumed that PAT and MAT columns are missing. If there is only one column, then it is assumed that only phenotype is provided.
read_phenotype(filename, sep = " ", header = FALSE, stringAsFactors = FALSE)
read_phenotype(filename, sep = " ", header = FALSE, stringAsFactors = FALSE)
filename |
character, name of file with phenotype |
sep |
character, field seperator in file |
header |
logical, does first row of file contain variables names |
stringAsFactors |
logical, should character vectors be converted to factors? |
object of class phenotypeData
Reading .raw file that was previously exported from PLINK - see details. Additional information about SNP mapping is read from .map file.
screen_snps( rawFile, mapFile = "", phenotype, pValMax = 0.05, chunkSize = 100, verbose = TRUE )
screen_snps( rawFile, mapFile = "", phenotype, pValMax = 0.05, chunkSize = 100, verbose = TRUE )
rawFile |
character, name of .raw file |
mapFile |
character, name of .map file |
phenotype |
numeric vector or an object of class |
pValMax |
numeric, p-value threshold value used for screening |
chunkSize |
integer, number of snps that will be processed together. The bigger chunkSize is, the faster function works but computer might run out of RAM |
verbose |
if TRUE (default) information about progress is printed |
Exporting data from PLINK
To import data to R, it needs to be exported from
PLINK using the option "–recodeAD"
The PLINK command should therefore look like
plink --file input --recodeAD --out output
.
For more information, please refer to:
http://pngu.mgh.harvard.edu/~purcell/plink/dataman.shtml
object of class screeningResult
A result of procedure for snp clumping produced by screen_snps
Always a named list of eight elements
X
numeric matrix, consists of snps that passed screening
y
numeric vector, phenotype
X_info
data.frame, SNP info from .map file
pVals
numeric vector, p-values from marginal tests for each snp
numberOfSnps
numeric, total number of SNPs in .raw file
selectedSnpsNumbers
numeric vector, which rows of X_info
data.frame are related to snps that passed screening
pValMax
numeric, p-value used in screening procedure
phenotypeInfo
data.frame, additional information about observations
provied in phenotypeData
object
Performs GWAS with SLOPE on given snp matrix and phenotype. At first clumping procedure is performed. Highly correlated (that is stronger than parameter rho) snps are clustered. Then SLOPE is used on snp matrix which contains one representative for each clump.
select_snps( clumpingResult, fdr = 0.1, type = c("slope", "smt"), lambda = "gaussian", sigma = NULL, verbose = TRUE )
select_snps( clumpingResult, fdr = 0.1, type = c("slope", "smt"), lambda = "gaussian", sigma = NULL, verbose = TRUE )
clumpingResult |
clumpProcedure output |
fdr |
numeric, False Discovery Rate for SLOPE |
type |
method for snp selection. slope (default value) is SLOPE on clump representatives, smt is Benjamini-Hochberg procedure on single marker test p-values for clump representatives |
lambda |
lambda for SLOPE. See |
sigma |
numeric, sigma for SLOPE |
verbose |
logical, if TRUE progress bar is printed |
object of class selectionResult
## Not run: slope.result <- select_snps(clumping.result, fdr=0.1) ## End(Not run)
## Not run: slope.result <- select_snps(clumping.result, fdr=0.1) ## End(Not run)
A result of applying SLOPE to matrix of SNPs obtained by
clumping produced. Result of function select_snps
Always a named list of eighteen elements
X
numeric matrix, consists of one snp representative for each clump
selected by SLOPE
effects
numeric vector, coefficients in linear model build on
snps selected by SLOPE
R2
numeric, value of R-squared in linear model build on
snps selected by SLOPE
selectedSNPs
which columns in matrix X_all
are related to snps selected by SLOPE
y
selectedClumps list of numeric vectors, which columns in SNP matrix
X_all
are related to clump members selected by SLOPE
lambda
numeric vector, lambda values used by SLOPE procedure
y
numeric vector, phenotype
clumpRepresentatives
numeric vector, which columns in SNP matrix X_all
are related to clumps representatives
clumps
list of numeric vectors, which columns in SNP matrix
X_all
are related to clump members
X_info
data.frame, mapping information about SNPs from .map file.
Copied from the result of clumping procedure
X_clumps
numeric matrix, consists of one snp representative for each clump
X_all
numeric matrix, all the snps that passed screening procedure
selectedSnpsNumbers
numeric vector, which rows of X_info
data.frame are related to snps that were selected by SLOPE
clumpingRepresentativesNumbers
numeric vector, which rows of X_info
data.frame are related to snps that are clump represenatives
screenedSNPsNumbers
numeric vector, which rows of X_info
data.frame are related to snps that passed screening
numberOfSnps
numeric, total number of SNPs before screening procedure
pValMax
numeric, p-value used in screening procedure
fdr
numeric, false discovery rate used by SLOPE
screeningResult
clumpingResult
select_snps
SLOPE
Summary clumpingResult class object
## S3 method for class 'clumpingResult' summary(object, ...)
## S3 method for class 'clumpingResult' summary(object, ...)
object |
clumpingResult class object |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Summary phenotypeData class object
## S3 method for class 'phenotypeData' summary(object, ...)
## S3 method for class 'phenotypeData' summary(object, ...)
object |
phenotypeData class object |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Summary function for class screeningResult
## S3 method for class 'screeningResult' summary(object, ...)
## S3 method for class 'screeningResult' summary(object, ...)
object |
screeningResult class object |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |
Summary selectionResult class object
## S3 method for class 'selectionResult' summary(object, clumpNumber = NULL, ...)
## S3 method for class 'selectionResult' summary(object, clumpNumber = NULL, ...)
object |
selectionResult class object |
clumpNumber |
number of clump to be summarized |
... |
Further arguments to be passed to or from other methods. They are ignored in this function. |