%\VignetteIndexEntry{CGEN Vignette} %\VignettePackage{CGEN} %\VigetteDepends{CGEN} \documentclass[a4paper]{article} \begin{document} \title{CGEN(Case-control.GENetics) Package} \maketitle <>= library(CGEN) @ \section*{Example of snp.logistic} Load the ovarian cancer data and print the first 5 rows. <>= data(Xdata, package="CGEN") Xdata[1:5, ] @ For this analysis, the main effects will be "age.group", "n.children", and "oral.years". We will let "age.group" be a categorical variable in the model and we will create dummy variables for it. The dummy variables will be called "age.group\_1", "age.group\_2", $\ldots$ "age.group\_5". <>= for (a in unique(Xdata[, "age.group"])) { dummyVar <- paste("age.group_", a, sep="") Xdata[, dummyVar] <- 0 temp <- Xdata[, "age.group"] == a if (any(temp)) Xdata[temp, dummyVar] <- 1 } @ To determine the baseline category, and if any categories need to be combined, get the frequency counts for the age.group variable by case-control status. <>= table(Xdata[, "case.control"], Xdata[, "age.group"], exclude=NULL) @ We will let "age.group\_4" will be the reference category, "case.control" be the response variable and "BRCA.status" be the SNP variable. Let the variables "oral.years" and "n.children" also interact with the SNP variable. Also let the stratification variable for the constrained maximum likelihood method (CML) be "ethnic.group". <>= mainVars <- c("oral.years", "n.children", "age.group_1", "age.group_2", "age.group_3", "age.group_5") fit <- snp.logistic(Xdata, "case.control", "BRCA.status", main.vars=mainVars, int.vars=c("oral.years", "n.children"), strata.var="ethnic.group") @ Compute a summary table for the models. <>= getSummary(fit) @ Compute Wald tests for the main effect of the SNP and interactions. <>= getWaldTest(fit, c("BRCA.status", "BRCA.status:oral.years", "BRCA.status:n.children")) @ \section*{Example of snp.matched} First let us use "age.group1", "gynSurgery.history" and "BRCA.history" to match the subjects finely into small sets. We will perform the matching only within each ethnic group. We check the case control distribution within ethnic groups. <>= table(Xdata$case.control, Xdata$ethnic.group) @ Thus, allowing matched sets of size 3 should be enough to match all the subjects in each ethnic group. For illustration, let us use maximum matched set size of 4 for ethnic groups 1 and 2 and that of 3 for ethnic group 3. Let us use daisy to compute the distance matrix, which automatically chooses Gower's distance if there are one or more categorical variables. <>= library("cluster") size <- ifelse(Xdata$ethnic.group == 3, 3, 4) d <- daisy(Xdata[,c("age.group_1","gynSurgery.history","BRCA.history")]) mx <- getMatchedSets(d, CC=TRUE, NN=TRUE, ccs.var = Xdata$case.control, strata.var = Xdata$ethnic.group, size = size, fixed = TRUE) @ The return object mx contains vectors corresponding to CC and NN matching as well as corresponding summary matrices tblCC and tblNN. Summaries can be inspected to see how many matched sets of each size were created (along rows) for each of the ethnic groups (along columns). The strata vectors are then appended to the data.frame, before calling the analysis function snp.matched. <>= mx$CC[1:10] mx$tblCC Xdata <- cbind(Xdata, CCStrat = mx$CC, NNStrat = mx$NN) Xdata[1:5,] @ We will look at the interaction of BRCA.status with oral.years and n.children using formulas. <>= intVars <- ~ oral.years + n.children snpVars <- ~ BRCA.status fit <- snp.matched(Xdata, "case.control", snp.vars=snpVars, main.vars=intVars, int.vars=intVars, cc.var="CCStrat", nn.var="NNStrat") @ Compute a summary table for the fitted CLR and CCL models. <>= getSummary(fit, method = c("CLR", "CCL")) @ Compute Wald tests for the omnibus effect of BRCA.status for the HCL method. <>= getWaldTest(fit$HCL, c( "BRCA.status" , "BRCA.status:oral.years" , "BRCA.status:n.children")) @ \section*{Session Information} <>= sessionInfo() @ \end{document}