Variables independientes: - Identificación de variables

3.1. Hipótesis y especificación de Variables

3.1.3. Identificación de variables

3.1.3.1. Variables independientes:

and standard deviation, it is more straightforward to use the t.test() function to calculate the relevant quantities.

5.1.8 Proportion and 95% confidence interval

Example: 11.2 binom.test(sum(x), length(x))

prop.test(sum(x), length(x))

Note: The binom.test() function calculates an exact Clopper–Pearson confidence interval based on the F distribution [25] using the first argument as the number of successes and the second argument as the number of trials, while prop.test() calculates an approximate confidence interval by inverting the score test. Both allow specification of the probability under the null hypothesis. The conf.level option can be used to change the default confidence level.

5.1.9 Maximum likelihood estimation of parameters

Example: 5.7.1 See also 3.1.1 (probability density functions).

library(MASS)

fitdistr(x, "densityfunction")

Note: Options for densityfunction include beta, cauchy, chi-squared, exponential, f, gamma, geometric, log-normal, lognormal, logistic, negative binomial, normal, Poisson, t, and weibull.

5.2 Bivariate statistics

5.2.1 Epidemiologic statistics

Example: 5.7.3 sum(x==0&y==0)*sum(x==1&y==1)/(sum(x==0&y==1)*sum(x==1&y==0)) or tab1 = table(x, y) tab1[1,1]*tab1[2,2]/(tab1[1,2]*tab1[2,1]) or glm1 = glm(y ~ x, family=binomial) exp(glm1$coef[2]) or library(epitools) oddsratio.fisher(x, y) oddsratio.wald(x, y) riskratio(x, y) riskratio.wald(x, y)

Note: The epitab() function in the epitools package provides a general interface to many epidemiologic statistics, while expand.table() can be used to create individual level data from a table of counts (see generalized linear models, 7.1).

5.2.2 Test characteristics

The sensitivity of a test is defined as the probability that someone with the disease (D = 1) tests positive (T = 1), while the specificity is the probability that someone without the disease (D = 0) tests negative (T = 0). For a dichotomous screening measure, the sensitivity and specificity can be defined as P (D = 1, T = 1)/P (D = 1) and P (D = 0, T = 0)/P (D = 0), respectively (see also receiver operating characteristic curves, 8.5.7).

sens = sum(D==1&T==1)/sum(D==1) spec = sum(D==0&T==0)/sum(D==0)

Note: Sensitivity and specificity for an outcome D can be calculated for each value of a continuous measure T using the following code.

library(ROCR)

pred = prediction(T, D)

diagobj = performance(pred, "sens", "spec") spec = slot(diagobj, "y.values")[[1]] sens = slot(diagobj, "x.values")[[1]] cut = slot(diagobj, "alpha.values")[[1]] diagmat = cbind(cut, sens, spec)

head(diagmat, 10)

Note: The ROCR package facilitates the calculation of test characteristics, including sensitivity and specificity. The prediction() function takes as arguments the continuous measure and outcome. The returned object can be used to calculate quantities of interest (see help(performance) for a comprehensive list). The slot() function is used to return the desired sensitivity and specificity values for each cut score, where [[1]] denotes the first element of the returned list (see help(list) and help(Extract)).

5.2.3 Correlation

Examples: 5.7.2 and 8.7.7 pearsoncorr = cor(x, y)

spearmancorr = cor(x, y, method="spearman") kendalltau = cor(x, y, method="kendall") or

cormat = cor(cbind(x1, ..., xk))

Note: Specifying method="spearman" or method="kendall" as an option to cor() gener- ates the Spearman or Kendall correlation coefficients, respectively. A matrix of variables (created with cbind()) can be used to generate the correlation between a set of variables. The use option for cor() specifies how missing values are handled (either "all.obs", "complete.obs", or "pairwise.complete.obs"). The cor.test() function can carry out a test (or calculate the confidence interval) for a correlation.

5.2.4 Kappa (agreement)

library(irr)

kappa2(data.frame(x, y))

Note: The kappa2() function takes a dataframe (see A.4.6) as an argument. Weights can be specified as an option.

5.3. CONTINGENCY TABLES 55

5.3 Contingency tables

5.3.1 Display cross-classification table

Example: 5.7.3 Contingency tables show the group membership across categorical (grouping) variables. They are also known as cross-classification tables, cross-tabulations, and two-way tables.

library(gmodels) CrossTable(x, y) or mytab = table(y, x) addmargins(mytab) or library(mosaic)

tally(~ y + x, margins=TRUE, data=ds) or

library(prettyR) xtab(y ~ x, data=ds)

Note: The CrossTable() function in the gmodels package provides a flexible means to generate crosstabs. It supports the missing.include option to add a category for missing values, unused factor levels, as well as emulation of SPSS or SAS output, with cell, row, and/or column percentages. For the table() function, the exclude=NULL option includes categories for missing values. The addmargins() function adds (by default) the row and column totals to a table. The colSums(), colMeans() functions (and their equivalents for rows) can be used to efficiently calculate sums and means for numeric vectors. The tally() function in the mosaic package supports a modeling language for categorical tables, including a | operator to stratify by a third variable. Options for the tally() function include format= (percent, proportion, or count) and margins=. Additional options for table display are provided in the prettyR package xtab() function.

5.3.2 Displaying missing value categories in a table

It can be useful to display tables including missing values as a separate category (see 11.4.4.1). table(x1, x2, useNA="ifany")

5.3.3 Pearson chi-square statistic

Example: 5.7.3 chisq.test(x, y)

chisq.test(ymat)

Note: The chisq.test() command can accept either two class vectors or a table of counts. By default, a continuity correction is used (the option correct=FALSE turns this off). A version with more verbose output (e.g., expected cell counts) can be found in the xchisq.test() function in the mosaic package.

5.3.4 Cochran–Mantel–Haenszel test

The Cochran–Mantel–Haenszel test gives an assessment of the relationship between X1and

possible confounding effects of X3 without having to estimate parameters for them.

mantelhaen.test(x1, x2, x3)

5.3.5 Cram´er’s V

Cram´er’s V (or phi coefficient) is a measure of association for nominal variables. library(vcd)

assocstats(table(x, y))

5.3.6 Fisher’s exact test

Example: 5.7.3 fisher.test(y, x)

fisher.test(ymat)

Note: The fisher.test() command can accept either two class vectors or a table of counts (here denoted by ymat). For tables with many rows and/or columns, p-values can be computed using Monte Carlo simulation using the simulate.p.value option. The Monte Carlo p-value can be considerably less compute-intensive for large sample sizes.

5.3.7 McNemar’s test

McNemar’s test tests the null hypothesis that the proportions are equal across matched pairs, for example, when two raters assess a population.

mcnemar.test(y, x)

Note: The mcnemar.test() command can accept either two class vectors or a matrix with counts.

In document ESCUELA SUPERIOR POLITÉCNICA DE CHIMBORAZO (página 40-0)