ANTECEDENTES GENERALES
1.4. PERSPECTIVAS TEÓRICAS EN DELINCUENCIA JUVENIL
Child Parent buckled unbuckled buckled 56 8
unbuckled 2 16
We can enter these numbers into R in several ways.
Creating the table as a combination of the row (or column) vectors is done as follows:
> rbind(c(56,8),c(2,16)) # combine rows [,1] [,2]
[1,] 56 8 [2,] 2 16
> cbind(c(56,2),c(8,16)) # bind as columns [,1] [,2]
[1,] 56 8 [2,] 2 16
Combining rows (or columns) of numeric vectors results in a matrix—a rectangular collection of numbers. We can also make a matrix directly using the matrix() function. To enter in the numbers we need only specify the correct size. In this case we have two rows. The data entry would look like this:
> x = matrix(c(56,2,8,16),nrow=2) > x [,1] [,2] [1,] 56 8 [2,] 2 16
The data is filled in column by column. Set byrow=TRUE to do this row by row.
Alternately, we may enter in the data using the edit() function. This will open a spreadsheet (if available) when called on a matrix. Thus the commands
> x = matrix(1) # need to initialize x > x = edit(x) # will edit matrix with spreadsheet
will open the spreadsheet and store the answer into x when done. The 1 will be the first entry. We can edit this as needed.
Giving names to a matrix It isn’t necessary, but it is nice to give the matrix row and column names. The rownames() and colnames() functions will do so. As with the names() function, these are used in a slightly different manner. As they modify the attributes of the matrix, the functions appear on the left side of the assignment.
> colnames(x) = c("buckled","unbuckled") > x
buckled unbuckled buckled 56 8 unbuckled 2 16
The dimnames() function can set both at once and allows us to specify variable names. A list is used to specify these, as made by list(). Lists are discussed further in Chapter 4. For this usage, the variable name and values are given in name=value format. The row variable comes first, then the column.
> tmp = c("unbuckled","buckled") # less typing > dimnames(x) = list(parent=tmp,child=tmp) # uses a named list
> x
child
parent unbuckled buckled unbuckled 56 8 buckled 2 16
If the matrix is made with rbind(), then names for the row vectors can be specified in name=value format. Furthermore, column names will come from the vectors if present.
> x = c(56,8); names(x) = c("unbuckled","buckled") > y = c(2,16)
> rbind(unbuckled=x,buckled=y) # names rows, columns come from x
unbuckled buckled
unbuckled 56 8 buckled 2 16
3.1.2Making two-way tables from unsummarized data
With unsummarized data, two-way tables are made with the table() function, just as in the univariate case. If the two data vectors are x and y, then the command table(x, y) will create the table.
■ Example 3.1: Is past performance an indicator of future performance?
A common belief is that an A student in one class will be an A student in the next. Is this so? The data set grades (UsingR) contains the grades students received in a math class and their grades in a previous math class.
> library (UsingR) # once per session > grades prev grade 1 B+ B+ 2 A− A− 3 B+ A− …
122 B B > attach(grades)
> table(prev, grade) # also table (grades) works grade prev A A− B+ B B− C+ C D F A 15 3 1 4 0 0 3 2 0 A− 3 1 1 0 0 0 0 0 0 B+ 0 2 2 1 2 0 0 1 1 B 0 1 1 4 3 1 3 0 2 B− 0 1 0 2 0 0 1 0 0 C+ 1 1 0 0 0 0 1 0 0 C 1 0 0 1 1 3 5 9 7 D 0 0 0 1 a a 4 3 1 F 1 0 0 1 1 1 3 4 11
A quick glance at the table indicates that the current grade relates quite a bit to the previous grade. Of those students whose previous grade was an A, fifteen got an A in the next class; only three of the students whose previous grade was a B or worse received an A in the next class.
3.1.3Marginal distributions of two-way tables
A two-way table involves two variables. The distribution of each variable separately is called the marginal distribution. The marginal distributions can be found from the table by summing down the rows or columns. The sum() function won’t work, as it will add all the values. Rather, we need to apply the sum() function to just the rows or just the columns. This is done with the function apply(). The command apply(x, 1, sum) will sum the rows, and apply (x, 2, sum) will sum the columns. The margin. table() function conveniently implements this. Just remember that 1 is for rows and 2 is for columns.
For the seat-belt data stored in x we have:
> X
child
parent unbuckled buckled unbuckled 56 8 buckled 2 16
> margin.table(x,1) # row sum is for parents
[1] 64 18
> margin.table(x,2) # column sum for kids [1] 58 24
The two marginal distributions are similar: the majority in each case wore seat belts. Alternatively, the function addmargins () will return the marginal distributions by extending the table. For example:
> addmargins(x) child
unbuckled 56 8 64 buckled 2 16 18 Sum 58 24 82
Looking at the marginal distributions of the grade data also shows two similar distributions:
> margin.table(table(prev,grade),1) # previous. Also table(prev) prev A A− B+ B B− C+ C D F 28 5 9 15 4 3 27 9 22 > margin.table(table(prev,grade),2) # current grade A A− B+ B B− C+ C D F 21 9 5 14 7 5 20 19 22
The grade distributions, surprisingly, are somewhat “well-shaped.”
3.1.4Conditional distributions of two-way tables
We may be interested in comparing the various rows of a two-way table. For example, is there a difference in the grade a student gets if her previous grade is a B or a C? Or does the fact that a parent wears a seat belt affect the chance a child does? These questions are answered by comparing the rows or columns in a two-way table. It is usually much easier to compare proportions or percentages and not the absolute counts.
For example, to answer the question of whether a parent wearing a seat belt changes the chance a child does, we might want to consider Table 3.2.