• No se han encontrado resultados

Introducción al Discipulado

In document Discipulado Bíblico: Guía de Estudio (página 19-35)

Since R is a free software project written by volunteers, there are no paid support options available directly from the R Foundation. However, extensive resources are available to help users.

In addition to the manuals, FAQs, newsletter, wiki, task views, and books listed on the www.r-project.org Web page, there are a number of mailing lists that exist to help answer questions. Because of the volume of postings, it is important to carefully read the posting guide athttp://www.r-project.

org/posting-guide.htmlprior to submitting a question. These guidelines are intended to help leverage the value of the list, to avoid embarrassment, and to optimize the allocation of limited resources to technical issues.

As in any general purpose statistical software package, bugs exist. More information about the process of determining whether and how to report a problem can be found in the R FAQ as well as via the information available using help(bug.report).

Chapter 2

Data management

This chapter reviews basic data management, beginning with accessing external datasets, such as those stored in spreadsheets, ASCII files, or foreign formats.

Important tasks such as creating datasets and manipulating variables are dis-cussed in detail. In addition, key mathematical, statistical, and probability functions are introduced.

2.1 Input

In this section we address aspects of data input. Data are organized in dataframes (1.5.6), or connected series of rectangular arrays, which can be saved as plat-form independent objects.

2.1.1 Native dataset

Example: See 5.7 load(file="dir_location\\savedfile") # Windows only

load(file="dir_location/savedfile") # other OS

Forward slash is supported as a directory delimiter on all operating systems;

a double backslash is also supported under Windows. The file savedfile is created by save() (see 2.2.1).

25

2.1.2 Fixed format text files

See also 2.1.4 (read more complex fixed files) and 7.4.1 (read variable format files)

# Windows only

ds = read.table("dir_location\\file.txt", header=TRUE)

# all OS (including Windows)

ds = read.table("dir_location/file.txt", header=TRUE)

Forward slash is supported as a directory delimiter on all operating systems;

a double backslash is also supported under Windows. If the first row of the file includes the name of the variables, these entries will be used to create appropriate names (reserved characters such as ‘$’ or ‘[’ are changed to ’.’) for each of the columns in the dataset. If the first row does not include the names, the header option can be left off (or set to FALSE), and the variables will be called V1, V2, . . . Vn. The read.table() function can support reading from a URL as a filename (see also 2.1.7). Files can be browsed interactively using file.choose() (see 2.7.7).

2.1.3 Other fixed files

See also 2.1.4 (read more complex fixed files) and 7.4.1 (read variable format files)

ds = readLines("file.txt") or

ds = scan("file.txt")

The readLines() function returns a character vector with length equal to the number of lines read (see also file()). A limit on the number of lines to be read can be specified through the nrows option. The scan() function returns a vector.

2.1.4 Reading more complex text files

See also 2.1.2 (read fixed files) and 7.4.1 (read variable format files).

Text data files often contain data in special formats. One common example is date variables. As an example below we consider the following data.

1 AGKE 08/03/1999 $10.49 2 SBKE 12/18/2002 $11.00 3 SEKK 10/23/1995 $5.00

2.1. INPUT 27

tmpds = read.table("file_location/filename.dat") id = tmpds$V1

initials = tmpds$V2

datevar = as.Date(as.character(tmpds$V3), "%m/%d/%Y") cost = as.numeric(substring(tmpds$V4, 2))

ds = data.frame(id, initials, datevar, cost) rm(tmpds, id, initials, datevar, cost)

This task is accomplished by first reading the dataset (with default names from read.table() denoted V1 through V4). These objects can be manipulated us-ing as.character() to undo the default codus-ing as factor variables, and coerced to the appropriate data types. For the cost variable, the dollar signs are re-moved using the substring() function (Section 2.4.4). Finally, the individual variables are gathered together as a dataframe.

2.1.5 Comma-separated value (CSV) files

Example: See 2.13.1 Comma-separated value (CSV) files are a common data interchange format that are straightforward to read and write.

ds = read.csv("dir_location/file.csv")

A limit on the number of lines to be read can be specified through the nrows option. The command read.csv(file.choose()) can be used to browse files interactively (see Section 2.7.7). The comma-separated file can be given as a URL (see 2.1.7).

2.1.6 Reading datasets in other formats

Example: See 6.6 library(foreign)

ds = read.dbf("filename.dbf") # DBase ds = read.epiinfo("filename.epiinfo") # Epi Info

ds = read.mtp("filename.mtp") # Minitab worksheet ds = read.octave("filename.octave") # Octave

ds = read.ssd("filename.ssd") # SAS version 6 ds = read.xport("filename.xport") # SAS XPORT file ds = read.spss("filename.sav") # SPSS

ds = read.dta("filename.dta") # Stata ds = read.systat("filename.sys") # Systat

The foreign library can read Stata, Epi Info, Minitab, Octave, SAS version 6, SAS Xport, SPSS, and Systat files (with the caveat that SAS version 6 files may be platform dependent). The read.ssd() function will only work if SAS

is installed on the local machine (as it needs to run SAS in order to read the dataset).

2.1.7 URL

Example: See 3.6.1 Data can be read from the Web by specifying a uniform resource locator (URL).

Many of the data input functions also support accessing data in this manner (see also 2.1.3).

urlhandle = url("http://www.math.smith.edu/r/testdata") ds = readLines(urlhandle)

or

ds = read.table("http://www.math.smith.edu/r/testdata") or

ds = read.csv("http://www.math.smith.edu/r/file.csv")

The readLines() function reads arbitrary text, while read.table() can be used to read a file with cases corresponding to lines and variables to fields in the file (the header option sets variable names to entries in the first line).

The read.csv() function can be used to read comma-separated values. Access through proxy servers as well as specification of username and passwords is provided by the function download.file(). A limit on the number of lines to be read can be specified through the nrows option.

2.1.8 XML (extensible markup language)

A sample (flat) XML form of the HELP dataset can be found athttp://www.

math.smith.edu/r/data/help.xml. The first 10 lines of the file consist of:

<?xml version="1.0" encoding="iso-8859-1" ?>

<TABLE>

<HELP>

<id> 1 </id>

<e2b1 Missing="." />

<g1b1> 0 </g1b1>

<i11 Missing="." />

<pcs1> 54.2258263 </pcs1>

<mcs1> 52.2347984 </mcs1>

<cesd1> 7 </cesd1>

Here we consider reading simple files of this form. While support is available for reading more complex types of XML files, these typically require considerable additional sophistication.

In document Discipulado Bíblico: Guía de Estudio (página 19-35)