• No se han encontrado resultados

3.1 Análisis cuantitativo y de la política pública de las solicitudes de refugio en el

3.1.1 Análisis cuantitativo de las solicitudes de refugio en el Ecuador

This application reads trajectory files from any source format; however, different GPS data sources provide different data formats. Therefore, the core of our application implements a file interpreter to extract attributes from the input data files containing GPS trajectory records. In order to do that, the user must specify the format (fields/attributes) of the input data as they appear in the source files, these specifications are provided through the TDDF.

We introduce a set of data description keywords to describe the input data format. The format (fields/attributes) of the input data must be provided as they appear in the source files. The TDDF is a user-specified script containing the descriptions of the input data files, similar to a Data Description Language (DDL), assisting the parser to identify trajectory records and attributes. The TDDF scope contains both attribute declarations, and commands to be executed while parsing the data. We introduce a set of declarative keywords to the TDDF, for both attributes’ (Data Definition Keywords) and command’s (Data Control Keywords) declarations. Identifiers and spatial-temporal attributes have a special tag since they represent the core of trajectory data. The scope of the TDDF was designed based on a survey of existing spatial-temporal trajectory formats, in order to cover a wide range of trajectory datasets. Following we describe the Data Definition Keywords for attributes’ declaration, and Data Control Keywords for command’s declaration, and give some examples.

TDDF Grammar

Predefined Keywords: Predefined keywords aid the parser to identify important parameters and commands in the input data. Tables 3.1 and 3.2 introduce the list of predefined keywords and their meanings.

Default Command Values: Although necessary for the data interpreter, some commands are provided with a default parameter/value in case they are not provided by the user. Table 3.2 shows the

3.1. TRAJECTORY DATA INTEGRATION AND

REPRESENTATION 49

Keyword Type Description

ID Attribute Name Trajectory Identifier

COORDINATES Attribute Name List of Trajectory Coordinates X Attribute Name Coordinate X (or Longitude) value Y Attribute Name Coordinate Y (or Latitude) value LON Attribute Name Coordinate Longitude value LAT Attribute Name Coordinate Latitude value TIME Attribute Name Coordinate Time-Stamp INTEGER Attribute Type Integer number

DECIMAL Attribute Type Decimal number STRING Attribute Type String character

BOOLEAN Attribute Type Logic type (True/False) CHAR Attribute Type Single character

DATETIME Attribute Type Date and time (Java DateTimeFormat) DELTAINTEGER Attribute Type Integer delta compressed number DELTADECIMAL Attribute Type Decimal delta compressed number ARRAY Attribute Type Array type (List)

CARTESIAN Command Value Cartesian coordinates (x,y)

GEOGRAPHIC Command Value Geographic coordinates (longitude,latitude)

LN Command Value Line-break

LS Command Value Line-space

EOF Command Value End-of-File

SPATIAL TEMPORAL Output Format Outputs spatial-temporal attributes only SPATIAL Output Format Outputs spatial attributes only

ALL Output Format Outputs all attributes # Comment Marker Line comment symbol

Table 3.1: TDDF Data Definition Keywords.

Keyword Type Description Default Value

RECORDS DELIM Command Name Data Records Delimiter LN (Line-break) IGNORE ATTR Command Name Ignore Input Attribute –

IGNORE LINES Command Name Ignore Input File Line(s) – AUTO ID Command Name Auto generate ID attribute –

COORD SYSTEM Command Name Spatial coordinates system GEOGRAPHIC DECIMAL PREC Command Name Precision for decimal numbers 5

SAMPLE Command Name Load a sample of the dataset 1.0 (100%) OUTPUT FORMAT Command Name User-specified output format ALL

Table 3.2: TDDF Data Control Keywords.

command keywords and their respective default values. All keywords are case-sensitive.

TDDF Syntax and Semantic

For each attribute of the data record, one must provide the attributes’ NAME, TYPE and DELIMITER, separated by space or tab.

NAME: Name of the field/attribute. TYPE: Type of the field/attribute to read. DELIMITER: Field delimiter in the input file.

When providing the TDDF script, the user must declare one attribute per line in the exact order they appear in the input files. The parser will read the attributes’ value until the given field DELIMITER is reached. Attributes’ name must be unique in the TDDF.

Commands, on the other hand, are declared in the form NAME, and VALUE. NAME: Name of the field/attribute.

VALUE: The command’s input parameter/value.

The attribute keyword ID describes the identifier field of each trajectory record. Since in our research not all input datasets provide an ID for the trajectory records, the command AUTO ID to generate the records’ IDs automatically. An example of the AUTO ID command syntax is given as follows:

AUTO ID prefix

# Output the ID attribute as “String”: “prefix 1”, “prefix 2”, ...

AUTO ID 10

# Outputs the ID as attribute “Integer”, starting from the given number: 10, 11, 12, ...

Either the trajectory ID attribute field, or AUTO ID, should be provided in the input TDDF. If both are omitted, the application will use “ AUTO ID 1” by default.

The attribute keyword COORDINATES is a mandatory field, and describes the list of coordinate points of the trajectory records. The COORDINATES must be declared as an ARRAY type, followed by the description of the spatial-temporal attributes – i.e. X, Y, TIMEin CARTESIAN system, or LON, LON, TIMEin GEOGRAPHIC system – and any semantic attributes of the coordinate points, in the same order they appear in the input data files. The spatial-temporal fields X, Y,

TIME, or LON, LAT, TIME, in a COORDINATES attribute declaration are mandatory.

The command RECORDS DELIM tells the parser the final of a data record. In most GPS trajectory datasets in our research, data records are organized by either one trajectory record per file line, that is RECORDS DELIM LN, one trajectory record per file, that is RECORDS DELIM EOF, or many records per file separated by a delimiter character or word c, that is RECORDS DELIM c. The parser will read a data record until the given delimiter is found.

The command IGNORE LINES tells the parser to ignore the given lines in all input data files. For instance, the following command will ignore the lines 1 to 5 and 7 in the input data files.

IGNORE LINES [1-5,7]

The command IGNORE ATTR, on the other hand, ignores the attribute in the position of its declaration in all data records, and it is followed by the attributer’s delimiter. Both IGNORE LINES

3.1. TRAJECTORY DATA INTEGRATION AND

REPRESENTATION 51

and IGNORE ATTR commands are useful, for instance, when not all data records, file lines, or attributes from the input dataset are necessary for the user application.

The command DECIMAL PREC tells the parser the number of decimal points d to consider in decimal values, the default value is d= 5. Attributes declared as DECIMAL will be converted to a integer number in the format value ∗ 10d, and compressed using a lossless delta-compression to reduce storage space.

The command SAMPLE tells the data loader to randomly select a sample the input dataset for reading and parsing. The value for sampling must be in the range ]0.0, 1.0] which specifies the percentage of data records to read. The SAMPLE command is particularly useful for large datasets and debugging purposes.

DATETIMEvalues are declared and parsed using Java’s DateTimeFormatter2. DATETIME types must be declared as DATETIME[‘‘pattern’’], where “pattern” describes the attribute using the DateTimeFormatter format.

Array Type Syntax: Arrays (or lists) types are declared by specifying the attributes in the array, i.e. attributes’ NAME, TYPE and DELIMITER, the general syntax Array declaration is:

ARRAY( NAME TYPE DELIMITER ... )

Arrays can be single-valued or multi-valued (e.g. objects) of any of the pre-defined data types, the parser will read the parameters until the given field delimiter is reached. Attributes in the array are specified in the exact order they appear in the source file, similar to any other attribute declaration. Following are some examples of array type declaration for COORDINATES field.

Example 1: Trajectory coordinates as an array/list of spatial-temporal points, comma separated.

ARRAY( X DECIMAL , Y DECIMAL , TIME INTEGER , )

Example 2: Trajectory coordinates as an array/list of spatial-temporal points, with weight and typeattributes, one coordinate per file line, separated by semicolon.

ARRAY( X DECIMAL ; Y DECIMAL ; TIME INTEGER ; weight DECIMAL ; type STRING LN ) 2https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html

Documento similar