CAPÍTULO 2 DISEÑO DEL SISTEMA
2.7. P AQUETE D EPENDENCIA
Data Services allows you to import and export metadata for XML documents that you can use as sources or targets in jobs.
XML documents are hierarchical and the set of properties describing their structure is stored in separate format files. These format files describe the data contained in the XML document and the relationships among the data elements, the schema. The format of an XML file or message (.xml) can be specified using either a document type definition (.dtd) or XML Schema (.xsd).
Data flows can read and write data to messages or files based on a specified DTD format or XML Schema. You can use the same DTD format or XML Schema to describe multiple XML sources or targets.
Data Services uses Nested Relational Data Modeling (NRDM) to structure imported metadata from format documents, such as xsd or dtd files, into an internal schema to use for hierarchical documents.
Importing metadata from a DTD file
You can import metadata from either an existing XML file (with a reference to a DTD) or a DTD file. If you import the metadata from an XML file, Data Services automatically retrieves the DTD for that XML file.
When importing a DTD format, Data Services reads the defined elements and attributes, and ignores other parts, such as text and comments, from the file definition. This allows you to modify imported XML data and edit the datatype as needed.
Importing metadata from an XML schema
For an XML document that contains, for example, information to place a sales order, such as order header, customer, and line items, the corresponding XML schema includes the order structure and the relationship between the data as shown:
Figure 24: Importing Metadata from an XML Schema
When importing an XML Schema, Data Services reads the defined elements and attributes, and imports:
• Document structure
• Table and column names
• Datatype of each column
• Nested table and column attributes
Note: While XML Schemas make a distinction between elements and attributes, Data Services imports and converts them all to nested table and column attributes.
Nested data in XML files
Sales orders are presented using nested data. For example, the line items in a sales order are related to a single header and are represented using a nested schema.
Each row of the sales order data set contains a nested line item schema as shown:
Figure 25: Example of Nested Data 1
Using the nested data method can be more concise (no repeated information), and can scale to present a deeper level of hierarchical complexity.
To expand on the example above, columns inside a nested schema can also contain columns.
There is a unique instance of each nested schema for each row at each level of the relationship as shown:
Generalizing further with nested data, each row at each level can have any number of columns containing nested schemas.
Data Services maps nested data to a separate schema implicitly related to a single row and column of the parent schema. This mechanism is called Nested Relational Data Modeling (NRDM). NRDM provides a way to view and manipulate
hierarchical relationships within data flow sources, targets, and transforms.
In Data Services, you can see the structure of nested data in the input and output schemas of sources, targets, and transforms in data flows.
Unnesting data
Loading a data set that contains nested schemas into a relational target requires that the nested rows be unnested.
For example, a sales order may use a nested schema to define the relationship between the order header and the order line items. To load the data into relational schemas, the multilevels of data must be unnested.
Unnesting a schema produces a cross-product of the top-level schema (parent) and the nested schema (child).
Figure 27: Unnesting Data 1
You can also load different columns from different nesting levels into different schemas. For example, a sales order can be flattened so that the order number is maintained separately with each line-item and the header and line-item information are loaded into separate schemas.
Figure 28: Unnesting Data 2
Data Services allows you to unnest any number of nested schemas at any depth.
No matter how many levels are involved, the result of unnesting schemas is a cross product of the parent and child schemas.
When two or more levels of unnesting occur, the inner-most child is unnested first, then the result—the cross product of the parent and the inner-most child is then unnested from its parent, and continuing to the top-level schema.
Keep in mind that unnesting all schemas to create a cross product of all data might not produce the results you intend. For example, if an order includes multiple customer values such as ship-to and bill-to addresses, flattening a sales order by unnesting customer and line-item schemas produces rows of data that might not be useful for processing the order.
Lesson Summary
You should now be able to:
• Import data from XML documents
• Unnest data in XML documents
Unit Summary
You should now be able to:
• Crate various types of Datastores
• Crate various types of Datastores
• Define system configurations in Data Services
• Defining flat file formats as a basis for a Datastore
• Create a Data Services Excel file format
• Import data from XML documents
• Unnest data in XML documents
Test Your Knowledge
1. What is the difference between a Datastore and a repository?
2. What are the two methods in which metadata can be manipulated in Data Services objects and what does each of these do?
3. Which is not a Datastore type?
Choose the correct answer(s).
□ A Database?
□ B Application
□ C Adapter
□ D File Format
4. What is the difference between a repository and a Datastore?
5. What is the difference between a Datastore and a repository?
6. What are the two methods in which metadata can be manipulated in Data Services objects and what does each of these do?
7. Which is not a Datastore type?
Choose the correct answer(s).
□ A Database?
□ B Application
□ C Adapter
□ D File Format
8. What is the difference between a repository and a Datastore?
Answers
1. What is the difference between a Datastore and a repository?
Answer: A Datastore is a connection to a database.
2. What are the two methods in which metadata can be manipulated in Data Services objects and what does each of these do?
Answer: You can use an object's options and properties settings to manipulate Data Services objects. Options control the operation of objects.
Properties document the object.
3. Which is not a Datastore type?
Answer: D
The File Format is used to create a Datastore and is not a type.
4. What is the difference between a repository and a Datastore?
Answer: A repository is a set of tables that hold system objects, source and target metadata, and transformation rules. A Datastore is an actual connection to a database that holds data.
5. What is the difference between a Datastore and a repository?
Answer: A Datastore is a connection to a database.
6. What are the two methods in which metadata can be manipulated in Data Services objects and what does each of these do?
Answer: You can use an object's options and properties settings to manipulate Data Services objects. Options control the operation of objects.
Properties document the object.
7. Which is not a Datastore type?
Answer: D
The File Format is used to create a Datastore and is not a type.
8. What is the difference between a repository and a Datastore?
Answer: A repository is a set of tables that hold system objects, source and target metadata, and transformation rules. A Datastore is an actual connection to a database that holds data.