• No se han encontrado resultados

Identifies, parses, validates, and formats the following data: address, person name, organization name, occupational title, phone number, and email address.

Note

This topic applies to the Application Function Modeler tool in Hana Studio only.

Address reference data comes in the form of country-specific directories. For information about downloading and deploying directories, see “Smart Data Quality Directories” in the Administration Guide for SAP HANA Smart Data Integration and SAP HANA Smart Data Quality.

Only one input source is allowed.

Note

General Properties

Table 7: General options

Option Description

Name The name for the output target. This can be named the same as the input source.

Display name The name shown in the Palette pane.

Note

This option can only be changed when creating a template. It cannot be changed when us­ ing the node outside of a template.

Description (Optional.) Provides a comment about the operation. For example, "Cleanse customer data."

Input Fields tab

Use the Input fields tab to select and map your input data. Your input data might already be mapped to the output fields. You can check on the Input tab along the left side of the screen. If the fields are not mapped, or if you want to change the mapping, you can use the Input fields tab on the General properties panel to do so.

A list of the most common Cleanse input fields are listed in categories. Click Show Additional Fields to add more fields to the list. In the Address and Person categories, you can change the type of format based on how the data is contained in the fields based on whether the data is for addresses or people.

Format of input data Description

Composite Address: Use fields from this group when the input address data consists of fields with ad­ dress, city, region, and postal code data completely in free form. For example, the address data may reside in three fields that contain the various address elements fielded inconsis­ tently from one record to another. The order of mapping free-form fields is significant. See the description for mapping to the Free Form fields in the Cleanse Input Columns [page 53]

topic.

Person: Use the Person field from this group when the input data has a single field for person data. For example, the name John Louis Maxwell is in one Name field.

Discrete Address: Use fields from this group only when the input address data consists of fields from the SAP Business Suite data model. If your schema is similar to that of the SAP Business Suite, but not exactly, then you should use fields from the Hybrid group instead.

Person: Use fields from this group when the input data consists of two or more fields for per­ son data.

Hybrid Address: Use fields from this group when the input address data consists of one or more free- form fields for the street portion of the address, and discrete fields for city, region, and postal code. The order of mapping free-form fields is significant. See the description for mapping to the Free form fields in the Cleanse Input Columns [page 53] topic.

Person: Use fields from this group when the input data consists of one or more free-form fields and also has some additional information in one or more fields for the name data. For example, the column for First Name might contain only the first name for a person, such as John. The Last Name field might contain the last name with an honorary postname (such as Ph. D) or a maturity postname (such as Jr).

For all input fields, click in the Mapping column to select the input data that should be mapped to this field. If you have an input source connected to the Cleanse node, you will see the list of input fields in the Mapping list. See

Cleanse Input Fields [page 53]. Output Fields tab

The Output Fields tab in the General properties panel lists all of the available output fields for the Cleanse node. The Cleanse node can enrich your data when you select additional output fields. For example, it can include address assignment levels by changing the option in the Enabled column to True. See Cleanse Output Fields [page 55].

Settings tab

Use the Settings tab in the General properties panel to select your formatting preferences.

Table 8: Email

Option Description

Casing Specifies the casing format.

Upper: Data is output in all capital letters. For example, [email protected]. Lower: Data is output in all lowercase letters. For example, [email protected] Table 9: Phone

Option Description

N.A. Phone Format Specifies the format for North American phone numbers.

Parens: Separates the area code with parenthesis, and with one hyphen. For example, (800) 123-4567.

Periods: Separates all sections with periods. For example, 800.123.4567. Hyphens: Separates all sections with hyphens. For example, 800-123-4567. Table 10: Firm, Title, Person, and Person or Firm

Option Description

Diacritics Specifies whether to retain diacritical characters on output.

Include: Retains the diacritical characters. For example, Hernández or Telecomunicações São Paulo.

Remove: Replaces diacritical characters such as accent marks, umlauts, and so on with the ASCII equivalent. For example, Hernandez or Telecomunicacoes Sao Paulo.

Casing Specifies the casing format.

Mixed: Data is output in mixed case. For example, MacArthur Inc. Upper: Data is output in upper case. For example, MACARTHUR INC.

Option Description

Cleanse Domain When a country field is input to the Cleanse node, then the person, title, firm, and person-or- firm data is cleansed according to linguistic norms in the input country. Use this setting to se­ lect which language/region domain you want to use by default when cleansing data for re­ cords that have a blank country, or for all records when a country field is not available. If all input data is from one region, then select one domain. For example, for data in the United States and Canada, select EN_US | GLOBAL. If your data spans multiple linguistic regions, then select multiple domains, ordering them beginning with the domain that is most prevalent in your data. For example, for data in DACH (Germany, Austria, Switzerland), select DE | FR | IT | GLOBAL.

Select the domains you want to include.

● GLOBAL - Global (Required as the last domain listed.) ● AR - Arabic

● ZH - Chinese ● CS - Czech ● DA - Danish ● NL - Dutch

● EN_US - English (United States & Canada) ● EN_GB - English (United Kingdom & Ireland) ● EN_AU - English (Australia & New Zealand) ● EN_IN - English (India)

● FR - French ● DE - German ● HU - Hungarian ● ID - Indonesian ● IT - Italian ● JA - Japanese ● MS - Malay ● NO - Norwegian ● PL - Polish ● PT_BR - Portuguese (Brazil) ● PT_PT - Portuguese (Portugal) ● RO - Romanian ● RU - Russian ● SK - Slovak

● ES_MX -Spanish (Latin America) ● ES_ES - Spanish (Spain) ● SV - Swedish

● TR - Turkish ● ZH - Chinese

Option Description

Output Format When a country field is input to the Cleanse node, then the person, title, firm, and person-or- firm data is output according to cultural norms in the input country. Use this setting to select the cultural domain you want to use by default when cleansing data for records that have a blank country, or for all records when a country field is not available.

For example, when selecting one of the English domains, if you output person name data to discrete fields, the first name is output to First Name, the middle name to Middle Name, and the full last name to Last Name (nothing is output to Last Name 2), and if you output to the composite Person field, the name is ordered as first name - middle name - last name - matur­ ity postname - honorary postname with a space between each word. When selecting one of the Spanish domains, the output format is a little different. If you output to discrete fields, it outputs the paternal last name to Last Name and the maternal last name to Last Name 2. When selecting the Chinese domain, if you output to discrete fields, it outputs the given name to First Name and the family name to Last Name (nothing is output to Middle Name or Last Name 2). If you output to the composite Person field, the name is ordered as last name - first name without any spaces between the words.

The valid values are the same as Cleanse Domain, but you may only select one domain, and Global is not an option.

Table 11: Address

Option Description

Country

Identification Mode

Specifies what to do for addresses that are input without a country. This may be the result of the country field not being populated for all addresses, or because all addresses are from the same country and there is no country field because the country is assumed.

Assign: The Cleanse node attempts to determine the country by looking at the rest of the ad­ dress data. Select this option when there is a country field. This option also improves perform­ ance if the operation cache is used.

Constant: The Cleanse node does not attempt to determine the country. Instead, it uses the country provided in the Default Country setting. Because selecting this option results in per­ formance degradation, it is recommended that you attempt to assign country data so that the country name or country code for those addresses are input before the cleansing process.

Default Country When the Country Identification Mode is set to Assign, then the country selected in the Default Country is used for addresses that the Cleanse node can't determine the country. In this sce­ nario, it is considered a best practice to select NONE, unless you are certain all addresses with a blank country are from a single country. Selecting NONE also improves performance if the operation cache is used. When the Country Identification Mode is set to Constant, then the country selected in Default country is used for all addresses.

Diacritics Specifies whether to retain diacritical characters on output.

Include: Retains the diacritical characters. For example, Münchner Str 100.

Remove: Replaces diacritical characters with the ASCII equivalent. For example, Muenchner Str 100.

Casing Specifies the casing format.

Mixed: Data is output in mixed case. For example, Main Street South. Upper: Data is output in upper case. For example, MAIN STREET SOUTH.

Option Description

Street Formatting Specifies how to format the street data.

Abbr No Punctuation: Uses a shortened form of common address types (street types, direc­ tionals, and secondary designators) without punctuation. For example, 100 N Main St Ste 201. Abbr With Punctuation: Uses a shortened form of common address types with punctuation. For example, 100 N. Main St. Ste. 201.

Expand: Uses the full form of common address types. For example, 100 North Main Street Suite 201.

Expand Primary Secondary No Punctuation: Uses the full form of street type and directional, but abbreviates the secondary designator without punctuation. For example, 100 North Main Street Ste 201.

Expand Primary Secondary With Punctuation: Uses the full form of street type and directional, but abbreviates the secondary designator with punctuation. For example, 100 North Main Street Ste. 201.

Country Common: Uses the most common format of the country where the address is lo­ cated.

Region Formatting Specifies how to format the region name (for example, state or province). Abbreviate: Uses the abbreviated form of the region. For example, NY or ON.

Note

In some countries it is not acceptable to abbreviate region names. In those cases, the cleansed region is fully spelled out, even when you set the option to abbreviate. Expand: Uses the full form of the region. For example, New York or Ontario

Country Common: Uses the most common format of the country where the address is lo­ cated.

Postal Formatting Specifies how to format postal box addresses.

Note

In some countries it is not acceptable to fully spell out the form of the postal address. In other countries, it is not acceptable to include periods in the abbreviated form. In these cases, the cleansed addresses meet the country-specific requirements, even when you se­ lect a different option.

Abbr No Punctuation: Uses a shortened form of the postal address without punctuation. For example, PO Box 1209.

Abbr With Punctuation: Uses a shortened form of the postal address with punctuation. For ex­ ample, P.O. Box 1209.

Expand: Uses the full form of the postal address. For example, Post Office Box 1209. Country Common: Uses the most common format of the country where the address is lo­ cated.

Mappings

The mappings tab shows how the input column names are mapped to output column names. If you have a large table, you can use Filter pattern to search for specific columns. See "Using the Mapping Editor" topic in the SAP HANA Developer Guide.

Input data

Select the input data General tab by clicking Input_<n>.

Table 12: General

Option Description

Name The name of the input source. You can rename this source.

Kind Identifies the type of input source, For example, table, column, scalar. Table 13: Signature

Option Description

Name The column name in the output source. This can be named the same as the output from the previous node.

Type The type of data contained in the column, for example, Nvarchar, Decimal, Date, and so on.

Length The number of characters allowed in the column.

Scale The number of digits to the right of the decimal point. This is used when the data type is a deci­ mal.

Nullable Indicates whether the column can be null.

Use the Add, Remove, Up and Down buttons to edit the input fields accordingly.

Table 14: Fixed Content

Option Description

Fixed Content Enable to have the input table of the node saved with the flowgraph file. Otherwise, it is placed in a separate table connected to the node. For more information, see the SAP HANA Developer

Guide topic "Flowgraphs".

Output data

One data target is allowed.

Table 15: General

Option Description

Name The name of the output target. You can rename this target.

Kind Identifies the type of output target. Table 16: Signature

Option Description

Name The column name in the input source. This can be named the same as the output from the previous node.

Type The type of data contained in the column, for example, Nvarchar, Decimal, Date, and so on.

Length The number of characters allowed in the column.

Scale The number of digits to the right of the decimal point. This is used when the data type is a deci­ mal.

Nullable Indicates whether the column can be null.

Use the Add, Remove, Up and Down buttons to edit the input fields accordingly.

Annotations

Create comments for users. For example, you might want to make a note of some particular settings in this flowgraph so that the administrator can schedule or understand certain customizations. The annotations are written to a table. See the "Application Function Modeler" section of the SAP HANA Developer Guide.

All

Shows all of the options in one screen. It includes, General, Mappings, and Annotations.

Related Information

Cleanse Input Columns [page 53] Cleanse Output Columns [page 55]

5.5.1 Cleanse Configuration in Web-based Development

Workbench

Identifies, parses, validates, and formats the following data: address, person name, organization name, occupational title, phone number, and email address.

Note

This topic applies to the SAP HANA Web-based Development Workbench only.

Address reference data comes in the form of country-specific directories. For information about downloading and deploying directories, see “Smart Data Quality Directories” in the Administration Guide for SAP HANA Smart Data Integration and SAP HANA Smart Data Quality.

Only one input source is allowed.

Note

Prior to configuring the Cleanse node, be sure that you have been assigned the proper permissions. See the Administration Guide for SAP HANA Smart Data Integration and SAP HANA Smart Data Quality for more information.

Note

The Cleanse node is available for real-time processing.

To configure the Cleanse node

1. Click the Cleanse node, place it on the canvas, and connect the source data or the previous node. The Cleanse Configuration window appears.

2. Select any additional columns to output. The default columns are automatically mapped based on the input data.

3. (Optional) Add or remove entire input categories by selecting or de-selecting the checkbox next to the component name, such as Person, Firm and Address.

Note

The categories shown are based on the input data. You will only see those categories if your data contains that type of information. For example, if your input data does not contain email data, then the email component is not shown.

4. (Optional) To add or remove specific columns, click the pencil icon next to the category name. For example, if you want to remove Address2 and Address3 from the Address category, de-select those columns in the Edit Component window, and then click OK.

5. (Optional) To edit the content types, click Edit Defaults Edit Content Types . Review the column names and content types making changes as necessary by clicking the down arrow next to the content type and selecting a different content type. Click Apply.

6. (Optional) To change the format and settings for this flowgraph, click Edit Defaults Edit Settings . For more information about the options available on the Cleanse Settings window, see Change Default Cleanse Settings [page 42]

7. Click Next.

8. Based on the input data provided, information is shown about the output columns. Click the right and left arrows to make any final formatting changes and additions to the output columns. You can also make these changes on the Cleanse Settings window.

Tip

To view the fields that will be output, place the cursor over the number in the blue dot on the Cleanse Configuration window.

9. Review the suggested actions. To implement the suggested action, click Apply and then OK for each action you want included.

10. (Optional) To include additional output fields such as address assignment levels or information codes, click Customize Manually. For each category, select the type of additional information that you want to add. Click the checkbox next to each output field, then click Apply.

11. Click Finish.

Related Information

Change Default Cleanse Settings [page 42] Cleanse Output Fields [page 55]

5.5.1.1

Change Default Cleanse Settings

Set the cleanse preferences.

Context

The Cleanse settings are used as a template for all future projects using the Cleanse node. These settings can be overridden for each project.

Procedure

1. To open the Default Cleanse Settings window, click Edit Defaults Edit Settings . 2. Select the component, and set the preferred options.

Option Description Component

Casing Specifies the casing format.

Mixed case: Converts data to initial capitals. For example, if the in­ put data is JOHN MCKAY, then it is output as John McKay.