Thursday, December 19, 2013

Module 9: More Transformations


Module Objectives
After completing this unit , you should be able to understand about:
–Map Operation transformer
–Table Comparison transformer
–History Preserving transformer
–Auditing transformer
–Handling Slowly Changing Dimensions

MAP_OPERATION
• Allows conversions between data manipulation operations.
• The Map_Operation transform allows you to change operation codes on data sets to produce the desired output.

TABLE_COMPARISON
• Compares two data sets and produces the difference between them as a data set with rows flagged as INSERT, UPDATE, or DELETE.
• The Table_Comparison transform allows you to detect and forward changes that have occurred since the last time a target was updated.

DATA INPUTS
• The data set from a source or the output from another transform. Only rows flagged as NORMAL are considered by the transform. This data is referred to as the input data set.
• The specification for a database table to compare to the input data set. This table is referred as the comparison table.

COMPARISON METHOD
Row-by-row select
—Select this option to have the transform look up the target table using SQL every time it receives an input row. This option is best if the target table is large compared to the number of rows the transform will receive as input.
Cached comparison table
—Select this option to load the comparison table into memory. In this case, queries to the comparison table access memory rather than the actual table. This option is best when you are comparing the entire target table. Data Integrator uses page able cache as the default. If the table fits in the available memory, you can change the cache type to in-memory in the data flow Properties.
Sorted input
—Select this option to read the comparison table in the order of the primary key column(s) using sequential read. This option improves performance because Data Integrator reads the comparison table only once.


DATA INPUTS
•Input primary key column(s)
–The input data set columns that uniquely identify each row. These columns must be present in the comparison table with the same column names and data types.
•Compare columns
–(Optional) Improves performance by comparing only the sub-set of columns you drag into this box from the input schema. If no columns are listed, all columns in the input data set that are also in the comparison table are used as compare columns.
Generated Key Column
–For an UPDATE, the output data set will contain the largest generated key found for a given primary key.
Detect deleted row(s) from comparison table
–(Optional) Generates DELETEs for all rows that are in the comparison table and not in the input set . Assumes the input set represents the complete data set. By default this option is turned off.

DATA OUTPUTS
• A data set containing rows flagged as INSERT, UPDATE or DELETE.
• This data set contains only the rows that make up the difference between the two input sources: one from the input to the transform (input data set), and one from a database table you specify in the transform (the comparison table).





HISTORY_PRESERVING
• The History_Preserving transform allows you to produce a new row in your target rather than updating an existing row.
• You can indicate in which columns the transform identifies changes to be preserved.
• If the value of certain columns change, this transform creates a new row for each row flagged as UPDATE in the input data set.

DATA INPUTS
• A data set that is the result of a comparison between two images of the same data in which changed data from the newer image are flagged as UPDATE rows and new data from the newer image are flagged as INSERT rows.



SCD-2
•Date Based
•Flag Based

FLAG BASED DIMENSION
• Compare columns
–Rows flagged as Insert should be inserted with current flag as ‘A’
–Input might have been flagged as update because of phone no
change. This row should be updated.
Current flag does not change
–If input is flagged as update because of state change, this row
should be inserted with current flag as ‘A’ & existing row should
be updated with current flag as ‘I’

AUDITING
• Auditing provides a way to ensure that a data flow loads correct data into the
warehouse. Use auditing to perform the following tasks:
• Define audit points to collect run time statistics about the data that flows out of
objects.
• Define rules with these audit statistics to ensure that the data at the following
points in a data flow is what you expect:
–Extracted from sources
–Processed by transforms
–Loaded into targets
• Generate a run time notification that includes the audit rule that failed.
AUDIT CONFIGURATION
Audit label—The unique name in the data flow that Data Integrator generates for the audit statistics collected for each audit function that you define. You use these labels to define audit rules for the data flow.
Audit rule—A Boolean expression in which you use auditlabe is to verify the Data Integrator job. If you define multiple rules in a data flow, all rules must succeed or the audit fails.
Actions on audit failure—One or more of three ways to generate notification of an audit rule (or rules) failure: email, custom script, raise exception.








Print this post

No comments: