Wednesday, December 18, 2013

Module 8: Date, Key Generate and data Transformer

Module Objectives
After completing this unit , you should be able to understand about:
–Date Generation transformer
–Key Generation transformer
–Data Transformer
DATE_GENERATION TRANSFORM
• Produces a series of dates incremented as you specify
• Use this transform to produce the key values for a time dimension target.
• From this generated sequence you can populate other fields in the time dimension (such as day_of_week) using functions in a query.
• Data Outputs: A data set with a single column named DI_GENERATED_DATE containing the date sequence.
• The rows generated are flagged as INSERT.
• Generated dates can range from 1900.01.01 through 9999.12.31
OPTIONS
• Start dateThe first date in the output sequence. Specify this date using the following format: YYYY.MM.DD where YYYY a year, MM is a month value, and DD is a day value.
• End dateThe last date in the output sequence. Use the same format used for Start dateto specify this date.
• IncrementThe interval between dates in the output sequence. Select Daily, Monthly, or Weekly.
• Join rank
A positive integer indicating the weight of the output data set if the data set is used in a join. Sources in the join are accessed in order based on their join ranks. The highest ranked source is accessed first to construct the join.
• Cache
Select this check box to hold the output from the transform in memory to be used in subsequent transforms. Select Cacheonly if the resulting data set is small enough to fit in memory.

KEY GENERATION
•Generates new keys for new rows in a dataset.
•When it is necessary to generate artificial keys in a table, the Key_Generation transform looks up the maximum existing key value from a table and uses it as the starting value to generate new keys.
•The transform expects the generated key column to be part of the input schema.
OPTIONS
• Table Name
–The fully qualified name of the source table from which the maximum existing key is determined. Should already be imported.
• Generated key column
–The column in the key source table containing the
existing keys values. A column with the same name
must exist in the input data set;
• Increment value
–The interval between generated key values.



DATA TRANSFER TRANSFORM
•Data Integrator generates SQL SELECT statements to retrieve the data from source data bases.
•Data Integrator automatically distributes the processing work load by pushing down as much as possible to the source data base server.
•Pushing down operations provides the following advantages:
–Use the power of the data base server to execute SELECT operations (such as joins, GroupBy, and common functions such as decode and string functions).Often the data base is optimized for these operations.
–Minimize the amount of data sent over the network. Less rows can be retrieved when the SQL statements include filters or aggregations.
•You can also do a full push down from the source to the target, which means Data Integrator sends SQL INSERT INTO...SELECT statements to the target data base.
•Data Transfer transforms Writes the data from a source or the output from another transform into a transfer object and subsequently reads data from the transfer object.
•The transfer type can be a relational data base table, persistent cache table, or file.
•Use the Data_Transfer transform to push down operations to the data base server when the transfer type is a data base table.
•You can push down resource- consuming operations such as joins, GROUPBY, and sorts.

EXAMPLE
•This simple data flow contains a Query transform that does a lookup of sales sub totals and groups the results by country and region.

Suppose the GROUPBY operation processes millions of rows.
•Data Integrator cannot push the GROUPBY operation down to the data base because the Query before it contains a lookup_ext function which Data Integrator cannot push down.
•You can add a Data_Transfertrans form to split the lookup_ext function and the GROUPBY operation into two sub data flows to enable Data
•Integrator to push the GROUPBY to the target data base.
•When you execute the job, Data Integrator displays messages for each sub data flow. Also watch table getting created & dropped during job run.






BUILT IN TRANSFORMS & OPERATION CODES
•Transforms manipulate data inputs and produce one or more output datasets.
•Operation codes maintain the status of each row in each data set described by inputs to and outputs from objects in data flows.

OPERATION CODES
•Normal Creates a New row in the target. All rows in a data set are flagged as normal when they are extracted from source table. Most of the transforms operate only on rows flagged as NORMAL
•Insert Rows can be flagged as INSERT by the Table_Comparison transforms to indicate that a change occurred in a data set as compared with an earlier image of the same data set. The Map_Operation transform can also produce rows flagged as INSERT. Only History_Preserving and Key_Generation transforms can accept data sets with rows flagged as INSERT as input.
•Delete is ignored by the target. Rows flagged as DELETE are not loaded. Rows can be flagged as DELETE in the Map_Operation and Table Comparison transforms. Only the History_Preserving, transform with the Preserve delete row(s) as update row(s) option selected, can accept data sets with rows flagged as DELETE.

•Update Rows can be flagged as UPDATE by the Table_Comparison transform to indicate that a change occurred in a data set as compared with an earlier image of the same dataset. Map_Operation transform can also produce rows flagged as UPDATE. Only History_Preserving and Key_Generation transforms can accept data sets with rows flagged as UPDATE as inputs



Print this post

No comments: