Tuesday, December 17, 2013

Module 7: Built-in transformers

Module Objectives
After completing this unit , you should be able to understand about:
–Query transformer
–Various options of Query transformer
–Joining and aggregation through query transformer
–Built-in functions used in Query transformer
–Case Transformer
–Merge Transformer
QUERY TRANSFORM
The Query transform is by far the most commonly used transform. It retrieves a data set that satisfies conditions that you specify

The Query transform can perform the following operations:
To choose (filter) the data to extract from sources
Join data from multiple sources
Map columns from input to output schemas
Perform transformations and functions on the data
Perform data nesting and un-nesting
Add new columns, nested schemas, and function results to the output schema
Assign primary keys to output columns
• Query transform can also be used for following.
–Performing calculations
–Joining sources
–Aggregation




Built-in Functions used in Query Transformer
FUNCTIONS
•Define built in functions.
•Differentiate between functions and transforms.
•List types of operations available for functions
•Using functions in expressions.
•Use date and time functions and date_generation
transform to build a dimension table.
•Use lookup functions to look up status in a table.
•Understand other functions.
BUILT IN FUNCTIONS
•Functions take input values and produce a return value.
Functions also operate on individual values passed to them
•Input values can be parameters passed into data low, values
from column of data or variables defined inside a script.
You can use functions in expressions that include script and
Conditional statements.
DIFFERENTIATING BETWEEN FUNCTIONS AND TRANSFORMS
•Functions operate on single values, such as values in
specific columns in data set.
•Transforms operate on data set , creating, updating &
Deleting rows of data.
TYPES OF OPERATIONS FOR FUNCTIONS
Operative Type Description
Aggregate Generates à a single value from a set of values. Aggregate functions , such as max & min & count.
Iterative Maintains state information from one invocation to another. The life of
iterative function’s state information is the execution life of the query in which they  are included. E.g. Lookup.
Stateless State information is not maintained from one invocation to the next.
Stateless functions such as to_char or month can be used anywhere expressions
are allowed.

OTHER TYPES OF FUNCTIONS
•Database & application functions –
Functions specific to you DBMS.
•Custom functions-Functions you define yourself.
USING FUNCTIONS IN EXPRESSIONS
•Functions are typically used to add:
–Columns based on some other value (lookup function)
–Generated key fields
•You can use functions in:
–Transforms : for e.g Query , Case and SQL transforms.
–Scripts: these are single-use objects used to call functions and
assign values to variables to work flow.
–Conditionals: these are single-use objects used to implement
if/then/else logic in work flow. Conditionals and their
components-if expressions , then and else diagrams-are
included in the scope of parent control flow’s variables and
parameters.
–Other custom functions
CUSTOM FUNCTIONS
–Written by the user in Data Integrator scripting language
–Reusable objects
–Managed through the function wizard
Consider these guidelines when you create your own functions:
• Functions can call other functions.
• Functions cannot call themselves.
• Functions cannot participate in a cycle of recursive calls. For example, function A cannot call function B, which calls function A.
• Functions return a value.
• Functions can have parameters for input, output, or both. However, data
flows cannot pass parameters of type output or input/output.
Before creating a custom function, you must know the input, output, and
return values and their data types. The return value is predefined to be return.
USING BUILT IN FUNCTIONS
• The built in functions for date and time and built-in
date_generation transform are useful when building a
time dimension table.
• to_char
–To convert date to string.
• to_date
–To convert a string to a date.
• Month
–To determine the month in which the given date fails.
• Quarter
–To determine the quarter in which given date fails.
USE LOOKUP FUNCTIONS TO LOOK UP STATUS IN ATABLE
•A specialized type of join, similar to an SQL outer join
–A SQL outer join may return multiple matches for a single record in the
outer table.
–Lookup functions always return exactly the same no. of records that are
in the source (outer) table.
–Sophisticated caching options
–A default value when no match is found.


LOOKUP_EXT()
• While all lookup functions return one row for each row in the source they
differ by how they choose which of several matching rows to return:
–Lookup_ext()
•Allows specification of an Order by column and Return policy(Min,Max) to
return the record with highest/lowest value in a given field. For e.g. a
surrogate key.
•This function also extends functionality by allowing you to :
–Return multiple columns from a single lookup.
–Choose from more operators to specify a lookup condition.
–Specify a return policy for lookup.
–Call lookup_ext, using Function Wizard, in the query output.


USE DATABASE TYPE FUNCTIONS TORETURN INFORMATION ON DATA SOURCES
• db_type
–function returns the database type of the data store configuration in
use at runtime. This function is useful if your datastore has multiple
configurations. Syntax: db_type(ds_name)
• db_version
–Function returns the database version of datastore configuration in use at
runtime.
Syntax: db_version(ds_name)
• db_database_name
–Function returns the database name of the datastore configuration in use
at runtime.
Syntax:db_database_name(ds_name)

DB FUNCTIONS
• db_owner :
–Function returns the real owner name for the datastore configuration that is
in use at runtime.This function is useful if your datastore has multiple
configurations because with multiple configurations, you can use alias owner
instead of database owner names.
Syntax : db_owner(ds_name,alias_name)
• decode :
–Function to return an expression based on the first condition in the specified
list of conditions and expressions that evaluates to TRUE. It provides an
alternate way to write nested if then else functions.

Syntax : decode(condition_and expression_list,default expression)


Print this post

No comments: