Friday, December 13, 2013

Module 5: Data Services Objects


Module Objectives
After completing this unit , you should be able to understand about:
  • Projects
  • Jobs
  • Data Flows
  • Work flows
  • Scripts
  • Transforms
  • Data Stores
  • Importing from through Data Stores and File Format
DATA SERVICES OBJECTS
• Projects
• Jobs
• Data Flows
• Work flows
• Scripts
• Transforms
PROJECTS & JOBS
Projects
–A projectis an object that allows you to group jobs. A project is the highest level of organization offered by Data Integrator.
–Opening a project makes one group of objects easily accessible in the user interface.
Jobs
–A job is the only object you can execute. You can manually execute and test jobs in development.
–In production, you can schedule batch jobs.
–A job is made up of steps you want executed
together.

DATA FLOW
Data flows extract, transform, and load data.
• Everything having to do with data, including reading sources, transforming data, and loading targets, occurs inside a data flow.
• The lines connecting objects in a data flow represent the flow of data through data transformation steps.


WORK FLOW
A work flow defines the decision-making process for executing data flows.
For example, elements in a work flow can determine the path of execution based on a value set by a previous job or can indicate an alternative path if something goes wrong in the primary path.


TRANSFORMS
• Data Integrator includes objects called transforms.
• Transforms operate on data sets.
• Transforms manipulate input sets and produce one or more output sets.
• E.g.
–Query
–Case
–Date Generator
–Etc.

SCRIPTS
Scripts are single-use objects used to call functions and assign values to variables in a work flow.
• For example, you can use the SQL function in a script to determine the most recent update time for a table and then assign that value to a variable.
• You can then assign the variable to a parameter that passes into a data flow and identifies the rows to extract from a source.

OBJECT USE
Single Use Objects
–Some objects are defined only within the context of a single job or data flow, for example scripts and specific transform definitions.
Re-usable Objects
–A reusable object has a single definition; all calls to the object refer to that definition. If you change the definition of the object in one place, you are changing the object in all other places in which it appears.
RELATIONSHIP BETWEEN THE OBJECTS
• Jobs are composed of work flows and/or data flows
• Work flow is an incorporation of several data flows
• Data flow is the process by which source data is transformed into
target data
• Work flow:
 • Data flow:


OBJECT HIERARCHY

DEVELOPMENT PROCESS
• Design
• Test
• Production

DESIGNER INTERFACE
• Key Areas of Designer windows
• Toolbar
• Local Object Library
• Project Area
• Tool palette
• Workspace
KEY AREAS OF DESIGNER WINDOW

DEFINING SOURCE AND TARGET METADATA
Using Data store 
• Importing Metadata
• Defining a File Format
DATASTORES
• Connection or multiple connections to data sources
• Import metadata from the data source
• Metadata consists of:
–Table name
–Column names
–Column data types
–Primary key columns
–Table attributes
–RDBMS functions
–Application specific data structure
• Data Integrator uses datastores to read data from source table or load data to target table

TYPES OF DATASTORES
• Database Datastores:
–Provides a simple way to import way to import metadata
• Application Datastores:
–Easily import metadata from most ERP systems
• Adapter Datastores:
–Provides access to an application’s data and metadata or just metadata
Datastores represent connection configurations between Data Services and data bases or applications.
These configurations can be direct or through adapters.
Datastore configurations allow Data Services to access metadata from a database or application and read from or write to that database or application while Data Services executes a job.
Datastores:
vProvide a logical channel (connection) to a database
vMust be specified for each source and target database
Are used to import metadata for source and target databases into the repository.
vAre used by Data Integrator to read data from source tables and load data to target tables

Different Database Connection through DS
The databases to which Data Integrator datastores can connect:
• OracleàMicrosoft SQL Server
• IBM DB2àSybase ASE
• InformixàODBC
• Sybase IQ
Metadata consists of:

• Database tablesàPrimary key columns
• Table nameàTable attributes
• Column namesàColumn data types
RDBMS functions

Connection metadata is defined in the object library as datastores (for tables) and file formats (for flat files). While we are designating the datastores as sources or targets datastores only function as connections. The actual source and target objects are defined when we define data flows

To define a Datastore

  • Explain Datastore
  • Create a database datastore
  • Change a datastore definition


Advanced 
Importing 
Properties


Import Metadata Through Datastage


With Data Services, we can import meta data for individual tables using a data store. Data Services stores meta data information for all imported objects in a data store. Importing meta data through a data base data store


To view imported objects in the object library


To view imported objects in the object library
If we have already imported an object such as a datastore, function, or table, we
can reimport it, which updates the object’s metadata from our database
(reimporting overwrites any changes we might have made to the object in Data
Services).
File Format Properties
  • A file format is a set of properties describing the structure of a flat file. File formats describe the metadata structure.
  • A file format defines a connection to a file. Therefore, we use a file format to connect Data Integrator to source or target data when the data is stored in a file rather than a database table.File format objects can describe files in:
  • Delimited format —Characters such as commas or tabs separate each Field
  • Fixed width format —The column width is specified by the user
  • SAP ERP and R/3 format

To create a file format from an existing file format
We must enter a new name for the replicated file. Data Services does not allow to save the replicated file with the same name as the original (or any other existing File Format object).


Editing file formats
  • We can modify existing file format templates to match changes in the format or structure of a file.
  • We cannot change the name of a file format template. Any changes effect every source or target file that is based on this file format template.
File format features
  1. Data Integrator offers several capabilities for processing files:
  2. Reading multiple files at one time
  3. Number formats
  4. Ignoring rows with specified markers
  5. Date formats at the field level
  6. Replicating and renaming file formats


HANDLING ERRORS IN FILE FORMAT
• When you enable error handling in File Format
Editor, Data Integrator:
–Checks for two types of Flat-file source error:
• Data-type conversion error
• Row-format error
–Stop processing the source file after reaching a
specified number of invalid rows
–Logs data-type conversion or row-format warning to
the Data Integrator error log

+VV Satyanarayana G
+Sairam Jalluru 


Print this post

No comments: