overview of the Graph
A graph is a data flow diagram that defines the various processing stages of a task and the streams of data as they move from one stage to another. In a graph a component represents a stage and a flow represents a data stream. In addition, there are parameters for specifying various aspects of graph behavior
Build a graph in the GDE by dragging and dropping components, connecting them with flows, and then defining values for parameters. run, debug, and tune your graph in the GDE. In the process of building a graph, are developing an Ab Initio application, and thus graph development is referred to as graph programming. When you are ready to deploy your application, you save the graph as a script that you can run from the command line.A sample Graph shown above
A graph is completely defined by the totality of its parameter values. A parameter is a name-value pair, with a number of additional associated attributes that describe when and how to interpret or resolve the value. Parameters are used to change the behavior of graphs and projects (groups of graphs in a sandbox) in a controlled and uniform way.
The sample graph is shown below the input file and output file is a database component .The details about component is explained in the following section. Input file and partition by round robin is connected by flows .flow is the path of data passes in the graph .
Parts of a Graph
Metadata: Metadata is any information about data or how to process it. There are two broad categories of metadata:
1. Technical metadata — Metadata associated with graphs. This includes the information needed to build a graph, such as record formats, key specifier, and transform functions; as well as graphs themselves, tracking information from the running of graphs, job histories, versioning, and so on. You can store technical metadata as part of a graph , in a file, or in a data store in the EME
2. Enterprise metadata — Metadata associated with the business using Ab Initio software. This includes user-defined documentation of job functions, roles, categories, and so on.
Dataset: It is one of the components it contains the table or files which hold the input and output file
Component: it is used to build a graph .the component organizer contains all the components
Flows: Flow is used to connect the two components
1. Layout determines the location of a resource.
2. A layout is either serial or parallel.
3. A serial layout specifies one node and one directory.
4. A parallel layout specifies multiple nodes and multiple directories. It is permissible for the same node to be repeated.
5. The location of a Data set is one or more places on one or more disks.
6.The location of a computing component is one or more directories on one or more nodes. By default, the node and directory is unknown.
7.Computing components propagate their layouts from neighbors, unless specifically given a layout by the user
Phase of the Graph
Phase are used to break up a graph into blocks for performance tuning. The primary purpose of phasing is performance tuning by managing resources. Phasing limits the number of simultaneous processes by breaking up the graph into different phases, only one of which is running at any given time. One common use of phasing is to avoid deadlocks. The temporary files created because of phase breaks are deleted at the end of the phase regardless of whether the run was successful or not
Check point :
1 .Checkpoints are used for the purpose of recovery.
2 .The main aim of checkpoints is to provide the means to restart a failed graph from some intermediate state.
3 .In this case, the temporary files from the last successful checkpoint are retained so that the graph can be restarted from this point in the event of a failure. Only as each new checkpoint is completed successfully are the temporary files corresponding to the previous checkpoint deleted.
Graph Runtime Behavior:
1 .The graph execution can be done from the GDE itself or from the back-end as well
2 .A graph can be deployed to the back-end server as a Unix shell script or Windows NT batch file.
3 .The deployed shell or the batch file can be executed at the back-end
A sandbox is a user personnel work place . A sandbox is a collection of graphs and related files that are stored in a single directory tree, and treated as a group for purposes of version control, navigation, and migration. A sandbox can be a file system copy of a datastore project
Sandbox Structure :
It has five folder based on metadata as explained below
• db - database-related
• dml - record formats
• mp - graphs
• run - deployed scripts
• xfr - transforms