Database View-plug-ins Framework

Project Description

The View-Plug-Ins Programming on database development is an interface-oriented programming. This methodology has been practiced widely in many fields (e.g. OOP programming) for a long time. Here just focus on database development and introduce a few meta-tables and helper packages (for Oracle, or set of stored procedures for SQL Server). The purpose is to simplify the design of many complex batches processing, useful for ETL, reporting and DWH solutions.

The central motive behind View-Plug-Ins is also a Branches-Leaves model of database development. Branches are stored procedures of flowchart while leaves are views of particular implementation. It helps to remind people pruning fruit trees during the growing season (in early stages of software lifecycle, from the system requirement analysis to design stage). Then in later maintenance stage, it would be more clear to organize/see the whole picture from the root to branches if the program structure of SQL scripts like a deciduous tree in winter.

Interface Design

The process of interface designing prefers more converse thinking. Especially in realization phase, the whole programming steps is from Target (output) Model to Source (input) Models, as shown following:

Once we get a clear and clean data requirement, then to abstract, induce and simplify into a unified target model is a prerequisite of interface-oriented designing.

The Target Model (above figure) defines the data model;
A Interface defines a cluster node to approach one kind of required data;
A view-plug-in defines a method model of data extraction.

Two basic aims of bringing in the Interface and the Plug-ins:

Isolation

To isolate the differences of data model, transform different model from many sources to a common target model, follow by classification, labeling/tagging etc. It helps the code to be loose-coupled while the data keep tightly-coupled. To test each independent view can be much clearer than to test a tangled of steps in sp, make the test-driven database development simple.

Unification

A complex system can always be divided into a few clear-cut sub-systems with logical loose coupling integration. For database system design, the principle of division depends on the induction of data model. Then each target sub-system can be treated as one data model, the differences have been transformed into just some attributes in the same model.

Meta Tables

There are only 4 tables about meta data need to be maintained for above designed model:

EXTRACT_SERVICE
EXTRACT_INTERFACE
EXTRACT_PLUGIN
EXTRACT_RULE

1. EXTRACT_SERVICE

An Extraction Service encapsulates a set of related Interfaces in the form of a service layer:

In design-time, the Service is used to organize interfaces into application domains (projects). A Service is a management unit to build code generation and for Pre-deployment (see the later section [Pre-deployment] for detail).
In run-time, all Interfaces under a Service share the same session context for each Batch.

2. EXTRACT_INTERFACE

Sample:

INTERFACE_ID	SERVICE_ID	UNION_VIEW	SELECT_LIST	DESCRIPTION_
PRD_CLS	APP_DOMAIN	XYZ.VIEW_ALL_PROD	BATCH_ID, SEC_ID, PROD_NAME	Prod Identification
...	...	...	...	...

Columns
- INTERFACE_ID: The unique identifier for the interface, consider a naming convention within the enterprise.
- SERVICE_ID: The extract service (application) of this interface.
- UNION_VIEW: (Also called Hub-View). The name of view which will union all plug-ins under the same interface.
- SELECT_LIST: The select list in the select statement, every plug-in view of the same interface will follow this signature.
- DESCRIPTION_:

3. EXTRACT_PLUGIN

Sample:

PLUGIN_ID	INTERFACE_ID	PLUGIN_VIEW	PLUGIN_ORDER	DESCRIPTION_
PRD_CLS_SRM	PRD_CLS	XYZ.VIEW_PROD_CLS_SRM	1	Identify prods from Srm
PRD_CLS_SCM	PRD_CLS	XYZ.VIEW_PROD_CLS_SCM	2	Identify prods from Scm
PRD_CLS_DEA	PRD_CLS	XYZ.VIEW_PROD_CLS_DEA	3	Identify prods from Dea
PRD_CLS_BSK	PRD_CLS	XYZ.VIEW_PROD_CLS_BSK	4	Identify prods from Bsk
...	...	...	...	...

Columns
- PLUGIN_ID: The unique identifier for the plug-in, consider a naming convention within the enterprise.
- INTERFACE_ID: The interface of the plug-in belongs to.
- PLUGIN_VIEW: The view which will implement the plug-in.
- PLUGIN_ORDER: The order where the plug-in will be assembled in the interface union view.
- DESCRIPTION_:

4. EXTRACT_RULE

Sample:

RULE_ID	PLUGIN_ID	TAG$01	TAG$02	TAG$03	TAG$04	...
1	PRD_CLS_BSK	PC	Flat Code A	SWAP	BASKET
2	PRD_CLS_SCM	PC	Flat Code B	LST_OPT	INDEX
3	PRD_CLS_SCM	PC	Flat Code C	OTC_OPT	INDEX
4	PRD_CLS_SCM	PC	Flat Code D	FUTURE	DEBT
5	PRD_CLS_DEA	PC	Flat Code E	SWAP	INDEX
6	PRD_CLS_DEA	PC	Flat Code F	FORWARD	INDEX
7	PRD_CLS_DEA	PC	Flat Code G	FUTURE	INDEX
8	PRD_CLS_SRM	PC	Flat Code H	E	ADR
9	PRD_CLS_SRM	PC	Flat Code I	E	COM
10	PRD_CLS_SRM	PC	Flat Code J	E	GDR
...	...	...	...	...	...	...

Columns
- RULE_ID: The primary key of this table is pointless, it just means one unique rule. The business key should be in one or some of TAG$... columns. The integrity of rules configuration must be checked manually.
- PLUGIN_ID: This rule will be applied to which plug-in.
- TAG$01: A constant parameter to be used by the extraction plug-in view.
- TAG$02: ..
- TAG$03: ..
- TAG$04: ..
- TAG$05: ..
- TAG$06: ..
- TAG$07: ..
- TAG$08: ..
- TAG$09: ..
- TAG$10: ..
- TAG$11: ..
- TAG$12: ..
- TAG$13: ..
- TAG$14: ..
- TAG$15: ..
- TAG$16: ..

These TAG$## columns look a bit ugly in the practice. From the viewpoint of applied business, all rules under a interface are in the same category. So create a rule-view for each interface specially can solve the ugliness.
A trigger will create a new declaration of columns alias in EXTRACT_RULE_TAG_ALIAS table while a new interface is being added into EXTRACT_INTERFACE table, please assign intelligible alias to corresponding appliable columns.

EXTRACT_RULE_TAG_ALIAS Sample:

INTERFACE_ID	RULE_VIEW	TAG$01	TAG$02	TAG$03	TAG$04	...
PRD_CLS	XYZ.VIEW_CLASS_RULE	PROD_NAME	PLATFORM	CLS_TYPE	SEC_TYPE
...	...	...	...	...	...	...

Columns
- INTERFACE_ID: The interface of the rule-view.
- RULE_VIEW: The updatable view of EXTRACT_RULE which will assign intelligible alias to appliable TAG$## columns.
- TAG$01: Alias of EXTRACT_RULE.TAG$01 column (if appliable).
- TAG$02: Alias of EXTRACT_RULE.TAG$02 column (if appliable).
- TAG$03: Alias of EXTRACT_RULE.TAG$03 column (if appliable).
- TAG$04: Alias of EXTRACT_RULE.TAG$04 column (if appliable).
- TAG$05: Alias of EXTRACT_RULE.TAG$05 column (if appliable).
- TAG$06: Alias of EXTRACT_RULE.TAG$06 column (if appliable).
- TAG$07: Alias of EXTRACT_RULE.TAG$07 column (if appliable).
- TAG$08: Alias of EXTRACT_RULE.TAG$08 column (if appliable).
- TAG$09: Alias of EXTRACT_RULE.TAG$09 column (if appliable).
- TAG$10: Alias of EXTRACT_RULE.TAG$10 column (if appliable).
- TAG$11: Alias of EXTRACT_RULE.TAG$11 column (if appliable).
- TAG$12: Alias of EXTRACT_RULE.TAG$12 column (if appliable).
- TAG$13: Alias of EXTRACT_RULE.TAG$13 column (if appliable).
- TAG$14: Alias of EXTRACT_RULE.TAG$14 column (if appliable).
- TAG$15: Alias of EXTRACT_RULE.TAG$15 column (if appliable).
- TAG$16: Alias of EXTRACT_RULE.TAG$16 column (if appliable).

Then a rule-view named XYZ.VIEW_CLASS_RULE (above sample) will be generated, it's a updatable view and can be used as a substitute of EXTRACT_RULE table for a specific area (interface).

Above example is extracting and classifying hundreds of different product type from various source systems.

The table EXTRACT_RULE induces them into a few extraction models, and centralizes all constant parameters (tags) to minimize hard-code queries.
Each extraction model expresss as a plug-in view. The table EXTRACT_PLUGIN describes which view presents the realization of each plug-in and which interface need to follow.
The table EXTRACT_INTERFACE declares the view signature of each interface (like delegate in C#), and the union view which assembles all its plug-in views. The code of union views and rule views can be generated by following helper view:

SELECT * FROM VPI.VIEW_EXTRACT_CODE

Click the <CLOB>..., the generated code will be show in a pop-up window of PL/SQL.

Session Context

In order to pass some parameters to a batch processing of views (like a OOP class with a parameterless constructor only), the session context is introduced in each batch of extraction processing. The package VPI.EXTRACT_UTILITY provides some functions/procedures to operate the session context.

CREATE_BATCH: The constructor of batch session, this function return a new BATCH_ID.
SET_PARAMS: Set properties in a batch session for passing variables cross views. These variables are stored in table EXTRACT_BATCH.

For most batch processings, people would like to trace the progress updates during the processing, such as elapsed time, current status, % in progress bar, etc.

PROGRESS_START: This procedure initializes how many steps will the batch process.
PROGRESS_UPDATE: This procedure updates current status.

For example (Oracle version):

PROCEDURE ETL_MAIN
(
    inDate  DATE
) IS
tBatch_ID   PLS_INTEGER := VPI.EXTRACT_UTILITY.CREATE_BATCH('CO.GRP.PRD.ETL');
BEGIN
    VPI.EXTRACT_UTILITY.SET_PARAMS(tBatch_ID, '2012-07-31');

    VPI.EXTRACT_UTILITY.PROGRESS_START(12, 1, 'Preloading(cleaning) some crucial slow sources ...');
    PRELOAD_CACHE_SCM(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description =>
        'Loading positions ...');
    LOAD_POSITIONS(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description =>
        'Loading top level securities and classifying product types ...');
    LOAD_TOP_LEVEL_SECURITIES(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description =>
        'Loading middle level securities ...');
    LOAD_MID_LEVEL_SECURITIES(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description =>
        'Loading bottom level securities ...');
    LOAD_BTM_LEVEL_SECURITIES(tBatch_ID);

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Loading issuers ...');
    LOAD_ISSUERS(tBatch_ID);

    ...

    VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Done.');
END MAIN;

Then the view VIEW_EXTRACT_BATCH_LOG or VIEW_LAST_BATCH_LOG can be used to mornitor the whole batch progress in background.
Or, UI can use the procedure EXTRACT_UTILITY.POLLING_PROGRESS to display a progress bar.

Query Optimization

Avoid/minimize using dynamic SQL and temp tables for extraction, SQL expression errors should be discovered at compile-time. Database view provides a facility for this, the execution plan is prepared when compiling the query, every query can be clearly optimized in advance.

Data Quality

The quality of source data always affects queries' performance disastrously.
Bad data: missing, duplicate, impossible values ...
For instance,
- Data missing requires you have to use OUTER JOIN instead of INNER JOIN;
- Data duplication requires you to have a extra MIN/MAX...GROUP BY...;
...
You are not allowed to use straightforward SQL in these situations, but to complicate and slow down queries everywhere painfully! Some romantic business requirements even would ask you to split a regular JOIN SQL into several broken steps and check potential data error at every step, mark error flags, try this side, try that side and continue... Imagine there are some land mines were laid along the highway, then every vehicles are required to be mine-clearing vehicles!
It is utterly opposed to the principle of using Structured Query Language efficiently.
Everyone known in theory, "there is no bad data, but only bad applications(producers)". But in some companies for some reasons, once a new case of data exception was exposed, the requirements always ask downstream applications/reports to handle new error logic, instead of putting constraints in database and fixing upstream applications of data producer to prevent such case happend again. - Isn't it the Emperor's New Clothes, every eyewitness have to draw a new clothes on their own glasses!
As a downstream application/report, normally it's also difficult to ask those tables in source systems to add proper indexes for optimizing high frequency queries or bottleneck queries of data consumers.
... All thus limitation of fact prompted us to own a isolated clean data environment, we should no longer to drive mine-clearing vehicles on the highway. So a preprocess of source data clean-up becames very helpful for the performance of all follow-up batch extractions. Base on the thinking of view-plug-ins, a Relationship Clean-up Engine has been introduced in http://datawashroom.codeplex.com/, since in most cases of bad data, the bad relationship is the most headache of headaches.

Pre-deployment

According to above metadata, all hub-views (union of plug-ins views) and rule-views need to be generated and pre-deployed in early stages of development cycle.

The package VPI.PRE_DEPLOY provides following utilities:

BUILD_SERVICE
Generate source code of all rule-views and hub-views of all interfaces for a service.
Each time a service is built, a new version will be associated with the service. Behind the method BUILD_SERVICE, two tables PRE_DEPLOY_VERSION and PRE_DEPLOY_SCRIPT keep track of all historical versions.
PUBLISH_SERVICE
Deploy the latest version which generated by BUILD_SERVICE to the database. The publishing is actually the database engine compiles the generated code. Compilation errors will be recorded in PRE_DEPLOY_SCRIPT table and also be displayed on DBMS Output window of PL/SQL.
As an option, all old objects (views) deployed by previous version can be dropped before the new deployment.
BUILD_AND_PUBLISH_SERVICE
During the development cycle, BUILD_SERVICE and PUBLISH_SERVICE usually would be run several rounds (once the metadata has a complete change). For the sake of convenience, BUILD_AND_PUBLISH_SERVICE simply combines BUILD_SERVICE and PUBLISH_SERVICE into one step.

Metadata Deployment

In essence, the production deployment is to deploy metadata. Please see also the [Build and Deploy] section in http://datawashroom.codeplex.com/ for detail deployment mechanism.

The view VIEW_EXTRACT_METADATA presents the Metadata Manifest for each service.

Supported Databases

Oracle
- Currently support.
SQL Server
- In the plan ...