Project Description
The View-Plug-Ins Programming on database development is an interface-oriented programming. This methodology has been practiced widely in many fields (e.g. OOP programming) for a long time. Here just focus on database development and introduce a few meta-tables and helper packages (for Oracle, or set of stored procedures for SQL Server). The purpose is to simplify the design of many complex batches processing, useful for ETL, reporting and DWH solutions.
The central motive behind View-Plug-Ins is also a Branches-Leaves model of database development. Branches are stored procedures of flowchart while leaves are views of particular implementation. It helps to remind people pruning fruit trees during the growing season (in early stages of software lifecycle, from the system requirement analysis to design stage). Then in later maintenance stage, it would be more clear to organize/see the whole picture from the root to branches if the program structure of SQL scripts like a deciduous tree in winter.
Interface Design
The process of interface designing prefers more converse thinking. Especially in realization phase, the whole programming steps is from Target (output) Model to Source (input) Models, as shown following:
Once we get a clear and clean data requirement, then to abstract, induce and simplify into a unified target model is a prerequisite of interface-oriented designing.
Two basic aims of bringing in the Interface and the Plug-ins:
Isolation
To isolate the differences of data model, transform different model from many sources to a common target model, follow by classification, labeling/tagging etc. It helps the code to be loose-coupled while the data keep tightly-coupled. To test each independent view can be much clearer than to test a tangled of steps in sp, make the test-driven database development simple.
Unification
A complex system can always be divided into a few clear-cut sub-systems with logical loose coupling integration. For database system design, the principle of division depends on the induction of data model. Then each target sub-system can be treated as one data model, the differences have been transformed into just some attributes in the same model.
Meta Tables
There are only 4 tables about meta data need to be maintained for above designed model:
1. EXTRACT_SERVICE
2. EXTRACT_INTERFACE
INTERFACE_ID | SERVICE_ID | UNION_VIEW | SELECT_LIST | DESCRIPTION_ |
PRD_CLS | APP_DOMAIN | XYZ.VIEW_ALL_PROD | BATCH_ID, SEC_ID, PROD_NAME | Prod Identification |
... | ... | ... | ... | ... |
3. EXTRACT_PLUGIN
PLUGIN_ID | INTERFACE_ID | PLUGIN_VIEW | PLUGIN_ORDER | DESCRIPTION_ |
PRD_CLS_SRM | PRD_CLS | XYZ.VIEW_PROD_CLS_SRM | 1 | Identify prods from Srm |
PRD_CLS_SCM | PRD_CLS | XYZ.VIEW_PROD_CLS_SCM | 2 | Identify prods from Scm |
PRD_CLS_DEA | PRD_CLS | XYZ.VIEW_PROD_CLS_DEA | 3 | Identify prods from Dea |
PRD_CLS_BSK | PRD_CLS | XYZ.VIEW_PROD_CLS_BSK | 4 | Identify prods from Bsk |
... | ... | ... | ... | ... |
4. EXTRACT_RULE
RULE_ID | PLUGIN_ID | TAG$01 | TAG$02 | TAG$03 | TAG$04 | ... |
1 | PRD_CLS_BSK | PC | Flat Code A | SWAP | BASKET | |
2 | PRD_CLS_SCM | PC | Flat Code B | LST_OPT | INDEX | |
3 | PRD_CLS_SCM | PC | Flat Code C | OTC_OPT | INDEX | |
4 | PRD_CLS_SCM | PC | Flat Code D | FUTURE | DEBT | |
5 | PRD_CLS_DEA | PC | Flat Code E | SWAP | INDEX | |
6 | PRD_CLS_DEA | PC | Flat Code F | FORWARD | INDEX | |
7 | PRD_CLS_DEA | PC | Flat Code G | FUTURE | INDEX | |
8 | PRD_CLS_SRM | PC | Flat Code H | E | ADR | |
9 | PRD_CLS_SRM | PC | Flat Code I | E | COM | |
10 | PRD_CLS_SRM | PC | Flat Code J | E | GDR | |
... | ... | ... | ... | ... | ... | ... |
These TAG$## columns look a bit ugly in the practice. From the viewpoint of applied business, all rules under a interface are in the same category. So create a rule-view for each interface specially can solve the ugliness.
A trigger will create a new declaration of columns alias in EXTRACT_RULE_TAG_ALIAS table while a new interface is being added into EXTRACT_INTERFACE table, please assign intelligible alias to corresponding appliable columns.
INTERFACE_ID | RULE_VIEW | TAG$01 | TAG$02 | TAG$03 | TAG$04 | ... |
PRD_CLS | XYZ.VIEW_CLASS_RULE | PROD_NAME | PLATFORM | CLS_TYPE | SEC_TYPE | |
... | ... | ... | ... | ... | ... | ... |
Then a rule-view named XYZ.VIEW_CLASS_RULE (above sample) will be generated, it's a updatable view and can be used as a substitute of EXTRACT_RULE table for a specific area (interface).
Above example is extracting and classifying hundreds of different product type from various source systems.
Session Context
In order to pass some parameters to a batch processing of views (like a OOP class with a parameterless constructor only), the session context is introduced in each batch of extraction processing. The package VPI.EXTRACT_UTILITY provides some functions/procedures to operate the session context.
For most batch processings, people would like to trace the progress updates during the processing, such as elapsed time, current status, % in progress bar, etc.
PROCEDURE ETL_MAIN ( inDate DATE ) IS tBatch_ID PLS_INTEGER := VPI.EXTRACT_UTILITY.CREATE_BATCH('CO.GRP.PRD.ETL'); BEGIN VPI.EXTRACT_UTILITY.SET_PARAMS(tBatch_ID, '2012-07-31'); VPI.EXTRACT_UTILITY.PROGRESS_START(12, 1, 'Preloading(cleaning) some crucial slow sources ...'); PRELOAD_CACHE_SCM(tBatch_ID); VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Loading positions ...'); LOAD_POSITIONS(tBatch_ID); VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Loading top level securities and classifying product types ...'); LOAD_TOP_LEVEL_SECURITIES(tBatch_ID); VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Loading middle level securities ...'); LOAD_MID_LEVEL_SECURITIES(tBatch_ID); VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Loading bottom level securities ...'); LOAD_BTM_LEVEL_SECURITIES(tBatch_ID); VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Loading issuers ...'); LOAD_ISSUERS(tBatch_ID); ... VPI.EXTRACT_UTILITY.PROGRESS_UPDATE(inProgress_Description => 'Done.'); END MAIN;
Then the view VIEW_EXTRACT_BATCH_LOG or VIEW_LAST_BATCH_LOG can be used to mornitor the whole batch progress in background.
Or, UI can use the procedure EXTRACT_UTILITY.POLLING_PROGRESS to display a progress bar.
Query Optimization
Avoid/minimize using dynamic SQL and temp tables for extraction, SQL expression errors should be discovered at compile-time. Database view provides a facility for this, the execution plan is prepared when compiling the query, every query can be clearly optimized in advance.
Data Quality
The quality of source data always affects queries' performance disastrously.
Bad data: missing, duplicate, impossible values ...
For instance,
- Data missing requires you have to use OUTER JOIN instead of INNER JOIN;
- Data duplication requires you to have a extra MIN/MAX...GROUP BY...;
...
You are not allowed to use straightforward SQL in these situations, but to complicate and slow down queries everywhere painfully! Some romantic business requirements even would ask you to split a regular JOIN SQL into several broken steps and check potential
data error at every step, mark error flags, try this side, try that side and continue... Imagine there are some land mines were laid along the highway, then every vehicles are required to be mine-clearing vehicles!
It is utterly opposed to the principle of using Structured Query Language efficiently.
Everyone known in theory, "there is no bad data, but only bad applications(producers)". But in some companies for some reasons, once a new case of data exception was exposed, the requirements always ask downstream applications/reports to handle new error logic,
instead of putting constraints in database and fixing upstream applications of data producer to prevent such case happend again. - Isn't it the Emperor's New Clothes, every eyewitness have to draw a new clothes on their own glasses!
As a downstream application/report, normally it's also difficult to ask those tables in source systems to add proper indexes for optimizing high frequency queries or bottleneck queries of data consumers.
... All thus limitation of fact prompted us to own a isolated clean data environment, we should no longer to drive mine-clearing vehicles on the highway. So a preprocess of source data clean-up becames very helpful for the performance of all follow-up batch
extractions. Base on the thinking of view-plug-ins, a Relationship Clean-up Engine has been introduced in
http://datawashroom.codeplex.com/, since in most cases of bad data, the bad relationship is the most headache of headaches.
Pre-deployment
According to above metadata, all hub-views (union of plug-ins views) and rule-views need to be generated and pre-deployed in early stages of development cycle.
The package VPI.PRE_DEPLOY provides following utilities:
Metadata Deployment
In essence, the production deployment is to deploy metadata. Please see also the [Build and Deploy] section in http://datawashroom.codeplex.com/ for detail deployment mechanism.
The view VIEW_EXTRACT_METADATA presents the Metadata Manifest for each service.
Supported Databases