How to build Agile Business Intelligence Data Warehouse ? on top of that it need to be Real Time...
A good modeling approach is the answer.The quality of the approach and the model is what make the difference and stabilize maintenance cost.
What does Data Vault 2.0 bring to the table ?
- Capability to leverage unstructured database ( Hadoop ) as traditionnl RBMS
- Datavault 2.0 is a lego that let build your DWH brick by brick.
Why does it's interesting for RealTime BI ?
- Within a stream of data is hard and costly to join reference data and doing look-up for surrogate key is a expensive operation especially if you dealing with millions of event. By hashing Business Key it's easier to create unique key that will replace Surrogate key and avoid intensive lookup.
Let's imagine you have a flow of event representing a RFID system where a truck full of good going throw antenna that will detect them. The information that the sensor send will be a tag code,a gate code and a time stamp.
There is couple of Business Key we can find from this flow of data :
- First a transaction, a tag (representing a product) and a time hash together are unique ( cannot have a tag a two different place at the exact same time). and it will represent a transaction.
- The gate code can be hash because it represent a Business Key
- The tag code will be hash for the same reason
- By hashing the concatenation of the tag_code,time-stamp,gate_code we represent a link between the concept of a transaction and the gate.
The second thing interesting for Real Time Business Intelligence is you don't have to do lookup to get surrogate key.We can have a flow of transaction and in the same time a flow of Master data. The master data will go on satellite of a hub ( a hub represent a Business Concept ) and the PK of the satellite will be a hash of the Business Key+InsertTime
For example the gate will have some information about the location of the gate.
We can have a flow of data that will be :
{
gate_code,
gate_location_name,
gate_longitude,
gate_latitude
} we will hash the gate_code and obtain a gate_sk. We will be able to feed directly to the SAT_GATE. The Pk of the SAT_GATE will be gate_sk and an insert timestamp so you can have slowly changing dimension, a flag will let you know for each Business Key which row is the latest as also a start validity and the end of the validity.
Conclusion:
We can see how DataVault fit in a Real Time BI solution,by suppressing the constraint of dependency between data load.In Real Time it's harder to predict when data will arrive and you want to speed up the process as much as possible and DV2 is a great help for that.