vendredi 8 mai 2015

DataVault and RealTime Business Intelligence

How to build Agile Business Intelligence Data Warehouse ? on top of that it need to be Real Time... 


A good modeling approach is the answer.The quality of the approach and the model is what make the difference and stabilize maintenance cost.

What does Data Vault 2.0 bring to the table ? 


- Capability to leverage unstructured  database ( Hadoop ) as traditionnl RBMS
-  Datavault 2.0 is a lego that let build your DWH brick by brick.

Why does it's interesting for RealTime BI ?


- Within a stream of data is hard and costly to join reference data and doing look-up for surrogate key is a expensive operation especially if you dealing with millions of event. By hashing Business Key it's easier to create unique key that will replace Surrogate key and avoid intensive lookup.

Let's imagine you have a flow of event representing a RFID system where a truck full of good going throw antenna that will detect them. The information that the sensor send will be a tag code,a gate code and a time stamp. 

There is couple of Business Key  we can find from this flow of data :

- First a transaction, a tag (representing a product) and a time hash together are unique ( cannot have a tag a two different place at the exact same time). and it will represent a transaction.

-  The gate code can be hash because it  represent a  Business Key

- The tag code will be hash for the same reason

- By hashing the concatenation of the tag_code,time-stamp,gate_code we represent a link between the concept of a transaction and the gate.


The second thing interesting for Real Time Business Intelligence is  you don't have to do lookup to get surrogate key.We can have a flow of transaction and in the same time a flow of Master data. The master data will go on satellite of a hub ( a hub represent a Business Concept ) and the PK of the satellite will be a hash of the Business Key+InsertTime

For example the gate will have some information about the location of the gate.
We can have a flow of data that will be :
{
   gate_code,
   gate_location_name,
   gate_longitude,
   gate_latitude
} we will hash the gate_code and obtain a gate_sk. We will be able to feed directly to the SAT_GATE. The Pk of the SAT_GATE will be gate_sk and an insert timestamp so you can have slowly changing dimension, a flag will let you know for each Business Key which row is the latest as also a start validity and the end of the validity.

Conclusion:

We can see how DataVault fit in a Real Time BI solution,by suppressing the constraint of dependency between data load.In Real Time it's harder to predict when data will arrive and you want to speed up the process as much as possible and DV2 is a great help for that.

jeudi 7 mai 2015

Exemple of use of Time windows within Stream Analytics

Stream Analytics allow you to analyse your data in real-time. The really interesting feature of this engine is the capability to operate your data throw the time and create windows of time.

One really interesting windows function is the Sliding one.
 Why ?Because it allow you for each event to create a windows of time with a give size before this event. It's really interesting for sensor data where the data are raw and can be duplicate among many sensors.

On this case we will be interested for a RFID sensor that catch each tag going throw a gate,even a low speed micro controller speed up to 16 Hhz which mean you got a refresh every 62 nanoseconde. So imagine a truck full of good tagged with RFID will spend 10 second to move throw the door, the information will be send 160 millions time throw the stream of data,so instead of scale up component and have a strong power on a micro-controller just for 10 seconds we can just send this data throw the cloud and have the Event-hub scale up automatically and the stream-analytic analyse them.

So to achieve that we will be looking to set a windows function of let say 2 min ( maximum time the truck may spend to go throw the gate).

We want to discard a tag that already went throw this gate in the last 2 min.


The code on stream-analytics will be :

select gate_uid,card_uid,max(Cast([date] as datetime)) as max_datetime into outputHadoop
from input
group by gate_uid,card_uid,SlidingWindow(minute,2)
having count(*)=1;

So we grouping data by windows of 2 min sending to the output only event that not happend in the last 2 minute ( same tag at the same gate).


Conclusion : We can see it's easy to compute event using the Stream Analytics.In the case of the RFID Warehouse management system that scan in an out it's not cost-efficient to operate and purchase a strong server just for couple run a day,instead leverage the cloud power will be way more interesting, offer high scalability and speed,this point is valid for many other system that need Ad-hoc power.

Connect to Event-Hub

This post explain how to create and connect to a Microsoft Event Hub in order to implement a Real-Time Business Intelligence Solution.

Create an Event-Hub on Microsoft Azure.



You will need an Azure account with a subscription, on azure portal hit the plus button on the left-bottom of the screen. Then APP SERVICES -> SERVICE BUS->EVENT HUB


Then click on Custom-create.

You will have to create a unique name for the event-hub and for the namespace of the service bus if don't already have one.


Once the creation is finish go to the the tab Service bus then click on the arrow after the name of your new service bus.


Now it's important that you go inside the hub and not configure the security inside the service-namespace.


Now go on the configure tab and create a new Policy with Send and Listen permission.


Then hit save.
Under Share access key generator select your policy from the drop-down list. And copy the Primary key.We will need it later.


Create a c# application to send data to an Event hub.


Now we will create a c# application that send data to an Event Hub.

Create a new c# console application in Visual-studio.

Use NuGet ton install Active Directory Authentification Library and Microsoft Azure Service Bus and NewtonSoft.Json.

This line of code will help you to initialize your connection to the service-bus

    string eventHubName = "demohub";
            string eventHubNamespace = "demohub-namespace";
            string sharedAccessPolicyName = "DemoPolicy";
            string sharedAccessPolicyKey = "YourKey"
            var settings = new MessagingFactorySettings()
            {
                TokenProvider = TokenProvider.CreateSharedAccessSignatureTokenProvider(sharedAccessPolicyName, sharedAccessPolicyKey),
                TransportType = TransportType.Amqp
            };

            var factory = MessagingFactory.Create(ServiceBusEnvironment.CreateServiceUri("sb", eventHubNamespace, ""), settings);
            EventHubClient client = factory.CreateEventHubClient(eventHubName);



Then to send data you will need to serialize a json object then send it.

// e is the object you want to send
String eventdata = JsonConvert.SerializeObject(e);

// You will need to partition the data you send with a key
                EventData data = new EventData(Encoding.UTF8.GetBytes(eventdata))
                {
                    PartitionKey = e.card_uid
                };

Then you can send you data with a Synchrnous method

client.Send(data);

Or Asynchronous

tasks.Add(client.SendAsync(data));


Real Time Bi with Microsoft Stack

Microsoft is providing more and more tool to report and analyse your data in real-time and near real-time throw is cloud platform, Azure.

What are the different technologies involve ?

- Event Hub, it the event process service from Microsoft that can scale hub and deal with millions of event per minutes,you paid depending of how much power you use,it's that simple.

- Stream Analysis, is a stream engine that let you process you flow of data with a language that look like SQL.

- Storage , blob storage is the HDFS implementation of Microsoft. Sql Server ,Power Bi DataSet


How to send data to the Real-Time Stack ?

The Event Hub is the entry point and propose authentification as also API to connect and send Event data in JSON format.

Overview of the solution