lundi 4 septembre 2017

A more resilient data export approach

Introduction

Business intelligence and data-warehouse are becoming a hub for communicating data to third party. Especially for regulation and legal purpose as these data need to be aggregate first before being send.These export need to be resilient in a way that an incident in the ETL or unavailability of the data warehouse should not create a gap in the data being send.

A common approach to that is subscription in SSRS or Sharepoint so the business can verify these data and the same report is send to third party. Even if some trick can let you only trigger the subscription only when the data is ready it's still lack the ability to catch up the days the ETL didn't ran or when a subscription failed to successfully send the data end to end.


Overview of the software


The software  goal is to be able to be trigger at the end of an ETL job,and create a task for each instance of a subscription for each days that need to be run. Each time a failure happen the catch up need to run as if it was when the failure happen. As a result the logic of non volatility of the data-warehouse apply to data export as well.
The software is able to send subscription via email and ftp.
The software can be found at this address here 


Detail of program behaviour:


For each active subscription the program,It will create one task per subscription for the current day.In addiction to that it also create for each subscription a task for the missing days ( So if we are the 29 and i haven't run for the 26,27,28 it will create 3 more task) .It will also create a task for each days that failed and didn't have success later one.
Each task will be execute in parallel with a max degree of parallelism set in conf file.
The flow of program is the following :
  1.  Generate the list of task
  2. For each task connect to Reporting services using web services,pass parameters and generate the report
  3. Gather the report in memory and send it by email or ftp
  4. Log the result
The catch up list all the missing day and Generate the report by passing a date to a report parameter flag as a CatchUp parameter,this parameter will receive the date of the missing/failed day with an offset.This offset is configurable.


Configure the program

Two place let you configure the program.

Application xml configuration


The first one is the SubscriptionController.exe.config file that let you configure the connection string to the report server. Endpoint address need to reflect your SSRS server. This endpoint address can be fount using SSRS Configuration manager.


The connection strings section Let you configure where your subscription database reside. The user that connect will need read and write permissions.

Within the file SubscriptionController.exe.config you will need to change the value for <web_service_ssrs_url>  <host database name> <sender_email_address(ex me@mycorp.com)> <email host (ex smtp.mycrop.com>




Database configuration



There is 8 tables :
  • dbo.email - Email address for each recipient
  • dbo.ftp - Host, login,password for the ftp connection
  • dbo.log - Time executed,result,message,time of the report, time of the parameter
  • dbo.parameter parameter name,parameter value
  • dbo.report, report name,report description,report address
  • dbo.subscription All subscription information
  • dbo.bridge_ftp_email_subscription Link subscription to a list of email or ftp address
  • dbo.bridge_parameter_subscription Link subscription to a list of parameter

More details on the subscription table :

  • frequency : DAILY or WEEKLY, the weekly one will start on each Monday of the week
  • type FTP or EMAIL
  • active_flag, let you deactivate the subscription without dropping it, 1 mean activate 0 is deactivate
  • id_report is is a FK to the report table
  • type_file_export possible type are XML,csv,TIFF,PDF,MHTML,Excel,PowerPoint,Word
  • title is the Subject of the email in case of an email subscription and the body as well.
  • flagAddDateEndOfFile a is to add the date of the report at the end of the file
  • DateFormatEndOfile is the date format for example yyyyMMdd will give four year digit two month digit and two day digits
  • baseFileName is the name of the file
  • dateAndOfFileOffset is the offset for the date at the end of the file, for example an offset of -1 will give set the date back by one day.
  • flagEncodeASCII will encode the file in ASCII if the flag is 1
  • flagCatchUpLastSuccessfullLoad if 1 it will catch on missing or failing day
  • dateCatchUpOffset is the offset of the date pass to the report during a catchUp

More details on the log table :

date_execution is the date when the task have been executed
result : FAILED or SUCCESS
message : Let you know the exception in case of a failure and in case of success if it was a catch up or not
id_subscription FK to the subscription table
date_report is in case of a normal run the same as the execution date.In case of a catch up it's the date that haven't been processed  ( on earlier example 26,27,28 ).
param_date . In case of a catch up it's the actual date pass to the report for example if the date to re-process is the 26,27,28 and the offset is 2 the date pass to the report parameters will be 24,25,26.





Tracking the subscription


The report of tracking let user follow the subscription they managed with third party.This report have 5 part.
The first two,side by side on the top are a list of report parameters that are pass to the report as well as if they are catch up parameters. On the right the List of destination tell us the list of receiver,ftp or emails.

The third panel "List of subscriptions parameters" give all the parameters of the subscriptions.
Right after the fourth panel "Fail without later success".List all the instance of a subscription that failed without any later successful catch up.

Lastly the fifth panel list all the successful run letting business user know which data have been send to third party.