SPC – Using statistics to get insight from BI

There is a well known adage that if you keep doing the same thing and expect different results, that is a sure sign of idiocy.  In the BI world too, we come across several instances where people take it for granted that the ‘BI tool’ will magically generate insight and spur ‘intelligence’ rather than ‘idiocy’. Yet the very practices of reporting the same measures, or of creating reports for metrics just because they are now made available by the tool, without sparing any ‘intelligence’ into what will generate insight is a major cause  of failures of BI.  Most of the leading commercial BI products are expensive and cost a lot of money in maintenance and support, so it is rather important to understand how to design the proper metrics and KPIs (key process indicators) which would generate insight. Even more important is to have a process focus and a general idea of the basics of statistical process control, in order to make sure that the right decisions are made and resources are spent on processes and strategies where they would have the most impact.

Statistical Process Control (SPC) is quite well known in the manufacturing industry and also in software engineering. In effect, it applies rules of statistics to the processes that are followed to predict whether a process is stable (and therefore in control) and its output is predictable or not and how to identify out-of-control processes and take corrective measures. Quality aids like causal analysis done using brainstorming/ nominal group techniques/ Ishikawa diagrams or fish-bone analysis are helpful in analyzing outliers and reasons of deviation from control limits. A substantive discussion of SPC and quality process areas is not possible in this post so I’ll just touch upon some concepts concisely.

PDCA – Plan-Do-Check-Act cycle, proposed by economist William Shewhart and later by quality guru Dr. Edward Deming. This is the foundation of the management and feedback cycle underlying any software engineering process.

Control limits – Any process which follows the Gaussian normal distribution would have a normal bell-shaped curve and be subject to control limits. The stability of the process can be gauged by the outliers (number and pattern of data points falling outside the control limits).

Causes of deviation: Outliers indicate deviation from a stable and predictable process. Causes of deviation could be due to special causes or common causes. Common causes are like background noise and may be present in stable processes. Special causes must be removed and steps taken to prevent their occurrence to bring a process under control. Common causes may be reduced to have a sharper curve with a narrower band of control limits and have greater control on the process.


Control Chart (Image courtesy: Wikipedia)

Users of BI tools haven’t tapped into the power of SPC to gain insight and control operational processes to the extent possible. There is even danger of damaging with a stable and in-control process due to tinkering with the process based on common-cause variation observed in operational reports. Part of the reason for SPC not gaining sufficient currency is that business analysts are not trained in the basics of SPC or quality processes like DAR (defect analysis and resolution) but mostly it is due to there not being any BI product in the market so far which allows easy use of SPC analysis. It is only of late that vendors like SAP-Business Objects have come out with specific SPC modules and predictive analytics in the BI product marketplace.

BI is a specialized discipline which involves a lot of investment on the part of customers in terms of pre-sale-evaluation (proof-of-concepts / comparisons), implementations, maintenance and support. However the returns from BI implementations are not easy to quantify and ROI (return on investment) figure calculations could be vague and incorrect. Using SPC along with the right quality process framework allows in maximizing the value of BI implementations, as well as provides a ready-reckoner for calculating ROI based on projected process improvements based on statistical control limits.


6 responses to “SPC – Using statistics to get insight from BI

  1. Thanks for posting this, I was looking for a few ROI methods for a BI/DW project!

    I have got to check out your blog more often.

    -Will Banks


  2. Thanks for posting. Only just come across your blog, really interesting. Will subscribe to you now….cheers


  3. Hi Maloy,

    I also work in BI and I am very interested in the addition of SPC to the BI tools. I think it’s very important and will be helpful. I have worked with Universe Designer/Web Intelligence as well as Excelcius and have not come across the SPC aspect. Is is a new addition or new tool?


    • Alex,
      SPC (Statistical Process Control) is a set of processes used to measure probability, statistics and in general to “control” the process. Most BI vendors do not have this in their toolset except for notable exceptions of SAS and WPC. There is a new wave of adding what is called “predictive analytics” to the existing product offerings by most vendors – this would probably contain some SPC tools e.g. control charts.



  4. You eluded to SAP Business Objects having some SPC modules in their products. I have not bumped into them. Could you point me in the right direction please?


  5. Maloy,

    My company, Sight Software, exists to solve the problem with BI that you have written about.

    We call it Process Intelligence, which is a hybrid of BI and statistical software.

    A number of thoughts come to mind:

    (1) Analytic Relativism: What BI vendors would like users to believe is that data analysis does not require methods. I call it analytic relativism because BI tells the user: “if you see a signal, then it exists.” At its heart Six Sigma (SPC wrapped in a deployment methodology) is a method of data analysis. The method rests on universally-accepted standards, which tell the user “if you can measure a signal, then it exists.” (I call this analytic universalism).

    (2) Distributions. Proper data analysis is an exercise in separating signals from noise. You can’t separate signals from noise without distributions. The big problem in adapting BI to good methods of data analysis is that BI is poorly equipped to handle data distributions. This is a fundamental problem with the OLAP cube. We had to build a new analytic engine (“Cubeless OLAP”) which is equipped to handle data distributions.

    (3) Error-proof Analysis. The problem with stat packages (JMP, Minitab, etc.) are two-fold: (a) users need to move data into them. So now a user must know about the data warehouse and SQL. (b) The user must also know statistics, because stat tools are essentially very dumb. They require the user to be smart enough to not only reformat the data, but to understand which tool among hundreds applies in the situation at hand. It should be obvious that very few people have both these skills. So what we’ve tried to accomplish in Process Intelligence is to error-proof the analysis. In short, when a user asks a business question, we’ve made the software smart enough to go get the appropriate statistical tool. And, we’ve made the presentation of visual and easy. If BI tries to do this without changing the architecture, it must bolt on a stat package-like interface. So maybe the user doesn’t have to move the data around so much, but they still must be smart enough to run a stat package.

    (4) Process Flow. Where BI really falls down is when the underlying system in question is one of process flow. The quick and dirty definition of process flow is “where there are log files there is flow.” Process improvement–which is afterall the goal of analysis–requires a detailed understanding of flow. This requires handling log files, which represent time. BI has trouble with time because it wants to store & analyze discreet summary values in the OLAP cube. This is not the proper way to analyze movement.

    OK, enough said.

    Thanks for raising this issue. I have been discouraged that there is not greater awareness of the problem of the lack of good data analysis methods in BI. I believe it is the reason that BI is fairly ineffective. Without good methods, users can’t really tell what needs attention.


Share your views

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s