The Business Intelligence Blog

SPC – Using statistics to get insight from BI

July 8, 2008 · 2 Comments

There is a well known adage that if you keep doing the same thing and expect different results, that is a sure sign of idiocy.  In the BI world too, we come across several instances where people take it for granted that the ‘BI tool’ will magically generate insight and spur ‘intelligence’ rather than ‘idiocy’. Yet the very practices of reporting the same measures, or of creating reports for metrics just because they are now made available by the tool, without sparing any ‘intelligence’ into what will generate insight is a major cause  of failures of BI.  Most of the leading commercial BI products are expensive and cost a lot of money in maintenance and support, so it is rather important to understand how to design the proper metrics and KPIs (key process indicators) which would generate insight. Even more important is to have a process focus and a general idea of the basics of statistical process control, in order to make sure that the right decisions are made and resources are spent on processes and strategies where they would have the most impact.

Statistical Process Control (SPC) is quite well known in the manufacturing industry and also in software engineering. In effect, it applies rules of statistics to the processes that are followed to predict whether a process is stable (and therefore in control) and its output is predictable or not and how to identify out-of-control processes and take corrective measures. Quality aids like causal analysis done using brainstorming/ nominal group techniques/ Ishikawa diagrams or fish-bone analysis are helpful in analyzing outliers and reasons of deviation from control limits. A substantive discussion of SPC and quality process areas is not possible in this post so I’ll just touch upon some concepts concisely.

PDCA – Plan-Do-Check-Act cycle, proposed by economist William Shewhart and later by quality guru Dr. Edward Deming. This is the foundation of the management and feedback cycle underlying any software engineering process.

Control limits – Any process which follows the Gaussian normal distribution would have a normal bell-shaped curve and be subject to control limits. The stability of the process can be gauged by the outliers (number and pattern of data points falling outside the control limits).

Causes of deviation: Outliers indicate deviation from a stable and predictable process. Causes of deviation could be due to special causes or common causes. Common causes are like background noise and may be present in stable processes. Special causes must be removed and steps taken to prevent their occurrence to bring a process under control. Common causes may be reduced to have a sharper curve with a narrower band of control limits and have greater control on the process.

 

 

Wikipedia)

Control Chart (Image courtesy: Wikipedia)

Users of BI tools haven’t tapped into the power of SPC to gain insight and control operational processes to the extent possible. There is even danger of damaging with a stable and in-control process due to tinkering with the process based on common-cause variation observed in operational reports. Part of the reason for SPC not gaining sufficient currency is that business analysts are not trained in the basics of SPC or quality processes like DAR (defect analysis and resolution) but mostly it is due to there not being any BI product in the market so far which allows easy use of SPC analysis. It is only of late that vendors like SAP-Business Objects have come out with specific SPC modules and predictive analytics in the BI product marketplace.

BI is a specialized discipline which involves a lot of investment on the part of customers in terms of pre-sale-evaluation (proof-of-concepts / comparisons), implementations, maintenance and support. However the returns from BI implementations are not easy to quantify and ROI (return on investment) figure calculations could be vague and incorrect. Using SPC along with the right quality process framework allows in maximizing the value of BI implementations, as well as provides a ready-reckoner for calculating ROI based on projected process improvements based on statistical control limits.

 

→ 2 CommentsCategories: Uncategorized
Tagged: , , , , , , , , ,

Thrive or Survive – the changing rules for databases

May 7, 2008 · Leave a Comment

Not since the late seventies, when Larry Ellison’s Relational Software Inc. (RSI) turned out the first commerically available RDBMS – Oracle, has there been such rapid changing of the rules (read disruption) in the database industry.
With Web 2.0 pushing enterprise adoption, and the ensuing information explosion in the maze of audio, video, data and ever-growing data warehouses, it seems that the conventional relational database systems are growing tired. With estimates of unstructured data being anywhere between 80% to 95% of all business data, and the ever changing requirements imposed by Web 2.0 – storage of pictures, audio and video, the demands being made on conventional RDBMS technology are monstrous. With the load window available being fixed due to availability and uptime requirements, the ever increasing data to be loaded into data warehouses, the bulking-up of the data due to usage of XML based formats, conflicting requirements of SQL and XQuery, the database is also being challenged by the demands of business intelligence.

And so the time has never been better for start-ups with innovative technologies. From the self-tuning column databases of Vertica to the data compression technologies of Infobright and the lock-free database from Ants Software , there is renewed interest in reinventing the RDBMS to optimize performance.

The competition has already hotted up in the high-end data warehousing segment with the introduction of appliances. With both Netezza and DATAllegro gaining traction rapidly, incumbent Teradata is feeling the heat.

Another entrant to the Web 2.0 database race is Database as a service (DBaaS). With Google opening up its cloud computing platform and making available Bigtable , and Microsoft offering a beta release of SQL Server Data Services (SSDS), the incumbent Amazon’s S3 and SimpleDB is getting some competition.

The incumbent conventional RDBMS vendors like Oracle are not resting on their laurels.
With its aggresive acquisition strategy, Oracle has acquired Sleepycat’s open-source embedded database BerkeleyDB in early 2006 and now licenses it commercially. With its other acquisition (2005) – TimesTen, now integrated as an in-memory database, Oracle has been targeting SaaS ISVs to sell its Oracle SaaS platform.
IBM has already moved into the BI infrastructure segment since 2005 with its pSeries Data Warehousing Balanced Configuration Units and evolved on to the Infosphere Balanced Warehouse applicances. Apart from SSDS, Microsoft has designed SQL Server 2008 with data warehouse features like the star join query, SSIS persistent lookups or the MERGE SQL statement. Most of the biggies (e.g. BOBJ acquiring Inxight) have acquired small companies with technologies to search and analyze unstructured data.

Yet with all this “new and improved”, the newer and innovative technologies are gaining a lot of traction at least in the data warehousing/ETL space. It remains to be seen if the trend catches on in the OLTP segment as well. But for now, the staid and bland database segment is on fire.

→ Leave a CommentCategories: Uncategorized
Tagged: , , , , , , , , , , , , , , , , , , , , , , , ,

It’s all in the universe – handling chasm and fan traps

May 1, 2008 · 4 Comments

Recently I worked on an assignment where the Business Objects universe was designed by a business analyst. Nothing wrong with that, as long as we get the correct results in reports or queries on the universe. But it was not to be.
We encountered incorrect results primarily due to two reasons:
1. Mixing of grains in fact tables by not using proper contexts or the aggregate navigator
2. Unresolved chasm traps
It was obvious that the business analyst was not trained in data modeling, nor did the universe go through a proper QA cycle where it is reviewed for data modeling errors.

What is a chasm trap?

The Chasm trap occurs when 2 “many-to-one” joins converge on a single table, which has not been resolved by any context. The most common problem caused by a chasm trap is fetching more data than expected.
Below is an example showing this many-to-one-to-many relationship:

Chasm Trap
How does a chasm trap inflate results?
The chasm trap causes the query to return every possible combination of rows for one measure (fact table 1) with every possible combination of rows for the other measure (fact table 2), thus effectively creating a Cartesian product. Under normal circumstances we would receive as many records as fetched from the tables. However due to the way fact tables are designed, measures have aggregate functions like SUM() or COUNT() applied on them, and this leads to an aggregated and inflated value for either measure.

Solution
The best way to resolve a Chasm Trap is to define a context for each fact table at the “many” end of the joins. This creates two SQL statements and two separate microcubes which Business Objects synchronize to return the correct output.

Another way to resolve a Chasm Trap which aplies only to measure type of objects is to select the option “Multiple SQL Statements for Each Measure” from the Universe Parameters dialog box. However this does not generate multiple SQL statements for dimensions and doesn’t resolve chasm traps in such cases.

Apart from chasm traps and the very familiar loops, there is another possible join problem encountered while designing Business Objects universes – the fan trap. This is however much infrequent compared to the other two.

What is a fan trap?
A Fan Trap is a join between 3 tables where a one-to-many join links the first table to the second and another one-to-many join links the second table to the third. Inflated results are obtained when fields from all 3 tables are included in the query.
Below is an example:

Fan Trap

How does a fan trap inflate results?

A fan trap “fans-out” the one-to-many relationships thus multiplying the result by the details at one end of the one-to-many relationship. An incorrect result is returned as you are performing an aggregate function on the table at the “one” end of the join, while still joining to the “many” end.
Solution
The best way to solve a fan trap using dimension objects in the query is to use an alias and contexts. Create an alias for the table producing the initial aggregation. Create a join between the alias table and the original table.
Next modify the object that is causing aggregation by using the alias table. Then, create separate contexts – one each for the alias table and the original table. Select “Multiple SQL Statements for Each Context” in the universe parameters.

Another way to resolve a fan trap only for measure objects is to use the universe parameters
option “Multiple SQL Statements for Each Measure”. This forces the generation of separate queries for each measure, but doesn’t ensure the same for dimensions.

Potential chasm and fan traps can be detected by visually inspecting the schema by suitably arranging the tables in the Structure pane, so that the “many” ends of the many-to-one joins are to one side of the pane, and the “one” ends to the other.

If you wish to read in more detail about SQL traps in Business Objects Designer and don’t mind a little drama or suspense, I highly recommend checking out Dave Rathbun’s posts on this topic.

→ 4 CommentsCategories: Uncategorized
Tagged: , , , ,

Export data out of Xcelsius dashboards

April 26, 2008 · 9 Comments

Accessing data in Xcelsius is pretty easy. You can have static data loaded from Excel, or dynamic data loaded through Web Services, Excel XML Maps, QaaWS (Query as a Web Service), Crystal Reports, Web Intelligence Reports or queries using Live Office.

There are however only a few ways to export data from Xcelsius. Either we could use Flash variables or use the XML Data Button to export data out of the compiled .swf file.

Here I present a simple method to export data from an Xcelsius .swf to a csv/Excel file. This method utilizes the XML Data Button’s capability to send and receive XML data.

The source code is provided at the end of this post as a download. Copy the code from the PDF and save as export2csv.aspx on the web server.

Note that this is an ASP.Net (.aspx extension) file, which works with an IIS Web Server. If you wish to use this code, please remember to change the links to point to your webserver and the appropriate directories.
For being able to write the exported data to a directory, you may need to grant Write access to the IUSR_<MachineName> account to allow anonymous access.

After you’ve placed the export2csv.aspx file on your webserver, perform the following steps in your XLF file.
1. Add an XML data button [ Xcelsius 4.5 ]/Connection Refresh button [ Xcelsius 2008 ] and check the “Enable Send” box. Make sure that in the MIME Type dropdown, “application/x-www-form-urlencoded” is selected.

2. Type out the URL of the export2csv.aspx file in the XML Data URL and add the ranges of data that you need to export. The screenshot above shows 2 ranges being exported.
3. Check the “Enable Load” box and add a single cell range (Sheet1!$J$1 in the example) to receive the URL of the CSV file that will be returned by the export2csv.aspx file.
Note that the XML variable name returned by the export2csv.aspx is exportdataurl, so you need to name the range the same. If you wish to change the name, you’ll need to update the same in the export2csv.aspx file.


4. Add a URL link button and set its URL to the cell defined in the previous step (Sheet1!$J$1)
5. Export the .swf file and try out exporting the data by clicking on the Connection Refresh Button and then opening the file by clicking on the URL link Button.

Now you can easily export data from any .swf that you’ve created in Xcelsius.

How does this work?
1. The XML Data Button exports the ranges you’ve selected in an XML file to our export2csv.aspx file.
2. Our export2csv.aspx file parses and saves this XML in a .csv format on the web server.
3. The path of the .csv file is returned to the .SWF file through the “Enable Load” range.
4. The URL Link component retrieves the URL from this cell and opens the .CSV file when clicked.

Source code:

<%
'-----------------------------------
'Name: export2csv.aspx
'Author: Maloy Manna
'Date: 04.26.2008
'Purpose: To export data out of Xcelsius .swf in csv format
'Input: Xcelsius data manager Send ranges in XML format
'Output: CSV output file URI
'Last Modified:
'Modification:
'-----------------------------------
'Variable declarations
Dim sXMLString as String
Dim sFormitem as String
Dim sFileName as String
Dim sRPath as String
Dim sPath as String
Dim sXML as String
Dim iRangeLength As Integer
Dim iRowLength As Integer
Dim iRangeCounter As Integer
Dim iCounter As Integer
Dim oFile as File
Dim oWrite as StreamWriter
Dim oXML as Object
Dim oRNodeList as Object
Dim oRNode as Object
Dim oNodeList as Object
Dim oNode as Object
'Get XML that was sent from SWF file
For Each sFormItem In Request.Form
sXMLString = sXMLString & sFormItem & " = " & Request.Form(sFormItem)
Next
'Load XML into object
oXML = Server.CreateObject("MSXML2.DOMDocument")
oXML.async = False
oXML.validateOnParse = False
oXML.loadXML(sXMLString)
'Create output file with timestamp in the name
sFileName = "Exported_Data_" & DatePart("m",Now()) & DatePart("d",Now()) & DatePart("yyyy",Now()) & DatePart("h",Now()) & DatePart("n",Now()) & DatePart("s",Now()) & ".csv"
oWrite= oFile.CreateText("C:\Inetpub\wwwroot\" & sFileName)
'Use XPath to get the data
'Find the number of ranges or variables to be exported
sRPath ="/data/variable"
oRNodeList = oXML.documentElement.selectNodes(sRPath)
iRangeLength = oRNodeList.Length
For iRangeCounter = 0 to iRangeLength - 1
oRNode = oXML.documentElement.selectSingleNode("/data/variable["+CType(iRangeCounter,String)+"]/row")
'Write out the data set for the range
sPath = "/data/variable["+CType(iRangeCounter,String)+"]/row"
'Find the number of rows for the range
oNodeList = oXML.documentElement.selectNodes(sPath)
iRowLength = oNodeList.Length
For iCounter =0 to iRowLength - 1
oNode = oXML.documentElement.selectSingleNode("/data/variable["+CType(iRangeCounter,String)+"]/row["+CType(iCounter,String)+"]/column")
oWrite.Write(oNode.text)
While Not(oNode.nextSibling Is Nothing)
oWrite.Write("," +oNode.nextSibling.text)
oNode = oNode.nextSibling
End While
oWrite.WriteLine
Next
'Write a blank line before the next range starts
oWrite.WriteLine
Next
oWrite.Close()
'Build XML data to return
sXML = ""
'exportdataurl
sXML = sXML & ""
sXML = sXML & ""
sXML = sXML & "" & "http://localhost/" & sFileName & ""
sXML = sXML & ""
sXML = sXML & ""
'close out the XML data string
sXML = sXML & ""
'Set the Response type
Response.Buffer = True
Response.ContentType = "application/vnd.csv"
Response.Write(sXML)
%>

Download code as PDF: Code for export2csv.aspx

→ 9 CommentsCategories: Uncategorized
Tagged: , , , , ,

Agile Development for BI

April 6, 2008 · 8 Comments

How can you reduce development costs and improve software reliability and accuracy at the same time? How can you make IT work together with Business while architect-ing your BI applications? If these goals sound contradictory and difficult to achieve, then Agile development may well fit the bill. Indeed in numerous BI projects, one or the other flavor of Agile is used to attain these very goals.

Defining Agile
There are several Agile development methodologies available:
• eXtreme Programming (XP)
• SCRUM
• Feature-Driven Development (FDD)
• Crystal Clear
• Dynamic Systems Development Method (DSDM)
• Adaptive Software Development (ASD) and more…

At the core of any flavor of Agile development methodology is the iteration, which may last from 1 to 4 weeks (one unit of time) to develop a piece of the software. Each iteration is treated as an entire software project with its associated planning, design, coding, testing and documentation tasks.

What is it about Agile development which makes it particularly suitable for data warehousing and business intelligence projects?

* Agile emphasizes on communication be it through meetings (be it through the phone, VOIP, web or IM) over written documents. The idea is to get the user involved much early in the development process and incorporate their feedback, so as to minimize the risk of developing faulty software. For organizations adopting BI, very often users are clueless about the systems to build, the technology to use or even the range of analysis they require. Products are often bought after effective sales pitches from vendors and left to IT to deploy and architect. In such cases, IT can use Agile methodologies like DSDM, SCRUM or ASD to flesh out the requirements and deliver BI which actually provides insight rather than building a monolithic and unreliable data warehouse difficult to query and administer.

* Agile gels well with the evolutionary approach required for a data warehousing / BI lifecycle. Requirements change over time, and the iterations of the Agile methodology (with database re-factoring and evolutionary data modeling ) is more efficient in capturing these changes than the classical waterfall approach.

* Proof of the concept, technology and architecture is crucial to justify continued investment in DW/BI projects, especially on the enterprise scale. This is simpler and easier to do with Agile.

* Agile imbibes every member of the project team with extra responsibilities, making them owners of discrete functions and helps the project manager overcome the ‘taskmaster‘ stereotype and concentrate on being a leader or a visionary.

BI is essentially gaining competitive edge by insight into your business through lagging (measures) and leading (predictive model-based) metrics, which allows feedback cycles and restructuring of processes (Plan-Do-Check-Act Deming cycle). This essentially involves cooperation and teamwork across functions to model and understand the multi-faceted perspectives. Teamwork being the foundation of Agile, it is a natural fit for projects in BI and data warehousing.

~biguru

→ 8 CommentsCategories: Uncategorized
Tagged: , , , , , , , , , , ,