Monthly Archives: May 2008

Thrive or Survive – the changing rules for databases

Not since the late seventies, when Larry Ellison’s Relational Software Inc. (RSI) turned out the first commerically available RDBMS – Oracle, has there been such rapid changing of the rules (read disruption) in the database industry.
With Web 2.0 pushing enterprise adoption, and the ensuing information explosion in the maze of audio, video, data and ever-growing data warehouses, it seems that the conventional relational database systems are growing tired. With estimates of unstructured data being anywhere between 80% to 95% of all business data, and the ever changing requirements imposed by Web 2.0 – storage of pictures, audio and video, the demands being made on conventional RDBMS technology are monstrous. With the load window available being fixed due to availability and uptime requirements, the ever increasing data to be loaded into data warehouses, the bulking-up of the data due to usage of XML based formats, conflicting requirements of SQL and XQuery, the database is also being challenged by the demands of business intelligence.

And so the time has never been better for start-ups with innovative technologies. From the self-tuning column databases of Vertica to the data compression technologies of Infobright and the lock-free database from Ants Software , there is renewed interest in reinventing the RDBMS to optimize performance.

The competition has already hotted up in the high-end data warehousing segment with the introduction of appliances. With both Netezza and DATAllegro gaining traction rapidly, incumbent Teradata is feeling the heat.

Another entrant to the Web 2.0 database race is Database as a service (DBaaS). With Google opening up its cloud computing platform and making available Bigtable , and Microsoft offering a beta release of SQL Server Data Services (SSDS), the incumbent Amazon’s S3 and SimpleDB is getting some competition.

The incumbent conventional RDBMS vendors like Oracle are not resting on their laurels.
With its aggresive acquisition strategy, Oracle has acquired Sleepycat’s open-source embedded database BerkeleyDB in early 2006 and now licenses it commercially. With its other acquisition (2005) – TimesTen, now integrated as an in-memory database, Oracle has been targeting SaaS ISVs to sell its Oracle SaaS platform.
IBM has already moved into the BI infrastructure segment since 2005 with its pSeries Data Warehousing Balanced Configuration Units and evolved on to the Infosphere Balanced Warehouse applicances. Apart from SSDS, Microsoft has designed SQL Server 2008 with data warehouse features like the star join query, SSIS persistent lookups or the MERGE SQL statement. Most of the biggies (e.g. BOBJ acquiring Inxight) have acquired small companies with technologies to search and analyze unstructured data.

Yet with all this “new and improved”, the newer and innovative technologies are gaining a lot of traction at least in the data warehousing/ETL space. It remains to be seen if the trend catches on in the OLTP segment as well. But for now, the staid and bland database segment is on fire.

It’s all in the universe – handling chasm and fan traps

Recently I worked on an assignment where the Business Objects universe was designed by a business analyst. Nothing wrong with that, as long as we get the correct results in reports or queries on the universe. But it was not to be.
We encountered incorrect results primarily due to two reasons:
1. Mixing of grains in fact tables by not using proper contexts or the aggregate navigator
2. Unresolved chasm traps
It was obvious that the business analyst was not trained in data modeling, nor did the universe go through a proper QA cycle where it is reviewed for data modeling errors.

What is a chasm trap?

The Chasm trap occurs when 2 “many-to-one” joins converge on a single table, which has not been resolved by any context. The most common problem caused by a chasm trap is fetching more data than expected.
Below is an example showing this many-to-one-to-many relationship:

Chasm Trap
How does a chasm trap inflate results?
The chasm trap causes the query to return every possible combination of rows for one measure (fact table 1) with every possible combination of rows for the other measure (fact table 2), thus effectively creating a Cartesian product. Under normal circumstances we would receive as many records as fetched from the tables. However due to the way fact tables are designed, measures have aggregate functions like SUM() or COUNT() applied on them, and this leads to an aggregated and inflated value for either measure.

Solution
The best way to resolve a Chasm Trap is to define a context for each fact table at the “many” end of the joins. This creates two SQL statements and two separate microcubes which Business Objects synchronize to return the correct output.

Another way to resolve a Chasm Trap which aplies only to measure type of objects is to select the option “Multiple SQL Statements for Each Measure” from the Universe Parameters dialog box. However this does not generate multiple SQL statements for dimensions and doesn’t resolve chasm traps in such cases.

Apart from chasm traps and the very familiar loops, there is another possible join problem encountered while designing Business Objects universes – the fan trap. This is however much infrequent compared to the other two.

What is a fan trap?
A Fan Trap is a join between 3 tables where a one-to-many join links the first table to the second and another one-to-many join links the second table to the third. Inflated results are obtained when fields from all 3 tables are included in the query.
Below is an example:

Fan Trap

How does a fan trap inflate results?

A fan trap “fans-out” the one-to-many relationships thus multiplying the result by the details at one end of the one-to-many relationship. An incorrect result is returned as you are performing an aggregate function on the table at the “one” end of the join, while still joining to the “many” end.
Solution
The best way to solve a fan trap using dimension objects in the query is to use an alias and contexts. Create an alias for the table producing the initial aggregation. Create a join between the alias table and the original table.
Next modify the object that is causing aggregation by using the alias table. Then, create separate contexts – one each for the alias table and the original table. Select “Multiple SQL Statements for Each Context” in the universe parameters.

Another way to resolve a fan trap only for measure objects is to use the universe parameters
option “Multiple SQL Statements for Each Measure”. This forces the generation of separate queries for each measure, but doesn’t ensure the same for dimensions.

Potential chasm and fan traps can be detected by visually inspecting the schema by suitably arranging the tables in the Structure pane, so that the “many” ends of the many-to-one joins are to one side of the pane, and the “one” ends to the other.

If you wish to read in more detail about SQL traps in Business Objects Designer and don’t mind a little drama or suspense, I highly recommend checking out Dave Rathbun’s posts on this topic.