The Business Intelligence Blog

Review of the BT Summit – Cloud computing, SOA and BI tracks

November 13, 2009 · Leave a Comment

I attended the Business Technology Summit in Bangalore last week – 3rd and 4th November. There were 3 tracks on cloud computing, Service Oriented Architecture and Business Intelligence, and I chose a mix of sessions across each.

Overall impression: The BT Summit was heavily focused on cloud computing with half of second day having a deep dive into Amazon’s EC2 cloud offering, and several keynotes. SOA and web services, REST and similar architectural sessions were interspersed but definitely not a first-class citizen. BI came a poor third with a poor choice of sessions, and more of a rehash of what is out there for everyone, rather than something on the cutting-edge including use of appliances and columnar databases, as also in-memory databases and use of Flash and AJAX for interactive BI front-ends.

Session-wise review: (Speaker profiles available here). I was able to speak to and ask questions of Vinod Kumar, Vijay Doddavaram, Abhinav Agarwal and Dr. Bob Marcus.

Keynotes:

Probably the highlight of the keynotes, this was a pep-talk about the inevitable interconnected future with smart products and services and for good measure Charney threw out some statistics on broadband growth and bandwidth usage and India’s readiness and potential in the scheme of things.

The worst of the lot – this started by comparing the spectrum of offerings in the cloud from Amazon’s DIY EC2 and AWS, Google appengine and apps to Microsoft’s Azure and ended up as a promo touting Azure as the best buy among all.

A very good keynote, focusing on what makes sense to migrate to the cloud and what doesn’t, what are the hidden costs, the myth of unlimited elasticity in the cloud and what Yahoo is doing to use open source software like Hadoop and Hive for cloud computing. In the short time span, Shouvick also tried to address some of the other considerations – including re-architecting existing applications, availability, data storage and movement considerations.

This post-lunch keynote by Sharma was a rambling talk on how technology keeps redefining our lives, and why it is important to think outside-the-box. He used the example of the iPhone to illustrate how such thinking has the potential to alter the established rules of the industry and redefine it as we know it.

Puhlmann provided the security perspective on how easy it to break/hack enterprise systems and how anti-virus and anti-spyware are always playing catch-up, the entire economy that is spawned by the “bad-guys” in technology and why our systems need to be smart and be built from the ground-up for security rather than as an afterthought. He provided valuable insights into what questions we should ask ourselves as we embrace cloud computing, the changing technology landscape making it easy for consuming information but easier still for the security breachers. Puhlmann concluded by suggesting it may be worthwhile including a level of risk assessment and mitigation, and collaboration with ethical hackers, rather than trying to do the impossible of removing all security threats.

Barely managed to sleep through it – this one talked about moving towards a virtual enterprise – with a focus on virtualized architecture, including cloud computing. As boring as they can get.

Other sessions:

  • SOA, Composite Applications, and Cloud Computing: Three pillars of a modern technology solution by Robert Schneider

Robert  Schneider presented the different facets of SOA, Composite applications (superset of mash-ups) and Cloud computing and contrasted them regarding the time to yield benefits, the maturity of the vision, involvement and buy-in from business and where they lie in the tactical-strategic plane. There wasn’t anything regarding why we are stuck with these three for a modern technology solution, or what other paradigms are out there beyond the old-world enterprise computing framework, possibly due to time constraints.

  • Self-service analysis and the future of Business Intelligence by Vinod Kumar

A lot of the BI folks were waiting for this, as Vinod performed the Project Gemini (Office 2010 Excel and PowerPivot) demo live for the first time in India, with several folks, including yours truly, sitting on the stairs. [We have had to rely on Youtube videos and MS Office 2010 preview videos earlier]. The demo was impressive fetching over 13 million records into Excel using a standard DDR laptop, using compression and in-memory technologies. The bigger question around unleashing another round of Excel hell went unanswered due to time constraints, however the presentation probably hinted at Microsoft’s vision of “self-service BI” or so-called “underground-BI” as the power-users of Excel (estimated at 2M worldwide, at 4% of the Excel user base) have been doing. Microsoft’s strategy around pushing SharePoint adoption in the Enterprise was made clear tacitly with SharePoint being the only “portal” to publish and share BI analysis (typical size of these Excel spreadsheets is upwards of 200MB) with other users in the enterprise.

  • Designing and Implementing RESTful web services by Eben Hewitt

Eben Hewitt started off with a very brief comparison between SOAP (Simple Object Access Protocol) modeled more on the lines of RPC (Remote Procedure Call) and REST (Representational State Transfer) and clarified that REST is more an architectural style rather than specifications. The remainder of the talk delved into details of implementation of REST – usage of simple ‘verbs’ and complexity in ‘nouns’, uniform interface, using named resources, java REST frameworks like Jersey, MIME types – JSON, XML, YAML and HTTP operations supported – POST, GET, PUT and DELETE.

I attended with some expectations on how a BI project can be executed possibly with open-source or free software like MySQL/Postgres, Pentaho/Talend, Jaspersoft/MicroStrategy reporting suite etc., but was highly disappointed by the presentation. Ramaswamy spoke on BI usage, barriers to BI adoption, costs of BI implementation and spewed statistics like m&m’s with cursory references to Forrester, Gartner and “research studies”, but there wasn’t anything tangible on how to go about a project execution except for some common-sense talk on “evaluating options” between open-source and licensing costs, offshoring and outsourcing, RDBMS vs. analytica databases and appliances etc.

  • Business Intelligence – Leveraging and Navigating during current challenging times by Vijay Doddavaram

Vijay spoke of the current global economic downturn and how it had taken everyone unawares during the downturn as well as when the current quarter the tide seems to have returned. With the example of a fictitious company in China, he illustrated the importance of trade-off between tactical and strategic decision making and whether and how business intelligence can make a difference in either a downturn and the upswing (whether it is a U, V, or a W curve). Thought-provoking, one couldn’t help feel that BI software has not yet eliminated the “intelligence” that people bring to the table, and made a distinct point about the “human analysis/intelligence” against the out-of-the-box actionable-intelligence marketed by the BI vendors. It would have been interesting to prolong the discussion, with a focus on the “predictive-analytics” offerings in the market (from SAP, WPC, SPSS and the open-source R etc.), we had once again run out of time, and it was the last session of the day as well.

  • Towards a unified Business Intelligence and Enterprise Performance Management Strategy by Abhinav Agarwal

Abhinav is from Oracle and he used this session to basically present the BI and EPM strategy of Oracle. Refreshing when contrasted with the usual Oracle marketing hype, Abhinav made it a point to stress the difficulty of delivering best-in-breed products due to numerous acquisitions and the inevitable integrations compared to the disruptive start-ups which could be one-trick ponies but nevertheless manage to push the technology envelope. Most of the session focused on Oracle BI server offering and the roadmap of integrating with the Fusion middleware, and brief touchpoints on the capabilities of the Oracle BI server: federated queries (acquired from nQuire, which Siebel systems had acquired, prior to being bought by Oracle), and real-time updates, including Oracle RTD (Real-time Decisions) and the segregation of the BI and EPM software offerings.

  • 10 Things software architects should know by Eben Hewitt

I was able to attend part of it, but for the most part- the bottomline of this talk was the trade-offs architects need to make and understanding there may not be a “solution” to a problem, it may just be “moving the problem” – the idea that each “solution” brings its own issues and tradeoffs into the picture. Being more focused on java APIs and cloud computing frameworks, it could have done better with something related to networks and database architecture in general for audience to relate better (for most of my time, I couldn’t relate to a BI applications and data-warehousing infrastructure).

Being late from an overcrowded dining hall, I was able to attend part of this. Bob spoke of the various public and private initiatives including those from the federal government, NASA Nebula and made the distinction early on between the types of offerings on the cloud: SaaS (Software as a service), IaaS (Infrastructure as a service) and PaaS (Platform as a Service). He mentioned in passing the data.gov and apps.gov initiatives of the Obama administration as also about RACE (Rapid Access Computing Environment) from the Dept. of Defense – Defense Information Systems Agency.

Vivek Khurana did a very short presentation to an overflowing hall on clichéd but nevertheless important aspects of information visualization while designing dashboards: clutter vs. simplicity, proper designing of KPIs, importance of delivery to mobile devices, and learning from news aggregation sites and portals on presentation.

  • Implementing Enterprise 2.0 using Open Source products by Udayan Banerjee

Banerjee did a great job of presenting what his vision of implementing Enterprise 2.0 in NIIT was – implementing SLATES (coined by Andrew McAfee) – Search, Links, Authoring, Tags, Extensions and Signals. Within half-an-hour he navigated us through using open-source products for collaboration using blogs and wiki (MediaWiki), using single-sign-on with enterprise databases, using links and tag clouds and integrating Search as well as implementing a text-based instant messenger.

I had missed the earlier session of Alan on lessons learnt using SharePoint, so I made it a point to attend the last of this at the summit – even though it meant I had no clue sometimes of what was being talked about! Alan spoke of the emergence of the multi-vendor CMIS standard for Enterprise Content Management – the various facets of ECM – from digital and media assets, email archiving, Internet content, web analytics, document types, rich media and the problems with the earlier Java standards like JSR 170 – most notably the absence of support from Microsoft. He also spoke about the vendor landscape and a 9-block rating similar to Gartner’s magic quadrant, plus various other important standards, including XAM – eXtensible Access Method – a storage standard developed by SNIA (Storage and Networking Industry Association)

Presentation files: Most presentation files are available here. You’ll need to register though to download.

- Maloy

→ Leave a CommentCategories: Uncategorized
Tagged: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Evolution of the BO XI platform – from XI R2 to XI 3.1 SP2

September 14, 2009 · 2 Comments

With BO XI 3.1 SP2 out in July this year, it is probably time to make a trip down the years to find out how the XI platform has evolved and matured.

The timeline:

  • XI R2 SP2 – service pack release in March 2007 with productivity pack – QaaWS and LiveOffice connectors
  • XI 3.0 – new major release in February 2008 – the first release after SAP acquired BOBJ in October 2007
  • XI 3.1 – upgrade release in September 2008
  • XI 3.1 SP2 – service pack release on 24 July 2009 – with enhanced SAP integration


Where were we with XI R2:

  • Change to Crystal service-oriented platform (Crystal 10 architecture)
  • Ability to plug Crystal Reports, Web Intelligence, Desktop Intelligence, OLAP Intelligence, Dashboard Manager, Performance Manager directly into the framework
  • Single repository, security, system management, publishing, portal
  • Infoview (Replaced old BO Infoview and Crystal ePortfolio)
  • Central Management Console (CMC)
  • Import Wizard (upgrades from BO 5, 6, XI, Crystal 8.5, 9, 10)
  • Desktop Intelligence (new name for BO full client + ability to query and display Unicode data)
  • Publishing, Encyclopedia, Discussions, OLAP Intelligence, Performance Management
  • Changes to Data Integrator, Composer, Metadata Manager

XI 3.0 (Titan)

  • All administration moved to the Central Management Console – CMC – with new GUI
  • Bulk action support in CMC
  • Central Configuration Manager – CCM is still there (to manage multiple nodes) with 2 entries : Tomcat & SIA
  • Server Intelligence Agent (SIA) – handles service dependencies
  • Server Intelligence in CMC – clone server deployments
  • Repository Federation – replicate repository on other BO cluster
  • Repository Diagnostic Tool (Infostore vs FileStore – repair inconsistencies between CMS database entries and files in FRS)
  • Improved Import Wizard
  • Web Intelligence Rich Client (offline viewing of WebI reports, no session timeout)
  • Data change tracking in Web Intelligence
  • Designer – “Database delegated” projection on measures
  • Universe based on stored procedures
  • Prompt syntax extension (persistent/primary_key undocumented features, finally!)
  • Personal data provider – combine data from Excel, text, csv and get into a single report
  • Smart cubes – support for non-additive measures (percentages, ratios) and RDBMS analytical functions
  • Multi language support – dimensions, measures, prompts automatically localized to report viewer’s language
  • Native Web Intelligence printing (without PDF)
  • Enbed image in Web Intelligence report
  • Hyperlinks dialog box makes links easy to create – syntax generated by WebIntelligence (remember opendocument()?)

What’s new in XI 3.1

  • Support for multi-forest Active Directory authentication
  • IP v6 support
  • Lifecycle Management Tool (LCMBIAR files, replace Import Wizard)
  • Saving Web Intelligence documents as CSV (data-only files) – new sheets for every 65K rows of data
  • Web Intelligence Autosave
  • “Begin_SQL” SQL prefix variable
  • Prompt syntax extension (support for key-value pairs!)
  • Business Objects Voyager enhancements
  • Live Office enhancements
  • WebIntelligence – Automatic loading of cached LOVs, interactive drag-drop, report filter bar, cancel refresh-on-open

What’s new in XI 3.1 SP2

In one of my next posts, I’ll cover selected new features in detail.

-Maloy

→ 2 CommentsCategories: Uncategorized
Tagged: , , , , , , , , , , , , , , , , ,

Developing a Business Objects security model – BO XI 3.1

August 28, 2009 · 2 Comments

While developing a Business Objects security model, you need to focus on the different types of security:

Functional Security – this would govern access to specific application features, e.g. editing reports, drilling down, ability to schedule reports etc.

Data Security – this governs access to specific data – rows or columns or cells as per authorization

Infrastructure Security – governs physical and electronic access to systems

The infrastructure security is the first to be designed. This typically happens when the architecture is being drawn up. It is important to get as much early visibility into the various ways the system is likely to be used, not only in the present but also in the foreseeable future, so that adjustments and capacity for future planning can be done to the extent possible. This also helps in deciding on the type of data security that would be required initially, though this can change over time.

The various security considerations for access control include:

Identification - whether it is a valid user? Usually taken care of by password management

Authentication - whether the user is allowed to use the system? This can be done by BO or externally with a third party tool, including but not limited to LDAP / Active Directory etc.

Authorization - governs fine grained entitlements or access – which parts of the application and data can the user access?

Let us look at the security approaches to authorization. (I will cover the various approaches to authentication and single-sign-on in a separate post).

Security policies can be held in the BO repository (functional + data security)

  • Authentication can be performed by BO or externally
  • Incorporates security policies in the BO repository
  • Supports row-level and column-level security
  • Data security can be controlled at application, connection, universe and report level

Custom security utilizing security tables, and joins forced in Universe Designer  (functional + data security)

  • Includes custom-built security tables to store users, groups, privileges etc. The joins to these are forced in report queries.
  • BO users are mapped to data in these tables – the data can be maintained with ETL processes
  • The @BOUSER variable can be used to get the user logins and can be used for implementing row/column level security
  • Allows both user-centric and object-centric views by querying the security tables

Table mapping or virtual private views – can be implemented with Oracle VPD and label security

  • Allows fine grained access control with airtight cell-level security if required
  • Policies setup in Oracle VPD, labels control column access, multiple views for multiple users
  • Works for ad-hoc queries also
  • Requires thorough testing to prevent sql-injection attacks; can lead to performance problems due to additional predicates
  • Can easily become overly complex; however a must-have where airtight security is required

Third party authorization using SiteMinder or LDAP or Active Directory

  • Authorization is based on directory entries in LDAP or Active Directory (people/group/role/IP address or rule)
  • Fine grained access control still requires some form of usage of BO or the database for auxiliary authorization.

What should be the preferred approach? The answer is “Well, it depends!” The approach depends on what is actually required and is feasible at your particular organization. In all cases however (except for VPD), there are a few best practices to be followed, if BO is used and CMC is used to configure security:

  • Grant rights to groups on folders, rather than individual objects to minimize complexity
  • Use pre-defined rights wherever possible, and Custom Access Levels instead of Advanced Rights
  • Avoid breaking inheritance to minimize complexity and simplify maintenance
  • Add multiple users to the Administrators group, rather than sharing the administrator account, for better traceability
  • Set up an audit policy and periodically review your deployment
  • Document and maintain the security structure outside the CMC - a spreadsheet can be a good choice.
  • Use Permissions Explorer, Check Relationships and Security Query to diagnose and correct security issues. These are also useful to verify tasks are completed without issues, while adding/deleting/modifying principals/objects/rights.
  • Allocate time and document the process for the administrators and support staff and prepare for their training on new workflows in CMC in BO XI 3.1

- Maloy

→ 2 CommentsCategories: Uncategorized
Tagged: ,

Change the location Google Desktop Search indexes your data

July 26, 2009 · Leave a Comment

Desktop search has become an important component of our everyday work. With the amount of information explosion, it is only natural that users and enterprises move towards enabling desktop (and enterprise) search for users – subject of course to appropriate security and access controls. BI vendors have moved into this new business space that has opened up and seems to be one of the most promising. While Business Objects had announced support for the Google Search appliance and Google Desktop back in 2006, their most important announcement lately has been the launch of the Business Objects Explorer (formerly known as Polestar) product. More about that in a later post…

Google Desktop Search is one of the most widely used desktop search appliances.  One would expect it to have an intelligent installer as well. Unfortunately, it doesn’t allow you to either choose the installation directory or the location for the search index. It installs in your system drive without providing any means to modify it from the Options setting. This can be quite annoying and frustrating if your system drive is not set up with a huge amount of space, as the Google Desktop search index will expand soon and hog a lot of space (up to 2 GB) on the system drive. I will show a tip here on how you can get around this issue by modifying the location of the Google Desktop search index to change it from the default system drive and without having to rebuild the index.

1. Right click and exit Google Desktop.

Exit_Google_Desktop

2. Open Windows Explorer and navigate to C:\Documents and Settings\<username>Local Settings\Application Data\Google\<google desktop search>

Navigate_To_Google_Desktop_Search_Folder

Note: If you’re unable to see “Local Settings” – (it’s a hidden folder) – change your folder options from Tools – View – Show hidden files and folders.

3. Move the <google desktop search> folder to a different drive, e.g. D:\ Google Desktop\<google desktop search>

4. Open the Windows registry editor from Start – Run – typing regedit – Hit Enter.

5. Navigate to HKEY_CURRENT_USER\Software\Google\Google Desktop.

6. Select the “data_dir” key in right pane, double-click to change its value to the new location of the <google desktop search index>

Modify_Registry_Google_Desktop_data_dr

7. Exit the registry editor.

8. Restart Google Desktop Search.

→ Leave a CommentCategories: Uncategorized
Tagged: , , , , , , , , , ,

BusinessObjects universe design best practices

July 19, 2009 · 7 Comments

Having relocated from the Silicon Valley to Bangalore a year back, I’m now working in an MIS – strategic reporting role. In my role to evangelize the use of BI best practices and tools, one of the foremost is that of universe design.  As a matter of fact, I’m currently being involved in formalizing a BI policy around the tools we use most – Oracle, Informatica and SAP Business Objects (along with migration from our legacy BO to the XI platform!) – so a lot of my current work is related to best practices, design guidelines and preparing unit test checklists for my team of developers.

So here goes my list of universe design best practices. Being the cornerstone of the Business Objects semantic layer, the universe design becomes one of the most important (next only to the data warehouse design if there is one, and foremost if there is none) aspects of getting the right data out there in time for analysis and decision making.

The best practices are grouped by the reporting area they belong to.

Universe design: object creation

  • Object and class naming should be in business terms – so that it makes sense to the end-user. This also reduces development overhead since reports can use descriptions out-of-the-universe, instead of editing headers or creating report level variables.
  • All objects should have help text or usage information – corollary from above.
  • Object formatting should preferably be done at the universe level.
  • Pre-build condition objects in the universe rather than forcing users to build conditions for reports.
  • Build logic into objects – translate code, common calculations etc rather than forcing users to do it in report variables.
  • Avoid using WHERE clauses in the object definitions; use CASE statement instead. In most cases, using WHERE clause will return incorrect results when similar objects are included in the result set, due to combined restrictions imposed by the multiple WHERE clauses.
  • Use aggregation in all measure objects – to push the aggregation to the database wherever the performance bottleneck is likely to be BO server and the database performance is optimal. Generally the database is much more powerful at doing aggregation calculations, and this also reduces the volume of data to be transported over the network.
  • All measure objects should include aggregation functions for projection. When this is not included, BO will not automatically roll-up the data in the report, which could result in incorrect data and analysis.  Note that in the 3.0 version of Designer, a new feature – Database Delegated projection function is available to take care of these anomalies while doing “averages” for instance.
  • Use Custom LOVs or cascading prompts to display LOVs where hierarchies and numerous values are involved.
  • Use relative date objects for scheduling e.g. Today, Yesterday, Previous Month etc. Create a separate class to contain these reporting objects – this helps in improving maintainability.
  • Use dynamic HTML in objects where required to avoid users having to build it in report variables – end users wouldn’t like to code hyperlinks themselves, but would love to have an object which when clicked can lead them to Google Maps for example.
  • Use contexts in universes having multiple fact tables – this helps in getting your measures (built from multiple fact tables) right.
  • Use derived tables to define measures dependent on multiple fact tables.
  • Use derived tables to reduce complexity of queries to be written by users or in place of views or procedures. A note of caution here: Use derived tables sparingly. If you have access to the database or DBA and can get views or tables created for the same purpose, go with it rather than using derived tables. This is not only to push the logic and work closer to the database, but also to take care of the performance and maintainability aspects. Exceptions to this include cases where your derived table may include a prompt which would restrict the number of rows returned and thus improve performance over a conventional view.
  • Reuse code with @Variable. Reuse interactive objects with @Where (if you use them at all).
  • Use @Prompt syntax for conditions and interactive objects where input values are likely to change or absence of prompt would lead to inaccurate values or unacceptable query response times. Also make sure regularly used conditions e.g. current year / latest date should not have prompts to avoid annoying users.
  • “To limit the number of objects created to avoid user confusion, build interactive objects with @Prompt syntax followed by additional OR clause to include “”All”" condition.

E.g. ‘ALL’ IN @Prompt(‘Enter Value or ALL’,'A’, ‘Class\Object’,multi,)

OR

Table.Column IN @Prompt(‘Enter Value or ALL’,'A’, ‘Class\Object’,multi,)”

Universe design: resolving join and performance problems

  • To resolve a chasm trap, define a context for each table at the “many” end of the joins.
  • To resolve a fan trap, create an alias table for the table producing the multiplied aggregation. Create a 1:1 join between the original and the alias tables. Modify the select statement to use the columns from the alias table instead of the original table.
  • Use of contexts should be evaluated w.r.t. use of aliases for resolving join issues, to take care of maintainability of code.
  • Integrity checks on the universe structure, parsing of objects, joins, contexts, detecting loops etc is mandatory. If you wish to use Business Objects to help you detect fan traps or chasm traps – you must set the cardinality on the joins. Do not rely on BO to suggest the cardinality – this is often erroneous, based on the records sample that BO fetches for each table.
  • Uncheck the “Multiple sql statements for each measure” option in universe parameters, if this is not required for resolving any join problems. This option should be checked if the measures being retrieved in the same query involve different tables. “Prevent Cartesian product” should be checked, as should there be limits placed on the number of records returned and the time for the sql connection – to prevent runaway queries which can bring the database down to its knees and cause an outage for all users.

Universe design: optimization / miscellaneous

  • Use shortcut joins wherever possible to reduce number of tables used in a query
  • Use aggregate tables /materialized views with aggregate awareness set up to improve query performance
  • Use keys instead of labels where possible to take care of index awareness benefits of performance and uniqueness
  • Use the JOIN_BY_SQL parameter to shift process from BO server to database wherever the bottleneck for performance is the BO server and the database performance is optimal.
  • Update the .prm files to enable access to custom SQL functions and improve help text
  • Do not use derived tables instead of aggregate tables.
  • Turn off LOVs for all dimension and detail objects that are redundant or not required. This prevents performance problems when users inadvertently click on the “Values” and the query sets to return all the IDs or other irrelevant data.
  • Consider using linked universes with a master kernel universe to ensure consistent dimensions across multiple universes

This list is certainly not an exhaustive one – but a work-in-progress. I’d update it as and when I compile more; meanwhile if you feel anything has been left out, drop in a line.

→ 7 CommentsCategories: Uncategorized