Info Warehouse and Info Lake Analytics Collaboration – InFocus Blog

Bill Schmarzo

By Invoice Schmarzo

CTO, Dell EMC Solutions (aka “Dean of Major Data”)October 19, 2017

This weblog was prepared with the considerate help of David Leibowitz, Dell EMC Director of Business enterprise Intelligence, Analytics & Major Info

So knowledge warehousing may possibly not be cool anymore, you say? It is yesterday’s engineering (or 1990’s engineering if you’re as outdated as me) that served yesterday’s small business desires. And when it’s accurate that latest significant knowledge and knowledge science technologies, architectures and methodologies seems to have rendered knowledge warehousing to the back again burner, it is fully wrong that there is not a important part for the knowledge warehouse and Business enterprise Intelligence in digitally remodeled organizations.

It’s possible the most effective way to recognize today’s part of the knowledge warehouse is with a bit of background. And you should justification us if we take a bit of liberty with background (due to the fact we were being there for most of this!).

Period 1: The Info Warehouse Period

Period 1: In the beginning, Gods (Ralph Kimble and Invoice Inmon, based upon your knowledge warehouse religious beliefs) created the knowledge warehouse. And it was very good. The knowledge warehouse, coupled with Business enterprise Intelligence (BI) applications, served the administration and operational reporting desires of the corporation so that executives and line-of-small business administrators could quickly and simply recognize the position of the small business, determine options, and spotlight potential places of less than-efficiency (see Figure 1).

Data Warehouse and Data Lake Analytics Collaboration_photo-1

Figure 1: The Info Warehouse Period







The knowledge warehouse served as a central integration point gathering, cleaning and aggregating a selection of knowledge sources from AS/400, relational and file based (this sort of as EDI). For the very first time, knowledge from provide chain, warehouse administration, AP/AR, HR, point of sale was accessible in a “single version of the truth of the matter.”

Employing extraction-transform-load (ETL) processing wasn’t generally rapid, and could demand a diploma of technological gymnastics to deliver with each other all of these disparate knowledge sources. At just one point, the “enterprise support bus” entered the participating in area to lighten the load on ETL upkeep, but routines quickly went from proprietary knowledge sources, to proprietary (and sometimes arcane) middleware small business logic code (anybody remember Monk?).

The knowledge warehouse supported studies and interactive dashboards that enabled small business administration to have a entire grasp on the state of the small business. That claimed, report authoring was static and not actually enabled for democratizing knowledge. Typically, the nascent thought of self-support BI was confined to cloning a subset of the knowledge warehouse to smaller sized knowledge marts, and extracts to Excel for small business analysis applications. This proliferation of extra knowledge silos created reporting environments that were being out of sync (remember the heated sales meetings where teams could not agree as to which report figures were being correct?) and the analysis paralysis brought on by spreadmarts intended that extra time was expended functioning the knowledge alternatively than driving perception. But we all dealt with it, as it was agreed that some info (no matter the hard work it took to get) was extra crucial that no knowledge.

Period 2: Enhance the Info Warehouse

But IT guy grew disappointed with becoming held captive by proprietary knowledge warehouse sellers. The expenditures of proprietary application and pricey hardware (and let us not even get commenced on consumer-defined capabilities in PL/SQL and proprietary SQL extensions that created architectural lock-in) forced organizations to limit the sum and granularity of knowledge in the knowledge warehouse. IT Male grew restless and seemed for approaches to reduce the expenditures associated with working these proprietary knowledge warehouses when delivering extra value to Business enterprise Male.

Then Hadoop was born out of the ultra-cool and hip labs of Yahoo. Hadoop offered a minimal-price knowledge administration system that leveraged commodity hardware and open up sources application that was an approximated to be 20x to 100x less expensive than proprietary knowledge warehouses.

Male quickly recognized the monetary and operational gains afforded by a commodity-based, natively parallel, open up resource Hadoop system to provide an Operational Info Retail store (now that is actually going outdated university!) to off-load those terrible Extract Load and Transform (ETL) processes off the pricey knowledge warehouse (see Figure 2).

Data Warehouse and Data Lake Analytics Collaboration_photo-2

Figure 2: Enhance the Info Warehouse









The Hadoop-based Operational Info Retail store was considered extremely very good as it helped IT Male to reduce shelling out on the knowledge warehouse (guess not so very good if you were being a vendor of those proprietary knowledge warehouse solutions…and you know who you are T-guy!). Since it’s approximated that ETL consumes 60% to 90% of the knowledge warehouse processing cycles, and due to the fact some sellers accredited their goods based upon those cycles – this thought of “ETL Offload” could provide substantial price reductions. So in an setting confined by Service Amount Agreements (due to the fact outside of Doc Brown’s DeLorean equipped with a flux capacitor, there is however only 24 hrs in a day in which to do all the ETL function), Hadoop offered a minimal-price, substantial-efficiency setting for substantially slowing the expense in proprietary knowledge warehouse platforms.

Items were being having better, but however weren’t ideal. Though IT Male could shave expenditures, he could not make the applications uncomplicated to use by straightforward knowledge shoppers (like Government Male). And when Hadoop was great for storing unstructured and semi-structured knowledge, it could not generally preserve up to the velocity relied upon for relational or cube based reporting from common transactional devices.

See weblog “The Info Warehouse Modernization Act” for extra aspects on the part of the Hadoop-based Operational Info Retail store and how it has helped to “modernize” today’s present knowledge warehouse setting.

Period 3: Introducing Info Science

Then God created the Info Scientists, or maybe it was the Devil based upon one’s standpoint. The knowledge experts required an setting where they could quickly ingest substantial volumes of granular structured (tables), semi-structured (log files) and unstructured knowledge (text, movie, images). They recognized that knowledge past the firewall was required in buy to generate intelligent perception. Info this sort of as climate, social, sensor and 3rd party could be mashed up with the common knowledge stores in the EDW and Hadoop to determine customer perception, customer actions and product usefulness. This built Advertising Male joyful. The experts required an setting where they could quickly check new knowledge sources, new knowledge transformations and enrichments, and new analytic procedures in research of those variables and metrics that could be better predictors of small business and operational efficiency. Thusly, the analytic sandbox, which also operates on Hadoop, was born (see Figure 3).

Figure 3: Introducing Data Science

Figure 3: Introducing Info Science


The features of a knowledge science “sandbox” could not be extra distinct than the features of a knowledge warehouse:

Data Warehouse and Data Lake Analytics Collaboration_photo-5






Finance Male tried desperately to incorporate these two environments but the audiences, duties and small business outcomes were being just way too varying to create an price-proficiently small business reporting and predictive analytics in single bubble.

In the end, the analytic sandbox grew to become just one of the drivers for the generation of the knowledge lake that could aid the two the knowledge science and knowledge warehousing (Operational Info Retail store) desires.

Info accessibility was having better for the knowledge experts but we yet again were being going towards proprietary course of action and a technological ability reserved for the elite. Continue to, points were being very good as IT Male, Finance Male and Advertising Male could function via the knowledge experts to generate innovation. But they quickly desired extra.

See the next weblogs for extra aspects on the complementary nature of the knowledge warehouse and the knowledge lake:

Period 4: Creating Actionable Dashboards

But Government Male was however unhappy. The Info Scientists were being developing amazing predictions about what was probable to materialize and prescriptions about what to do, but the promise of self-support BI was lacking. In its place of the outdated times, and possessing to run to IT Male for studies, now he was requesting them of the Info Scientist.

The studies and dashboards created to aid govt and front-line administration in Phase 1 were being the organic channel for rendering the predictive and prescriptive insights, proficiently closing the loop involving the knowledge warehouse and the knowledge lake. With knowledge visualization applications like Tableau and Electrical power BI, IT Male could eventually deliver on the promise of self-support BI by offering interactive descriptive and predictive dashboards that even Government Male could work (see Figure 4).

Data Warehouse and Data Lake Analytics Collaboration_photo-3

Figure 4: Closing the Analytics Loop


See the weblog “Creating Actionable Dashboards” for extra aspects on how to transform present studies and dashboards into actionable studies and dashboards!

And Male was joyful (until finally the advent of Terminator robots began creating conclusions for us).

DELL Servicing

Leave a Reply

Your email address will not be published.