People were getting lost in data - so they created tools to help them. Since 1958, when Hans Peter Luhn coined the term “Business Intelligence” until the end of the 80’s, the whole industry lived by terms such as data warehouses, OLAP cubes etc. In 1989 Howard Dresner defined BI as a “set of concepts and methods to improve business decision making by using fact-based support systems”.
Over the last half century, BI has been progressing until today, when it finds itself in a bit of crisis.
The Dashboard Crisis
We are overwhelmed by data - no longer the raw data - but rather the categorized, mathematically processed data represented in what we call “reports”.
Imagine that you have a large amount of data. You know that there is a lot of very interesting information in it. So, you take tools that pull all that data into one place, clean it, polish and present back to you - and you start looking at it (that’s what we do with Keboola & GoodData).
Over time, though, one can easily experience the following side effects:
- resignation / the “juicer syndrome”. You see (if you use the system passively) the same information in the data day after day. Inside the first few weeks, you drill into the data and look from all angles. As time follows, your focus falls away while you continually ingest more and more data you don’t need to see again (Avast Antivirus now has more than 200M users, they’ll still have more than 200M tomorrow, no one needs to be reminded of that daily). If you bought a new juicer, you probably drank nothing but fresh juice for a week or two, and since then the appliance has been collecting dust somewhere. Something very similar can easily happen in BI.
- drowning in data. If you have a good tool that allows you to drill into your data and you use it, you generate one report after another as you find more and more interesting answers. At one point you’ll have so many reports that you get lost.
Once you have hundreds of reports, all sorting, tagging or naming conventions stop working. You’ll get to the point when no-one will be able to find what they need. Instead of looking for existing reports, people will start building the same ones again and again. Your sales director knows, that there was a report “Margin estimate for the next 4 weeks based on sales managers’ estimate” somewhere, but it is harder to find it than to build it again (which speaks, in case of GoodData, volumes about the ease of its use).
What are the attempts for solutions?
- Use of natural language - Microsoft is trying in it’s “Power BI” to understand queries asked in a similar matter to how we ask a search engine. In that case, natural language needs to be somehow connected to the semantic model leading to the data. It looks pretty (see the Power BI link), but Odin, my colleague, nailed it when he commented after reading one such article:
“I read it and IMHO it’s a bit of BS, because articles like that have been showing up regularly since the 50’s - saying that use of natural language is “almost here”. The best generic tools for interactive communication with a computer (asking the computer for something) is so far SQL, which was supposed to be so simple, that everyone can write a query as easy as a sentence. Time has shown that reality (and therefore also natural language) is so idiotically complex, that any language describing it needs to be also complex and you need to study for 5 years to master it (same as natural language).”
- Use of visual interface between the system and a human - you can see that nicely on an example of BIRST. It’s a beautifully executed marketing video, but once the data model (a.k.a. the relationships between information) gets sufficiently complex, the interface stops working - it doesn’t understand what we want from it or controlling it gets so complicated, that its advantages are lost.
What are we doing about it?
It is important to take a bit of everything. It will remain critical that everyone has access to information they feel they need (to validate hypothesis, support their decision etc.). Apart from that the machines need to help a bit with sifting through the data - so you don’t have to generate hundreds of reports trying to find the golden nugget.
At Keboola we’ve been working on a system that is attempting to solve exactly that since the Summer of 2013. Today it is practically a complete set of functions, that can recognize the meaning of data (time, ID, number, attribute - we call that piece “data profiler”), relationships between data (for example it can figure out how to connect Google Analytics with CRM data) and afterwards run tests to identify “interesting moments”. For example it can discover seasonality in a particular segment of customers and point to it, without the need for an analyst to get the idea to try something like that out. Our system “guesses” where the data relates to a specific customer and if it finds something interesting, it will point it out. Ideally it by itself creates a report in GoodData filtered to the given situation.
As an example, for “on-line transaction” data types we have a set of tests that are looking for those interesting moments. One of these tests (working title “Wrong Order Test”) creates histograms of all combinations of facts (typically monetary values) and attributes (products / locations / months / user types etc. ) Among those it tests whether the counts of ID’s (such as orders) correlate with the values - if some attribute seems outside of “normal” in a particular situation, it’s a reason enough to bring it up with the business user.
This picture shows how for a specific time period and product (or a user group), the system identified that there is unexpected drop of profit for a particular payment method - “interesting behaviour”. Unless you somehow get the idea to test for precisely this situation and report setting, you have practically NO CHANCE to discover this. On top of that, the same anomaly may not present itself a week later, therefore you need continual detection.
Our goal is to periodically test the various data types sitting in Keboola and inform their owners of those interesting facts in the form of an automated dashboard within their GoodData projects. The last thing we need to do is to define how to configure the tests, as the true power lies in the interaction of various tests over the same data. Everything else - the data profiler, tests themselves, supporting R functions, API, infrastructure apod. is ready to go.
This way Keboola will not only help use data to find answers to your business questions, but also phrase new questions based on gems hidden in the data.