Today's world is oversaturated with data. Telling stories through data is beginning to be so sexy, that many people are building their career on it. A few semi-experts in the Czech Republic have even changed their colours and started talking about BigData (in a worst-case scenario, they also hold conferences on this topic). However, I'll save this topic for a future blog post, in which I'll ground their Hadoop enthusiasm a bit for you.
People want to know more about the environment in which they operate. It helps them make better decisions, which usually leads to a competitive advantage. Generally, for good decision making we need a combination of three things: proper input parameters (information / data), common sense / experience, and a modicum of luck. However, an idiot will still be stupid and although luck can be occasionally bought in the Czech Republic, there's the threat of being arrested for bribery. Hence why information remains the most influenceable component of success. At my playground, the correct information serves as answers to your most penetrating questions you can think of.
I assume that each of you knows how much money you have in your personal bank account. Most of us will also know how much money we spend per month. Fewer of you will know exactly what it was for. An even smaller group of people will know the structure of all the pleasant cups of coffee, ice cream, wine, lunches and so on (we call it long tail). I would bet that almost no one knows one´s personal annual trend in the cost structure of such long tail. You’ll probably argue that you don't care. If you're a company that wants to succeed, you can't do without such information. As for one's own personal life, the biggest nutcase, as I see it, is Stephen Wolfram, who has been measuring almost everything since 1990. He wrote about almost everything except lint from his bellybutton (unlike Graham Barker :)
Because there’s nothing about the executive summary of your accounting on CRM, Google Analytics or social networks on TV after the evening news, you're forced to build different variants of reports and dashboards yourselves.
I'll try to summarize the tools which I know are available; but in the end I'll tell you that it's all just a toy gun, and whomever wants a proper data gun must reach for GoodData. To be fair, I'll do my best and argue a bit :)
Today, Excel is on every corner. It's a good helper, but quite a lot of people have a strange tendency to make Excel Engineers of themselves, which is the most dangerous expertise you can come across. The Excel Engineer often ends with a contingency table and SUMIF() formula. At the same time, business data processing is interconnected with him and, perhaps unwittingly, he's becoming a brake on progress. The biggest risks of reporting in Excel, in my opinion, are as follows:
- The primary data, from which reports are made, are stored in Excel; these data were imported by someone into Excel at some point - with both a poor/expensive possibility of updating
- Excel sheets tend to travel around Corporate Outlooks, leading to different versions. It often comes in handy to change YTD% a little bit, or it may easily happen that another department has the same Excel, but with different numbers - it undermines confidence in the reports and can easily allow for distortion of reality
- Complicated reports must be created by the reporting department (only they know how to update the data - see point 1), where Excel experts provide answers to business questions they do not always understand. Therefore, it often happens that the ad-hoc responses to your ad-hoc hypotheses have been forming for days (submitter burnout occurs)
- The combination of manual operations and macros made by the one that doesn't work here anymore, introduces errors to Excel, thanks to which the cosmoverse then collapses!
It's probably obvious that Excel reporting should end at the level of a sole trader. Nothing reliable can be created with it efficiently. You can be sure that the Excels on ZEE (network drive, of course!) contain errors, are not up-to-date and were made by people who were assigned to it by someone else, so they knew damn all about the nature of the data they involved into the VLOOKUP! Excel Engineers usually don't have it in their genes to do data discovery, and even if they came across something interesting, they probably wouldn't notice. You will know best what the correct information is at that very moment (and Excel isn't really what you should master in 2013 at the level of VBS macros and dirty hacks)!
Today's market is oversaturated with tools that aim to help you visualize some kind of business information. Imagine business information as the number of orders per today, the net margin for the last hour, the average profit per user, etc. In the majority of cases it works in the following way: you calculate this information on your side and send it automatically via an interface to a service that ensures the given metric is presented. Examples of such services are Mixpanel, KissMetrics, StatHat, GeckoBoard and even KlipFolio. The advantage compared to Excel lies especially in the fact that the reports and dashboards can be easily automated and then shared. Information sharing is quite underrated! An example of such information could be the number of data transformations that are executed in minute granularity at our staging layer:
You can build Dashboards from these reports, and for a while you well feel good. The problem occurs when you find out that any extension of such a dashboard requires intervention from your programmers, and the more complex your questions are, the more complicated the intervention. If you operate in B2C and have transactional data, you can be sure that the clinical death of this form of reporting will be, for example, a query on the number of customers with time that spent at least 20% more than the average order for the previous quarter, and all of whom at the same time bought an ABC product this month for the first time. If your programmers, by luck, manage to implement it, they'll shoot their heads off once you add that you want daily numbers of TOP 10 customers from each city who meet the previous rule. If you have just a few more transactions, it will mean remaking your existing DB on your side, which will eventually lead to a 100% collapse. Even if you try to make it survive at all costs, you can be sure that you won't slay the competition thanks to that zero flexibility - you won’t even be able to gently take the analytical helm because the market will pivot around you.
It's possible you don't have similar questions about your business, and it doesn't bother you. The cruel truth is that your competition is asking about it right now, and you will have to somehow respond to it...
Neither Excel nor visualization tools usually have any sophisticated back-end, which applies similarly to services like Domo or Jolicharts. They look super sexy at first glance, but inside is a masked set of visualization tools, sometimes coated with a few statistical features that you mostly won’t use. The common denominator is the absence of a language with which you could step out of the predefined dashboards, and begin to implement similar services so that they were to your benefit.
Their only advantage is that they can be quickly implemented. Unfortunately, that's it, and after a short intoxication period, sobriety sets in. If by chance you are a little bit more demanding, you haven't got a chance for a very happy life.
Low Level Approach
There are services that allow you to upload data and raise queries. As I see it, nowadays the hottest is Google BigQuery. For us at Keboola, it's a tremendous help with data transformation, denormalization and JOINs of huge tables. It can serve you well if it seems like a good idea to you to write the following:
...to get this...:
It's evident that if you don't make a living as an SQL consultant and don't have any ambition to create your own analytical service, you’d better leave this approach to nerds (like us!) and attend to your own business :)
If you google cloud BI, Google will return names like Birst, GoodData, Indicee, Jaspersoft, Microstrategy, Pentaho, etc. (if you have Zoho Reports among the results, the universe got crazy because that should have remained in Asia :).
From many trends, it's obvious that the Cloud moves the world of today. In the Czech Republic, the most common concern about the concept is a worry about the data and the feeling that my IT can do something better than that of the vendor. If you feel the same concerns, you should know that when any troubles arise in the Cloud, the best people available on this planet are working on it immediately, so that everything will again run like clockwork. Dave Girouard (coincidentally also a board member of GoodData) summed it up nicely in this article.
Except for Microstrategy, which probably discovered the Cloud this morning, the above-mentioned brands are relatively established within the Cloud. However, there are different surprises hiding under the lid. Pentaho requires highly technical knowledge so that you can make the most of it. Jaspersoft is Excel on the web that, in short, failed. Indicee would like to play in the major leagues, but I know at least one large customer from Vancouver who, after trying to implement their solutions for a year, moved to GoodData. When I tried Birsta it was all in Flash, and despite my enormous effort I really didn't understand it :(
As I said in the beginning, everything except GoodData sucks. There are several reasons for this:
- GoodData has a powerful language for the definition of metrics. With this language, it's possible for anyone to generate reports, no matter how complicated. The fact that these reports are created not only by clicking is more than essential - it gives you the flexibility you'll need to fight for first place against your competition. If GoodData satisfies Tomáš Čupr (ex-Slevomat, DámeJídlo.cz), you can be sure it will suit you as well. Also constructs which are complex at first glance can be quickly learned at the Keboola Academy.
- GoodData, unlike its competitors, has fundamentally designed API interface to enable companies such as Keboola to bend the whole analytical platform so that it plays first violin in your environment. Seamless integration with other information systems, white-labeling, single-sign-on and a framework for data extraction and transformation means that there are no compromises during the implementation.
- GoodData aren't just reports in a web browser but an entire set of abstractly separated functional layers (from a physical model representing the data up to a logical model representing the business relationship), thanks to which the implementation doesn't include things like, for example, a feasibility study or technical specification. In comparison with the competition, GoodData can be implemented with tremendous speed (no projects for months).
- GoodData has a phantom lab in Brno where R&D is taking place, the output of which are innovations which I'm not sure I can make public today. Nevertheless, I can honestly say that the others will soon shit their pants from it. I'll definitely add it here in time!
All in all, the quality of GoodData shows, among other things, a lot of connections, such as Zendesk.com (the biggest service to support customers in the world). The ability of such flexibility is, from my point of view, absolutely essential for future success. Any one of you can rent high-performance servers, design super-cool UI or program specific statistical functions (or perhaps borrow them from Google BigQuery), but in the foreseeable future no one will come out with a comprehensive concept that makes sense and is applicable to small dashboards (we have a client who uses GoodData to look at some data from Facebook Insights) as well as gigantic projects with a six-digit $ budget for just the first implementation phase.