GoodData XAE: The BI Game-Changer (1st part)

Putting your data into the right context

At the beginning of last summer, GoodData launched its new analytic engine AQE (Algebraic Query Engine). Its official product name is GoodDate XAE. However, since I believe that XAE is Chinese for “underfed chicken”, I will stick with AQE ☺. Since the first moment I saw it, I considered it a concept with the biggest added value. When Michael showed me AQE I immediately fell in love.

However, before we can truly reveal AQE and the benefits that can be derived from it, we need to begin with an understanding of it’s position in the market - starting from the foundation on which GoodData’s platform rests. In a three part series we’ll cover AQE’s impact on contextual data, delivering meaningful insights and finally digging for those hidden gems.

First, a bit more comprehensive introduction...

Any system with ambitions to visualize data needs some kind of mathematical device. For instance, if I choose sold items using the names of salespeople as my input and my goal is to find out the median of the salespersons’ turnover, somewhere in the background a total summation of the sold items per month (and per salesperson) must take place. Only after getting that result can we count the requested median. Notice the below graphic - the left table is the crude input with the right table being derived in the course of the process - most of the time we don’t even realize that these inter-outputs keep arising. Within the right table, we can quickly calculate the best salesperson of the month, the average salesperson/median and so on…

And how does this stack up against the competition?

If we don’t have a robust analytic backend, we cannot have the freedom to do whatever we want to. We have to tie our users to some already prepared “vertical analysis“ (churn analysis of the e-shop’s customers, RFM segmentation, cohorts of subscriptions, etc…). Fiddling with the data is possible in many ways. Besides GoodData, you can find tools such as Birst, Domo, Klipfolio, RJMetrics, Jaspersoft, Pentaho and many, many others. They look really cool and I have worked with some of them before! A lonely data analyst can also reach for R, SPSS, RapidMiner, Weka and other tools. However, these are not BI tools.

Most of the aforementioned BI tools do not have a sophisticated mathematical device. Therefore, it will simply allow you to count the data, calculate the frequency of components, find out the maximum, minimum and mean. The promo video of RJMetrics is a great example.

Can I just use a calculator instead?

Systems such as or solve the problem of an absentee mathematical device in a bit of a bluffy way. They offer their users several mathematical devices – just the same as Excel does. The main difference is that they can be used with separate tables, not with the whole data model. Someone may think that it does not matter but, quite the contrary – this is the pillar of anything connected to data analytics. I will try to explain why...

The border of our sandbox lays with the application of the law of conservation of “business energy”.

“If we don’t manage to earn our customer more money than our services (and GoodData license) cost him, he won’t collaborate with us.“

Say for example if we take the listing of invoices from SAP and draw a graph of growth, our customers will sack us from the offices. We need a little bit more. We need to put each data dimension into context (dimension = thematic data package usually presented by data table). Each dimension does not have to have any strictly defined linkages; the table in our analytics project is called dataset.

But how is it all connected?

The moment we give each dimension it’s linkage (parents, children … siblings?), we get a logical data model. A logical data model describes the “business” linkages and most of the time it is not identical with the technical model in which any kind of system saves it’s data. For example, if Mironet has its own e-shop, the database of the e-shop is optimized for the needs of the e-shop – not financial, sales and/or subscription analytics. The more the environment (of which we analyze the data) is complicated, the less similarities the technical and analytical data models have. A low structural similarity of the source data and the data we need to analytics, divides the other companies from GoodData.

A good example of this is our internal project. I chose the internal project because it contents the logical model we need only for ourselves. Therefore, it is not somehow artificially extended just because we know “the customer will pay for it anyway”.

We upload different kinds of tables into GoodData. These tables are connected through linkages. The linkages define the logical model; the logical model then defines “what can we do with the data”. Our internal project serves to measure our own activity and it connects the data from the Czech accounting system (Pohoda), Canadian accounting system (QuickBooks), the cloud application and some Google Drive documents. In total, our internal project has 18 datasets and 4 date dimensions.

The first image (below) is a general model, select the arrow in the left corner to see what a more detailed model looks like.

In the detailed view (2 of 2), note that the name of the client is marked with red, the name of our analyst is marked with black and the worked hours are marked with blue. What I want to show here is that each individual piece of information is widely spread throughout the project. Thanks to the linkages, GoodData knows what makes sense altogether.


Using business-driven thinking to force your data to comply to your business model (rather than the other way around) will allow you to report on meaningful and actionable insights. Part 2 of the following series on AQE (...or more formally XAE) will uncover the translation of the Logical Data Model into the GoodData environment.

For next part (2/3), continue here

Aggregation in MongoDB, Oracle, Redshift, BigQuery, VoltDB, Vertica, Elasticsearch, GoodData, Postgres and MySQL

"Executive Summary"

It kinda got out of my hands. It exploded...

I’ve been trying to describe how to do the same procedure within different systems. In the end, I tried to do the same using GoodData and Keboola Connection, and I have attached a screenshot (see bellow). I know it’s more high-level than just the database, but I believe it shows the beauty and speed of the tool.

Summary table:


At the end of last year I found this blogpost: "MongoDB 'Lightning Fast Aggregation' Challenged with Oracle”.  Lukas Eder did the same aggregation, using Oracle, as Vlad Mihalcea had done a week earlier with MongoDB.

Lukas Edler gave me the source data — 50,000,000 lines of events with time and value:

Below, you’ll find 10,000 lines to get you started, as well as the whole data set. Both have headers, are without enclosures and use commas as delimiter-separated values (until mid-September you can download from my S3; then it will go to Glacier):

Two tests are being done:

  • Test A - perform the aggregation in years, and days within the year, where the number of entries, the daily average value and the maximum value are recorded
  • Test B - exactly the same as Test A, but  with an hour filter applied
I’ve tried to replicate the conditions as much as possible, using Amazon Redshift, Google BigQuery, VoltDB, HP Vertica, Elasticsearch, GoodData, Postgres and MySQL. The purpose is not really to find who is the fastest; that’s why I don’t rely on having exactly the same conditions.  To be exact, Google BigQuery is “unknown hw” so it wouldn’t be possible anyway. I’m more interested in how difficult — and how easy — it is to get the same result when using these different platforms. I have tasked Redshift with doing 10x the same - 500.000.000 lines, but these are 10x repetitions of the same data set. In the GoodData example I’ve added some complications, so you can see how easy it is to work with. 

And here are the results: 


(for details se Vlada’s blog — link above)

Test A: 129s
Test B: 0.2s


(for details see Lukas’s blog — link above)

Test A: 32s
Test B: 20s first run, 0.5s second run


I’ve uploaded the data to Redshift from S3. I had to do these steps before running the test:
  1. create a table
  2. import the data
  3. the system couldn’t recognise the time format ISO8601, so I had to alternate the table
    1. add a column for timestamp
    2. set it up according to time stamp in the original data source
    3. delete the original column with date
  4. I have added SortKey, and did the ANALYZE and VACUUM commands

(here you’ll find the exact list of SQL queries and times)

Test A

Exact query used:
     EXTRACT(YEAR FROM created_at),
     EXTRACT(dayofyear FROM created_at),
FROM RandomData
     EXTRACT(YEAR FROM created_at),
     EXTRACT(dayofyear FROM created_at)
     EXTRACT(YEAR FROM created_at),
     EXTRACT(dayofyear FROM created_at);
Redshift dw1.xlarge (15s)
Redshift dw2.large (7s)

500.000.000 lines version:

Redshift dw1.xlarge (182s)
Redshift dw2.large (53s)


Test B

Used query:
     EXTRACT(YEAR FROM created_at),
     EXTRACT(DAYOFYEAR FROM created_at),
     EXTRACT(HOUR FROM created_at),
FROM RandomData
     created_at BETWEEN
     TIMESTAMP '2012-07-16 00:00:00'
     TIMESTAMP '2012-07-16 01:00:00'
     EXTRACT(YEAR FROM created_at),
     EXTRACT(dayofyear FROM created_at),
     EXTRACT(HOUR FROM created_at)
     EXTRACT(YEAR FROM created_at),
     EXTRACT(dayofyear FROM created_at),
     EXTRACT(HOUR FROM created_at);
Redshift dw1.xlarge (1.3s)(first run)
Redshift dw1.xlarge (0.27s)(second run)

500.000.000 lines version:

Redshift dw1.xlarge (46s)(first run)
Redshift dw1.xlarge (0.61s)(second run)


And the query plan, for those who love it :)

Google BigQuery:

You can communicate with BigQuery using REST API  or web interface. I’ve used the client inside the console running on the server. I had to download sample data onto my drive.

Necessary steps before the actual query:

  1. Create a project via and under API&auth authorize BigQuery (credit card has to be inserted)
  2. Create a “project” in BigQuery
  3. Import the data (this took a while, although I didn’t record the exact time)

You can’t really tweak BigQuery too much. There are no indexes, keys, etc.

Project set up from console:
./bq mk rad.randomData

Data import:
./bq load --noallow_quoted_newlines --max_bad_records 500 --skip_leading_rows=1 rad.randomData ./randomData.csv created_on:timestamp,value
-- ~1500s (pity I haven't exact time)

Test A

The query:

  YEAR(created_on) AS Year,
  DAYOFYEAR(created_on) AS DayOfYear,
  COUNT(*) AS Count,
  AVG(value) AS Avg,
  MIN(value) AS Min,
  MAX(value) AS Max
FROM [rad.randomData]
  Year, DayOfYear
  Year, DayOfYear;

-- 7s (1.3GB)

Test B

The query:

  YEAR(created_on) AS Year,
  DAYOFYEAR(created_on) AS DayOfYear,
  HOUR(created_on) AS Hour,
  COUNT(*) AS Count,
  AVG(value) AS Avg,
  MIN(value) AS Min,
  MAX(value) AS Max
FROM [rad.randomData]
WHERE created_on >= '2012-07-16 00:00:00' AND created_on <= '2012-07-16 01:00:00'
  Year, DayOfYear, Hour
  Year, DayOfYear, Hour;

-- 2.9s / 1.6s (cached)


I stumbled upon this database by chance (Q4/2013). They make lots of claims, but I was surprised to find you can’t even extract the day of year from the time stamp; you’ve got to make a pre-processing and prepare the data for it.

My “tour” of the VoltDB ended with me calling their support where VoltDB Solution Engineer named Dheeraj Remella tried to help me (he was excellent!) and promised he would do the test for me. The actual email exchange took quite a while.

Meanwhile, they managed to release version 4.0, which includes EXTRACT() function. Results follow:

Data import:
Read 50000001 rows from file and successfully inserted 50.000.000 rows (final)
Elapsed time: 1735.586 seconds

Test A:

     EXTRACT(YEAR FROM create_on_ts) AS Year,
     EXTRACT(DAY_OF_YEAR FROM create_on_ts) AS DayOfYear,
     COUNT(*) as groupCount,
     SUM(value) as totalValue,
     MIN(value) as minimumValue,
     MAX(value) as maximumValue
FROM RandomData
     EXTRACT(YEAR FROM create_on_ts),
     EXTRACT(DAY_OF_YEAR FROM create_on_ts);

-- 70ms

Test B:

-- 330 ms

Times are great!  Starting to find out if he didn’t pre-cached Year, Day, Hour...

UPDATE: Yes, in his test DB, he pre-computed date attributes. Here is his DDL:

The hardware used: MacBook Pro (Intel i5 2.5 GHz processor - 2 cores, Memory 16 GB).

VoltDB looks very interesting. I was just little bit curious that, for instance, you set up a database by using binary client in terminal, and it runs some things in Java code:
voltdb compile -o random.jar random.sql
voltdb create catalog random.jar
csvloader randomdata -f randomData10.csv --skip 1

It’s not my cup of tea for now, but I will be watching — in time it might be cool!

HP Vertica

When Jan Císař ran this test for me, Vertica was quite a challenge. It’s changed dramatically since then, and is now quite a nice tool.

     created_on TIMESTAMP,
     value DECIMAL(22,20)

COPY RandomData_T from '/tmp/randomData.csv' delimiter ',' null as '' enclosed by '"' exceptions '/tmp/load.err';
Time: First fetch (1 row): 121s. All rows formatted: 121s

#warming up ... :)
SELECT * FROM RandomData_T LIMIT 10;

Test A

     EXTRACT(YEAR FROM created_on),
     EXTRACT(doy FROM created_on),
FROM RandomData_T
     EXTRACT(YEAR FROM created_on),
     EXTRACT(doy FROM created_on)
     EXTRACT(YEAR FROM created_on),
     EXTRACT(doy FROM created_on);

--  Time: First fetch (366 rows): 2068ms

Test B

     EXTRACT(YEAR FROM created_on),
     EXTRACT(doy FROM created_on),
     EXTRACT(HOUR FROM created_on),
FROM RandomData_T
     created_on BETWEEN
     TIMESTAMP '2012-07-16 00:00:00'
     TIMESTAMP '2012-07-16 01:00:00'
     EXTRACT(YEAR FROM created_on),
     EXTRACT(doy FROM created_on),
     EXTRACT(HOUR FROM created_on)
     EXTRACT(YEAR FROM created_on),
     EXTRACT(doy FROM created_on),
     EXTRACT(HOUR FROM created_on);

--   Time: First fetch (2 rows): 26ms

Jan used AWS instance type 4xlarge, approximately 30GB of RAM, SSD disks and 4x Intel Xeon E5-2680


Around New Year I put a teaser about these tests on my Facebook, and Karel Minarik called me to say he would do this test in ElasticSearch. I was very excited, and here is the result. Executive Summary — it’s fast as hell! It’s quite complicated to get to the point when you can actually query, and it takes a while to import.  For me it is actually too difficult to achieve it thanks to lots of Ruby code. 

Karmi's results are here.

Pure aggregation: 16.3s
Aggregation + filter: 10ms


Sure, it does not try to tackle the milliseconds differences, but there is no match for its ease-of-use when producing results.

I uploaded data from  S3 into Keboola Connection (about as difficult as sending an email with an attachment). I told  the Keboola Connection how to import it into GoodData. The preparation inside the GoodData project is very simple — I’ve appointed the first column as date (with time) and the second to be a number.

Click the “Upload Table”  and Keboola Connection prepares everything else. It creates a physical and logical data model inside GoodData, and it parses and exports data into a format that GoodData imports. To get you excited, I have included the link with the communication log with GoodData API.  All of this is hidden behind one button for the end user, or one API call.

We deliver the demonstration and GoodData crawls through it, preparing the data accordingly and importing it into a BI project. I advise everyone to think about it as a compact table that you see on your end — the input.  Although all of this is actually placed into the columns which link to each other and appear like “snow flakes”.  Load data (after parsing it inside Keboola Connection the table has approximately 4GB) takes around one hour.


For the first test — aggregation over 50 million lines — you need to create four metrics:
  1. counts the number of records [ COUNT(Records of randomData) ]
  2. calculates the average value [ AVG(value) ]
  3. calculates the minimal value [ MIN(value) ]
  4. calculates the maximum value [ MAX(value) ]

and you "look at them ” through the year, and the day in the year.

To conclude the second test you just add the filters.

Both of the tests are in the screencast below. The footage is not edited on YouTube, and you can see how instinctive and fast it is. Yes, it’s possible to measure the time it takes to calculate the report, but that’s not the point.  The point is to show how EASY it is compared to other approaches.

I thought I could make it a bit more complicated and show you how to create "By how many % did the aggregated count of records change day by day". Again, here you can find the unedited video: 

GoodData offers you a link with every report “explain” v DB" which shows what needs to be done for the actual query. When you chart it, it looks like this:


Jan Winkler did Test A and B on PostreSQL (9.3.4). Details are here:
  • Import tooks about 2 minutes
  • Test A (no indexes, no vacuum): 33.55 s
  • Test B (no indexes): 4.5 s
  • Test B (with indexes on created_on column, first run): 0.06 s
  • Test B (with indexes on created_on column, second run): 0.015 s


Finally, I did same tests in MySQL. Server has 64GB RAM, SSD discs...

Test A

FROM `randomData`

-- 46sec

Most of the time was used for creating the tmp table:

Test B

FROM `randomData`
     `created_on` BETWEEN '2012-07-16 00:00:00' AND '2012-07-16 01:00:00'
GROUP BY 1,2,3
ORDER BY 1,2,3;

-- 0.022sec
Summary to MySQL: it's optimizer is crap :) Hynek Vychodil did get down Test A to 33sec (

A final few words ….

Importing the data always poses different degrees of difficulty, so I don’t include them in my evaluation. But I can say the final aggregation of data for the end user has always been very fast. 

If you know how to use SQL and you just need a few queries, BigQuery should be your choice.  

If you want to ask 'zillions' of business questions and not take care of your own DB cluster, then Amazon Redshift is great.

If your data doesn’t have a very firm data structure, ElasticSearch is perfect (Karmi told me there is a guy from Germany who pours more than 1TB of data into Elastic every day and he has no problem with speed).

If you want to process the data and/or you have a more complicated query structure, and somebody will be asking lots of business questions, than I believe you should go with Keboola Connection and GoodData.

Why is GoodData special?

Today's world is oversaturated with data. Telling stories through data is beginning to be so sexy, that many people are building their career on it. A few semi-experts in the Czech Republic have even changed their colours and started talking about BigData (in a worst-case scenario, they also hold conferences on this topic). However, I'll save this topic for a future blog post, in which I'll ground their Hadoop enthusiasm a bit for you.


People want to know more about the environment in which they operate. It helps them make better decisions, which usually leads to a competitive advantage. Generally, for good decision making we need a combination of three things: proper input parameters (information / data), common sense / experience, and a modicum of luck. However, an idiot will still be stupid and although luck can be occasionally bought in the Czech Republic, there's the threat of being arrested for bribery. Hence why information remains the most influenceable component of success. At my playground, the correct information serves as answers to your most penetrating questions you can think of.

I assume that each of you knows how much money you have in your personal bank account. Most of us will also know how much money we spend per month. Fewer of you will know exactly what it was for. An even smaller group of people will know the structure of all the pleasant cups of coffee, ice cream, wine, lunches and so on (we call it long tail). I would bet that almost no one knows one´s personal annual trend in the cost structure of such long tail. You’ll probably argue that you don't care. If you're a company that wants to succeed, you can't do without such information. As for one's own personal life, the biggest nutcase, as I see it, is Stephen Wolfram, who has been measuring almost everything since 1990. He wrote about almost everything except lint from his bellybutton (unlike Graham Barker :)

Because there’s nothing about the executive summary of your accounting on CRM, Google Analytics or social networks on TV after the evening news, you're forced to build different variants of reports and dashboards yourselves.

I'll try to summarize the tools which I know are available; but in the end I'll tell you that it's all just a toy gun, and whomever wants a proper data gun must reach for GoodData. To be fair, I'll do my best and argue a bit :)


Today, Excel is on every corner. It's a good helper, but quite a lot of people have a strange tendency to make Excel Engineers of themselves, which is the most dangerous expertise you can come across. The Excel Engineer often ends with a contingency table and SUMIF() formula. At the same time, business data processing is interconnected with him and, perhaps unwittingly, he's becoming a brake on progress. The biggest risks of reporting in Excel, in my opinion, are as follows:

  1. The primary data, from which reports are made, are stored in Excel; these data were imported by someone into Excel at some point - with both a poor/expensive possibility of updating
  2. Excel sheets tend to travel around Corporate Outlooks, leading to different versions. It often comes in handy to change YTD% a little bit, or it may easily happen that another department has the same Excel, but with different numbers - it undermines confidence in the reports and can easily allow for distortion of reality
  3. Complicated reports must be created by the reporting department (only they know how to update the data - see point 1), where Excel experts provide answers to business questions they do not always understand. Therefore, it often happens that the ad-hoc responses to your ad-hoc hypotheses have been forming for days (submitter burnout occurs)
  4. The combination of manual operations and macros made by the one that doesn't work here anymore, introduces errors to Excel, thanks to which the cosmoverse then collapses!

It's probably obvious that Excel reporting should end at the level of a sole trader. Nothing reliable can be created with it efficiently. You can be sure that the Excels on ZEE (network drive, of course!) contain errors, are not up-to-date and were made ​​by people who were assigned to it by someone else, so they knew damn all about the nature of the data they involved into the VLOOKUP! Excel Engineers usually don't have it in their genes to do data discovery, and even if they came across something interesting, they probably wouldn't notice. You will know best what the correct information is at that very moment (and Excel isn't really what you should master in 2013 at the level of VBS macros and dirty hacks)!


Today's market is oversaturated with tools that aim to help you visualize some kind of business information. Imagine business information as the number of orders per today, the net margin for the last hour, the average profit per user, etc. In the majority of cases it works in the following way: you calculate this information on your side and send it automatically via an interface to a service that ensures the given metric is presented. Examples of such services are Mixpanel, KissMetrics, StatHat, GeckoBoard and even KlipFolio. The advantage compared to Excel lies especially in the fact that the reports and dashboards can be easily automated and then shared. Information sharing is quite underrated! An example of such information could be the number of data transformations that are executed in minute granularity at our staging layer:

You can build Dashboards from these reports, and for a while you well feel good. The problem occurs when you find out that any extension of such a dashboard requires intervention from your programmers, and the more complex your questions are, the more complicated the intervention. If you operate in B2C and have transactional data, you can be sure that the clinical death of this form of reporting will be, for example, a query on the number of customers with time that spent at least 20% more than the average order for the previous quarter, and all of whom at the same time bought an ABC product this month for the first time. If your programmers, by luck, manage to implement it, they'll shoot their heads off once you add that you want daily numbers of TOP 10 customers from each city who meet the previous rule. If you have just a few more transactions, it will mean remaking your existing DB on your side, which will eventually lead to a 100% collapse. Even if you try to make it survive at all costs, you can be sure that you won't slay the competition thanks to that zero flexibility - you won’t even be able to gently take the analytical helm because the market will pivot around you.

It's possible you don't have similar questions about your business, and it doesn't bother you. The cruel truth is that your competition is asking about it right now, and you will have to somehow respond to it...

Pseudo BI

Neither Excel nor visualization tools usually have any sophisticated back-end, which applies similarly to services like Domo or Jolicharts. They look super sexy at first glance, but inside is a masked set of visualization tools, sometimes coated with a few statistical features that you mostly won’t use. The common denominator is the absence of a language with which you could step out of the predefined dashboards, and begin to implement similar services so that they were to your benefit.

Their only advantage is that they can be quickly implemented. Unfortunately, that's it, and after a short intoxication period, sobriety sets in. If by chance you are a little bit more demanding, you haven't got a chance for a very happy life.  

Low Level Approach

There are services that allow you to upload data and raise queries. As I see it, nowadays the hottest is Google BigQuery. For us at Keboola, it's a tremendous help with data transformation, denormalization and JOINs of huge tables. It can serve you well if it seems like a good idea to you to write the following: get this...:

It's evident that if you don't make a living as an SQL consultant and don't have any ambition to create your own analytical service, you’d better leave this approach to nerds (like us!) and attend to your own business :)

Cloud BI

If you google cloud BI, Google will return names like Birst, GoodData, Indicee, Jaspersoft, Microstrategy, Pentaho, etc. (if you have Zoho Reports among the results, the universe got crazy because that should have remained in Asia :).

From many trends, it's obvious that the Cloud moves the world of today. In the Czech Republic, the most common concern about the concept is a worry about the data and the feeling that my IT can do something better than that of the vendor. If you feel the same concerns, you should know that when any troubles arise in the Cloud, the best people available on this planet are working on it immediately, so that everything will again run like clockwork. Dave Girouard (coincidentally also a board member of GoodData) summed it up nicely in this article.

Except for Microstrategy, which probably discovered the Cloud this morning, the above-mentioned brands are relatively established within the Cloud. However, there are different surprises hiding under the lid. Pentaho requires highly technical knowledge so that you can make the most of it. Jaspersoft is Excel on the web that, in short, failed. Indicee would like to play in the major leagues, but I know at least one large customer from Vancouver who, after trying to implement their solutions for a year, moved to GoodData. When I tried Birsta it was all in Flash, and despite my enormous effort I really didn't understand it :(

As I said in the beginning, everything except GoodData sucks. There are several reasons for this:

  1. GoodData has a powerful language for the definition of metrics. With this language, it's possible for anyone to generate reports, no matter how complicated. The fact that these reports are created not only by clicking is more than essential - it gives you the flexibility you'll need to fight for first place against your competition. If GoodData satisfies Tomáš Čupr (ex-Slevomat, DámeJí, you can be sure it will suit you as well. Also constructs which are complex at first glance can be quickly learned at the Keboola Academy.
  2. GoodData, unlike its competitors, has fundamentally designed API interface to enable companies such as Keboola to bend the whole analytical platform so that it plays first violin in your environment. Seamless integration with other information systems, white-labeling, single-sign-on and a framework for data extraction and transformation means that there are no compromises during the implementation.
  3. GoodData aren't just reports in a web browser but an entire set of abstractly separated functional layers (from a physical model representing the data up to a logical model representing the business relationship), thanks to which the implementation doesn't include things like, for example, a feasibility study or technical specification. In comparison with the competition, GoodData can be implemented with tremendous speed (no projects for months).
  4. GoodData has a phantom lab in Brno where R&D is taking place, the output of which are innovations which I'm not sure I can make public today. Nevertheless, I can honestly say that the others will soon shit their pants from it. I'll definitely add it here in time!

All in all, the quality of GoodData shows, among other things, a lot of connections, such as (the biggest service to support customers in the world). The ability of such flexibility is, from my point of view, absolutely essential for future success. Any one of you can rent high-performance servers, design super-cool UI or program specific statistical functions (or perhaps borrow them from Google BigQuery), but in the foreseeable future no one will come out with a comprehensive concept that makes sense and is applicable to small dashboards (we have a client who uses GoodData to look at some data from Facebook Insights) as well as gigantic projects with a six-digit $ budget for just the first implementation phase.

GoodData Rocks! 



Keboola Stats on Pebble

BI at your fingertips

In 2014 it’s already passe to have your dashboard behind two firewalls and two-factor authorization, full of information with various levels of importance. What our customers need in today’s fast world is literally have the most critical information at their fingertips - and not even an iPad dashboard can fulfill this promise with the expected level of convenience.


We believe that everyone of us, regardless of job or interest, has their ONE number. The one number that captures the essence of what you really do. For instance, as a salesperson you may be very easily motivated if you see what you will make on commissions and how you compare to the rest of the team. The CEO needs one number from the CFO… this immediate feedback loop is crucial for understanding and connecting actions and outcomes. Alternately, if you’re blogger, the number of followers and/or comments is what gets you out of the bed and to the keyboard each day.

In the time it takes you to reach for your iPad, another event in your business has taken place.  … And now the only thing you need to do is look at your watch.

There’s pretty short distance from your fingertips to your wrist, making the smartwatch an obvious choice. We choose the pioneer, Pebble, as our jumping point into the “wearables” movement.

Using Pebble, Keboola Stats connects with your data to deliver business insights on the go. With a delivery speed of up to 1x/10 second, we’re serious when we say “real time updates”.

How does it work?

Simple. Just three easy steps:
  1. If you don’t have it already … install the Pebble app into your phone (Android, iOS)

  2. Into that, you will install the Keboola Stats app.

  3. Finally, enter the token generated by our “Pebble Writer.”

… Oh, and it does help to have the Pebble watch.

Ok, show me what can I do ...

Keboola Stats can show you a dashboard with your top 2 numbers and their % changes in real time. So, what if you wanted to see your actual revenue from midnight (until now) and see the % change compared to (the same point in time) yesterday? Or how about today’s order count and it’s % change from the beginning of the week?

You can answer these questions in a matter of seconds, putting you ahead of the game and making you a rockstar in your next morning meeting. A glance on your ONE number tells you what you need to do next - be it nothing, or be it looking at in detail what changed the number.

But that’s only one example. Pretty much anything that fits on the screen and derived from your data can be delivered there. We now have 33 data sources + an API ready to accept any type of data from our clients - we routinely process everything from social data to POS transactions to support tickets.

So what’s next?

We have the full stack on back end - data collection, analyses and API. Now we’re ready to roll it out to LG, Samsung and Motorola :)

If you’re as psyched about this as we are, you can thank Tomas Kacur for making it all happen. Oh and Martin Karasek for snapping out the Pebble Store icons in record time - something like 45sec ? :)

If you already have your data in our care, tell us the numbers you want to see and we’re pretty much done. We will agree with you on the frequency of updates depending on the context (no point in frequently updating a number that in its nature changes slowly).

If you’re new to our services, let us know and let’s talk about how to get to your data in the most sensible way. GoodData clients have an advantage because we can connect directly to a report within the platform.

P.S.  For the tech savvy, the phone Pebble app (JS) and the app for your watch (Vanilla C) are published as an OpenSource. You can get it from our GitHub (backend API is in

The Beginner’s Guide To Keboola III: We ♥ Your Third Party Data Sources

You’re certainly using them, you probably like them, and perhaps they even help you save some money. However, you’ll find the real treasure of third party data sources the moment you interconnect them and find the answers to your business questions.

In this edition of the Beginner’s Guide you’ll find out how to use data from external services and databases to better understand your data. You will also begin to recognize the importance of getting to know your data (actually it’s time to become best friends), and how to ask the right questions to get the right results (or buckle up for one bumpy ride!).

What data sources does Keboola use?

The short answer…...lots. At Keboola, we are able to connect to most modern systems. We simply need to find the API and it’s ready, set, go.  We like to think of APIs as magical translators that allow programs to exchange data and thus make it more meaningful to you.

These are the 9 nominees for “most used source in a Keboola project” (in no particular order):

Although these are the most common, the potential for new sources is limitless (and that is why we love our dev team).

If it is readable, we can use any kind of data.

Along with service and applications connections via API, you can send us your data in almost any format. We are able to read data in everything from CSV to JSON to unstructured text in a notepad.

We can even go beyond text data and bring in pictures (bless the magic of OCR) if you so desire. The most important thing to remember when bringing data in is that it needs to be readable.

Once we have established the readability of your data we can start building out your project. Our process is generally top secret but usually involves locking ourselves in the office, utilizing only food delivery trucks for survival. We think through the logics of connection, carry out tests, and write documentation. We are then ready to upload your data and start building reports for your viewing pleasure.

Sounds great, except I have no idea where to start and what to do!

Don’t panic. Data can seem overwhelming but it is all about asking a few simple questions and then doing a few simple things.

Start by asking yourself some questions like:

  • What exactly do you want to assess?
  • How can data help you with that?
  • What indicators do you need to watch?
  • What information is missing from the tools you already have?

Next gather the data. 

For external sources begin investigating how information is communicated, the magical translators known as APIs are a great place to start. For internal sources just keep doing what you are doing and update the information you already have. If you haven’t started yet, think of ways to capture that internal information and initiate the process.

By doing some strategic thinking and then organizing your data you are well on your way to creating the right results. This process also helps to explain why more expensive data services are not necessarily better than those that are free. What matters most is the relevance of your data to answering your business questions.

It’s sort of like buying an s-class Mercedes for a ride through the rough and rocky Rubicon Trail. Arguably Mercedes makes one heck of a car, but if you don’t ask where you are going it might be a rather unpleasant ride for you and the car. That’s why it is important to ask questions first and then collect, collect, collect until you are able to cruise through to the right results.

We have to drive off into the sunset for now, but stayed tuned as we builds on this idea in our next article featuring an interview with Tomáš from Czech Keboola.

The Beginner’s Guide To Keboola II: How It Works

You already know that Keboola can process your data in such a way that it makes sense and that it is of value to you. This time we shall go a little deeper and show you how it is done in practice.

Let’s say you're the owner of a chain of coffee shops. You wish to expand your business and at the same time you would like to figure out where you are losing money. You have lots of data from your POS system and of course your accounting software.

This is where Keboola steps in.
  • Together you will identify and gather your KPIs - the parameters you want to monitor. Maybe the average spending by cafe and waiter. Or customer loyalty. Or anything else.
  • Together you can come up with reports you wish to follow. How they should look like and what they should compare.
  • You can start looking forward to a return on your investment.

Now it is time for the "IT stuff"

We will create for your data a model with a clear structure in the Keboola Connection tool. It is thanks to this model that later the whole system will tread quickly, flexibly and accurately. Using the model we will be able to find relationships between the data.

But the model wants to eat – the model wants to be fed data. Which, will come mostly from these four main sources:

  1. If you run your own database, we will connect to it remotely and process all the necessary data. 
  2. If your data is scattered in multiple systems or locations, we will tell you exactly how to connect the dots with our interface. 
  3. Do you wish to relate your data from cafés sales with your website traffic from Google Analytics data? Or with population using open data from your city hall in each city and neighbourhood? We can do it for you! 
  4. Historical data is not a problem either. (Yeah, we're talking about the 10 -year-old Excel sheet with sales data). All you have to do is keep its structure. 

A short wait for the first report

Once we have fed the model with data, we will send the processed data into an application called GoodData. After which, you almost immediately gain access to your reports. Rest assured that the first contact will feel a bit like magic.

Once you’ve had your first dose of satisfaction, we guarantee you that you will want more: "I do not want this report and I want that report to take weather into account." Ok. Post your requirements and wait for two months for a couple of days and then you are looking at your new reports.

Or even better - access our know-how in Keboola Academy to learn how to work the system and then you will be able to modify the reports yourself. After that no one will ever be able to tear you apart from your data. 

A boss with GoodData, who is lounging on a beach half a world away, knows more than any boss present at work without it.

Now, if you wish you can sit under a beach umbrella in Honolulu with a tablet and every five minutes you can check just how much money you are making. 

You will notice that the people who were served by Olivier never came back to your cafe.

You will see that customers in Vancouver are spending roughly twice as much as customers in Quebec, as you just launched an advertising campaign in there.

You will observe that when it rains your sales of pour over coffee rise sharply – unless the manager forgets to stock up on the filters.

You will clearly see how the purchasing behaviour of your customers changes in time, so you will spot new trends early to take the full advantage.

As you sip your Mai Tai slowly, you’ll then start to write your first email: "Mary, please order extra thin filters for our coffee machines and also tall glasses for Vancouver. It seems like there's a new fad..."

BI Dashboard Crisis

People were getting lost in data - so they created tools to help them. Since 1958, when  Hans Peter Luhn coined the term “Business Intelligence” until the end of the 80’s, the whole industry lived by terms such as data warehouses, OLAP cubes etc. In 1989 Howard Dresner defined BI as a “set of concepts and methods to improve business decision making by using fact-based support systems”.

Over the last half century, BI has been progressing until today, when it finds itself in a bit of crisis.

The Dashboard Crisis

We are overwhelmed by data - no longer the raw data - but rather the categorized, mathematically processed data represented in what we call “reports”.

Imagine that you have a large amount of data. You know that there is a lot of very interesting information in it. So, you take tools that pull all that data into one place, clean it, polish and present back to you - and you start looking at it (that’s what we do with Keboola & GoodData).

Over time, though, one can easily experience the following side effects:

  1. resignation / the “juicer syndrome”. You see (if you use the system passively) the same information in the data day after day. Inside the first few weeks, you drill into the data and look from all angles. As time follows, your focus falls away while you continually ingest more and more data you don’t need to see again (Avast Antivirus now has more than 200M users, they’ll still have more than 200M tomorrow, no one needs to be reminded of that daily). If you bought a new juicer, you probably drank nothing but fresh juice for a week or two, and since then the appliance has been collecting dust somewhere. Something very similar can easily happen in BI.
  2. drowning in data. If you have a good tool that allows you to drill into your data and you use it, you generate one report after another as you find more and more interesting answers. At one point you’ll have so many reports that you get lost.

Once you have hundreds of reports, all sorting, tagging or naming conventions stop working. You’ll get to the point when no-one will be able to find what they need. Instead of looking for existing reports, people will start building the same ones again and again. Your sales director knows, that there was a report “Margin estimate for the next 4 weeks based on sales managers’ estimate” somewhere, but it is harder to find it than to build it again (which speaks, in case of GoodData, volumes about the ease of its use).

What are the attempts for solutions?

  • Use of natural language - Microsoft is trying in it’s “Power BI” to understand queries asked in a similar matter to how we ask a search engine. In that case, natural language needs to be somehow connected to the semantic model leading to the data. It looks pretty (see the Power BI link), but Odin, my colleague, nailed it when he commented after reading one such article:

“I read it and IMHO it’s a bit of BS, because articles like that have been showing up regularly since the 50’s - saying that use of natural language is “almost here”. The best generic tools for interactive communication with a computer (asking the computer for something) is so far SQL, which was supposed to be so simple, that everyone can write a query as easy as a sentence. Time has shown that reality (and therefore also natural language) is so idiotically complex, that any language describing it needs to be also complex and you need to study for 5 years to master it (same as natural language).”

  • Use of visual interface between the system and a human - you can see that nicely on an example of BIRST. It’s a beautifully executed marketing video, but once the data model (a.k.a. the relationships between information) gets sufficiently complex, the interface stops working - it doesn’t understand what we want from it or controlling it gets so complicated, that its advantages are lost.

What are we doing about it?

It is important to take a bit of everything. It will remain critical that everyone has access to information they feel they need (to validate hypothesis, support their decision etc.). Apart from that the machines need to help a bit with sifting through the data - so you don’t have to generate hundreds of reports trying to find the golden nugget.

At Keboola we’ve been working on a system that is attempting to solve exactly that since the Summer of 2013. Today it is practically a complete set of functions, that can recognize the meaning of data (time, ID, number, attribute - we call that piece “data profiler”), relationships between data (for example it can figure out how to connect Google Analytics with CRM data) and afterwards run tests to identify “interesting moments”. For example it can discover seasonality in a particular segment of customers and point to it, without the need for an analyst to get the idea to try something like that out. Our system “guesses” where the data relates to a specific customer and if it finds something interesting, it will point it out. Ideally it by itself creates a report in GoodData filtered to the given situation.

As an example, for “on-line transaction” data types we have a set of tests that are looking for those interesting moments. One of these tests (working title “Wrong Order Test”) creates histograms of all combinations of facts (typically monetary values) and attributes (products / locations / months / user types etc. ) Among those it tests whether the counts of ID’s (such as orders) correlate with the values - if some attribute seems outside of “normal” in a particular situation, it’s a reason enough to bring it up with the business user.

This picture shows how for a specific time period and product (or a user group), the system identified that there is unexpected drop of profit for a particular payment method - “interesting behaviour”. Unless you somehow get the idea to test for precisely this situation and report setting, you have practically NO CHANCE to discover this. On top of that, the same anomaly may not present itself a week later, therefore you need continual detection.

Our goal is to periodically test the various data types sitting in Keboola and inform their owners of those interesting facts in the form of an automated dashboard within their GoodData projects. The last thing we need to do is to define how to configure the tests, as the true power lies in the interaction of various tests over the same data. Everything else - the data profiler, tests themselves, supporting R functions, API, infrastructure apod. is ready to go.

This way Keboola will not only help use data to find answers to your business questions, but also phrase new questions based on gems hidden in the data.

GoodData Open Analytics Platform - a Category of One

(originally published as a guest post on the GoodData blog)

In the world of big data and analytics, what is the definition of a platform? What belongs in the category and what doesn't? If Tableau is on the list, why not Excel? If Excel, why not Numbers or GoogleSheets? (Hey, that one's even cloud based!) The whole thing is somewhat silly to me. It is trying to compare the incomparable.

Over the years of Keboola's existence and focus on business intelligence, we've been closely monitoring the tools available. We are an independent company and while we partner with GoodData, our ultimate focus is to do what's best for our customers. There are many tools out there. Some mediocre, some brilliant in what they do. It never ceases to amaze me how can solutions built on Cognos, Microstrategy or Business Object cost so much while so little value seems to be actually delivered. Similarly how Domo took a simple dashboarding tool and by some serious marketing dollars made it appear almost like a BI product. Conversely looking at a product like Tableau, its visualizations are unparalleled. And yet simply put, nothing comes close to fulfilling our vision of BI Platform as well as GoodData does.

If you disagree, start asking questions - Which BI tools have a robust API that can connect to and push data from any data source? Do they allow you to filter data based on the user that is looking at it? Can you automatically build scripts that generate reports relevant to the current situation of your business? Does a BI tool allow you to analyze hundreds of millions of rows of data in seconds? Does it have a front end interface that anyone who came near a medium-complex spreadsheet and knows how to drag and drop can use? Does the platform allow you to build a product that you deploy to hundreds of customers by a touch of a button? And which tool allows you to do ALL these things? Right.

GoodData is more than a tool, it is a true open platform. For some this comes as news, for us at Keboola, it has always been that way. We have always treated GoodData as a platform.

True platform gives you tools and space at the same time. The tools allow you to do things, and the space to imagine and create new ways of doing. Your imagination, not the tool is the limit. We built, using GoodData itself, a training tool to teach people how to use GoodData called Keboola Academy. We built AI that modifies not only the data in the reports, but the dashboard layout of the dashboard to pinpoint what is important. We completely integrated with GoodData so deployment of dashboards and analytics over our own business data warehouse product is seamless and largely automatic. We built whole data products, deeply embedded into our customers' interfaces, all using an open analytics platform called GoodData.

Keboola is about helping companies make more money using data. Whether it is for internal reporting and analytics, or to create new revenue streams by monetizing data-as-a-product, GoodData gave us the freedom to build amazing things and continuously grow our business (so far 200% or more year over year) and that is why I consider it the only true BI platform on the market today. "BI Platforms" is a category of one.

Seznam's Return on BI, Part Two

No one has been fired because of the data yet – Michal Buzek reveals the backstage of Seznam’s Business Intelligence

When we last spoke with Michal Buzek, we uncovered that Seznam’s investment in business intelligence paid itself off more than 10 times in only three months. However, we wanted to dig deeper into the specific changes GoodData made in the work of business teams, how it affects the running of their company and how Seznam will leverage data to evolve.

Does GoodData give you any answers out of scope of the regular reports?

What exactly do you mean by regular reports? The fact is our salesmen know their clients much better thanks to GoodData and this, in my opinion, is priceless. Today we know that our client invests tens of thousands with us, but at the same time spends millions on billboards. We therefore work with the seasonality, and we are approaching people according to the branches of their business. We can also analyze the portfolios of businessmen and business teams, and find clients who we can provide with better care. We were able to do all this before (and we did) but it’s incomparably easier and quicker today with the use of GoodData.

I think the key is not one all-encompassing report that opens our eyes – it is having a set of practical reports that are always on hand and are used to help in our everyday work.

Are the business people easily getting used to it?

A few times in Seznam I have heard that a business person is supposed to do business - not rummage in tables. That is why we debugged the reports and made sure the graphs are clear and understandable at first glance. Thanks to that, GoodData became an everyday tool for a business person, a tool that they use before every meeting with a client. This allows them to find out the client’s media strategy, advertising development, customs, and many other types of data before the meeting even begins.

Has analytics ever helped those business people that you thought wouldn’t have been as receptive to the technology?

Some have discovered GoodData already, others still have ways to go. But this year I have noticed a greater hunger for data among the business people. From some of them, I  absolutely haven‘t expected that. It‘s a pleasure to see that GoodData is not just a tool for the top and senior management. All employees can profit from using this tool and gain useful information from GoodData.

Have you let anyone go based on the data?

Not yet, or not that I know of.  But I must say, the business projects in GoodData allow the managers to control their people much more effectively.

Can you be more specific?

The manager is able to see the content of contracted advertisements and the number of visits of every single client. If for example, the client contracted orders of 50k and the business people came to visit 15 times, it tells you something is probably wrong. Conversely, it’s also a mistake when you see a client with advertisement costs of over 2 million and in the history of communication you can only see one record. Either the business person is extremely effective or he doesn‘t care about completing the records from the meetings. 

Do the managers use this information or are they just aware of it?

They use it. Based on detailed statistics they are able to talk to their people in a much more specific way. You can see immediately where a business person is wasting their time and what they should be focusing on instead. Of course you can’t exaggerate these statistics, but I hear a lot of positive feedback from the managers.

Has data analytics changed how Seznam functions?

Most of the people in our company know that the only correct and important numbers are found in GoodData. The top management gather the information necessary to manage the whole firm, the Product Managers follow KPIs and task fulfillment, the Controller sees the expenses and incomes, the Business Managers follow the results of their teams and clients, and the PR department see the statistics from social networks etc.

What about people from other departments? Do they contact you with report demands?

Yes, they do. It’s great that the grapevine works in Seznam. For example, last month the guys from Sklik came to me, as well as the Manager of Mobile Advertisement, and  the Planner of Media Space. They saw some useful reports that other people had and wanted to make their own. They saw that it could actually help them with their work.

How many people use GoodData at your company?

About 75, with 50 of those working in sales. Some managers even use GoodData to make XLS or PDF documents of which they share with others; so in the end there are many more people working with GoodData.

Have the outputs from GoodData found their way to your office meetings?

In the meetings of top management, reports from GoodData have become a standard for the past couple years. We have used these reports to review expenses, incomes, the fulfillment of indicators compared to plan along with other factors. At the meetings of business teams, they mostly discuss reports with advertisement monitoring, the history of communication, and the advertisement outputs in concrete advertisement products. Also, people from services, sales, and marketing regularly meet and watch their KPIs in Seznam.

What do you see as the biggest profit from GoodData/Keboola solutions?

I have mentioned many profits already, so I’ll share with you some of the other added benefits. The greatest pleasure for me personally, is when I see the people in company actually using GoodData and Keboola. I get to see them absolutely excited about the dashboards that we‘ve made for them, even though we are just showing them the data we have always had in Seznam. Before it just wasn’t so easily accessible and now it is all in one place. There also wasn’t the possibility of examining information in the detail there is now.

Is there any report that you prefer?

I think the strength is in its simplicity. The business people prefer that the tables are loaded with useful information. For example, rows show information on clients, columns show information of their price list expenses at Seznam and outside Seznam. The business people are thrilled when they’re able to find the place where our client advertised and exactly what they promoted all in one simple click.

I personally like the tab with two flow charts. For instance, on the left this tab shows the extent of a client‘s investments into advertising by their price list from the database of advertisement monitoring. The one on the right one shows the extent in real prices from the sales system of Seznam. 

You can therefore see things like this client advertised heavily in 2013 – but not at Seznam. Thanks to the data that fact was uncovered and our business people managed to put Seznam back in the game in October.

So what’s next? Have you got any ideas where to go next with this whole thing?

My goal is to help Seznam build extremely efficient business teams. As our sales director once said – “we should be able to send our business people to the right clients, in the right time, and with the right offer”.

How can data help that goal in your opinion?

I want the business people to see themselves in GoodData. Based on the data I want them to see what clients they should approach first. Expanding by more data sources would also help with this, as well as getting more detailed client segmentation. What’s important with all this is education. Right now we are preparing a workshop for business managers, so they can get the most information from GoodData for their work.

Do you think you‘d manage to work without GoodData and Keboola Connection today?

Well if you offered me a tool equipped at least as well as GoodData, then I would.  But I’m a realist. If there was anything better in the market, I would be aware of it. But certainly, I wouldn’t want to return to the Excel and Access times.

Do you feel you are more beneficial to your company thanks to data analytics?

That’s a very difficult question. I try to be that way. But the benefit doesn’t depend entirely on me or the guys who work with GoodData. What’s important is the support of the data from the sales department and from the whole company. In other words – it’s important how data is acquired and used in practice. I think that we have been able to convince more people in Seznam about the value of data. And hopefully, thats how the influence of the analytics department grows.

You’re not getting out of it so easily – has your value on the labour market grown?

When it’s needed, I’ll put it in my CV. I have been here since the very beginning of implementing GoodData into Seznam. I’m able to arrange datasets and build a data model in Keboola Connection. I’m no star, but I have gathered some experience and even if I didn’t need it anymore, the last two years with GoodData and Keboola were really fun for me.

Would you go into the whole thing again?

Definitely. If I ever finish at Seznam and some other company wants me to go through their data, I’ll totally go across Karlín in Prague.

Seznam's Return on BI, Part One

Seznam's Return on BI, Part One

The investment in Business Intelligence returned 10 times in three months, says Michal Buzek, the chief analyst of Seznam

Czech’s biggest web portal,, has not only built a search engine to rival Google, but has also founded an empire of prospering services. From an email platform to a growing network of contextual advertisements (Sklik), Seznam has excelled at building a portfolio of complementary business ventures.

So how does this giant with thousands of employees, manage and understand the infinite amounts of data at their disposal? One word, GoodData. We sat down with Seznam’s Head Analyst, Michal Buzek, to dig deeper into this trade secret. The following two-part interview will uncover how an investment in their data has payed off by more than 5 million in profit.

How did it all start?

Some time around 2009, a decision was made to implement a Business Intelligence tool and an open competition took place. The former CEO of Seznam, Pavel Zima (now a deputy chairman of the managing board), invited GoodData to bid. At the time, I was part of the team that compared offers and provided recommendations to management. We met with Zdeněk Svoboda (co-founder of Good Data) a few times and he showed us GoodData’s capabilities using a sample of our business data from Compared to on-premise licensed BI tools, GoodData was extremely simple and quick; and on top of that, Mr. Svoboda was very smooth and natural in selling it to us. 

Why exactly did you search for Business Intelligence tool?

We needed to escape from Excel – when everybody was bringing their own report to a conference, and the quality of the data was unsteady. What’s more, we had about four different business systems at that time. Long story short, we were looking for an integrated reporting tool that would allow us to get all the data we needed under one unified dashboard. 

And how did you encounter Keboola?

We’d been using GoodData for about two years, but didn’t launch any big actions in that area. From time to time we asked for a modification of the data model, but it wasn’t until our PR department found out that GoodData had the capabilities to interact with social networks that we were introduced to Keboola. We were told that they had developed the best connectors of Facebook and Twitter data for integration with GoodData.  

What was your first impression?

Finally someone who resembles the types of people you can find here at Seznam. No suits.

So it all started with the project of getting the data from social networks?

Yes, but I had actually wanted to try something new in GoodData even before that. I wanted to expand the data models and play with other views on our data to see whether I’ll get someone else excited as well. I also wanted to accelerate additions of new items without the need to consult with GoodData each time.

Meanwhile, Keboola came and showed me some ways to improve the Dashboard in GoodData and also had their own tool, Keboola Connection. I won’t lie – I also read Tomáš Čupr’s (well-known Czech businessman, the founder of the most successful variation of Groupon – post about the way they changed his life.  

So what happened next?

In March 2013 we started building a new project for the sales department. I wanted to give the salesmen a fundamental reason to use GoodData. We’ve been buying market research data for some years by that time – specifically looking at expenditures on big-format advertisement - but we haven’t had a chance yet to maximize its potential.

Recently, we connected this third-party data with our business system. In doing so, the knowledge of our current and potential clientbase shifted about five levels ahead. We gave our salesmen a simple tool to trend what types of advertising is purchased, how often and where from, so that they have a better understanding of the buying behaviour of our clients. This took us not one, but five levels beyond what we had before. 

Was it difficult to learn how to work with Keboola Connection?

No, there wasn’t much extra to learn. The data transformations in Keboola Connection are written in SQL, which already has been used by our team. I personally got the hang of it after a few weeks. My favourite toy is Sandbox, a “training environment” in which I can send input tables and play with questions long enough to get the appropriate result. 

What have you already managed to create?

The sales department of Seznam is quite big and the teams are diverse, so the demands for the statistics are varying. People from Sklik need one kind of report, the team specializing in serving large clients needs another. This is why we are continuously developing the project and we cannot just set things up once to be done with it. That being said, I have yet to see an inquiry that we couldn’t solve with Keboola Connection’s help. 

And what specific projects have you launched?

In GoodData we have taken on several projects, beginning with the social networks and ending with the buying behavior of clients. We divide clients according to their industries, we watch their seasonality according to the attendence of categories on and we try to approach them proactively based on this gained insight. The salesman picks a category on his dashboard and is then able to see listed clients, their solvency and their spendings outside Seznam. From this, he knows exactly who and when to call. 

How is the sales team responding to Keboola Connection and GoodData?

The sales department has their people 100 % under control thanks to Keboola and GoodData, so their response is of course very positive. When you hear a sales manager with more than eight years of experience saying that he cannot imagine his work without GoodData anymore, it’s certainly something you like to hear. 

Does it pay off financially?

After three months of the project running, I could easily see the results (in dollars) through the business managers’ performance - of which we knew certainly was earned with thanks to information from GoodData. I can’t talk in exact numbers, but the investment into the database and BI consulting was in the hundreds of thousands range, and was payed off by more than 5 million in profit. 

So it does really pay off?

Sure it does. Not only does GoodData help us to generate more money, but also to find the areas where we can keep from losing it. A businessman can only use his time and energy where it’s worthy. Decisions are not driven by gut feeling anymore, they are based on hard data. We see costs drilled down to the tiniest of details. We can find the causes of growth and we are able to see what and how it exactly impacts our profit.

Seznam's Return on BI, Part Two