Putting your data into the right context
At the beginning of last summer, GoodData launched its new analytic engine AQE (Algebraic Query Engine). Its official product name is GoodDate XAE. However, since I believe that XAE is Chinese for “underfed chicken”, I will stick with AQE ☺. Since the first moment I saw it, I considered it a concept with the biggest added value. When Michael showed me AQE I immediately fell in love.
However, before we can truly reveal AQE and the benefits that can be derived from it, we need to begin with an understanding of it’s position in the market - starting from the foundation on which GoodData’s platform rests. In a three part series we’ll cover AQE’s impact on contextual data, delivering meaningful insights and finally digging for those hidden gems.
First, a bit more comprehensive introduction...
Any system with ambitions to visualize data needs some kind of mathematical device. For instance, if I choose sold items using the names of salespeople as my input and my goal is to find out the median of the salespersons’ turnover, somewhere in the background a total summation of the sold items per month (and per salesperson) must take place. Only after getting that result can we count the requested median. Notice the below graphic - the left table is the crude input with the right table being derived in the course of the process - most of the time we don’t even realize that these inter-outputs keep arising. Within the right table, we can quickly calculate the best salesperson of the month, the average salesperson/median and so on…
And how does this stack up against the competition?
If we don’t have a robust analytic backend, we cannot have the freedom to do whatever we want to. We have to tie our users to some already prepared “vertical analysis“ (churn analysis of the e-shop’s customers, RFM segmentation, cohorts of subscriptions, etc…). Fiddling with the data is possible in many ways. Besides GoodData, you can find tools such as Birst, Domo, Klipfolio, RJMetrics, Jaspersoft, Pentaho and many, many others. They look really cool and I have worked with some of them before! A lonely data analyst can also reach for R, SPSS, RapidMiner, Weka and other tools. However, these are not BI tools.
Most of the aforementioned BI tools do not have a sophisticated mathematical device. Therefore, it will simply allow you to count the data, calculate the frequency of components, find out the maximum, minimum and mean. The promo video of RJMetrics is a great example.
Can I just use a calculator instead?
Systems such as Domo.com or KlipFolio.com solve the problem of an absentee mathematical device in a bit of a bluffy way. They offer their users several mathematical devices – just the same as Excel does. The main difference is that they can be used with separate tables, not with the whole data model. Someone may think that it does not matter but, quite the contrary – this is the pillar of anything connected to data analytics. I will try to explain why...
The border of our sandbox lays with the application of the law of conservation of “business energy”.
“If we don’t manage to earn our customer more money than our services (and GoodData license) cost him, he won’t collaborate with us.“
Say for example if we take the listing of invoices from SAP and draw a graph of growth, our customers will sack us from the offices. We need a little bit more. We need to put each data dimension into context (dimension = thematic data package usually presented by data table). Each dimension does not have to have any strictly defined linkages; the table in our analytics project is called dataset.
But how is it all connected?
The moment we give each dimension it’s linkage (parents, children … siblings?), we get a logical data model. A logical data model describes the “business” linkages and most of the time it is not identical with the technical model in which any kind of system saves it’s data. For example, if Mironet has its own e-shop, the database of the e-shop is optimized for the needs of the e-shop – not financial, sales and/or subscription analytics. The more the environment (of which we analyze the data) is complicated, the less similarities the technical and analytical data models have. A low structural similarity of the source data and the data we need to analytics, divides the other companies from GoodData.
A good example of this is our internal project. I chose the internal project because it contents the logical model we need only for ourselves. Therefore, it is not somehow artificially extended just because we know “the customer will pay for it anyway”.
We upload different kinds of tables into GoodData. These tables are connected through linkages. The linkages define the logical model; the logical model then defines “what can we do with the data”. Our internal project serves to measure our own activity and it connects the data from the Czech accounting system (Pohoda), Canadian accounting system (QuickBooks), the cloud application Paymo.biz and some Google Drive documents. In total, our internal project has 18 datasets and 4 date dimensions.
The first image (below) is a general model, select the arrow in the left corner to see what a more detailed model looks like.
In the detailed view (2 of 2), note that the name of the client is marked with red, the name of our analyst is marked with black and the worked hours are marked with blue. What I want to show here is that each individual piece of information is widely spread throughout the project. Thanks to the linkages, GoodData knows what makes sense altogether.
Using business-driven thinking to force your data to comply to your business model (rather than the other way around) will allow you to report on meaningful and actionable insights. Part 2 of the following series on AQE (...or more formally XAE) will uncover the translation of the Logical Data Model into the GoodData environment.