Recommendation Engine in 27 lines of (SQL) code

In the data preparation space, very frequently the focus lies in BI as the ultimate destination of data. But we see, more and more often, how data enrichment can loop straight back into the primary systems and processes.

Take recommendation. Just the basic type (“customers who bought this also bought…”). That, in its simplest form, is an outcome of basket analysis.

We recently had a customer who asked for a basic recommendation as a part of proof of concept, whether Keboola Connection is the right fit for them. The dataset we got to work with came from a CRM system, and contained a few thousand anonymized rows (4600-ish, actually) of won opportunities which effectively represented product-customer relations (what customers had purchased which products). So, pretty much ready-to-go data for the Basket Analysis app, which has been waiting for just this opportunity in the Keboola App Store. Sounded like good challenge - how can we turn this into a basic recommendation engine?

The value of text (data) and Geneea NLP app

Just last week, a client let out a sigh: “We have all this text data (mostly customer reviews) and we know there is tremendous value in that set but outside from reading it all and manually sorting through it, what can we do with it?”

With text becoming a bigger and bigger chunk of a company’s data intake, we hear those questions more and more often. A few years ago, the “number of followers” was about the only metric people would get from their Twitter accounts. Today, we want (and can) know much more; What are people talking about? How do we escalate their complaints? What about the topics trending across data sources and platforms? Those are just some examples of questions we’re asking of NLP (Natural Language Processing) applications at our disposal.

Besides the more obvious social media stuff, there are many areas where text analytics can play an extremely valuable role. Areas like customer support (think of all the ticket descriptions and comments), surveys (most have open-ended questions and their answers often contain the most valuable insights), e-mail marketing (whether it is analyzing outbound campaigns and using text analytics to better understand what works and what doesn’t, or compiling inbound e-mails) and lead-gen (what do people mention when reaching out to you) to name a few. From time to time we even come across more obscure requests like text descriptions of deals made in the past that need critical information extracted (for example contract expiration dates) or comparisons of bodies of text to determine “likeness” (when comparing things like product or job descriptions).

KBC as a Data Science Brain Interface

The Keboola Data App Store has a fresh new addition. That brings us to total of 16 currently available apps, three of which provided by development partners.

This new one is called “aLook Analytics”, and technically it is a clone of our development project, a “Custom Science” app (not available yet, but soon!). It facilitates connection to a GitHub/Bitbucket repository of a specific data science shop, which you can “hire” via the app and enable them to safely work on your project.

This first instance is connected to Adam Votava’s company aLook Analytics (check them out at

How does it work?

Let’s imagine you want to build something data-science-complex in your project. You get in touch with aLook and agree on what it is you want them to do for you. You exchange some data, the boys there will do some testing on their side, set up the environment and once they’re done, they’ll give you a short configuration script that you will enter into their app in KBC. Any business agreement regarding their work is to be made directly between you and aLook, Keboola stays on the sidelines for this one.
When you run the app, your data gets served to aLook’s prepared model and scripts, saved in aLooks repository get executed on Keboola servers. All the complex stuff happens and the resulting data gets returned into your project. The app can be (like any other) included in your Orchestrations, which means it can run automatically as a part of your regular workflow.

The user of KBC does not have direct access to the script, protecting aLook’s IP (of course, if you agree with them otherwise, we do not put up any barriers).

Very soon we will enable the generic “Custom Science” app mentioned above. That means that any data science pro can connect their GitHub/Bitbucket themselves - that gives you, our user, the freedom to find the best brain in the world for your job.

Why people and not just machines?

No “Machine Learning Drag&Drop” app provides the same quality as a bit of thought by a seasoned data scientist. We’re talking business analytics here! People can put things in context and be creative, while all machines can do is to adjust (sometimes thousands of) parameters and tests the results against a training set. That may be awesome for facial recognition or self-driving car AI, but in any specific business application, a trained brain will beat the machine. Often you don’t even have enough of a test sample so a bit of abstract thinking is critical and irreplaceable.

Why I'm not a Data Scientist

During my tenure at Keboola, and for some time before that, I’ve helped to design successful BI implementations for numerous companies, big and small.

In my role I taught others and helped them to achieve the same. Together, we build solutions that amaze me daily with their ability, value they bring to the users, and potential for the future. We process billions of rows of data, 10s of millions of text entries of all kinds, millions of deals and billions of dollars in business transactions. We perform some serious analytics over all that, helping to draw out business value for our clients every day. We innovate and help to redefine what it means to do BI. Our own company runs on data.

Yet, I would not call myself a Data Scientist.

I rarely code. I suck at stats. I definitely need to freshen up on my math skills. I avoid fancy terms like OLAP cube and Linear Regression. I prefer simple language. With my resume, I wouldn’t fit the bill for 80% of data analyst jobs postings out there. 

I don’t hold a PhD.

For me, Big Data is not a category of its own. It is something too big to handle using the tools at hand. So you get a bigger hammer and move on.

I’m a user, in all senses of the word. I’m addicted to data. I look for it everywhere, behind every question and problem. I love great business ideas and using data to make them fly. I love to work with people who think the same way.

How do I pull it off? Sometimes I wonder. For the most part, I believe it’s about the right tools. Tools that are conductive to this kind of thinking. I mostly use just two of them - Keboola Connection to bring the data together and put it where and how I need it, and GoodData to extract the meanings and answers to business questions.

Petr Olmer, Director of Expert Services at GoodData once tweeted that the most underused tool in BI is the human brain, and the most underrated method is asking questions. I believe it, and would add that the term “Data Scientist” ranks up there with the most over- (and mis-) used.

At Keboola we are trying to change that. Consultants at Keboola are people who understand the business and speak its language. They use their brains and ask a lot of questions.

Both Keboola and GoodData have some brilliant people that you could call serious scientists, data or otherwise. But their talents are being applied to making the tools smarter and more useful for us, the common folks. What they do keeps things simple for us. It allows us to focus on the business objective of the task at hand rather than the “how” of it all. Thanks to them, you don’t need to hire a scientist (or be one) to find the wealth in your data

But you might want to talk to - or become - one of us.