Recommendation Engine in 27 lines of (SQL) code

In the data preparation space, very frequently the focus lies in BI as the ultimate destination of data. But we see, more and more often, how data enrichment can loop straight back into the primary systems and processes.

Take recommendation. Just the basic type (“customers who bought this also bought…”). That, in its simplest form, is an outcome of basket analysis.

We recently had a customer who asked for a basic recommendation as a part of proof of concept, whether Keboola Connection is the right fit for them. The dataset we got to work with came from a CRM system, and contained a few thousand anonymized rows (4600-ish, actually) of won opportunities which effectively represented product-customer relations (what customers had purchased which products). So, pretty much ready-to-go data for the Basket Analysis app, which has been waiting for just this opportunity in the Keboola App Store. Sounded like good challenge - how can we turn this into a basic recommendation engine?

How we "hacked" Vizable

Tableau unveiled their new Vizable app the first full day of the Tableau User Conference 2015 (A.K.A. TC-15) to much oohs and aahs. Vizable is a tablet app that allows users to take data from an .xls or .csv file and easily interact with it right on their tablet. It is unparalleled in its ease of use and intuitiveness, providing an exciting new way to consume data and drive insights. More information here: http://vizable.tableau.com/

As soon as we saw it, the Keboola team thought, “What an exciting way to use data from Keboola Connection - if only we could send data to it immediately to test it!” The app is built to accept .xls and .csv files that are physically present on the iPad it runs from, so at a glance, it is completely and utterly off-line. We immediately wondered if Keboola Connection - due to its integration with DropBox and Google Drive - could make Vizable the ultimate, on-the-go data visualization app.

(a little bit of frantic testing later)

Yeah! We can easily schedule pushing data into the iPad using our existing integrations. We didn't have to write a single line of code and already during the conference we were able to play with #data15 mentions we’d pulled in through Keboola Connection, with fresh data being automatically pushed into the iPad every 30 minutes.

We eagerly shared our success with the Vizable team and started showing conference attendees and members of the Tableau team just how we’d made it all happen! It was great to receive a string of visits from the whole Vizable crew all the way up to Dave Story, VP of Mobile and Strategic Growth, and Chris Stolte, the Chief Development Officer. What a thrilling way to educate the Tableau folks on all the cool stuff Keboola does with their tool and for their customers.

Get in touch with us if you want to know more!

GoodData Open Analytics Platform - a Category of One

(originally published as a guest post on the GoodData blog)

In the world of big data and analytics, what is the definition of a platform? What belongs in the category and what doesn't? If Tableau is on the list, why not Excel? If Excel, why not Numbers or GoogleSheets? (Hey, that one's even cloud based!) The whole thing is somewhat silly to me. It is trying to compare the incomparable.

Over the years of Keboola's existence and focus on business intelligence, we've been closely monitoring the tools available. We are an independent company and while we partner with GoodData, our ultimate focus is to do what's best for our customers. There are many tools out there. Some mediocre, some brilliant in what they do. It never ceases to amaze me how can solutions built on Cognos, Microstrategy or Business Object cost so much while so little value seems to be actually delivered. Similarly how Domo took a simple dashboarding tool and by some serious marketing dollars made it appear almost like a BI product. Conversely looking at a product like Tableau, its visualizations are unparalleled. And yet simply put, nothing comes close to fulfilling our vision of BI Platform as well as GoodData does.

If you disagree, start asking questions - Which BI tools have a robust API that can connect to and push data from any data source? Do they allow you to filter data based on the user that is looking at it? Can you automatically build scripts that generate reports relevant to the current situation of your business? Does a BI tool allow you to analyze hundreds of millions of rows of data in seconds? Does it have a front end interface that anyone who came near a medium-complex spreadsheet and knows how to drag and drop can use? Does the platform allow you to build a product that you deploy to hundreds of customers by a touch of a button? And which tool allows you to do ALL these things? Right.

GoodData is more than a tool, it is a true open platform. For some this comes as news, for us at Keboola, it has always been that way. We have always treated GoodData as a platform.

True platform gives you tools and space at the same time. The tools allow you to do things, and the space to imagine and create new ways of doing. Your imagination, not the tool is the limit. We built, using GoodData itself, a training tool to teach people how to use GoodData called Keboola Academy. We built AI that modifies not only the data in the reports, but the dashboard layout of the dashboard to pinpoint what is important. We completely integrated with GoodData so deployment of dashboards and analytics over our own business data warehouse product is seamless and largely automatic. We built whole data products, deeply embedded into our customers' interfaces, all using an open analytics platform called GoodData.

Keboola is about helping companies make more money using data. Whether it is for internal reporting and analytics, or to create new revenue streams by monetizing data-as-a-product, GoodData gave us the freedom to build amazing things and continuously grow our business (so far 200% or more year over year) and that is why I consider it the only true BI platform on the market today. "BI Platforms" is a category of one.

Why I'm not a Data Scientist

During my tenure at Keboola, and for some time before that, I’ve helped to design successful BI implementations for numerous companies, big and small.

In my role I taught others and helped them to achieve the same. Together, we build solutions that amaze me daily with their ability, value they bring to the users, and potential for the future. We process billions of rows of data, 10s of millions of text entries of all kinds, millions of deals and billions of dollars in business transactions. We perform some serious analytics over all that, helping to draw out business value for our clients every day. We innovate and help to redefine what it means to do BI. Our own company runs on data.

Yet, I would not call myself a Data Scientist.

I rarely code. I suck at stats. I definitely need to freshen up on my math skills. I avoid fancy terms like OLAP cube and Linear Regression. I prefer simple language. With my resume, I wouldn’t fit the bill for 80% of data analyst jobs postings out there. 

I don’t hold a PhD.

For me, Big Data is not a category of its own. It is something too big to handle using the tools at hand. So you get a bigger hammer and move on.

I’m a user, in all senses of the word. I’m addicted to data. I look for it everywhere, behind every question and problem. I love great business ideas and using data to make them fly. I love to work with people who think the same way.

How do I pull it off? Sometimes I wonder. For the most part, I believe it’s about the right tools. Tools that are conductive to this kind of thinking. I mostly use just two of them - Keboola Connection to bring the data together and put it where and how I need it, and GoodData to extract the meanings and answers to business questions.

Petr Olmer, Director of Expert Services at GoodData once tweeted that the most underused tool in BI is the human brain, and the most underrated method is asking questions. I believe it, and would add that the term “Data Scientist” ranks up there with the most over- (and mis-) used.

At Keboola we are trying to change that. Consultants at Keboola are people who understand the business and speak its language. They use their brains and ask a lot of questions.

Both Keboola and GoodData have some brilliant people that you could call serious scientists, data or otherwise. But their talents are being applied to making the tools smarter and more useful for us, the common folks. What they do keeps things simple for us. It allows us to focus on the business objective of the task at hand rather than the “how” of it all. Thanks to them, you don’t need to hire a scientist (or be one) to find the wealth in your data

But you might want to talk to - or become - one of us.

Amazon's anticipatory shipping

Amazon is just making the next logical step against traditional retail

I’ve now read about a dozen various reactions to Amazon’s patent on their “anticipatory shipping”. (you probably saw, or even read, USA Today’s article recommended on Linkedin). While I don’t dispute the brilliance of the idea, I think it’s worth mentioning that we’re looking at fairly expectable extension of Amazon’s logistical model, with a bit of excellent lateral thinking thrown into the mix.

When you think about it, brick-and-mortar retail chains have been doing the same thing since their inception. “Shipping into a general geographic area” is in their language called “putting goods on shelves” in a particular store. They use exactly the same data analytic techniques to estimate how much to put there to avoid both mark-downs and run-outs. Walmart built much of its success on the ability to know exactly how much to have where and when. Replace the word “store location” (which serves people in particular area) with “general geographic area” (state, county, zip-code) and you are back in Amazon’s world.

With the data it has, Amazon can of course predict orders in a particular area at least with the same accuracy Walmart can determine how many boxes of a particular toothpaste to put on a truck in their DC today so it hits the store just at the right time. It’s not perfect, but it does work very well indeed. Now if you imagine that the “general geographical area” happens to be an area served by a particular UPS depot, then all you need to do is to send the stuff there and then just collect the orders by the time the local delivery vans are being loaded. The better your “prediction”, the fewer items will be left at the depot without addressee that day (which may even be, up to a point, just fine with UPS, given how much business they see from Amazon), and the fewer people will have to wait additional day to get their items. Amazon is effectively distributing their DC closer to the user, using the trucks and planes as their warehouse.

Adrian Gonzales in his post looks at the whole thing from an additional, very interesting angle. The shipments to a particular area can become in a way self-fulfilling prophecies. The one thing traditional retail still holds over the on-line business is the ability to posses the wished item right there and then. While Amazon won’t be shipping to you shelves of items to pick from and send back what you don’t want any time soon, with the package already on the way at the time of your order, they’re coming pretty close. With this (on average) shorter time between order and delivery, with the cost of shipping staying standard unlike with same-day deliveries, Amazon is further strengthening its offering and increasing the reason why people would buy online rather than going to a store. In addition to that, they’re opening doors to impulse purchasing (“we think you probably want this, we have an extra on a truck near you, click yes/no”). Or imagine dutch auction for those not-yet-spoken-for items. Price is dropping until someone says yes and “outbids” the others.

At Keboola, we are working with clients on both sides of this online v. brick-and-mortar struggle. Both principles have place in our future shopping habits, but both of them need to work hard to balance the advantages of the other. Data happens to be the weapon of choice on both sides. While online retailers are trying to eliminate the time-to-value gap of purchases against retail, traditionals are learning more and more about us, individual shoppers, and our patterns. So what will be Bentonville’s answer to Amazon’s challenge? Maybe a shopping cart, waiting for you at the entrance of Walmart, already pre-loaded with the items you are almost certainly planning to buy today. You then just pick up the few unusual pieces and off you go.


Milan Veverka