Why I'm not a Data Scientist

During my tenure at Keboola, and for some time before that, I’ve helped to design successful BI implementations for numerous companies, big and small.

In my role I taught others and helped them to achieve the same. Together, we build solutions that amaze me daily with their ability, value they bring to the users, and potential for the future. We process billions of rows of data, 10s of millions of text entries of all kinds, millions of deals and billions of dollars in business transactions. We perform some serious analytics over all that, helping to draw out business value for our clients every day. We innovate and help to redefine what it means to do BI. Our own company runs on data.

Yet, I would not call myself a Data Scientist.

I rarely code. I suck at stats. I definitely need to freshen up on my math skills. I avoid fancy terms like OLAP cube and Linear Regression. I prefer simple language. With my resume, I wouldn’t fit the bill for 80% of data analyst jobs postings out there. 

I don’t hold a PhD.

For me, Big Data is not a category of its own. It is something too big to handle using the tools at hand. So you get a bigger hammer and move on.

I’m a user, in all senses of the word. I’m addicted to data. I look for it everywhere, behind every question and problem. I love great business ideas and using data to make them fly. I love to work with people who think the same way.

How do I pull it off? Sometimes I wonder. For the most part, I believe it’s about the right tools. Tools that are conductive to this kind of thinking. I mostly use just two of them - Keboola Connection to bring the data together and put it where and how I need it, and GoodData to extract the meanings and answers to business questions.

Petr Olmer, Director of Expert Services at GoodData once tweeted that the most underused tool in BI is the human brain, and the most underrated method is asking questions. I believe it, and would add that the term “Data Scientist” ranks up there with the most over- (and mis-) used.

At Keboola we are trying to change that. Consultants at Keboola are people who understand the business and speak its language. They use their brains and ask a lot of questions.

Both Keboola and GoodData have some brilliant people that you could call serious scientists, data or otherwise. But their talents are being applied to making the tools smarter and more useful for us, the common folks. What they do keeps things simple for us. It allows us to focus on the business objective of the task at hand rather than the “how” of it all. Thanks to them, you don’t need to hire a scientist (or be one) to find the wealth in your data

But you might want to talk to - or become - one of us.

The Beginner’s Guide To Keboola

"The whole thing is a bit complicated…" started Vojta, one of Keboola’s consultants, over an English breakfast in the coffee shop with the best coffee in Prague. He was right. It was complicated. But a few hours (and a pint of coffee) I got pretty good idea what was going on. Here, I will try to relay it to you.

Intro: Companies today often have enough data to get completely lost in it and it is unfathomable to put it into context and extract any useful meaning. Even if they can, there are high costs associated with time and money.

Finding the gold in the data

Keboola does something called data ETL (Extract, Transform, Load). It sounds (just like many other fancy terms from this field) more complicated than it is.

Keboola helps you:

  1. Identify, locate and pull together all data relevant to your business from both your own and third-party sources. Anything from accounting and ERP systems to some related open-data initiatives of the government to comments on your Facebook pages. This is the Extract stage.
  2. They manage the whole load, organize it into a structure in which one can meaningfully work with it. That’s Transform.
  3. Then the data is pushed into the system or application selected for the final consumption - Load.

The toolset that Keboola uses to perform (amongst other things) the ETL tasks, is their own Keboola Connection.

The platform that Keboola uses for the analytics and producing all of those wondrous charts and dashboards is GoodData.

So what is it all good for?

You’ve got data. Lots of it.

To give it meaning, the data needs to be pre-processed, the pieces put in order and with the right context, so that GoodData will give you the results you need. That is what Keboola is for:

  • Helping you to find meaning in your data.
  • Continuously processes your data using Keboola Connection
  • Sets up GoodData so you can find the answers you need. Answers to questions like "how much revenue did we get from customers brought to us by the expensive marketing campaign from last fall?” or “what impact does weather have on our sales people’s performance?” Or whatever else comes to mind.

Keboola can do all of that pretty fast and practically without limitations. But that’s my topic for the next time.

If anything here doesn’t make sense to you, please ask! I’ll reply and explain better in the article.

Amazon's anticipatory shipping

Amazon is just making the next logical step against traditional retail

I’ve now read about a dozen various reactions to Amazon’s patent on their “anticipatory shipping”. (you probably saw, or even read, USA Today’s article recommended on Linkedin). While I don’t dispute the brilliance of the idea, I think it’s worth mentioning that we’re looking at fairly expectable extension of Amazon’s logistical model, with a bit of excellent lateral thinking thrown into the mix.

When you think about it, brick-and-mortar retail chains have been doing the same thing since their inception. “Shipping into a general geographic area” is in their language called “putting goods on shelves” in a particular store. They use exactly the same data analytic techniques to estimate how much to put there to avoid both mark-downs and run-outs. Walmart built much of its success on the ability to know exactly how much to have where and when. Replace the word “store location” (which serves people in particular area) with “general geographic area” (state, county, zip-code) and you are back in Amazon’s world.

With the data it has, Amazon can of course predict orders in a particular area at least with the same accuracy Walmart can determine how many boxes of a particular toothpaste to put on a truck in their DC today so it hits the store just at the right time. It’s not perfect, but it does work very well indeed. Now if you imagine that the “general geographical area” happens to be an area served by a particular UPS depot, then all you need to do is to send the stuff there and then just collect the orders by the time the local delivery vans are being loaded. The better your “prediction”, the fewer items will be left at the depot without addressee that day (which may even be, up to a point, just fine with UPS, given how much business they see from Amazon), and the fewer people will have to wait additional day to get their items. Amazon is effectively distributing their DC closer to the user, using the trucks and planes as their warehouse.

Adrian Gonzales in his post looks at the whole thing from an additional, very interesting angle. The shipments to a particular area can become in a way self-fulfilling prophecies. The one thing traditional retail still holds over the on-line business is the ability to posses the wished item right there and then. While Amazon won’t be shipping to you shelves of items to pick from and send back what you don’t want any time soon, with the package already on the way at the time of your order, they’re coming pretty close. With this (on average) shorter time between order and delivery, with the cost of shipping staying standard unlike with same-day deliveries, Amazon is further strengthening its offering and increasing the reason why people would buy online rather than going to a store. In addition to that, they’re opening doors to impulse purchasing (“we think you probably want this, we have an extra on a truck near you, click yes/no”). Or imagine dutch auction for those not-yet-spoken-for items. Price is dropping until someone says yes and “outbids” the others.

At Keboola, we are working with clients on both sides of this online v. brick-and-mortar struggle. Both principles have place in our future shopping habits, but both of them need to work hard to balance the advantages of the other. Data happens to be the weapon of choice on both sides. While online retailers are trying to eliminate the time-to-value gap of purchases against retail, traditionals are learning more and more about us, individual shoppers, and our patterns. So what will be Bentonville’s answer to Amazon’s challenge? Maybe a shopping cart, waiting for you at the entrance of Walmart, already pre-loaded with the items you are almost certainly planning to buy today. You then just pick up the few unusual pieces and off you go.


Milan Veverka

Behold. The Official Keboola Blog has started

Hi! I am Martin, official Keboola Data Rookie, here to share the first post of our new blog!

My goal is to explain what it is we actually do here at Keboola… which can be pretty hard to sum up in one sentence. Every side of Keboola tells a different story, so each month we’ll give you insight into our business from a different perspective. With every post I hope to tell you something new and exciting, but if you’re still left with burning questions please let me know so I can answer them! 

Here’s a taste of what you can expect to see:

  • Keboola Basics. Or kindergarden for data analysts - as simple as what Keboola does, for whom and how. (Successfully tested on my grandmother)
  • Data for Business. We know you want to see numbers, but we also want to give you a comprehensive overview of how and what Keboola does to help companies earn big money through Big Data. We’ll share these stories through interviews, case studies, and our cultivated best practices.
  • Nerd Zone. For the tech savvy and future innovators, this is the space where analysts can embrace their inner geeks. Look forward to detailed articles full of pure know-how.
  • From Keboola With Love. With offices in two very (culturally) different time zones, get a behind the scenes look at what happens on the other side of the screen.

From one data enthusiast to another, thanks for taking the time to hear what I have to say. Stay tuned for our next post where we sit down with Michal Buzek - head of the analytical department at Seznam.cz (a respectable Czech rival of Google) – and find out the results of his project with Keboola.