community

Pavel Dolezal (Keboola): Companies want to become data-driven. It takes four steps to get there.

What’s going on with machine learning, artificial intelligence, and "data-driven" businesses in practice?

Jiri Vicherek

Written by
Jiri Vicherek

May 15, 2019

Pavel Dolezal is one of the shareholders of Keboola, a company which enables customers to work with data using its cloud-based platform. Now he is in charge of the company’s new Chicago office. As businesses grow more reliant on data, Keboola has been successful in attracting a large number of customers in the US.
Mr. Dolezal talks about the steps that companies and organizations need to take to become data-driven and about the role of machine learning (ML) and artificial intelligence (AI) in this process.

Pavel Dolezal, CEO Keboola

I’ve recently heard a good joke: What is the difference between machine learning and artificial intelligence? If it’s written in Python, it’s probably machine learning. If it’s written in PowerPoint, it’s artificial intelligence. Does it match what you see in the market?

It reminds me of how all investor activities focused on AI in the past 3 years. Everything was AI. That was funny as well.

Keboola works with and enables working with data that both AI and ML need. When it comes to your customers and partners, can you tell if something is really happening in these areas?

We see extensive use of machine learning. And you must follow a certain path for this. First, you need to help people with data – put it together, clean it, and so on. Then it is necessary to get insights into the company, so that everyone can see the same things. Actions are the third step, enabling us to take concrete steps about all that. Actions can be physical, such as changes in the company, or maybe just creating a machine learning model. We are helping companies on this path. It’s never enough to put data together and create insights. We need to create new processes.

Years ago, Keboola was one of the first companies in the Czech Republic that saw the emerging need for data and predicted the major role of the cloud. What do you see as another such thing today?

Wherever possible, we work hard to automate processes, so people can stop performing unnecessary tasks. However, this can only be done when you have data available on how people and things behave. Many company employees can no longer work using only basic data from Excel. They need data and tools similar to those used by hardcore data scientists. They need to analyze raw data.

We’ve also seen the growing diversity of places where data is generated. And companies are getting more and more interconnected. It’s called data sharing, and it’s becoming very interesting. We have several customers in the United States who work in the same segment and want to share their data. We make it happen for them at the technological level.

Before the cloud, legacy IT systems were robust, closed, and inflexible. However, the cloud caused another problem; everything was suddenly scattered and terribly chaotic. It had to be brought back together. For example, someone downloads data and stores it somewhere in one part of the company, and someone’s doing basically the same thing, but using different tools, in another department of the same company. We have created a software layer that manages the infrastructure and does so on-demand.

In addition to AI and ML, the term data-driven company keeps coming up at all the conferences I attend. Does it mean there are already companies and organizations that use data in their management?

Well, reality tends to lag behind conferences by four to ten years. But there already are companies that use data in day-to-day decision making. In general, the problem is that the larger the company, the more people work in silos.

We see four stages that each company has to go through to become data-driven. First, it is necessary to agree on what metrics are relevant, and this must take place across the whole organization. Second, KPI reports must be available, and we need to be able to influence the KPIs. As one of our retail customers puts it: “We have to know how we can influence KPIs, and we must see a person before us as well as behind us in the process.” This is how, years ago, Slevomat, the largest daily deal/bulk discount website in the Czech Republic, started cooperating with traders they benchmarked well. Third, sharing data across departments is a necessity. The fourth step is integrating machine learning and process control. Many companies are still not ready for that.

It’s no problem to harness AI and ML technically. Getting data takes one click in our platform, but it takes up to eight months in the legacy systems. Companies must start using the final model and make it part of their processes. Even though they want to become data-driven, they think it’s possible to skip the four evolutionary phases. And it just doesn’t work like that.

There are beliefs that the so-called HiPPOs – the people with the highest pay – should stop making decisions based on their feelings, and decisions should only be made based on data. But in practice, it is probably somewhere in between, isn’t it?

It is not possible to follow the data only. The world is more complicated, and data presents a basic level on which people can agree. But the decision on what to do based on the data is something completely different. I can have the perfect data output that tells me something, but I can also have a long-term strategy and intentionally base my decisions on that. Unfortunately, it is still quite common to see people in charge ignore data completely even though it is available to them. That’s why there’s an established four-step path for creating a data-driven culture.

Where does the pressure to use data in company management come from – from below or from above?

I don’t have an answer to this, but I can tell you one thing. Even though changes often come from below, they can only reach a certain point. If there is no support from above, it’s all for nothing and it can’t be done. You can do a lot of interesting things, but without the necessary support, they just end up in some silos.

But there is another way, one we see most often – the fear that the company will stop innovating and their competition will leave them in the dust. To give an example, on seeing what Amazon has been doing in different fields in the past year and a half, companies have become so worried that they’ve made the data issue reach their directors’ agenda. They are beginning to realize that one technology cannot change the situation and that several things must be done. Three years ago, it wasn’t like that – data was the problem of the IT department, not of the leadership.

Listen to Pavel's podcast DataHeroes

It’s no secret that the trend of companies and organizations sharing their data is on the rise. How exactly does it work?

Another retail customer of ours answered this question well. When asked if he was afraid that someone would steal their shared data, he replied: “For each of my competitors, it is enough to stand in front of one of our stores for one day and ask a random sample of people what they had bought. Then they will know exactly what’s going on.” After this customer democratized the data, their management changed. If someone gets a snapshot of data as it is today, it’s useless. It will take two to three years to make the necessary organizational changes, and the company they copy will be way ahead of them again. Heureka.cz, one of the top price-comparison shopping sites and shopping advisors in Central and Eastern Europe, has made a nice move, supplying e-shops with attendance, but also benchmarking them among themselves, providing recommendations, etc. They have fresh data every day and make sure to have it online.

Big brands have never been able to track their advertising campaigns to the point where the goods are bought. They know the goods are sent somewhere, and then what is meant to be is meant to be. However, with Heureka’s approach, they will be able to perfectly track the influence of the campaign on different types of sales outlets. These are the first steps. Robots trading with each other will come next. Businesses will share their data – and why should people be in charge of everything?

So, in theory, is it possible for companies to share their data and connect with each other?

Even though we still need to wait and see where this will go, there are already first signs of this happening. I believe the future of data lies in its sharing. It’s the logical next step. As there is more information, more complex processes, and fewer but more expensive people, we try to use machines to handle all that. Algorithms can do a lot of things better.

Will the Internet of Things and completely new data sources play any role here?

In industry, that’s certain. We are working on a project with SimpleCell, the first IoT mobile operator in the Czech Republic, where there are tens of thousands of devices. It is a paradox, but this is not about the data volume. That’s already been solved. The complexity is the problem. If each department in a company has between 15 to 20 data sources to make good decisions, the company as a whole uses tens to hundreds of various data sources. Some of this data is changing every second, and a number of people need to work with it in different ways. This is where we see the main challenge.

Have you ever seen a deployed functional blockchain?

No, I haven’t. We do not encounter blockchain in our world. There are certainly many applications that we do not know of or can describe today. IoT has also been spoken of for many years, but we are starting to perceive the analytics of this area only now.

Should education reflect the growing importance of data?

I believe it’s important to teach how to break down problems into smaller, more easily solvable subproblems. Schools often teach a lot of outdated things, and the world is already far ahead. But there are a few constants that have been used for many years. One of them is SQL. Learning something like that in today’s world will definitely not hurt anyone, and it opens many doors. SQL has survived its several clinical deaths and keeps coming back. As a foundation, it is the best that is available.

The problem is with the “mental framework” of how to approach problem analysis and how to use data when it’s available. I also believe that it is essential to learn to work together and not in silos. The importance of data and its analysis will grow and affect different fields. For example, marketing today is far from simply writing a nice Google AdWords text. There is a strong influence of data along with the complexity of interconnection and machine learning. The question is how to teach this. As textbooks are not much help here, I find offline workshops to be the best option.

Are companies today working with unstructured data more than before?

More and more. Fifteen years ago, companies had data that was closed in databases. Or, they got some outputs like reports from Nielsen. With the world moving into the cloud, these gluttonous giants were dismantled. Today, a lot of data “sits” outside, plus there are services like Yelp or social networks that hold large amounts of unstructured data. It is often impossible to achieve outputs today without using unstructured data.

Given the direction in which work with data is heading, do you think other fields will attempt innovations similar to Amazon Go which has fully automated retail stores?

Amazon Go has three to five times the turnover in the same area as regular stores. It’s the same principle Amazon dealt with years ago – a certain increase in the loading speed of their websites increased the sales by a few percent. A retail customer of ours once said that every sector that has more than four forms can be disrupted.

Last year, we played a lot with the Amazon Go concepts. There are several possibilities. You can do everything “at home,” but only very few companies in the world can manage that. Or, you have to put it back into the ecosystem again, and you won’t have the best data scientists with you. You need an environment that allows you to welcome external data-science companies to help you. That’s what we specifically focus on and what opens new possibilities. Today, reverse engineering is being applied to Amazon Go, but we need to think about what’s next, not just copy it. Amazon Go is just the first real example.

How does this data-sharing environment cope with those several large companies, such as Amazon and Google, that are sitting on data no one else has? It obviously puts the rest of the market in a disadvantageous position.

It depends. For certain projects, you might indeed need data that only Google has. But it does not apply to most business cases. Business management is important, and innovations are essential. There are many areas and disciplines where companies can differ. Not everyone has to follow Amazon Go. And if so, they must find something that’s their own – something that sets them apart – and specialize in it. Amazon just proved it is possible.

Recently, people have started talking about the hybrid cloud again. Does it make any sense in the current data environment?

I don’t think so. And if it does make sense, it’s just because of the regulatory requirements. This approach is often caused by the fact that regulation is lagging behind the reality of the world. But it’s also changing. Even cloud providers like Amazon and Microsoft are very well prepared for this. They know what big businesses want and why, and they can provide that for them. Today, people are discussing edge computing, when things will need to be done in geographically remote locations, and some models will have to run there. These are more interesting approaches.

But even a server can be considered edge in certain cases.

It depends on what it’s used for. If it works as a device for pre-processing sensor data and running basic models, it’s fine. In some cases, it will be necessary to pre-process data on site. The problem arises when someone says: “I’m scared, I don’t want it, so I’ll have it under my desk.” In my opinion, a hybrid solution is just a transition phase for most companies.

Keboola has recently announced the opening of its new branch in Chicago. Why there?

The largest part of B2B business in the enterprise and upper middle market is happening in the United States. Having succeeded in attracting a larger and larger customer base, we need to provide them with services, introduce them to other people, and build the environment and the ecosystem. And that must be done locally.

In Chicago, we are organizing a series of networking events. People are overwhelmed by digital technology in today’s world and enjoy going offline. To these events, we also bring customers who work on interesting data projects. This way we are gradually building a community.

The Chicago area alone has about three times the GDP of the Czech Republic. There are also plenty of retail, logistics and other companies located there, and they are a very interesting target group for us.

This article was originally published on Lupa.cz (author: Jan Sedlák)

Interested in opinions and experience of people who have successfully transformed their business with data-driven approach? Subscribe to our newsletter, where we pick inspiring case studies and news from Keboola… or tune up DataHeroes to find out more about corporate culture in data-driven and fast growing companies. Guests of our CEO Pavel Dolezal share their experience and know-how.
If you are excited to try our tool Keboola Connection, learn more on our
website or ask for demo straight away.

More community Articles