Keboola: Data Monetization Series - How data science can help


Having access to the right data in a clean and accessible format is the first step (or series of steps) leading up to actually extracting business value from your data.  As much as 80% of the time spent on data science projects involves data integration and preparation.  Once we get there, the real fun begins.  With the continued focus on big data and analytics to drive competitive advantage, data science has been spending a lot of time in the headlines.  (Can we fit a few more buzzwords into one sentence?)

Let’s take a look at a few data science apps available on our platform and how they can help us into our data monetization efforts.

Basket analysis

One of the most popular algorithms is market basket analysis.  It provides the power behind things like Amazon’s product recommendation engine and identifies that if someone buys product A, they are likely to buy product B.  More specifically, it’s not identifying products placed next to each other on the site that get bought together, rather products that aren’t placed next to each,  This can be useful in improving in-store and on site customer experience, target marketing and even the placement of content items on media sites.

Anomaly detection

Anomaly detection refers to identifying specific events that don’t conform to the expected pattern from the data.  This could take the form of fraud detection, identifying medical problems or even detecting subtle change in consumer buying behaviors.  If we look at the last example, this could help us in  identifying new buying trends early and taking advantage.  Using the example of an eCommerce company, you could identify anomalies in carts created per minute, a high number of carts abandons, an odd shift in orders per minute or a significant variance in any other number of metrics.


The desire to extract business value out of data isn’t only focused on transactional data.  About 80% of the data generated is unstructured (social media, customer service, call center, etc) and natural language processing (NLP) is being applied to harness it.  Essentially, enabling computers to extract meaning from human language.  The most common application for NLP is sentiment or text analysis.  This can be very effective for understanding positive and negative trends in social media interactions

Decision Tree

Decision trees are mathematical models to aid in, you got it, decision making.  Using estimates and probabilities to calculate likely outcomes, it’s designed to help you decide whether the net gain of a decision is worthwhile.  Looking at the example of a retailer, it could help you decide whether the best route for increasing revenue across your store chain would be launching a new marketing campaign or cutting costs.  On the upside, it helps you get an overview of the options at the same time in a logical way, considering both risks and rewards.  On the downside, it ignores qualitative factors  and estimates can be prone to error.

Correlation / Grouped Histogram

A grouped histogram is a way of  understanding the distribution of quantitative data.  This gives  analysts information presented in a compact and organized fashion, allowing them to analyze a large data set without having to dive into descriptions for distinguishing each variable and frequency for a given set of intervals.  This is a great way to use data analysis to make informed decisions by simply conveying information about each variable including their values and occurrences.  This could be a good way of visualising in buckets something like the number of minutes visitors spend on a particular page on your site

As more companies continue to focus on data science, acquiring and maintaining that talent can be a real challenge.  Additionally, building out these capabilities is typically costly and time consuming.  I recently read a great article on outsourcing data science; another great option is to use Keboola’s data science apps.  We make it simple for business users to bring together their data and perform data science tasks.  Contact us to learn more.



I got some great insight around market basket analysis from this article; check it out!