tag:blog.keboola.com,2013:/posts The Official Keboola Blog 2018-07-06T23:08:19Z Keboola Blog tag:blog.keboola.com,2013:Post/1279332 2018-05-01T22:18:02Z 2018-05-01T23:20:56Z Uncover New Growth Opportunities in eCommerce with our Update Magento 2.0 Extractor

Magento is an open source eCommerce platform which provides merchants with a flexible solution to control their online stores’ contents, appearance and functions. With our updated extractor, merchants will be able to extract data from Magento into Keboola's platform to:

  1. Syncing inventory across multiple platforms

  2. Constructing cost structure by constructing merchant company’s financial data and suppliers’ data

  3. Observing or predicting the statistics of user engagement by analyzing customer information

Magento Extractor is written based on Generic Extractor. Multiple templates have been configured to extract data from the few popular endpoints offering the users to have a clean data extraction out of the box. 

This link contains information on how the extractor should be configured and some useful links connecting the users to Magento API settings.

If a user wants to extract data from multiple endpoints, they are not bound to using the pre-configured templates. The extractor UI can switch to a JSON editor, where users have the freedom to alter endpoints and mappings to their liking, This process does, however, require Keboola user's to have advanced knowledge of the generic extractor.

Are you interested in learning more about our Magento integration? Contact us.


Leo Chan

tag:blog.keboola.com,2013:Post/1266170 2018-03-28T15:53:44Z 2018-03-28T18:56:24Z Understanding project management workflow by integrating Keboola + Asana

Related image

Asana is on a mission to help humanity thrive by enabling all teams to work together effortlessly, improve the productivity of teams, and increase the potential output of every team’s effort. They provide a great web-based project management tool which allows users across teams to keep track of their work. 

Although Asana does offer fantastic UI, features to help managers or project managers to gauge the progress of the project, it lacks the simplicity in creating a dashboard to report the progress. For example, the number of tasks completed last week, the number of tasks tagged 1st priority, the number of tasks each user has, etc. With Asana extractor, users can transform and enrich the data with Keboola to have a better insight into the projects contained in Asana. The integration will enhance collaboration and build an actionable, 360-degree view of project management and usage as well as customer experience.

Colin McGrew
tag:blog.keboola.com,2013:Post/1247559 2018-02-14T17:08:48Z 2018-03-11T15:42:10Z See How Keboola Automated Personalized MailChimp + SurveyMonkey Campaigns

Our client is in the event business organizing regular meetups for CEO’s in the Vancouver area, specifically for technology companies. To organize these ad-hoc conferences for hundreds of people, they use excel, Salesforce as CRM, MailChimp for the emails and SurveyMonkey, well, for surveys. For those of you who have been using Keboola for some time, you might know this is the optimal setup for us.

After an initial discussion to understand their current in-house processes, we narrowed down the biggest pain to be the e-mailing component. Do you remember when I mentioned those meetups are for CEOs?

Well, they have 14 groups and each group meets once a month. For each event, they need to contact the host two weeks before in order to remind them. Then there is another reminder a week before the event that is sent to all the guests. Finally, after each meeting, there is a survey sent out to those who attended.

All those emails are being prepared and sent manually by one person. You can do the math but believe me, it is almost a full time job just to check every day what email has to be sent out to whom.

This is where Keboola stepped in…..

Let’s skip the part where we moved the Excel into Google Sheets, replaced nicely formatted bar charts and roadmaps with simple data tables so we could crunch all the data in Keboola.


After each meeting the organization collects the feedback using the SurveyMonkey. They measure several KPIs, plus they allow users to comment in each section. This survey results are distributed the next month as a part of the Mailchimp campaign reminding guests the next event is coming up soon.

Then we wrote a short API call (JSON Config?) using our almighty Generic extractor to get the data from SurveyMonkey. They have a good documentation, so it’s not difficult to obtain the survey results.

The more complicated part of the process was to join together all the output tables from their API, because the endpoint created around 20 tables full of parents and childs.

Mailchimp Part 1

The most important part of the puzzle, is that all emails are being distributed from MailChimp.

We had to figure out how to automatically feed each Mailchimp template with personalized data. It might sound easy at first, but by default Mailchimp offers only two variable fields connected to the recipient. Unsurprisingly, it is the first and the last name.

However, for each template we needed much more than that! We needed to personalize location, time and date of the event and even add a personal note. There special sections for those survey results and users comments from previous meeting but having just the two out of the box fields were not going to cut it.

This was exactly the moment when we started asking the “what if” type of questions. Quite soon after opening the API documentation, the right question hit us: Could we push the whole content via API call and not merely trigger the campaign remotely?

Keboola Python Part

This is where Leo, our Python ninja, stepped in. He will briefly go over how Keboola “hacked” the MailChimp API.

With the use of a custom science component written in Python and MailChimp API integration, users can ease the pain away from manually creating multiple campaigns and entering the “variables” (eg. Time of the event, locations of the event, etc) everytime when a reminder or email is needed to send out. Users are only required to maintain the templates within MailChimp and the google documentations which have detailed descriptions of the event and the list of participants. Combined with automation via Keboola orchestration, the custom science component can be run periodically fetching events and participants details. With fetched data, component will then use the configured templates in MailChimp and create a new campaign for every events the component can list. If repetitive event names are found within the same sheet/run, the component  will inject the details into the first encountered row with the same event name. Upon triggering completion of all the campaigns, the component will output a CSV to Keboola storage indicating the behaviour of the campaigns regarding whether or not it is successfully executed. “Sent” campaigns are kept in users’ MailChimp account as a record. Users can find what contents and which participants the campaign sends to. Stats and activities of the campaigns can also be found under the Report tab in users’ MailChimp Accounts.

The last step was to create a different csv outputs for each scenario. But since we could use as many variable fields in the Mailchimp template as we wished, it was just a matter of a couple SQL queries and a few python transformations to get the results with the right numbers in

Mailchimp Part 2

Now we had a functioning writer which was able to replace any predefined variable field with data in corresponding column in the output file.

All we had to do was to adjust the existing templates and we were ready to go!


I’m not sure how this solution would withstand thousand of recipients, but in our case, when there goes tens of e-mail at the most busy day, it works well.

The event manager’s job is now to maintain the master sheet where all they need to do is to add a new event. Keboola downloads the sheet every day, transformation detects if there is a need for any kind of email campaign and the custom writer pushes personalized campaigns through Mailchimp.



Data Geek / BI Developer

Colin McGrew
tag:blog.keboola.com,2013:Post/1220008 2018-01-02T19:28:51Z 2018-03-11T15:42:10Z Taking a data-driven approach to pricing to optimize profit

Research around pricing has consistently shown its importance. Deloitte  found on average, a 1 percent price increase translates into an 8.7 percent increase in operating profits (assuming no loss of volume, of course). Yet, also estimated up to 30 percent of the thousands of pricing decisions made by companies each year, fall short of delivering the best price. That’s a lot of money left on the table!

To often, pricing is an after thought, and pricing decisions are made by 'gut feeling,' based on a quick look at competitor websites. Ideally, pricing decisions involve determining the value to the customer relative to the competitor, factoring in pricing power and pricing strategy. For more on this see Pricing Power. What it is. How to get it. from our partners at Ibbaka. Understanding value requires data, and that data can change over time so that the best thought out pricing model can soon be out of date. The solution is data science plus data integration. Data from many different sources can be connected to value and pricing models and when these get out of alignment, due to changes in the market and competitor actions, alerts can be triggered.

When we think about B2B sales analysis, the things that initially come to mind usually involve reporting on CRM data to understand sales by product, region, deal velocity, and the like. Whereas most innovative B2C companies take greater advantage of the mountains of valuable data they have, B2B companies have been a bit slower to adopt this approach. After identifying customer segments, data from not only CRM, but ERP, third party economic data sources and others can be used to understand past purchases & prices, preferences  and more to determine the optimal price. As mentioned, even a 1% increase can have a huge impact.

Automation is king

One of the critical keys to getting more out of your data is automating or even better, eliminating, as many mundane processes as possible. This allows organizations to focus on the test & evaluate part of pricing process, not configuring infrastructure and monitoring and maintaining data flows. It also allows the people doing analysis to take advantage of larger and more diverse data sets and make adjustments without a huge headache.

Sell pricing internally

As someone who has spent years in sales, I can tell you that pricing is so much more than just a number. It’s lead to heated discussions around many tables in innumerable offices. It’s one of the big “hows” sales reps have to keep in mind when trying to close deals. Providing B2B sales organizations, with information and context to help reps understand the factors behind pricing is mission critical. Keeping them educated on this topic will build confidence and translate to clients that have a better understanding of how the cost of the product translate to business value. Introduce optimal pricing earlier in sales cycles can also lead to increasing the speed to close deals as well as increasing win rate.

Test and evaluate

It’s important to remember that pricing is not a static, set it one time type of event. It’s important to try, test and learn. This process is so much better when we can use data to influence and validate the decisions made.

Pricing is an important part of the foundation for strong sales and marketing organizations and using data effectively as part of your strategy can reveal key insights to make sure you get it right.

Want to learn more?

Join us for our upcoming webinar with Ibbaka where will dive deeper into data-driven pricing strategies!

Colin McGrew
tag:blog.keboola.com,2013:Post/1215623 2017-12-12T17:54:16Z 2018-03-11T15:42:10Z Holiday Gift Ideas for the Data Geek in Your Life

I can’t believe it’s already been a year since we covered some great gift ideas for data people!  We’re back with some more last minute ideas, some may look familiar albeit bigger (or smaller) and better while others are new arrivals. Whatever you’re looking for, we hope at least one of these ideas will help you find something that really excites the techie / data lover in your life this holiday season!

Amazon Echo Second Gen (Dot)

Colin McGrew
tag:blog.keboola.com,2013:Post/1205527 2017-11-15T17:49:08Z 2018-03-11T15:42:10Z How to take an agile (Minimum Viable Product) approach to analytics projects

By now, the idea of agile development and a Minimum Viable Product or MVP is prevalent. The problem is, while most people have the minimum part down, people often haven't mastered the viable…. especially when it comes to analytics.

To quickly recap,a Minimum Viable Product is an approach, where you’re focusing on creating a product with a sufficient level of features to be able to solve a particular problem. This first iteration is used to collect user feedback and develop the complete set of features for the final product to be delivered.

That’s all nice and well, but you may be wondering what the benefits to this approach are as it concerns analytics projects...

Learning, and learning quickly

Is your solution actually delivering the value that you are trying to create? In a typical project, you may be months down the road before what you’re building is actually in front of users. This makes it difficult to determine its viability for solving the business case. The whole point is to prove or disprove your initial assumptions sooner.

  • What part of your users current process is really frustrating them?

  • Are the analytics we designed actually guiding them through their workflow and making their life better?

By getting a usable set of features in front of user’s earlier in the process, we can collect feedback and determine if we are in fact on the right track.

Colin McGrew
tag:blog.keboola.com,2013:Post/1192622 2017-09-20T18:59:53Z 2018-03-11T15:42:09Z What is data health and why is it critical to your business?


Every great data insight or data visualization starts with good, clean data. Whether you want to understand lifetime value, design upsell and cross-sell strategies, define personas, or develop sophisticated data models, having clean and consolidated data will enable better analytics, improve the performance of your marketing campaigns and maximize your marketing ROI.

Clean data is particularly crucial for CRM, ERP, sales and IT systems with customer data. For example, proper planning and cleansing of your customer data from the beginning will keep you from falling behind on your CRM implementation. Your data needs to be reviewed, filtered and cleaned to ensure that bogus data is not transferred. The the cost to the business of processing errors can be evaluated from the time spent on manual troubleshooting, forced ETL re-runs and at worst, representing incorrect or invalid data to the customers or employees to drive their business decisions.

  • How do you ensure your data is not wrong or incomplete when you digest data from various third-party sources, especially sources like FTP and AWS S3 which (unlike an API) do not have given structure all the time?

  • How do you successfully migrate data from an old system to new one?

It is safe to say that the majority of data flows have set of expected data types defined and very often the value range as well.

One option is to use SQL or Python transformations but such hard coded configuration or approach can be very time-consuming and it is lacking of the flexibility or simplicity to be reused. Additionally, it would not be obvious which rows and columns include rogue values until these transformations run into an error (or you would have to design a specific workflow to off-load them.)

Another option is to describe the data and set up value and type conditions for it in the form of rules. Once that’s done, all you need to do is make sure data flows include rules that check every time you run the orchestration (ETL process). KBC Data Health App has been designed to help you automate this data check process.

Typical use cases:

  • Ensuring data quality from systems with data collected by users (internal IT systems, CRM, user forms, etc.)

  • Migrating data from legacy systems - data migration assumptions vs. reality check

  • Validating crucial fields for report buildup

KBC Data Health Application

Data Health Application is an app designed to aid users to produce a clean data file. To boost user productivity, it provides users a simple and convenient solution to cleanse or filter data instead of creating multiple long queries in transformation to obtain the same results. Some primary features include:

  • Filtering data based on user configured rules to match business needs

  • Can be triggered to run on a scheduled basis

  • Generate a report with descriptions and reasons why rows are rejected

As many users did not have any prior knowledge in SQL, this application is capable of creating basic SQL functionalities through simple user interface inputs. The application does not have any pre-configured rules. It allows users to have the freedom to create rules tailored to their needs and wants. With the combination of KBC orchestration, this application can be triggered on a daily/weekly basis depending on user’s business requirements. With that being said, users will have an automated progress that generates “clean” data to conduct any in depth analysis without worrying about handling corrupted data or outliers.

Supported Rules:

  1. List Comparison

  2. Digit Count

  3. Numeric Comparison (Value comparison)

  4. Regular Expression (Regex)

  5. Column Type (Applicable value type)


Input Table:

Screen Shot 2017-08-29 at 15023 PMpng


  • User wants anything within the “Western Europe” Region

  • User is only interested in countries placed within the top 10 happiness rank

  • Happiness score cannot be empty

Output table:

Screen Shot 2017-08-29 at 21203 PMpng

If you’re already a KBC user you can find the Data Health app alongside the rest of our data applications. Not yet a user and want to learn more? Contact us to discuss.


Leo Chan
tag:blog.keboola.com,2013:Post/1191251 2017-09-14T19:59:12Z 2018-03-11T15:42:09Z Is there untapped value in your data?


Embedded analytics, data products, data monetization, big data….there are plenty of buzz words we can use to “categorize” the idea.

IDC reports that the big data and business analytics market growing at a rate of over 11% in 2016 and at a compound annual growth rate of 11.7% through to 2020. This rapidly growing area of investment can’t be for naught….can it?

Let’s look beyond the hype at some specific approaches for extracting additional value (and ultimately dollars) from your data.

According to Gartner, Data Monetization refers to using data for quantifiable economic benefit.

The first thing that may come to mind is outright selling of data (via a data broker or independently.)  Although a potentially viable option, with increased data privacy policies and the sheer amount of data needed to be successful with this approach, it can be quite limiting.

There are many other approaches to monetizing your data, such as:

Colin McGrew
tag:blog.keboola.com,2013:Post/1188142 2017-09-01T21:07:37Z 2018-03-11T15:42:09Z How Keboola Switched to Automatic Invoicing

We’ve been assisting people with data and automation for a while now, helping them become “data-driven.” Several months ago, we had an exciting opportunity to automate within Keboola itself. After we lost our two main finance wizards to the joys of childcare, we decided to automate our internal business processes.

There was a lot of work ahead of us. For years, the two of them had been issuing invoices one by one. Manually. Hating unnecessary manual tasks, we were eager to put the power of our platform — Keboola Connection into work and eliminate the manual invoicing.

We expected approximately 2-3 mandays per month to be cut down. We also wanted to get much better data about what’s going on.

As our sales activities have been taking off around the globe, we would need to automate this process anyway. Otherwise soon we would have to hire a new employee just for invoicing and that is a no-go for us. Plus we didn’t want to overload Tereza, our new colleague, with this tedious work and take away her weekends from her. 

When it comes to data, we often preach the agile methodology: Start small, build quick, fail fast and have the results in production from day one - slow Kaizen style improvement. This is exactly what we did with our invoicing automation project. We didn’t want to have someone write a custom app for us. We wanted to hack our current systems, prototype, fail fast and see where it would lead us. We wanted to save Tereza’s time but didn’t want to waste it 10x in the development of the system. :-)

Our “budget” was 3-4 mandays max!

Step 1  —  Looking for a tool to use for the invoicing

We were looking for a tool which can handle all the basic things we need: different currencies (it’s Europe!), different bank accounts, with or without tax, paid or unpaid, and a handful of other features. Last but not least, the tool HAD to have a nice RESTful API. After some trials we opted for a Czech system – Fakturoid. They have great support, by the way. That’s a big plus.

Step 2  — Getting data about customers from Fakturoid into Keboola Connection

First, Padak took all clients we already had in Flexibee, our accounting tool, and exported them to Fakturoid. Then we added all the necessary info to the contacts.

Great. Now we had all the customers’ records ready and needed to get the data into Keboola Connection. It was time to set up our Generic Extractor. It literally took me half an hour to do it! Check it out here:

Keboola Generic extractor config for getting clients’ info from Fakturoid into Keboola Connection

Step 3  —  Creating two tables with invoices and their items for uploading into Fakturoid

There was only one more thing to know. Who is supposed to pay for what and when? We store this info in our Google Spreadsheet. It contains basic info about our clients, the services they use, the price they pay for them, the invoicing period (yearly, quarterly, monthly), and the time period for which the info is valid (when their contract is valid; new contract/change = new row). To be able to pair the tables easily, we added a new column with the Fakturoid client ID.

Finally, we set up our Google Drive Extractor and loaded the data into Keboola Connection. Once we had all the data there, we used SQL to create a Transformation that took everything necessary from the tables (who we bill this month, how much, if out of country = don’t put VAT, add info about current exchange rate, etc.) and created clean output tables.

Part of the SQL transformation which creates an output table with items to pay for Fakturoid.

Step 4  — Sending the tables into Fakturoid and letting it create the invoices

This step was not as easy as exporting data from Fakturoid. We couldn’t use any preprogrammed services. Thankfully, Keboola Connection is an open environment and any developer can augment it and add new code to extend its functionality. Just wrap it up in Docker container. We asked Vlado to write a new writer for Fakturoid which would take the output tables from our Transformation (see Step 3) and create invoices in Fakturoid from the data in those tables.

It took Vlado only 2 hours to have the writer up and running!

Now when the writer is completed, Keboola Connection has one more component which is available to all its users.

Step 5 — Automating  the whole process

It was the easiest part. We used our Orchestration services inside Keboola Connection and created an orchestration which automatically starts on the first day of each month. Five minutes later, all the invoices are done and sent out. #easypeasy


It is not a complicated solution. No rocket science. We believe in splitting big problems into smaller pieces, solving the small parts and putting them back together just like Lego bricks. The process should be easy, fast, open and put together from self-contained components. So when you have a problem in one part, it doesn’t affect the whole solution and you can easily fix it.

Saving Tereza’s time, this is the springboard for automating other parts of her job. We want her to spend more time doing more interesting things. And the process scales as we grow.

It took us:

  • 4 hours to analyse and understand the problem and how things are connected,
  • 1 hour to export clients from the accounting system,
  • 1/2 hour to write a Generic Extractor from Fakturoid,
  • 2 hours to write a Transformation preparing clean output data,
  • 2 hours to develop a new Writer for Fakturoid, and
  • 1-2 hours to do everything else related to the whole proces.

Total = circa 11 hours

Spoiler alert: I’m already working on further articles from the automation series. Look forward to reading how we implemented automatic distribution of invoices to clients and the accounting company, or how we let the systems handle our invoicing for implementation work.

tag:blog.keboola.com,2013:Post/1185777 2017-08-22T19:50:06Z 2018-07-06T23:08:19Z How to build data products that increase user engagement

Think about all the social media platforms out there, which ones do you use the most (and why)? I’m not talking about giving your LinkedIn profile a face lift before you put in a job application or searching for a long lost friend on Facebook; which of these apps are actually driving user engagement? For me, it’s Instagram; the interface is easy to navigate and more than once I’ve found myself re-opening it after I’ve just closed it. Have you thought about why many of these platforms have exploded in user engagement with many people posting to their Twitter or Facebook accounts multiple times per day? According to a recent Gartner blog, adoption rate for some of the BI tools in their Magic Quadrant are at a low but not too surprising 21%. Are people sick and tired of “all that data” or is there something more sinister at work…

We’ve thought a lot about social media platforms (and other apps) that seem to drive such high user engagement and put together a few thoughts on how you can do the same within your data product to ensure you keep users coming back for more. Before we reveal the secret sauce for building engagement in your data products, let’s take a quick look at how many analytics teams approach the problem.

Too often, teams building an analytics product for this customer’s approach the project in the wrong way, the story is oh so familiar. As we covered in a recent blog, this meant taking the reports existing in an Excel spreadsheet and web-ifying them in a cloud BI tool. It’s essentially surfacing the exact same information as before, but now with shiny new charts and graphs, more color choices, and some interactivity. After the initial excitement over the new toy in the room, the latter solution isn’t doing any better than the former at driving engagement; let alone delivering “insights” or creating a new revenue stream.

One of the big reasons customer aren’t lining up to write a check for the latest, greatest data product a vendor has rolled out is that the analytics team failed to make it engaging. Simply put, product teams need to let users know “hey—check this out,” “hey—we’ve got some important information for you”, and “hey—you should come back and see us.” Most teams do the second part, the “we’ve got insights” piece, but they fail to inform users why they need to keep coming back for more. These are essential elements of establishing engagement; not building these in is like skipping the foundation of a new skyscraper. "It's like when you see a skyscraper; you're impressed by the height, but nobody is impressed by the foundation. But make no mistake, it's important," said Akshay Tandon, Head of Strategy & Analytics at LendingTree.

Want to avoid the killer mistakes of failing to build engagement into your data product? Here’s how:

Colin McGrew
tag:blog.keboola.com,2013:Post/1179818 2017-08-09T16:13:24Z 2018-03-11T15:42:09Z Creating Intelligent Narratives with Narrative Science & Keboola

Intelligent Narratives are the data-driven stories of the enterprise. They are automated, insightful communications packed with the information that matters most to you—specific to your role or industry—written in conversational language, and at machine scale. By giving your employees and your customers a richer, more nuanced understanding of your business, they can make more informed decisions and realize their greatest potential.

Narrative Science is the leader in advanced natural language generation (Advanced NLG) for the enterprise. Quill™, its Advanced NLG platform, learns and writes like a human, automatically transforming data into Intelligent Narratives—insightful, conversational communications packed with audience-relevant information that provide complete transparency into how analytic decisions are made.

As we all know, one of the biggest barriers to successful data projects is having the right data in the right place; that's why Narrative Science and Keboola have partnered to bring the next generation of analytics to you faster. Automate data workflows, reduce time and complexity of implementations and start gaining new insights now! Leverage this app, powered by Narrative Science, to produce machine-generated narratives of data ingested by Keboola. 

Colin McGrew
tag:blog.keboola.com,2013:Post/1177314 2017-07-26T16:45:10Z 2018-03-11T15:42:09Z Freethink + Keboola: Understanding cross-channel video analytics

Video is one of the hottest trends in digital marketing. YouTube, which has expanded more than 40 percent since last year, reaches more 18-49 year-old viewers than any of the cable networks and has a billion users watching hundreds of millions of hours every day. 

Freethink, a modern media publisher, uses online video to tell the stories of passionate innovators who are solving some of humanity’s biggest challenges by thinking differently. While telling important stories is their primary focus, data underlies all of their decisions. As a publisher, they need to understand how well each piece of content performs, as well as how that content performs across platforms (they currently publish videos on their website, YouTube and Facebook.)

Prior to working with Keboola, collecting and combining data for cross-channel video analysis was a time consuming, manual effort (particularly because Facebook has separate APIs to track page content and promoted content.) In addition, this process made performing time-over-time analyses a real challenge.

The goal was to provide a dashboard solution for the team to have better visibility into their data. Keboola Connection (KBC) was able to overcome this by leveraging existing API connections to get data from Facebook and YouTube. In addition, Keboola utilized its partnership with Quintly (social media analytics) in order to pick up cleaned and verified data from their API.  All this data is combined additional data sources including Google Sheets to provide additional metadata for advanced reporting and segmentation. This blended data enables universal reporting across platforms to get a 360-degree picture of each piece of content.

Image result for social media

Freethink now has all their data populated in Redshift, where Chartio is able to connect to create beautiful dashboards for reporting. They are able to go into the Keboola platform and manually adjust and run configurations to get exactly the data they need. The biggest gains have been in time saved, being able to show change over time and freeing the team up to focus on more complicated analyses. This also opened up data access to the broader team, promoting collaboration and data driven decision making.

"Keboola really helped simplify and automate the process of collecting and combining data. Working together, Chartio and Keboola Connection deliver a full stack solution for modern analytics, taking full advantage of the cloud. I’m able to give my team better insights into our performance and make better decisions, quicker."

-Brandon Stewart, Executive Editor at Freethink



Colin McGrew
tag:blog.keboola.com,2013:Post/1168513 2017-06-28T16:07:44Z 2017-07-05T16:32:51Z The Best Tool for Your Data Product Journey? A Good Map


For anyone creating an analytics product, the pressures of engaging customers and generating revenue while protecting your core product and brand can be overwhelming, especially when aiming to hit so many goals on the horizon:

  • Does it target users effectively?

  • Will it guide users to a solution to their business problem?

  • Can it scale to many customers?

  • Will it deliver real results that customers are willing to pay for??

Fortunately, we've been there, done that, and understand what it takes to build a great data product. That's why we've created a map to help you navigate your way to success, built on the experience of countless voyagers who have sailed the same seas before you; the Data Product Readiness Assessment.

Colin McGrew
tag:blog.keboola.com,2013:Post/1163897 2017-06-14T17:08:18Z 2017-06-15T17:12:49Z Why your data product needs a good elevator pitch


In recent years, a term started appearing across the technology world: “data monetization,” turn your data into dollars.. (as we mentioned in a previous, post, you can Find Gold in Your Data!) Businesses reacted to the hype, started spending on every solution under the sun and then… Nothing. Nada. Zilch. In many cases the revenues never materialized, buyers became frustrated with the lack of results and blamed the whole concept of data monetization. The problem is, you’ve got to avoid certain mistakes... and they’re silent killers.

In truth, data products are a great opportunity for most businesses to engage customers and create new streams of revenue. Untapped, dormant data can, when refined properly, become a crucial resource for your company. Fortunately, we’ve worked on many analytics projects ourselves, have seen these mistakes made and have put together a guide to help you avoid making them yourself.

To provide some quick insight, we thought we’d share one of the tips we’ve found most helpful when starting to create an analytics product.

Creating an elevator pitch

Colin McGrew
tag:blog.keboola.com,2013:Post/1151352 2017-05-03T16:44:02Z 2017-08-22T22:13:25Z What is "modern" business intelligence anyway...?


Last week, Tableau hosted a session on the evolution of Business Intelligence in Portland that I had the chance to attend. Although I did review their Top 10 trends in BI when they released them earlier this year, the presentation and discussion ended up being pretty interesting. A few of the topics really resonated with me and I thought we could dig into them a bit more.  

For starters:

Modern BI becomes the new normal

The session (and report) kick off by highlighting Gartner’s Business Intelligence Magic Quadrant and the shift away from IT-centric BI over the last 10 years. Regardless of who’s discussing the trends (Gartner, Tableau or otherwise..) and if or when they come to fruition, it’s important to dig deeper. **Reports like those by Gartner are good guideposts for trends and technologies to exam; saw that mentioned somewhere recently, comment for credit.

That said, I think we can agree that the overall landscape of technology and the way that organizations of all sizes are taking advantage of it in the domain of business intelligence has improved over the last decade.

So does that mean modern BI has truly arrived?

Although some ideas come to mind when I hear the phrase..

What is modern business intelligence?  

And do we all think of the same things when we discuss it….?


Colin McGrew
tag:blog.keboola.com,2013:Post/1147456 2017-04-19T16:41:44Z 2017-04-19T18:50:34Z Find Gold in Your Data


"Data Monetization" is a term you might have heard a lot lately.  But what does it really mean for you and your business?  There is gold in your data, but how can you extract it to gain all its benefits without adding resource burdens on your business?  We collected the main approaches successful companies are using to give you inspiration and insight into how you can use data you already have to improve efficiencies, create new revenue streams or increase value and hence your wallet share from your current customer base. 

Use data to make better decisions

It is not always about the big, earth shaking decisions. What if we can empower our employees to choose better paths in incremental fashion? Which ad to place in an available space? How to utilize remaining capacity on a shipment? Those items may each mean just $50.00, or $1,000.00. But people can be easily making 50 decisions like that per day.

Colin McGrew
tag:blog.keboola.com,2013:Post/1143970 2017-04-04T21:47:09Z 2017-04-06T16:29:38Z First Principles: The Foundation of a Great Data Product

To kick-off our new series about creating data products, we decided to write a white paper. This sounds simple, but this time it was a little more difficult than we expected.

Specifically, where do we start when we want to explain the difficulties data product teams face and how to overcome the critical obstacles? Should we begin with user personas and how to design data products that engage users? Do we kick things off with a piece about pricing data products and the finer points of ensuring future up-sell paths? How about a few words explain why data products that don’t use Keboola are doomed to fail and bring shame upon their product teams and ultimately their entire company? Hmmm... All possibilities, but none of these seemed the best way to start our series.
After much thought and coffee, we decided to start at the beginning with “first principles”—those foundational attributes which distinguish successful analytical applications from those that don’t quite meet their objectives. Our white paper would discuss these principles that make a data product truly great.
Wait—isn’t that a little vague, a little “fluffy”? Not at all. We felt compelled to start with these principles because, while not as mathematical as pricing or as black and white as dashboard design, it can be hard to know where to begin when you’re part of
a product team charged with building an analytics product. First principles act as guide post to help you stay on the right path.
These guide post are essential for product team because it isn’t easy trying create analytics that have a positive impact both for users and on your company’s bottom line. Do you start by setting revenue targets and determining the cost structure that
needs to be achieved in order for the data product to be profitable? Maybe you start by defining the various reports and information that you need to put in the hands of your customers to solve their problems and reduce the deluge of “more data” requests. Or perhaps you could start by brainstorming a list of all of the features that might make users engage with the analytics—requests you’ve received or functionality that is present in your competitors’ products.
Each of these paths is a reasonable starting point, but are any of them the best way to begin the process of building a great data product? That's where first principles come into play.
First principles don’t have anything to do with bar charts versus pie charts or even technology selection. Instead, they are a set of guiding beliefs about what makes a data product great. They are foundational truths and from them, everything else—features, pricing, and product strategy—follow.
As we start our series on creating data products, we felt that our first principles were a great place to begin and so, we’d like to share them with you in a white paper. Before you start to think that these principles will be a rehash of all the modern catchphrases such as “embrace the change” or “empower each other”—these are directly targeted at creating successful data products. They are a collection of elements that we’ve seen in great, successful analytics-based products and are the place where we always begin when considering each project.
We hope that you find these elements of a great data product useful in your journey to deliver analytics to your customers and, as always, we’re here to help if you’d like to build a data product together.
Please enjoy your free copy of Elements of a Successful Data Product!

tag:blog.keboola.com,2013:Post/1135345 2017-03-02T18:43:13Z 2017-03-02T18:43:13Z Facebook Prophet - Forecasting library

It all started yesterday morning when I saw multiple tweets mentioning new forecasting library published on my way to work:


Sounds interesting, I thought. I bookmarked the link for “weekend fun with code” and moved on. The minute I stepped in the office, Amazon S3 had the outage (coincidence?) which impacted half of the internet and KBC as well. Ok, what can i do now then?

I opened link to the facebook engineering page and started reading about the forecasting module. They supplied quite simple instructions and it made me tempted to test it out. Wouldn't it be great to use it in some KBC projects?

Since the code needed for forecasting is pretty simple, I mocked up a script of suitable for KBC use before lunch and when amazon (US-east) got back up, I could implement the code as a custom science app.

The algorithm requires two columns, date and the value column. The current script gets the source and result tables’ information from the input and output mapping and the parameters specified by user. Those parameters will define:

  • Date column name

  • Value column name

  • Required prediction length (period)

This is how it looks like in the Keboola:


To see the output in a visual form, I used Jupyter, which has been recently integrated within KBC. Not bad for a day’s work, what do you say?


Just imagine how easy would be for our user to orchestrate the forecasting process:

  1. Extract sales data

  2. Run forecasting

  3. Enrich data by forecasted values

  4. Publish them to sales and marketing teams



  • The sample data I used sucks. I bet yours will be better!
  • Here is the link for Jupyter notebook.
  • Feel free to check some other custom science apps I did: https://bitbucket.org/VFisa/

Where Prophet shines (from Facebook page)

Not all forecasting problems can be solved by the same procedure. Prophet is optimized for the business forecast tasks we have encountered at Facebook, which typically have any of the following characteristics:
  • hourly, daily, or weekly observations with at least a few months (preferably a year) of history
  • strong multiple “human-scale” seasonalities: day of week and time of year
  • important holidays that occur at irregular intervals that are known in advance (e.g. the Super Bowl)
  • a reasonable number of missing observations or large outliers
  • historical trend changes, for instance due to product launches or logging changes
  • trends that are non-linear growth curves, where a trend hits a natural limit or saturates

Martin Fiser (Fisa)

Keboola, Vancouver, Canada

Twitter: @VFisa

Martin Fiser
tag:blog.keboola.com,2013:Post/1127740 2017-02-01T17:38:10Z 2017-03-02T23:30:47Z Webhooks and KBC - How to trigger orchestration by form submission (Typeform)

Triggering KBC orchestration with webhook

How to trigger orchestration by form submission

Use case

Keboola just implemented a product assessment tool dedicated to OEM partners. The form's results will show how submitters fare in the various dimensions of data product readiness, areas on which to focus, and specific next steps to undertake.

We wanted to trigger the orchestration that extracts the responses (have you noticed new Typeform extractor?), processes the data, and updates our GoodData dashboard with answers. There was no option to use "Magic Button" to do so because there is no guarantee the respondent would click on it at the end of the form.

Martin Fiser
tag:blog.keboola.com,2013:Post/1127717 2017-01-31T23:57:44Z 2017-02-01T17:50:56Z Keboola + InterWorks Partnership Offers End-to-End Solutions for Tableau


We’re always keeping an eye out for BI and analytics experts to add to our fast growing network of partners and we are thrilled to add a long-standing favorite in the Tableau ecosystem! InterWorks, who holds multiple Tableau Partner Awards, is a full spectrum IT and data consulting firm that leverages their experienced talent and powerful partners to deliver maximum value for their clients. (Original announcement from InterWorks here.)  This partnership is focused on enabling consolidated end-to-end data analysis in Tableau.

Whether we’re talking Tableau BI services, data management or infrastructure, InterWorks can deliver everything from quick-strikes (to help get a project going or keep it moving) to longer-term engagements with a focus on enablement and adoption. Their team has a ton of expertise and is also just generally great to work with.

InterWorks will provide professional services to Keboola customers, with the focus on projects using Tableau alongside Keboola Connection, both in North America and in Europe, in collaboration with our respective teams.  “We actually first got into Keboola by using it ourselves,” said InterWorks Principal and Data Practice Lead Brian Bickell. “After seeing how easy it was to connect to multiple sources and then integrate that data into Tableau, we knew it had immediate value for our clients.”

What does this mean for Keboola customers?

InterWorks brings world-class Tableau expertise into the Keboola ecosystem. Our clients using Tableau can have a one-stop-shop for professional services, leveraging both platforms to fully utilize their respective strengths. InterWorks will also utilize Keboola Connection as the backbone for their white-gloves offering for a fully managed Tableau crowned BI stack.

Shared philosophy

Whether working on projects with customers or partners, we both believe that aligning people and philosophy is even more critical than the technology behind it.  To that end, we’ve found in InterWorks a kindred spirit, we believe in being ourselves and having fun, while ensuring we deliver the best results for our shared clients. The notion of continuous learning and trying new things was one of the driving factors behind the partnership.

Have a project you want to discuss with InterWorks?

Contact InterWorks or if you want to learn a bit more about the types of projects they work on, check out their blog!

Please contact us if you have questions or want to learn more about Keboola.

Colin McGrew
tag:blog.keboola.com,2013:Post/1117312 2016-12-21T20:12:10Z 2016-12-29T23:02:36Z Keboola #YearInReview: Customer & Partner Highlights

It’s been quite an exciting year for us here at Keboola and the biggest reason for that is our fantastic network of partners and customers -- and of course a huge thanks to our team!  In the spirit of the season, we wanted to take a quick stroll down memory lane and give thanks for some of the big things we were able to be a part of and the people that helped us make them happen!


Probably the biggest news from a platform perspective this year came about two years after we first announced support for the “nextt” data warehouse called Amazon Redshift.  At the time, it was a huge step in the right direction.  We still use Redshift for some of our projects (typically due to data residency or tool choice) but this year we were thrilled to announce a partnership born in the cloud when we officially made the lightning fast and flexible Snowflake the database of choice behind our storage API and the primary option for our transformation engine. Not to get too far into the technical weeds (you can read the full post here,) but it has helped us deliver a ton of value to our clients (better elasticity and scale, huge performance improvement for concurrent data flows, better “raw” performance by our platform, more competitive pricing for our customers and best of all, some great friends!)  Since our initial announcement, Snowflake joined us in better supporting our European customers by offering a cloud deployment hosted in the EU (Frankfurt!)  We’re very excited to see how this relationship will continue to grow over the next year and beyond!


One of our favorite things to do as a team is participate in field events so we can get out in the data world and learn about the types of projects people work on, challenges they run into, and find out what’s new and exciting.  It’s also a great chance for our team to spend some time together as we span the globe - sometimes Slack and Goto Meeting isn’t enough!

SeaTug in May

We had the privilege of teaming up with Slalom Consulting to co-host the Seattle Tableau User Group back in May.  Anthony Gould was a gracious host, Frank Blau provided some great perspective on IoT data and of course Keboola’s own Milan Veverka dazzled the crowd with his demonstration focused on NLP and text analysis.  Afterwards, we had the chance to grab a few cocktails, chat with some very interesting people and make a lot of new friends.  This event spawned quite a few conversations around analytics projects; one of the coolest came from a group of University of Washington students who analyzed the sentiment of popular music using Keboola + Tableau Public (check it out.)


Colin McGrew
tag:blog.keboola.com,2013:Post/1113387 2016-12-06T18:31:30Z 2016-12-09T05:47:17Z Why Avast Hasn’t Migrated into Full Cloud DWH

How breaking up with Snowflake.net is like breaking up with the girl you love

At the beginning of May, I got a WhatsApp message from Eda Kucera:

“Cheers, how much would it cost to have Peta in Snowflake? Eda”

There are companies that rely only on “slide ware” presentations. Other companies are afraid to open the door to the unknown and not have their results guaranteed. Avast is not one of them. I am glad I can share with you this authentic description of Avast’s effort to balance a low-level benchmark, fundamental shift in their employees’ thinking and the no-nonsense financial aspect of all that.

Let’s get back to May. Just minutes after receiving Eda’s WhatsApp message, almost 6 months of deep testing began in our own Keboola instance of Snowflake. Avast tested Snowflake with their own data. (At this point I handed it to them, the rest was entirely in their hands.)

They dumped approximately 1.5TB a day from Apache Kafka into Keboola’s Snowflake environment, and assessed the speed of the whole process, along with its other uses and its costs.

With a heavy heart, I deleted Avast’s environment within our Snowflake on October 13. Eda and Pavel Chocholous then prepared the following “post mortem”:

Pavel’s Feedback on Their Snowflake Testing

“It’s like breaking up with your girl…”

This sentence is summing it all up. And our last phone call was not a happy one. It did not work out. Avast will not migrate into Snowflake. We would love it, but it can’t be done at this very moment. But I’m jumping ahead. Let’s go back to the beginning.

The first time we saw Snowflake was probably just before DataHackathon in Hradec Kralove. It surely looked like a wet dream for anyone managing any BI infrastructure. Completely unique features like cloning the whole DWH within minutes, linear scalability while running, “undrop table”,”select…at point in time” etc.

How well did it work for us? The beginning was not so rosy, but after some time, it was obvious that the problem was on our side. Data was filled with things like “null” as text value, and as the dataset was fairly “thin”, it had a crushing impact. See my mental notes after the first couple of days:

“Long story short — overpromised. 4 billion are too much for it. The query has been parsing that json for hours, saving the data as a copy. I’ve already increased the size twice. I’m wondering if it will ever finish. The goal was to measure how much space it will take while flattened, jsoin is several times larger than avro/parquet…."

Let me add that not only our data was bad. I also didn’t know that scaling while running a query would affect only the consequently run queries and not the currently running one. So I was massively overcharging my credit card having the megainstance of Snowflake ready in the background, while my query was still running on the smallest possible node. Well, you learn from your mistakes :). This might be the one thing I expected to “be there”, but I’m a spoiled brat. It really was too good to be true.

Okay, after ironing out all the rookie errors and bugs, here is a comparison of one month of real JSON format data:

(Data size within SNFLK vs. Cloudera parquet file 3.7TB (hadoop) vs. 4.2TB (Snowflake))

Tabulated results…You know, it is hard to understand what our datasets look like and how complicated queries they hold. The main thing is that benchmarks really suck :). I personally find  the overall results much more interesting:
  • Data in Snowflake are roughly the same size as in our Hadoop, yet I would suggest to expect 10% — 20% difference.

  • Performance: we didn’t find any blockers.

  • Security/roles/privileges: SNFLK is much more mature than Hadoop platform, yet it cannot be integrated with on-premise LDAP.

  • Stability: SNFLK is far more stable than Hadoop. We didn’t encounter a single error/warning/outage so far. Working with Snowflake is nearly the opposite to hive/impala where errors and cryptical and misleading error messages are part of the ecosystem culture ;).

  • Concept of caching in SNFLK cannot be fully tested, but we have proved that it affects performance in a pleasant yet a bit unpredictable way.

  • Resource governance in SNFLK is a mature feature, beast type of queries are queued behind the active queries while small ones sneak through etc.

  • Architecture of separated 'computing nodes' can stop inter-team collisions easily. Sounds like marketing bullshit, but yes, not all teams do love each other and are willing to share resources.

  • SNFLK can consume data from various sources from most of cloud-on/on-premise services (Kafka, RabbitMQ, flat files, ODBC, JSBC, practically any source can be pushed there). Its DWH as a service architecture is unique and compelling (Redshift/Google BigQuery/GreenPlum could possibly reach this state in the near future).

  • Migration of 500+ TB data? Another story  —  one of the points that undermine our willingness to adopt Snowflake.

  • SNFLK provides limited partitioning abilities; it can bring even more performance, once enabled at full scale.

  • SNFLK would allow platform abuse with all of its 'create database as a copy', 'create warehouse as a copy', 'pay more, perform more'. And costs can grow through the roof. Hadoop is a bit harder to scale which somehow guarantees only reasonable upgrades ;).

  • SNFLK can be easily integrated into any scheduler. Its command line client is the best one I’ve seen in last couple of years.

Notes from Eda

“If we did not have Jumpshot in the house, I would throw everything into Snowflake…”

If I was to build a Hadoop cluster of the size 100TB-200TB from scratch, I would definitely start with Snowflake…Today, however, we would have to pour everything in it, and that is really hard to do while you’re fully on-premise… It would be a huge step forward for us. We would become a full-scale cloud company. That would be amazing!

If I had to pay the people in charge of Hadoop US wages instead of Czech wages, I would get Snowflake right away. That’s a no brainer #ROI.

Unfortunately, we will not go for it right now. Migrating everything is just too expensive for us at the moment and using Snowflake only partially just doesn’t make sense.

Our decision was also affected by our strong integration with Spark; we’ve been using our Hadoop cluster as compute nodes for it. In SNFLK’s case, this setup would mean pushing our data out of SNFLK into the EC2 instance where the Spark jobs would be running. That would cost additional 20-30% (the data would be running inside AWS, but the EC2s cost something as well). I know Snowflake is currently working on a solution for this setup, but I haven’t found out what it is.

In our last phone call with SNFLK, we learned that storage prices were going down again. So, I assume that we will meet within a reasonable time frame, and reopen our discussion. (In November, Snowflake has started privately testing their EU datacenter and will open it publicly in January 2017.) In the meantime, we’ll have an on-demand account for practicing :).

Petr Šimeček
tag:blog.keboola.com,2013:Post/1097598 2016-10-24T22:44:33Z 2016-10-24T23:45:36Z Keboola’s Solutions for Agencies

We would like to show you how some of our clients redefined their businesses by routinely using data in their daily activities. Despite the fact that each company’s situation is different, we hope to give you some ideas to explore in your own business.

If you work in a service agency, as a customer care manager or in similar type positions, you are all about efficiency. Any idle time spent on non-revenue generating activities means wasted time and manpower, and more importantly, a net loss for your organization.

To ensure optimal operation, you may be asking yourself questions like this:

  • Is your team correctly prioritizing clients with a higher profit margin?

  • How are individual team members performing compared to each other?

  • Are team members doing the work they are best suited for?


Sometimes the simplest graphs show the most relevant information. The graph that you see below (generally known as "bullet chart") has been coined the “earthworm” by our clients. Provided by one of our clients, this particular graph eloquently shows agent performance overall, as well as in comparison to the team average.

As a manager, imagine having one of these for each of your agents. In a mere seconds you can distinguish your top vs. poor performers and take the actions needed to enhance or improve their behavior.

Customer Care

Diving deeper into individual performance, you can then examine why each agent is performing the way they are. After you take a look at the next client example, you will see that this series of earthworms track agent performance in different areas.

tag:blog.keboola.com,2013:Post/1097594 2016-10-24T22:31:54Z 2016-10-24T23:39:57Z Keboola’s Marketing Solutions

Even though we understand that every company and each department within it have very different BI needs, we also believe in sharing inspiration from our clients about how they make relevant business decisions using data in their daily routines. You might find this helpful in shaping your own solution.

When planning a new product launch and deciding where to spend your marketing budget, you probably have questions regarding the impact of your campaign:

  • How long will it take to turn marketing leads into faithful customers?

  • Did I target the correct customer group?

  • Do my potential customers respond to the advertisement as expected?

  • What is the return of investment for my campaign based on different target groups and products?

Check out similar questions our clients have asked. Combine them with an analytical mindset, and create the reports your company needs to invest in better marketing decisions, and to generate a higher return on investment.

Roman Novacek from Gorilla Mobile says: “When looking at our marketing model, everything seemed to be going according to plan. But when we looked deeper into what we thought were well-performing campaigns, we found out that while some ads and channels were performing extraordinarily well, others were draining the overall average leading to mediocre results.”

sales funnel

tag:blog.keboola.com,2013:Post/1097593 2016-10-24T22:21:25Z 2016-10-24T23:40:22Z McPen: Built and Run on Data

McPen is a European chain distributor of stationery goods. They are one of the first small to mid-sized retailers who use a data-driven approach to business and enable equal access to data to all of their employees.

Initial situation

Embarking on their data-driven business journey, McPen realized that to excel in the stationery goods space, they would need to create a competitive advantage with a unique operational management system. In order to identify retail solutions specific to their business, they wanted to combine many previously unconnected data sources, and upgrade and speed up their reporting process.

Where Keboola came in

Assisted by the Ascoria team, our partner, McPen’s CEO Milan Petr configured the new system from scratch and without the help of a single developer. McPen began to pull data from sources like their POS, Frames and other retail sources, allowing everybody in the company to use this compiled and easily accessible data to find solutions to their real retail problems.

Focusing on lean operations and adding new features, Milan created a system that benefitted the entire organization. He knew that to effectively manage shifts in business, he had to involve every part of the organization in making decisions based on data. Leading by example, he developed and studied the system in detail to understand its impact on daily operations. He then provided access and support directly to the people on the floor to empower them to make necessary strategic decisions and improve their daily results.


Surprising benefits and results

Examined data showed that in order to maximize profitability, McPen needed to upsell customers. And while their biggest income comes from customers who spend between 200 and 500 CZK (around 8 to 20 USD), it is the 42% of all McPen customers spending up to 50 CZK (around 2 USD) who have the biggest potential for the upsell.

tag:blog.keboola.com,2013:Post/1097126 2016-10-08T19:13:17Z 2016-12-15T15:21:32Z Please hold, your call is important to us

We’ve recently experienced two fairly large system problems that have affected approximately 35% of our clients.

The first issue took 50 minutes to resolve and the other approximately 10 hours. The root cause in both cases was the way we handled the provisioning of adhoc sandboxes on top of our SnowflakeDB (a few words about "how we started w/ them").

We managed to find a workaround for the first problem, but the second one was out of our hands.  All we could do was fill in a support ticket with Snowflake and wait. Our communication channels were flooded with questions from our clients and there was nothing we could do. Pretty close to what you would call a worst-case scenario.! Fire! Panic in Keboola!

My first thoughts were like: “Sh..t! What if we run the whole system on our own infrastructure, we could do something now. We could try to solve the issue and not have to just wait…”

But, we were forced to just wait and rely on Snowflake. This is the account of what happened since:

Petr Šimeček
tag:blog.keboola.com,2013:Post/1090764 2016-09-19T16:32:23Z 2017-03-02T23:31:40Z Snowflake vs. Redshift backend speed comparison


At the same time as the announcement about default backend in KBC being shifted to Snowflake, I have started working on a new project. The customer pushed us the initial dump of two main tables (10M rows each) and some other small attribute tables.  

Martin Fiser
tag:blog.keboola.com,2013:Post/1089387 2016-09-12T21:06:29Z 2016-10-24T23:41:29Z New dose of steroids in the Keboola backend

More than two years after we announced support for Amazon Redshift in Keboola Connection, it’s about the friggin’ time to bring something new to the table. Something that will propel us further along. Voila, welcome Snowflake.

About 10 months ago we presented Snowflake at a meetup hosted at the GoodData office for the first time.

Today, we use Snowflake both behind the Storage API (it is now the standard backend for our data storage) and the Transformations Engine (you can utilize the power of Snowflake for your ETL-type processes). Snowflake’s SQL documentation can be found here.

What on Earth is Snowflake?

It’s a new database, built from scratch to run in the cloud. Something different that when a legacy vendor took an old DB and hosts it for you (MSSQL on Azure, Oracle in Rackspace or PostgreSQL in AWS).

Petr Šimeček
tag:blog.keboola.com,2013:Post/1088412 2016-09-09T17:04:55Z 2016-10-24T23:41:46Z Guiding project requirements for analytics

In a recent post, we started scoping our executive level dashboards and reporting project by mapping out who the primary consumers of the data will be, what their top priorities / challenges are, which data we need and what we are trying to measure.  It might seem like we are ready to start evaluating vendors and building it out the project, but we still have a few more requirements to gather.

What data can we exclude?

With our initial focus around sales analytics, the secondary data we would want to include (NetProspex, Marketo and ToutApp) all integrates fairly seamlessly with the Salesforce so it won't require as much effort on the data prep side.  If we pivot over to our marketing function however, things get a bit murkier.  On the low end this could mean a dozen or so data sources.  But what about our social channels, Google Ads, etc, as well as various spreadsheets.  In more and more instances, particularly for a team managing multiple brands or channels, the number of potential data sources can easily shoot into the dozens.

Although knowing what data we should include is important, what data can we exclude? Unlike the data lake philosophy (Forbes: Why Data Lakes Are Evil,) when we are creating operational level reporting, its important focus on creating value, not to overcomplicating our project with additional data sources that don't actually yield additional value.

Who's going to manage it?

Just as critical to the project as what and how; who’s going to be managing it? What skills do we have out our disposal and how many hours can we allocate for the initial setup as well as ongoing maintenance and change requests?  Will this project be managed by IT, our marketing analytics team, or both? Perhaps IT will manage data warehousing and data integration and the analyst will focus on capturing end user requirements and creating the dashboards and reports.  Depending on who's involved, the functionality of the tools and the languages used will vary. As mentioned in a recent CMS Wire post Buy and Build Your Way to a Modern Business Analytics Platform, its important to take an analytical inventory of what skills we have as well as what tools and resources we already have we may be able to take advantage of.


Colin McGrew
tag:blog.keboola.com,2013:Post/1081540 2016-08-16T07:46:03Z 2016-10-24T23:42:02Z When Salesforce Met Keboola: Why Is This So Great?


How can I get more out of my Salesforce data?

sfdcpngAlong with being the world’s #1 CRM, Salesforce provides an end-to-end platform to connect with your customers including Marketing Cloud to personalize experiences across email, mobile, social, and the web, Service Cloud to support customer success, Community Cloud to connect customers, partners and employees and Wave Analytics designed to unlock the data within.

After going through many Salesforce implementations, I’ve found that although companies store their primary customer’s data there, the opportunity enrich it further by bringing in related data stored in other systems such as invoices in ERP or contracts in dedicated DMS is a big one.  For example, I’ve seen clients run into the issue of having inconsistent data in multiple source systems when a customer changes their billing address.  In a nutshell, Salesforce makes it easy to report on that data stored within but can’t provide a complete picture of the customer unless we broaden our view.  

Martin Humpolec