Uncover New Growth Opportunities in eCommerce with our Update Magento 2.0 Extractor

Magento is an open source eCommerce platform which provides merchants with a flexible solution to control their online stores’ contents, appearance and functions. With our updated extractor, merchants will be able to extract data from Magento into Keboola's platform to:

  1. Syncing inventory across multiple platforms

  2. Constructing cost structure by constructing merchant company’s financial data and suppliers’ data

  3. Observing or predicting the statistics of user engagement by analyzing customer information

Magento Extractor is written based on Generic Extractor. Multiple templates have been configured to extract data from the few popular endpoints offering the users to have a clean data extraction out of the box. 

This link contains information on how the extractor should be configured and some useful links connecting the users to Magento API settings.

If a user wants to extract data from multiple endpoints, they are not bound to using the pre-configured templates. The extractor UI can switch to a JSON editor, where users have the freedom to alter endpoints and mappings to their liking, This process does, however, require Keboola user's to have advanced knowledge of the generic extractor.

Are you interested in learning more about our Magento integration? Contact us.


Leo Chan

Understanding project management workflow by integrating Keboola + Asana

Related image

Asana is on a mission to help humanity thrive by enabling all teams to work together effortlessly, improve the productivity of teams, and increase the potential output of every team’s effort. They provide a great web-based project management tool which allows users across teams to keep track of their work. 

Although Asana does offer fantastic UI, features to help managers or project managers to gauge the progress of the project, it lacks the simplicity in creating a dashboard to report the progress. For example, the number of tasks completed last week, the number of tasks tagged 1st priority, the number of tasks each user has, etc. With Asana extractor, users can transform and enrich the data with Keboola to have a better insight into the projects contained in Asana. The integration will enhance collaboration and build an actionable, 360-degree view of project management and usage as well as customer experience.

See How Keboola Automated Personalized MailChimp + SurveyMonkey Campaigns

Our client is in the event business organizing regular meetups for CEO’s in the Vancouver area, specifically for technology companies. To organize these ad-hoc conferences for hundreds of people, they use excel, Salesforce as CRM, MailChimp for the emails and SurveyMonkey, well, for surveys. For those of you who have been using Keboola for some time, you might know this is the optimal setup for us.

After an initial discussion to understand their current in-house processes, we narrowed down the biggest pain to be the e-mailing component. Do you remember when I mentioned those meetups are for CEOs?

Well, they have 14 groups and each group meets once a month. For each event, they need to contact the host two weeks before in order to remind them. Then there is another reminder a week before the event that is sent to all the guests. Finally, after each meeting, there is a survey sent out to those who attended.

All those emails are being prepared and sent manually by one person. You can do the math but believe me, it is almost a full time job just to check every day what email has to be sent out to whom.

This is where Keboola stepped in…..

Let’s skip the part where we moved the Excel into Google Sheets, replaced nicely formatted bar charts and roadmaps with simple data tables so we could crunch all the data in Keboola.


After each meeting the organization collects the feedback using the SurveyMonkey. They measure several KPIs, plus they allow users to comment in each section. This survey results are distributed the next month as a part of the Mailchimp campaign reminding guests the next event is coming up soon.

Then we wrote a short API call (JSON Config?) using our almighty Generic extractor to get the data from SurveyMonkey. They have a good documentation, so it’s not difficult to obtain the survey results.

The more complicated part of the process was to join together all the output tables from their API, because the endpoint created around 20 tables full of parents and childs.

Mailchimp Part 1

The most important part of the puzzle, is that all emails are being distributed from MailChimp.

We had to figure out how to automatically feed each Mailchimp template with personalized data. It might sound easy at first, but by default Mailchimp offers only two variable fields connected to the recipient. Unsurprisingly, it is the first and the last name.

However, for each template we needed much more than that! We needed to personalize location, time and date of the event and even add a personal note. There special sections for those survey results and users comments from previous meeting but having just the two out of the box fields were not going to cut it.

This was exactly the moment when we started asking the “what if” type of questions. Quite soon after opening the API documentation, the right question hit us: Could we push the whole content via API call and not merely trigger the campaign remotely?

Keboola Python Part

This is where Leo, our Python ninja, stepped in. He will briefly go over how Keboola “hacked” the MailChimp API.

With the use of a custom science component written in Python and MailChimp API integration, users can ease the pain away from manually creating multiple campaigns and entering the “variables” (eg. Time of the event, locations of the event, etc) everytime when a reminder or email is needed to send out. Users are only required to maintain the templates within MailChimp and the google documentations which have detailed descriptions of the event and the list of participants. Combined with automation via Keboola orchestration, the custom science component can be run periodically fetching events and participants details. With fetched data, component will then use the configured templates in MailChimp and create a new campaign for every events the component can list. If repetitive event names are found within the same sheet/run, the component  will inject the details into the first encountered row with the same event name. Upon triggering completion of all the campaigns, the component will output a CSV to Keboola storage indicating the behaviour of the campaigns regarding whether or not it is successfully executed. “Sent” campaigns are kept in users’ MailChimp account as a record. Users can find what contents and which participants the campaign sends to. Stats and activities of the campaigns can also be found under the Report tab in users’ MailChimp Accounts.

The last step was to create a different csv outputs for each scenario. But since we could use as many variable fields in the Mailchimp template as we wished, it was just a matter of a couple SQL queries and a few python transformations to get the results with the right numbers in

Mailchimp Part 2

Now we had a functioning writer which was able to replace any predefined variable field with data in corresponding column in the output file.

All we had to do was to adjust the existing templates and we were ready to go!


I’m not sure how this solution would withstand thousand of recipients, but in our case, when there goes tens of e-mail at the most busy day, it works well.

The event manager’s job is now to maintain the master sheet where all they need to do is to add a new event. Keboola downloads the sheet every day, transformation detects if there is a need for any kind of email campaign and the custom writer pushes personalized campaigns through Mailchimp.



Data Geek / BI Developer

Taking a data-driven approach to pricing to optimize profit

Research around pricing has consistently shown its importance. Deloitte  found on average, a 1 percent price increase translates into an 8.7 percent increase in operating profits (assuming no loss of volume, of course). Yet, also estimated up to 30 percent of the thousands of pricing decisions made by companies each year, fall short of delivering the best price. That’s a lot of money left on the table!

To often, pricing is an after thought, and pricing decisions are made by 'gut feeling,' based on a quick look at competitor websites. Ideally, pricing decisions involve determining the value to the customer relative to the competitor, factoring in pricing power and pricing strategy. For more on this see Pricing Power. What it is. How to get it. from our partners at Ibbaka. Understanding value requires data, and that data can change over time so that the best thought out pricing model can soon be out of date. The solution is data science plus data integration. Data from many different sources can be connected to value and pricing models and when these get out of alignment, due to changes in the market and competitor actions, alerts can be triggered.

When we think about B2B sales analysis, the things that initially come to mind usually involve reporting on CRM data to understand sales by product, region, deal velocity, and the like. Whereas most innovative B2C companies take greater advantage of the mountains of valuable data they have, B2B companies have been a bit slower to adopt this approach. After identifying customer segments, data from not only CRM, but ERP, third party economic data sources and others can be used to understand past purchases & prices, preferences  and more to determine the optimal price. As mentioned, even a 1% increase can have a huge impact.

Automation is king

One of the critical keys to getting more out of your data is automating or even better, eliminating, as many mundane processes as possible. This allows organizations to focus on the test & evaluate part of pricing process, not configuring infrastructure and monitoring and maintaining data flows. It also allows the people doing analysis to take advantage of larger and more diverse data sets and make adjustments without a huge headache.

Sell pricing internally

As someone who has spent years in sales, I can tell you that pricing is so much more than just a number. It’s lead to heated discussions around many tables in innumerable offices. It’s one of the big “hows” sales reps have to keep in mind when trying to close deals. Providing B2B sales organizations, with information and context to help reps understand the factors behind pricing is mission critical. Keeping them educated on this topic will build confidence and translate to clients that have a better understanding of how the cost of the product translate to business value. Introduce optimal pricing earlier in sales cycles can also lead to increasing the speed to close deals as well as increasing win rate.

Test and evaluate

It’s important to remember that pricing is not a static, set it one time type of event. It’s important to try, test and learn. This process is so much better when we can use data to influence and validate the decisions made.

Pricing is an important part of the foundation for strong sales and marketing organizations and using data effectively as part of your strategy can reveal key insights to make sure you get it right.

Want to learn more?

Join us for our upcoming webinar with Ibbaka where will dive deeper into data-driven pricing strategies!

Holiday Gift Ideas for the Data Geek in Your Life

I can’t believe it’s already been a year since we covered some great gift ideas for data people!  We’re back with some more last minute ideas, some may look familiar albeit bigger (or smaller) and better while others are new arrivals. Whatever you’re looking for, we hope at least one of these ideas will help you find something that really excites the techie / data lover in your life this holiday season!

Amazon Echo Second Gen (Dot)

How to take an agile (Minimum Viable Product) approach to analytics projects

By now, the idea of agile development and a Minimum Viable Product or MVP is prevalent. The problem is, while most people have the minimum part down, people often haven't mastered the viable…. especially when it comes to analytics.

To quickly recap,a Minimum Viable Product is an approach, where you’re focusing on creating a product with a sufficient level of features to be able to solve a particular problem. This first iteration is used to collect user feedback and develop the complete set of features for the final product to be delivered.

That’s all nice and well, but you may be wondering what the benefits to this approach are as it concerns analytics projects...

Learning, and learning quickly

Is your solution actually delivering the value that you are trying to create? In a typical project, you may be months down the road before what you’re building is actually in front of users. This makes it difficult to determine its viability for solving the business case. The whole point is to prove or disprove your initial assumptions sooner.

  • What part of your users current process is really frustrating them?

  • Are the analytics we designed actually guiding them through their workflow and making their life better?

By getting a usable set of features in front of user’s earlier in the process, we can collect feedback and determine if we are in fact on the right track.

What is data health and why is it critical to your business?


Every great data insight or data visualization starts with good, clean data. Whether you want to understand lifetime value, design upsell and cross-sell strategies, define personas, or develop sophisticated data models, having clean and consolidated data will enable better analytics, improve the performance of your marketing campaigns and maximize your marketing ROI.

Clean data is particularly crucial for CRM, ERP, sales and IT systems with customer data. For example, proper planning and cleansing of your customer data from the beginning will keep you from falling behind on your CRM implementation. Your data needs to be reviewed, filtered and cleaned to ensure that bogus data is not transferred. The the cost to the business of processing errors can be evaluated from the time spent on manual troubleshooting, forced ETL re-runs and at worst, representing incorrect or invalid data to the customers or employees to drive their business decisions.

  • How do you ensure your data is not wrong or incomplete when you digest data from various third-party sources, especially sources like FTP and AWS S3 which (unlike an API) do not have given structure all the time?

  • How do you successfully migrate data from an old system to new one?

It is safe to say that the majority of data flows have set of expected data types defined and very often the value range as well.

One option is to use SQL or Python transformations but such hard coded configuration or approach can be very time-consuming and it is lacking of the flexibility or simplicity to be reused. Additionally, it would not be obvious which rows and columns include rogue values until these transformations run into an error (or you would have to design a specific workflow to off-load them.)

Another option is to describe the data and set up value and type conditions for it in the form of rules. Once that’s done, all you need to do is make sure data flows include rules that check every time you run the orchestration (ETL process). KBC Data Health App has been designed to help you automate this data check process.

Typical use cases:

  • Ensuring data quality from systems with data collected by users (internal IT systems, CRM, user forms, etc.)

  • Migrating data from legacy systems - data migration assumptions vs. reality check

  • Validating crucial fields for report buildup

KBC Data Health Application

Data Health Application is an app designed to aid users to produce a clean data file. To boost user productivity, it provides users a simple and convenient solution to cleanse or filter data instead of creating multiple long queries in transformation to obtain the same results. Some primary features include:

  • Filtering data based on user configured rules to match business needs

  • Can be triggered to run on a scheduled basis

  • Generate a report with descriptions and reasons why rows are rejected

As many users did not have any prior knowledge in SQL, this application is capable of creating basic SQL functionalities through simple user interface inputs. The application does not have any pre-configured rules. It allows users to have the freedom to create rules tailored to their needs and wants. With the combination of KBC orchestration, this application can be triggered on a daily/weekly basis depending on user’s business requirements. With that being said, users will have an automated progress that generates “clean” data to conduct any in depth analysis without worrying about handling corrupted data or outliers.

Supported Rules:

  1. List Comparison

  2. Digit Count

  3. Numeric Comparison (Value comparison)

  4. Regular Expression (Regex)

  5. Column Type (Applicable value type)


Input Table:

Screen Shot 2017-08-29 at 15023 PMpng


  • User wants anything within the “Western Europe” Region

  • User is only interested in countries placed within the top 10 happiness rank

  • Happiness score cannot be empty

Output table:

Screen Shot 2017-08-29 at 21203 PMpng

If you’re already a KBC user you can find the Data Health app alongside the rest of our data applications. Not yet a user and want to learn more? Contact us to discuss.


Is there untapped value in your data?


Embedded analytics, data products, data monetization, big data….there are plenty of buzz words we can use to “categorize” the idea.

IDC reports that the big data and business analytics market growing at a rate of over 11% in 2016 and at a compound annual growth rate of 11.7% through to 2020. This rapidly growing area of investment can’t be for naught….can it?

Let’s look beyond the hype at some specific approaches for extracting additional value (and ultimately dollars) from your data.

According to Gartner, Data Monetization refers to using data for quantifiable economic benefit.

The first thing that may come to mind is outright selling of data (via a data broker or independently.)  Although a potentially viable option, with increased data privacy policies and the sheer amount of data needed to be successful with this approach, it can be quite limiting.

There are many other approaches to monetizing your data, such as:

How Keboola Switched to Automatic Invoicing

We’ve been assisting people with data and automation for a while now, helping them become “data-driven.” Several months ago, we had an exciting opportunity to automate within Keboola itself. After we lost our two main finance wizards to the joys of childcare, we decided to automate our internal business processes.

There was a lot of work ahead of us. For years, the two of them had been issuing invoices one by one. Manually. Hating unnecessary manual tasks, we were eager to put the power of our platform — Keboola Connection into work and eliminate the manual invoicing.

We expected approximately 2-3 mandays per month to be cut down. We also wanted to get much better data about what’s going on.

As our sales activities have been taking off around the globe, we would need to automate this process anyway. Otherwise soon we would have to hire a new employee just for invoicing and that is a no-go for us. Plus we didn’t want to overload Tereza, our new colleague, with this tedious work and take away her weekends from her. 

When it comes to data, we often preach the agile methodology: Start small, build quick, fail fast and have the results in production from day one - slow Kaizen style improvement. This is exactly what we did with our invoicing automation project. We didn’t want to have someone write a custom app for us. We wanted to hack our current systems, prototype, fail fast and see where it would lead us. We wanted to save Tereza’s time but didn’t want to waste it 10x in the development of the system. :-)

Our “budget” was 3-4 mandays max!

Step 1  —  Looking for a tool to use for the invoicing

We were looking for a tool which can handle all the basic things we need: different currencies (it’s Europe!), different bank accounts, with or without tax, paid or unpaid, and a handful of other features. Last but not least, the tool HAD to have a nice RESTful API. After some trials we opted for a Czech system – Fakturoid. They have great support, by the way. That’s a big plus.

Step 2  — Getting data about customers from Fakturoid into Keboola Connection

First, Padak took all clients we already had in Flexibee, our accounting tool, and exported them to Fakturoid. Then we added all the necessary info to the contacts.

Great. Now we had all the customers’ records ready and needed to get the data into Keboola Connection. It was time to set up our Generic Extractor. It literally took me half an hour to do it! Check it out here:

Keboola Generic extractor config for getting clients’ info from Fakturoid into Keboola Connection

Step 3  —  Creating two tables with invoices and their items for uploading into Fakturoid

There was only one more thing to know. Who is supposed to pay for what and when? We store this info in our Google Spreadsheet. It contains basic info about our clients, the services they use, the price they pay for them, the invoicing period (yearly, quarterly, monthly), and the time period for which the info is valid (when their contract is valid; new contract/change = new row). To be able to pair the tables easily, we added a new column with the Fakturoid client ID.

Finally, we set up our Google Drive Extractor and loaded the data into Keboola Connection. Once we had all the data there, we used SQL to create a Transformation that took everything necessary from the tables (who we bill this month, how much, if out of country = don’t put VAT, add info about current exchange rate, etc.) and created clean output tables.

Part of the SQL transformation which creates an output table with items to pay for Fakturoid.

Step 4  — Sending the tables into Fakturoid and letting it create the invoices

This step was not as easy as exporting data from Fakturoid. We couldn’t use any preprogrammed services. Thankfully, Keboola Connection is an open environment and any developer can augment it and add new code to extend its functionality. Just wrap it up in Docker container. We asked Vlado to write a new writer for Fakturoid which would take the output tables from our Transformation (see Step 3) and create invoices in Fakturoid from the data in those tables.

It took Vlado only 2 hours to have the writer up and running!

Now when the writer is completed, Keboola Connection has one more component which is available to all its users.

Step 5 — Automating  the whole process

It was the easiest part. We used our Orchestration services inside Keboola Connection and created an orchestration which automatically starts on the first day of each month. Five minutes later, all the invoices are done and sent out. #easypeasy


It is not a complicated solution. No rocket science. We believe in splitting big problems into smaller pieces, solving the small parts and putting them back together just like Lego bricks. The process should be easy, fast, open and put together from self-contained components. So when you have a problem in one part, it doesn’t affect the whole solution and you can easily fix it.

Saving Tereza’s time, this is the springboard for automating other parts of her job. We want her to spend more time doing more interesting things. And the process scales as we grow.

It took us:

  • 4 hours to analyse and understand the problem and how things are connected,
  • 1 hour to export clients from the accounting system,
  • 1/2 hour to write a Generic Extractor from Fakturoid,
  • 2 hours to write a Transformation preparing clean output data,
  • 2 hours to develop a new Writer for Fakturoid, and
  • 1-2 hours to do everything else related to the whole proces.

Total = circa 11 hours

Spoiler alert: I’m already working on further articles from the automation series. Look forward to reading how we implemented automatic distribution of invoices to clients and the accounting company, or how we let the systems handle our invoicing for implementation work.

How to build data products that increase user engagement

Think about all the social media platforms out there, which ones do you use the most (and why)? I’m not talking about giving your LinkedIn profile a face lift before you put in a job application or searching for a long lost friend on Facebook; which of these apps are actually driving user engagement? For me, it’s Instagram; the interface is easy to navigate and more than once I’ve found myself re-opening it after I’ve just closed it. Have you thought about why many of these platforms have exploded in user engagement with many people posting to their Twitter or Facebook accounts multiple times per day? According to a recent Gartner blog, adoption rate for some of the BI tools in their Magic Quadrant are at a low but not too surprising 21%. Are people sick and tired of “all that data” or is there something more sinister at work…

We’ve thought a lot about social media platforms (and other apps) that seem to drive such high user engagement and put together a few thoughts on how you can do the same within your data product to ensure you keep users coming back for more. Before we reveal the secret sauce for building engagement in your data products, let’s take a quick look at how many analytics teams approach the problem.

Too often, teams building an analytics product for this customer’s approach the project in the wrong way, the story is oh so familiar. As we covered in a recent blog, this meant taking the reports existing in an Excel spreadsheet and web-ifying them in a cloud BI tool. It’s essentially surfacing the exact same information as before, but now with shiny new charts and graphs, more color choices, and some interactivity. After the initial excitement over the new toy in the room, the latter solution isn’t doing any better than the former at driving engagement; let alone delivering “insights” or creating a new revenue stream.

One of the big reasons customer aren’t lining up to write a check for the latest, greatest data product a vendor has rolled out is that the analytics team failed to make it engaging. Simply put, product teams need to let users know “hey—check this out,” “hey—we’ve got some important information for you”, and “hey—you should come back and see us.” Most teams do the second part, the “we’ve got insights” piece, but they fail to inform users why they need to keep coming back for more. These are essential elements of establishing engagement; not building these in is like skipping the foundation of a new skyscraper. "It's like when you see a skyscraper; you're impressed by the height, but nobody is impressed by the foundation. But make no mistake, it's important," said Akshay Tandon, Head of Strategy & Analytics at LendingTree.

Want to avoid the killer mistakes of failing to build engagement into your data product? Here’s how: