Snowflake vs. Redshift backend speed comparison

Intro  

At the same time as the announcement about default backend in KBC being shifted to Snowflake, I have started working on a new project. The customer pushed us the initial dump of two main tables (10M rows each) and some other small attribute tables.  

The main transformation handles initial join of one of the big table and a handful of small ones...

... and then it is meshed with the other big table, on which we have to calculate duration... 

... and apply multiple cases

Originally, I had the complete transformations done in MySQL since I've used that transformation for data exploration on a small sample of data. However, when trying on the whole set, I had to kill the transformation after 36 minutes of running and not delivering a single output table...

Speed comparison  

Since MySQL was not an option, I have tested and compared both Redshift and new Snowflake transformations. It took almost 20 minutes to Redshift to cope with this transformation. In contrast, it took only 5 minutes to Snowflake!

Furthermore, I have explored (thanks Marcus) the difference between CREATE TABLE  and CREATE VIEW :  There is a subtle difference between those, it will actually shave off some time to use CREATE VIEW (the difference is between 7-10%). 

Overview  

Screenshots  

Redshift

Snowflake

Notes  

The purpose of this exercise was to evaluate the speed from the KBC user perspective. In other words, I have compared the whole transformation completion time -  both table data load and the query time. This represents the full user experience on the respective storage and transformation back-ends.

Thanks for reading,

Fisa

New dose of steroids in the Keboola backend

More than two years after we announced support for Amazon Redshift in Keboola Connection, it’s about the friggin’ time to bring something new to the table. Something that will propel us further along. Voila, welcome Snowflake.

About 10 months ago we presented Snowflake at a meetup hosted at the GoodData office for the first time.

Today, we use Snowflake both behind the Storage API (it is now the standard backend for our data storage) and the Transformations Engine (you can utilize the power of Snowflake for your ETL-type processes). Snowflake’s SQL documentation can be found here.

What on Earth is Snowflake?

It’s a new database, built from scratch to run in the cloud. Something different that when a legacy vendor took an old DB and hosts it for you (MSSQL on Azure, Oracle in Rackspace or PostgreSQL in AWS).

Snowflake is different. Perfectly elastic, frighteningly fast, with no limits for data storage (you can’t physically fill up a disc array to run out of space - there’s no “storage provisioning”, you get whatever you need) and what is most important, you can’t “kill it” or overload it with a dumb query. In Snowflake, you choose the power that is available for a particular query. You can even have two workers running two different queries concurrently over the same data, having absolutely no impact on each other, and on top of it you can speed them up or slow them down AS THEY’RE RUNNING. (check out this post and the video underneath for more info).

Until it is given that any database can be killed by a stupid query, it will remain the IT’s job to block your access to any production Teradata/Oracle/MSSQL… In Snowflake, you can only give trouble to one worker.

What does “no limits” mean?

During our beta testing we got some data from Rockaway into Snowflake transformation - three tables, a few million rows each (data volumes post-compression):

and ran this query

Which took 40 minutes (on x-small dwh) to create nearly a 0.5TB of data. It could’ve made 5TB. Or 50TB. Or ...whatever. Simply unlimited. There are no limits in Snowflake, just in your wallet #cloud (and, if needed, we could’ve used more powerful worker for that query that would need mere 5 minutes).

What does it mean for Keboola?

Four important points:

  1. Snowflake is now the default storage backend. Thanks to its elasticity we have no real boundaries - adding 50GB of data no more means the need of adding another Redshift node, with its cost we’d have to account for in the customer pricing
  2. We can run various processes over the data without affecting the client’s performance. So while our user runs his SQL/R/Python/Docker components over their data, our “black boxes” distilling the secret sauces (data profiling, data health, descriptive statistics, predictive models etc.) can be running at the same time with no negative effects. This allows us to make Keboola into the smart partner it is meant to be, one that can help and recommend without breaking the bank
  3. Brutal, raw performance. Let’s say you have clickstream of data from a small customer who creates 100GB/week, where you need to apply an attribution model written in Python. You can easily find out that the value of the output of the model is $Y, while the cost of the computing power to come up with it was four times as much. Historically for example, doing anything serious with an RTB system output made no sense economically. Until now.
  4. Great friends at Snowflake Computing Inc. :)

Why not Google BigQuery?

As I am writing this, I’d ask myself: “Why not dump such “big data” into Google BigQuery?”

Simple answer, actually:

  1. BigQuery practically can’t edit data once written there
  2. Costs $5.00 per TB pulled through the DB operations (not written into BigQuery), so 10 simple queries each hour can cost you nearly $4k before you know it

How do I get my hands on it?

All KBC projects that are not currently powered by a dedicated Redshift (if you have one, you know about it) will be migrated to Snowflake automatically. Those with Redshift can “opt in”. Ping your Keboola people to help out and answer any questions.

What’s next?

Innovation doesn’t stop. In Summer 2013 we started playing with Redshift, in February 2014 we rolled out Redshift-powered Transformations just to start, only 8 months later, talking to Thomas:

Took about another year to get to a contract with Snowflake

Just so we could, another year later, bring this technology (a completely new thing to many of you) to your fingertips. What will be the next leap?

Last fall, Larry Ellison said in his Oracle Open Day keynote that Oracle stopped meeting the traditional players at their clients and described how cloud computing changes Enterprise business:

“In this new world of cloud computing, everything has changed. And almost all of our competitors are new, CX (customer experience) application specialist Salesforce.com and ERP/HR application specialist Workday are the SaaS competitors Oracle sees most frequently”.

“We virtually never, ever see SAP. This is a stunning change. The largest [enterprise] application company in the world is still SAP, but we never see them in the cloud.”

“So this is how much our world has changed. Our two biggest competitors, the two companies we watched most closely over the last two decades, have been IBM and SAP, and we no longer pay any attention to either one of them. It is quite a shock.… I can make a case that IBM was the greatest company in the history of companies, but they’re just nowhere in the cloud. SAP was certainly the largest [enterprise] application company that has ever existed. They are nowhere in the cloud.”

That’s easy to agree with! Oracle itself, of course, is faking it a bit as they missed the train just as well. If Larry got the absence of the traditionals such as IBM and SAP correctly, we can discount IBM and their “quantum cloud” and focus on players like D-Wave, TensorFlow from Google, H2o.ai and the direction taken by Apache Zeppelin or Apache Spark.

Check out the comparison of the new Snowflake backend against Redshift in Martin's blog here.


Guiding project requirements for analytics

In a recent post, we started scoping our executive level dashboards and reporting project by mapping out who the primary consumers of the data will be, what their top priorities / challenges are, which data we need and what we are trying to measure.  It might seem like we are ready to start evaluating vendors and building it out the project, but we still have a few more requirements to gather.

What data can we exclude?

With our initial focus around sales analytics, the secondary data we would want to include (NetProspex, Marketo and ToutApp) all integrates fairly seamlessly with the Salesforce so it won't require as much effort on the data prep side.  If we pivot over to our marketing function however, things get a bit murkier.  On the low end this could mean a dozen or so data sources.  But what about our social channels, Google Ads, etc, as well as various spreadsheets.  In more and more instances, particularly for a team managing multiple brands or channels, the number of potential data sources can easily shoot into the dozens.

Although knowing what data we should include is important, what data can weexclude? Unlike the data lake philosophy (Forbes: Why Data Lakes Are Evil,) when we are creating operational level reporting, its important focus on creating value, not to overcomplicating our project with additional data sources that don't actually yield additional value.

Who's going to manage it?

Just as critical to the project as what and how; who’s going to be managing it? What skills do we have out our disposal and how many hours can we allocate for the initial setup as well as ongoing maintenance and change requests?  Will this project be managed by IT, our marketing analytics team, or both? Perhaps IT will manage data warehousing and data integration and the analyst will focus on capturing end user requirements and creating the dashboards and reports.  Depending on who's involved, the functionality of the tools and the languages used will vary. As mentioned in a recent CMS Wire post Buy and Build Your Way to a Modern Business Analytics Platform, its important to take an analytical inventory of what skills we have as well as what tools and resources we already have we may be able to take advantage of.

                                                    

What functionality will we require?

Although we know who will be running the project, we need to refine our focus to make the tool evaluation process more straightforward.  How often does the data need to be refreshed (daily, hourly…) and how will we integrate all of the data sources. Certain types of data, like unstructured, will require different functionality than something that's designed to capture sensor data.  Based on what we’re measuring, we may want to have snapshots of the data at a certain interval, as well as the capability to track data lineage.  How will we create the dashboards and visualize the data for end user consumption?  Will the users be able to run their own ad-hoc reports or will this be managed through report requests to an analyst / IT?  Depending on how we’ve integrated and warehoused the data for the project, there are a lot of different routes to go for visualization.  

Should we partner / outsource?

Another question to address is if we are going to outsource some or most of this project to a vendor?  Do we have dedicated developers or can we select an analytics platform that can free up our resources for another project?  Particularly with things like sales forecast analytics or embedded analytics, there are vendor with specific expertise and best practices that can add additional value to the project we may not get if we go it alone. In a nutshell, we want to make sure we have the right people with the right tools to maximize value and make the best use of our resources; these questions probably deserve (and will get) their own post.  

Will it scale?

Up to this point, we’ve tried to break down the project into components and do some light discovery for things to keep in mind.  After playing in the weeds a while, it’s a good idea to take a step back and ask a question about the broader project.  How will this solution scale?  Considering the talent we have available and the project requirements, how will the tools we select allow us to scale to more users, additional data sets and larger data volumes as the project grows?  The data landscape, and users needs will change; if we aren’t planning for the flexibility and growth, we might as well sink the ship right now and save the budget.  

                                                                

Thanks for checking out my post, if you enjoyed it you might also like: 5 data science algorithms that can help you better understand your customers.

Cheers,

Colin McGrew

I write about data, analytics and developing client relationships.

When Salesforce Met Keboola: Why Is This So Great?

whenharrymetsallyjpg

How can I get more out of my Salesforce data?

sfdcpngAlong with being the world’s #1 CRM, Salesforce provides an end-to-end platform to connect with your customers including Marketing Cloud to personalize experiences across email, mobile, social, and the web, Service Cloud to support customer success, Community Cloud to connect customers, partners and employees and Wave Analytics designed to unlock the data within.

After going through many Salesforce implementations, I’ve found that although companies store their primary customer’s data there, the opportunity enrich it further by bringing in related data stored in other systems such as invoices in ERP or contracts in dedicated DMS is a big one.  For example, I’ve seen clients run into the issue of having inconsistent data in multiple source systems when a customer changes their billing address.  In a nutshell, Salesforce makes it easy to report on that data stored within but can’t provide a complete picture of the customer unless we broaden our view.  

Another challenge I’ve noticed is that we can only report on the data residing there which means doing time-over-time or analysis of changes between snapshots is a problem.  Salesforce has a huge API and is able to connect to any other systems’ API, however that can also means a lot of development and lengthy, expensive customizations.  At the same time, Salesforce prefers declarative development where we don’t typically need to write any code and developing these connections are contrary to that idea.  

keboola-logo-name-smallpngEnter Keboola Connection, which allows us to blend Salesforce data along with other sources, clean them and run apps on top of them.  In minutes we can set up things like Churn prediction, logistic flow, segmentation and much more.  Supporting the same idea of close to no-development, the focus is on connecting the right data sources, creating transformations or using the data science apps and then automating these data flows.  This enables us to do cross-object reporting and cross data source analysis in our favorite data visualization tools or even blend data together and feed it back into Salesforce to further enrich our customer data.

Here’s a Couple of Examples:

NGO With Hundreds of Thousands of Donations

NGO, an organization that receives hundreds of thousands of donations that they track through campaigns in Salesforce.  The obvious challenge?  How can they get better insight into these campaigns to increase the size and number of donations?  Although within Salesforce we can use different reports and groupings of data to examine different points of view, this requires a lot of manual effort as well as some imagination.  

The Keboola difference?  Just use the Salesforce Extractor to get data out of Salesforce (click the button, authorize SFDC credentials and select the appropriate fields), do segment analysis, which will automatically analyzes data and groups donors together based on similarities, then upload the results (information about segment each contact it’s part of) with the Salesforce Writer back into the system.  This equals days of saved time and a much more precise grouping of contacts which will directly address the donation insights they’re looking for.  

Other benefits include predicting when the next payment will come based on existing data.  Just imagine that NGO without any guaranteed income would be able to predict their cash flow, that would be awesome!   

New CRM without additional information

The second example is a customer who just implemented Salesforce CRM basics - companies, contacts, addresses number of employees and so on.  This isn’t a bad start, but very simple with no additional information.  It means that salespeople have to use this system to obtain and enter information about customers, but cannot use it for segmentation based on billings or existing contracts.  Connecting CRM with their existing ERP is out of the scope of this project, would be too expensive and take too long to deliver.  Because their existing ERP doesn’t support any web access, opening a page which would show billing data based on a provided customer id is out of question as well.  That said, we need salespeople to be able to take advantage of data from both systems to evaluate who to call first.

In Keboola this problem can be solved in minutes.  By using the connector to their existing ERP, using an identification field to match billings and contracts records to customer records and then creating or update these records in Salesforce.  Piece of cake!  This saved weeks of development and several hours every day for each salesman.  

sfdckbc 1jpg

Conclusion

It is easy to think about many other examples Keboola Connection can play an important role in, not just as a middle-man between Salesforce and other systems, but also to help identify important information across all the records we have.  What use case can you see in your environment?

About author

Martin Humpolec is Salesforce Certified Consultant and author of the Salesforce Writer for Keboola Connection. He blog about Salesforce and other things on his personal blog, you can also follow him on Twitter.

Recommendation Engine in 27 lines of (SQL) code

In the data preparation space, very frequently the focus lies in BI as the ultimate destination of data. But we see, more and more often, how data enrichment can loop straight back into the primary systems and processes.

Take recommendation. Just the basic type (“customers who bought this also bought…”). That, in its simplest form, is an outcome of basket analysis.

We recently had a customer who asked for a basic recommendation as a part of proof of concept, whether Keboola Connection is the right fit for them. The dataset we got to work with came from a CRM system, and contained a few thousand anonymized rows (4600-ish, actually) of won opportunities which effectively represented product-customer relations (what customers had purchased which products). So, pretty much ready-to-go data for the Basket Analysis app, which has been waiting for just this opportunity in the Keboola App Store. Sounded like good challenge - how can we turn this into a basic recommendation engine?

Basket analysis, simply put, looks at groups of items (baskets), and assigns a few important values on any combinations of items that are found in the primary dataset. The two most descriptive values are called "support" - the frequency in which a given combination of items presents itself in the dataset or its segment, and "lift" - the likelihood of items in the combination to appear together. If you want to go deeper, here's a good resource (it's a bit heavy reading, you've been warned). Now, this "lift" value is interesting - we can loosely translate the likelihood of the items being together as the likelihood that the customer may be interested in such combination.

So, for simple recommendation, we take what is in the “basket” right now, and look at additional items with highest “lift” value for the combination and display, or “recommend” them. That’s how you get conditioner with a shampoo or fries with a hot dog. While well understood, the algorithms are not simple nor are they “light”, they take decent amount of computing power, especially when you get beyond a few thousand transactions (or “baskets”).

To make things simpler I built a quick transformation to filter out the won opportunities only (the original set, having been a flattened table from SFDC, also contained tons of opportunities in other stages, obviously irrelevant to the task at hand), and just the needed columns - an ID, the customer identifier and a product they got:

And the resulting table:

That then got fed into KBC’s Basket Analysis app. The settings are simple, just point to the app and assign columns that contain what the app needs:

The output of the app gives several tables (you will learn about them from the app description of course), but the interesting one was ARL__1 (I need to talk to Marc who built this app to maybe refresh the table names a bit) - this one gives the found “rules”. The key columns are “LHS”, “RHS” and “lift”. In layman’s (mine) terms, this means: for a customer with a “basket” containing items listed in the “LHS” (Left Hand Side in case you were wondering), there is the “lift” value for the “RHS” (yes, you guessed it) items to be present as well.

The “LHS” and “RHS” columns are actually arrays, as logically there may be more products involved here. Quite fortunately the content is in alphabetical order (that will be important later on). The data looks like this:

Now, in my simple example, I really care only about 1 item to be recommended. So, I care only about rows from the ARL__1 table that:

a) have only one item in RHS and

b) out of those only the highest “lift” rows (this now is a temporary “recommendation” table, defining for each basket the next best product to recommend).

A few lines of SQL will take care of that, those are the first three queries in this transformation:

The rest of the code here is dealing with the few very simple tasks left:

Query 4 takes the original input table and “collapse” it into baskets in the same format as our “lhs” field in the basket analysis - think “GROUP_CONCAT” in MySQL, here it’s Redshift so the listagg() aggregation comes to aid. It also has the nifty ability to force ordering within the group, which allows us to match the order of items with the alphabetical output of the Basket Analysis app (told you it would be important - and thanks to our friends at Periscope Data for their blog where I came across this trick). And finally

Query 5 use this new field to join the temporary table, which allows us to get the recommended new product for each customer.

And we’re “done”. Here's the output data:

Why the quotation marks? This exercise represents just a very simple approach. Its success depends on nearly ideal initial conditions and overall lack of edge cases. Depending on the input data, there will be number of baskets that just don’t create a rule with enough significance, and therefore no recommendation will be served (note the null values in the table above). While this can be addressed by lowering the support threshold of the Basket Analysis app (in my example I used 1% cut off, which at the end yielded 59% success rate, or ratio of customers for whom we were able to provide solid recommendation), whether or not that solution makes sense depends heavily on the circumstances. Judgment needs to be applied - how many transactions we have? How many products? Etc. etc.

In some situations, we could take the customers without recommendations and run them through secondary process - take a part of their basket that has a high “lift” to a product that is not yet in the basket - obviously the SQL gets a bit more complicated. We can deal with most of edge cases in similar manner. At some point, however, going to a dedicated recommendation app such as the one from Recombee would be much better use of resources. This is no Netflix recommendation system :).

This was tons of fun to build, and totally good enough for the proof of concept project. We’ll write the recommended product back into the CRM system as a custom field, and the sales people will know exactly what to bring up next time! Or, perhaps, we’ll use some of the integrations with mailing systems to send these customers just the right piece of content.

If interested, get in touch to learn more!


Thanks for reading,

Milan

Find the Right Music: Analyzing last.fm data sentiment with Keboola + Tableau

                               Find The Right Musicpng

As we covered in our recent NLP blog, there are a lot of cool use cases for text / sentiment analysis.  One recent instance we found really interesting came out of our May presentation at SeaTUG (Seattle Tableau User Group.)  As part of our presentation / demo we decided to find out what some of the local Tableau users could do with trial access to Keboola; below we’ll highlight what Hong Zhu and a group of students from the University of Washington were able to accomplish with Keboola + Tableau for a class final project!

What class was this for and why did you want to do this for a final project?

We are a group of students at the University of Washington’s department of Human Centered Design and Engineering.  For our class project for HCDE 511 – Information Visualization, we made an interactive tool to visualize music data from Last FM.  We chose the topic of music because all 4 of us are music lovers.

Initially, the project was driven by our interest in having an international perspective on the popularity vs. obscurity of artists and tracks.  However, after interviewing a number of target users, we learned that most of them were not interested in rankings in other countries.  In fact, most of them were not interested in the ranking of artists/tracks at all.  Instead, our target users were interested in having more individualized information and robust search functions, in order to quickly find the right music that is tailored to one’s taste, mood, and occasion.  Therefore, we re-focused our efforts on parsing out the implicit attributes, such as genre and sentiment, from the 50 most-used tags of each track.  That was when Keboola and its NLP plug-in came into play and became instrumental in the success of this project.

What specific data set(s) did you analyze and how did you collect it?

We extracted a large amount of data from Last FM’s API.  On the high level, there are two main data sets: top 100 artists in each country and top 100 tracks in each country.  For each of the two data sets, we extracted ranking, country name, total play count of all times, URL linking to the artist/track, and top 50 tags for each artist/track.  The total number of data points was over 2 million.  We eventually narrowed them down to 6 dimensions in our final visualization: Track Name, Artist, Play Count, URL, Top Tag, and Overall Sentiment Score.  We also decided to only visualize the top tracks data set due to time constraint of the school quarter.

Latest cleaned data set with sentiment scores:

lastfmpng

https://drive.google.com/file/d/0Bw1Dzj_jagXWMWJaa3Z3ZzE2emM/view?usp=sharing

So you’ve got the data, now what?

Moving over to our trial access to the Keboola platform, it was a few simple clicks and an authorization to access the data through Dropbox and bring in the spreadsheet.


                                Screen Shot 2016-06-14 at 123259 PMpng


Once you have the data you want to analyze, it’s a matter of clicking into the Keboola app store, selecting the app you want to run (NLP highlighted below) and choosing the table you want to analyze.  

                                 medium_NLP_3png

Now what?

Because the outputs from Geneea were 50 separate tables – one for each tag, we needed to combine them into one table and calculate the overall sentiment score for each track.

Due to our limited experience in Python and lack of SQL knowledge, we were only able to join 2 tables at a time using Transformation. In the end, we downloaded all the tables and joined them locally.   (**Typically this would be done with a SQL, Python or R transformation within the platform.)

Ready to visualize

Once the data is ready for analysis, its simple to download it as a .TDE file from Keboola or send it directly to Dropbox or Google Drive for consumption in Tableau desktop.  You can also create a live data connection directly to Tableau Server.  

                                   Screen Shot 2016-06-14 at 122912 PMpng

In this case, Tableau public  was the right choice for data visualization.  I’ve provided a screen shot below or you can check out the live viz here.

Tableau-Public-logopng

                                     Find The Right Musicpng

We’re glad we could lend a hand to the UW students (and thanks for letting us be part of a cool data project!)  As mentioned at the outset of the blog, please check out our previous blog  The value of text (data) and Geneea NLP app if you’d like to learn a bit more about the app or feel free to reach out.

Cheers,

Colin


If you want to learn more about the Tableau + Keboola integration, check out our brief YouTube video!

The value of text (data) and Geneea NLP app

Just last week, a client let out a sigh: “We have all this text data (mostly customer reviews) and we know there is tremendous value in that set but outside from reading it all and manually sorting through it, what can we do with it?”

With text becoming a bigger and bigger chunk of a company’s data intake, we hear those questions more and more often. A few years ago, the “number of followers” was about the only metric people would get from their Twitter accounts. Today, we want (and can) know much more; What are people talking about? How do we escalate their complaints? What about the topics trending across data sources and platforms? Those are just some examples of questions we’re asking of NLP (Natural Language Processing) applications at our disposal.

Besides the more obvious social media stuff, there are many areas where text analytics can play an extremely valuable role. Areas like customer support (think of all the ticket descriptions and comments), surveys (most have open-ended questions and their answers often contain the most valuable insights), e-mail marketing (whether it is analyzing outbound campaigns and using text analytics to better understand what works and what doesn’t, or compiling inbound e-mails) and lead-gen (what do people mention when reaching out to you) to name a few. From time to time we even come across more obscure requests like text descriptions of deals made in the past that need critical information extracted (for example contract expiration dates) or comparisons of bodies of text to determine “likeness” (when comparing things like product or job descriptions).

The “common” way to deploy text analytics service today is an API integration. There are quite a few services out there (Alchemy API, Rosette spring to mind) that allow anyone with an account to submit data into their API, and receive back results. While perfectly doable, it means that the customer needs a developer’s/engineer’s capacity and what company has these valued employee sitting around with nothing better to do?

Enter Geneea and their app in the Keboola Connection App Store:

While Geneea also offers API service to exchange data with their Interpretor platform crowned with the Frida dashboard, they early on recognized the potential of the Keboola Connection platform. It is Keboola’s job to remove the complexity (and the need for aforementioned developer time) from accomplishing data tasks. This is why having a strong NLP partner has been one of our key priorities. Check out what they can do with your text here.

Today, we have multiple customers utilizing the app to process text data for the use cases described above. Once your data is managed by Keboola Connection, setting up the text data enrichment with the Geneea app takes less time (actually, about 20% of it) than you just spent reading this blog!

Ready to attack your own text analytics opportunity?

Check out what Geneea wrote about their app on their blog.


Keboola and Slalom Consulting Team up to host Seattle’s Tableau User Group

On Wednesday, May 18th, Keboola’s Portland and BC team converged in Seattle to host the city’s monthly Tableau User Group with Slalom Consulting. We worked with SeaTUG’s regular hosts and organizers, Slalom Consulting, to put together a full evening of discussion around how to solve complex Tableau data problems using KBC. With 70+ people in attendance, Seattle’s Alexis Hotel was buzzing with excitement! 

The night began with Slalom’s very own Anthony Gould, consultant, data nerd and SeaTUG host extraordinaire, welcoming the group and getting everyone riled up for the night’s contest - awarding the attendee who’s SeaTUG related tweet got the most retweets! He showed everyone how we used Keboola Connection (KBC) to track that data and prepared them that this would be updated at the end of the night and prizes distributed!

Anthony passed off the mic to our very own Milan Veverka who got to the heart of the evening’s presentation, explaining how users and attendees can use KBC to solve the complex data problems that would be presented throughout the evening. Throughout the rest of the evening, Milan continued to present topics such as, “When SQL isn’t enough” and you want R or Python to get the results you want, data cleanliness and twitter text enrichment. He shared the stage with Slalom consultant Frank Blau, who presented on a variety of Internet of Things (IoT) topics, including weather data enrichment and working with magnetometer and EKG data.

Throughout the presentation and during the following breakout sessions, the audience was engaged, excited, asking lots of questions and doing a lot of laughing for such a technical presentation! Over the coming weeks, we’ll be releasing some video from the night and sharing more takeaways and results! We loved the experience and look forward to hosting more TUGs around North America!


Cleaning Dirty Address Data in KBC

There is an increasing number of use cases and data projects for which geolocation data can add a ton of value - e-commerce and retail, supply chain, sales and marketing, etc.  Unfortunately, one of the most challenging asks of any data project is relating geographical information to various components of the dataset. On a more positive note, however, KBC’s easy integration with Google apps of all kinds allows users to leverage Google Maps to add geo-coding functionality. Since we have so many clients taking advantage of geo-coding capabilites, one of our consultants, Pavel Boiko outlined the process of adding this feature to your KBC environment. Check it out!  

Anatomy of an Award Winning Data Project Part 3: Ideal Problems not Ideal Customers

Hopefully you’ve had a chance to read about our excitement and pride upon learning that two of our customers had won big awards for the work we’d done together. To jog your memory, Computer Science Corporation (CSC)’s marketing team won the ITSMA Diamond Marketing Excellence Award as a result of the data project we built together. CSC used KBC to bridge together 50+ data sources and pushing those insights out to thousands of CSC employees. To catch up on what you missed or to read again, revisit our Part 1 of our Anatomy of an Award Winning Data Project. 

Additionally, the BI team at Firehouse Subs won Hospitality Technology’s Enterprise Innovator Award for its Station Pulse dashboard built with a KBC foundation. The dashboard measures each franchise’s performance based on 10 distinct metrics and pulling data from at least six sources. To catch up on what you missed or to read again, revisit our Part 2 of our Anatomy of an Award Winning Data Project.

We’re taught that most businesses have a “typical” or “ideal” customer. When crafting a marketing strategy or explaining your business to partners, customers and your community, this concept comes up repeatedly. And we don’t really have a ready-made answer. A data-driven business can be in any industry and the flexibility and agility of the Keboola platform is by its very nature data source and use case agnostic.

And so, when these two customers of ours both won prestigious awards highlighting their commitment to data innovation, it got us thinking. These two use cases are pretty different. We worked with completely different departments, different data sources, different end-users, different KPIs, etc. And yet both have been successful, award-winning projects.

We realized that perhaps the question of an ideal customer isn’t really relevant for us. Perhaps we’d been asking the wrong question all along. We can’t define our target customer, but we can define the target problem that our customers need help solving.

Read more »