tag:blog.keboola.com,2013:/posts The Official Keboola Blog 2017-02-01T17:55:47Z Keboola Blog tag:blog.keboola.com,2013:Post/1127740 2017-02-01T17:38:10Z 2017-02-01T17:55:47Z Webhooks and KBC - How to trigger orchestration by form submission (Typeform)

Triggering KBC orchestration with webhook

How to trigger orchestration by form submission

Use case

Keboola just implemented a product assessment tool dedicated to OEM partners. The form's results will show how submitters fare in the various dimensions of data product readiness, areas on which to focus, and specific next steps to undertake.

We wanted to trigger the orchestration that extracts the responses (have you noticed new Typeform extractor?), processes the data, and updates our GoodData dashboard with answers. There was no option to use "Magic Button" to do so because there is no guarantee the respondent would click on it at the end of the form.

]]>
tag:blog.keboola.com,2013:Post/1127717 2017-01-31T23:57:44Z 2017-02-01T17:50:56Z Keboola + InterWorks Partnership Offers End-to-End Solutions for Tableau

                           iwpng


We’re always keeping an eye out for BI and analytics experts to add to our fast growing network of partners and we are thrilled to add a long-standing favorite in the Tableau ecosystem! InterWorks, who holds multiple Tableau Partner Awards, is a full spectrum IT and data consulting firm that leverages their experienced talent and powerful partners to deliver maximum value for their clients. (Original announcement from InterWorks here.)  This partnership is focused on enabling consolidated end-to-end data analysis in Tableau.

Whether we’re talking Tableau BI services, data management or infrastructure, InterWorks can deliver everything from quick-strikes (to help get a project going or keep it moving) to longer-term engagements with a focus on enablement and adoption. Their team has a ton of expertise and is also just generally great to work with.

InterWorks will provide professional services to Keboola customers, with the focus on projects using Tableau alongside Keboola Connection, both in North America and in Europe, in collaboration with our respective teams.  “We actually first got into Keboola by using it ourselves,” said InterWorks Principal and Data Practice Lead Brian Bickell. “After seeing how easy it was to connect to multiple sources and then integrate that data into Tableau, we knew it had immediate value for our clients.”

What does this mean for Keboola customers?

InterWorks brings world-class Tableau expertise into the Keboola ecosystem. Our clients using Tableau can have a one-stop-shop for professional services, leveraging both platforms to fully utilize their respective strengths. InterWorks will also utilize Keboola Connection as the backbone for their white-gloves offering for a fully managed Tableau crowned BI stack.

Shared philosophy

Whether working on projects with customers or partners, we both believe that aligning people and philosophy is even more critical than the technology behind it.  To that end, we’ve found in InterWorks a kindred spirit, we believe in being ourselves and having fun, while ensuring we deliver the best results for our shared clients. The notion of continuous learning and trying new things was one of the driving factors behind the partnership.

Have a project you want to discuss with InterWorks?

Contact InterWorks or if you want to learn a bit more about the types of projects they work on, check out their blog!


Please contact us if you have questions or want to learn more about Keboola.

]]>
Colin McGrew
tag:blog.keboola.com,2013:Post/1117312 2016-12-21T20:12:10Z 2016-12-29T23:02:36Z Keboola #YearInReview: Customer & Partner Highlights

It’s been quite an exciting year for us here at Keboola and the biggest reason for that is our fantastic network of partners and customers -- and of course a huge thanks to our team!  In the spirit of the season, we wanted to take a quick stroll down memory lane and give thanks for some of the big things we were able to be a part of and the people that helped us make them happen!


snowflakepng

Probably the biggest news from a platform perspective this year came about two years after we first announced support for the “nextt” data warehouse called Amazon Redshift.  At the time, it was a huge step in the right direction.  We still use Redshift for some of our projects (typically due to data residency or tool choice) but this year we were thrilled to announce a partnership born in the cloud when we officially made the lightning fast and flexible Snowflake the database of choice behind our storage API and the primary option for our transformation engine. Not to get too far into the technical weeds (you can read the full post here,) but it has helped us deliver a ton of value to our clients (better elasticity and scale, huge performance improvement for concurrent data flows, better “raw” performance by our platform, more competitive pricing for our customers and best of all, some great friends!)  Since our initial announcement, Snowflake joined us in better supporting our European customers by offering a cloud deployment hosted in the EU (Frankfurt!)  We’re very excited to see how this relationship will continue to grow over the next year and beyond!


tableaujpg

One of our favorite things to do as a team is participate in field events so we can get out in the data world and learn about the types of projects people work on, challenges they run into, and find out what’s new and exciting.  It’s also a great chance for our team to spend some time together as we span the globe - sometimes Slack and Goto Meeting isn’t enough!

SeaTug in May

We had the privilege of teaming up with Slalom Consulting to co-host the Seattle Tableau User Group back in May.  Anthony Gould was a gracious host, Frank Blau provided some great perspective on IoT data and of course Keboola’s own Milan Veverka dazzled the crowd with his demonstration focused on NLP and text analysis.  Afterwards, we had the chance to grab a few cocktails, chat with some very interesting people and make a lot of new friends.  This event spawned quite a few conversations around analytics projects; one of the coolest came from a group of University of Washington students who analyzed the sentiment of popular music using Keboola + Tableau Public (check it out.)

                                               seatugJPG

]]>
Colin McGrew
tag:blog.keboola.com,2013:Post/1113387 2016-12-06T18:31:30Z 2016-12-09T05:47:17Z Why Avast Hasn’t Migrated into Full Cloud DWH

How breaking up with Snowflake.net is like breaking up with the girl you love


At the beginning of May, I got a WhatsApp message from Eda Kucera:

“Cheers, how much would it cost to have Peta in Snowflake? Eda”

There are companies that rely only on “slide ware” presentations. Other companies are afraid to open the door to the unknown and not have their results guaranteed. Avast is not one of them. I am glad I can share with you this authentic description of Avast’s effort to balance a low-level benchmark, fundamental shift in their employees’ thinking and the no-nonsense financial aspect of all that.

Let’s get back to May. Just minutes after receiving Eda’s WhatsApp message, almost 6 months of deep testing began in our own Keboola instance of Snowflake. Avast tested Snowflake with their own data. (At this point I handed it to them, the rest was entirely in their hands.)

They dumped approximately 1.5TB a day from Apache Kafka into Keboola’s Snowflake environment, and assessed the speed of the whole process, along with its other uses and its costs.


With a heavy heart, I deleted Avast’s environment within our Snowflake on October 13. Eda and Pavel Chocholous then prepared the following “post mortem”:

Pavel’s Feedback on Their Snowflake Testing

“It’s like breaking up with your girl…”

This sentence is summing it all up. And our last phone call was not a happy one. It did not work out. Avast will not migrate into Snowflake. We would love it, but it can’t be done at this very moment. But I’m jumping ahead. Let’s go back to the beginning.

The first time we saw Snowflake was probably just before DataHackathon in Hradec Kralove. It surely looked like a wet dream for anyone managing any BI infrastructure. Completely unique features like cloning the whole DWH within minutes, linear scalability while running, “undrop table”,”select…at point in time” etc.


How well did it work for us? The beginning was not so rosy, but after some time, it was obvious that the problem was on our side. Data was filled with things like “null” as text value, and as the dataset was fairly “thin”, it had a crushing impact. See my mental notes after the first couple of days:

“Long story short — overpromised. 4 billion are too much for it. The query has been parsing that json for hours, saving the data as a copy. I’ve already increased the size twice. I’m wondering if it will ever finish. The goal was to measure how much space it will take while flattened, jsoin is several times larger than avro/parquet…."


Let me add that not only our data was bad. I also didn’t know that scaling while running a query would affect only the consequently run queries and not the currently running one. So I was massively overcharging my credit card having the megainstance of Snowflake ready in the background, while my query was still running on the smallest possible node. Well, you learn from your mistakes :). This might be the one thing I expected to “be there”, but I’m a spoiled brat. It really was too good to be true.

Okay, after ironing out all the rookie errors and bugs, here is a comparison of one month of real JSON format data:

(Data size within SNFLK vs. Cloudera parquet file 3.7TB (hadoop) vs. 4.2TB (Snowflake))

Tabulated results…You know, it is hard to understand what our datasets look like and how complicated queries they hold. The main thing is that benchmarks really suck :). I personally find  the overall results much more interesting:
  • Data in Snowflake are roughly the same size as in our Hadoop, yet I would suggest to expect 10% — 20% difference.

  • Performance: we didn’t find any blockers.

  • Security/roles/privileges: SNFLK is much more mature than Hadoop platform, yet it cannot be integrated with on-premise LDAP.

  • Stability: SNFLK is far more stable than Hadoop. We didn’t encounter a single error/warning/outage so far. Working with Snowflake is nearly the opposite to hive/impala where errors and cryptical and misleading error messages are part of the ecosystem culture ;).

  • Concept of caching in SNFLK cannot be fully tested, but we have proved that it affects performance in a pleasant yet a bit unpredictable way.

  • Resource governance in SNFLK is a mature feature, beast type of queries are queued behind the active queries while small ones sneak through etc.

  • Architecture of separated 'computing nodes' can stop inter-team collisions easily. Sounds like marketing bullshit, but yes, not all teams do love each other and are willing to share resources.

  • SNFLK can consume data from various sources from most of cloud-on/on-premise services (Kafka, RabbitMQ, flat files, ODBC, JSBC, practically any source can be pushed there). Its DWH as a service architecture is unique and compelling (Redshift/Google BigQuery/GreenPlum could possibly reach this state in the near future).

  • Migration of 500+ TB data? Another story  —  one of the points that undermine our willingness to adopt Snowflake.

  • SNFLK provides limited partitioning abilities; it can bring even more performance, once enabled at full scale.

  • SNFLK would allow platform abuse with all of its 'create database as a copy', 'create warehouse as a copy', 'pay more, perform more'. And costs can grow through the roof. Hadoop is a bit harder to scale which somehow guarantees only reasonable upgrades ;).

  • SNFLK can be easily integrated into any scheduler. Its command line client is the best one I’ve seen in last couple of years.

Notes from Eda

“If we did not have Jumpshot in the house, I would throw everything into Snowflake…”

If I was to build a Hadoop cluster of the size 100TB-200TB from scratch, I would definitely start with Snowflake…Today, however, we would have to pour everything in it, and that is really hard to do while you’re fully on-premise… It would be a huge step forward for us. We would become a full-scale cloud company. That would be amazing!

If I had to pay the people in charge of Hadoop US wages instead of Czech wages, I would get Snowflake right away. That’s a no brainer #ROI.

Unfortunately, we will not go for it right now. Migrating everything is just too expensive for us at the moment and using Snowflake only partially just doesn’t make sense.


Our decision was also affected by our strong integration with Spark; we’ve been using our Hadoop cluster as compute nodes for it. In SNFLK’s case, this setup would mean pushing our data out of SNFLK into the EC2 instance where the Spark jobs would be running. That would cost additional 20-30% (the data would be running inside AWS, but the EC2s cost something as well). I know Snowflake is currently working on a solution for this setup, but I haven’t found out what it is.

In our last phone call with SNFLK, we learned that storage prices were going down again. So, I assume that we will meet within a reasonable time frame, and reopen our discussion. (In November, Snowflake has started privately testing their EU datacenter and will open it publicly in January 2017.) In the meantime, we’ll have an on-demand account for practicing :).

]]>
Petr Šimeček
tag:blog.keboola.com,2013:Post/1097598 2016-10-24T22:44:33Z 2016-10-24T23:45:36Z Keboola’s Solutions for Agencies

We would like to show you how some of our clients redefined their businesses by routinely using data in their daily activities. Despite the fact that each company’s situation is different, we hope to give you some ideas to explore in your own business.

If you work in a service agency, as a customer care manager or in similar type positions, you are all about efficiency. Any idle time spent on non-revenue generating activities means wasted time and manpower, and more importantly, a net loss for your organization.

To ensure optimal operation, you may be asking yourself questions like this:

  • Is your team correctly prioritizing clients with a higher profit margin?

  • How are individual team members performing compared to each other?

  • Are team members doing the work they are best suited for?

Earthworms

Sometimes the simplest graphs show the most relevant information. The graph that you see below (generally known as "bullet chart") has been coined the “earthworm” by our clients. Provided by one of our clients, this particular graph eloquently shows agent performance overall, as well as in comparison to the team average.

As a manager, imagine having one of these for each of your agents. In a mere seconds you can distinguish your top vs. poor performers and take the actions needed to enhance or improve their behavior.

Customer Care

Diving deeper into individual performance, you can then examine why each agent is performing the way they are. After you take a look at the next client example, you will see that this series of earthworms track agent performance in different areas.

]]>
Keboola
tag:blog.keboola.com,2013:Post/1097594 2016-10-24T22:31:54Z 2016-10-24T23:39:57Z Keboola’s Marketing Solutions

Even though we understand that every company and each department within it have very different BI needs, we also believe in sharing inspiration from our clients about how they make relevant business decisions using data in their daily routines. You might find this helpful in shaping your own solution.

When planning a new product launch and deciding where to spend your marketing budget, you probably have questions regarding the impact of your campaign:

  • How long will it take to turn marketing leads into faithful customers?

  • Did I target the correct customer group?

  • Do my potential customers respond to the advertisement as expected?

  • What is the return of investment for my campaign based on different target groups and products?

Check out similar questions our clients have asked. Combine them with an analytical mindset, and create the reports your company needs to invest in better marketing decisions, and to generate a higher return on investment.

Roman Novacek from Gorilla Mobile says: “When looking at our marketing model, everything seemed to be going according to plan. But when we looked deeper into what we thought were well-performing campaigns, we found out that while some ads and channels were performing extraordinarily well, others were draining the overall average leading to mediocre results.”


sales funnel


]]>
Keboola
tag:blog.keboola.com,2013:Post/1097593 2016-10-24T22:21:25Z 2016-10-24T23:40:22Z McPen: Built and Run on Data

McPen is a European chain distributor of stationery goods. They are one of the first small to mid-sized retailers who use a data-driven approach to business and enable equal access to data to all of their employees.

Initial situation

Embarking on their data-driven business journey, McPen realized that to excel in the stationery goods space, they would need to create a competitive advantage with a unique operational management system. In order to identify retail solutions specific to their business, they wanted to combine many previously unconnected data sources, and upgrade and speed up their reporting process.

Where Keboola came in

Assisted by the Ascoria team, our partner, McPen’s CEO Milan Petr configured the new system from scratch and without the help of a single developer. McPen began to pull data from sources like their POS, Frames and other retail sources, allowing everybody in the company to use this compiled and easily accessible data to find solutions to their real retail problems.


Focusing on lean operations and adding new features, Milan created a system that benefitted the entire organization. He knew that to effectively manage shifts in business, he had to involve every part of the organization in making decisions based on data. Leading by example, he developed and studied the system in detail to understand its impact on daily operations. He then provided access and support directly to the people on the floor to empower them to make necessary strategic decisions and improve their daily results.

metrics-2


Surprising benefits and results

Examined data showed that in order to maximize profitability, McPen needed to upsell customers. And while their biggest income comes from customers who spend between 200 and 500 CZK (around 8 to 20 USD), it is the 42% of all McPen customers spending up to 50 CZK (around 2 USD) who have the biggest potential for the upsell.

]]>
Keboola
tag:blog.keboola.com,2013:Post/1097126 2016-10-08T19:13:17Z 2016-12-15T15:21:32Z Please hold, your call is important to us

We’ve recently experienced two fairly large system problems that have affected approximately 35% of our clients.

The first issue took 50 minutes to resolve and the other approximately 10 hours. The root cause in both cases was the way we handled the provisioning of adhoc sandboxes on top of our SnowflakeDB (a few words about "how we started w/ them").

We managed to find a workaround for the first problem, but the second one was out of our hands.  All we could do was fill in a support ticket with Snowflake and wait. Our communication channels were flooded with questions from our clients and there was nothing we could do. Pretty close to what you would call a worst-case scenario.! Fire! Panic in Keboola!

My first thoughts were like: “Sh..t! What if we run the whole system on our own infrastructure, we could do something now. We could try to solve the issue and not have to just wait…”

But, we were forced to just wait and rely on Snowflake. This is the account of what happened since:

]]>
Petr Šimeček
tag:blog.keboola.com,2013:Post/1090764 2016-09-19T16:32:23Z 2016-10-24T23:48:03Z Snowflake vs. Redshift backend speed comparison

Intro  

At the same time as the announcement about default backend in KBC being shifted to Snowflake, I have started working on a new project. The customer pushed us the initial dump of two main tables (10M rows each) and some other small attribute tables.  

]]>
tag:blog.keboola.com,2013:Post/1089387 2016-09-12T21:06:29Z 2016-10-24T23:41:29Z New dose of steroids in the Keboola backend

More than two years after we announced support for Amazon Redshift in Keboola Connection, it’s about the friggin’ time to bring something new to the table. Something that will propel us further along. Voila, welcome Snowflake.

About 10 months ago we presented Snowflake at a meetup hosted at the GoodData office for the first time.

Today, we use Snowflake both behind the Storage API (it is now the standard backend for our data storage) and the Transformations Engine (you can utilize the power of Snowflake for your ETL-type processes). Snowflake’s SQL documentation can be found here.

What on Earth is Snowflake?

It’s a new database, built from scratch to run in the cloud. Something different that when a legacy vendor took an old DB and hosts it for you (MSSQL on Azure, Oracle in Rackspace or PostgreSQL in AWS).

]]>
Petr Šimeček
tag:blog.keboola.com,2013:Post/1088412 2016-09-09T17:04:55Z 2016-10-24T23:41:46Z Guiding project requirements for analytics

In a recent post, we started scoping our executive level dashboards and reporting project by mapping out who the primary consumers of the data will be, what their top priorities / challenges are, which data we need and what we are trying to measure.  It might seem like we are ready to start evaluating vendors and building it out the project, but we still have a few more requirements to gather.

What data can we exclude?

With our initial focus around sales analytics, the secondary data we would want to include (NetProspex, Marketo and ToutApp) all integrates fairly seamlessly with the Salesforce so it won't require as much effort on the data prep side.  If we pivot over to our marketing function however, things get a bit murkier.  On the low end this could mean a dozen or so data sources.  But what about our social channels, Google Ads, etc, as well as various spreadsheets.  In more and more instances, particularly for a team managing multiple brands or channels, the number of potential data sources can easily shoot into the dozens.

Although knowing what data we should include is important, what data can we exclude? Unlike the data lake philosophy (Forbes: Why Data Lakes Are Evil,) when we are creating operational level reporting, its important focus on creating value, not to overcomplicating our project with additional data sources that don't actually yield additional value.

Who's going to manage it?

Just as critical to the project as what and how; who’s going to be managing it? What skills do we have out our disposal and how many hours can we allocate for the initial setup as well as ongoing maintenance and change requests?  Will this project be managed by IT, our marketing analytics team, or both? Perhaps IT will manage data warehousing and data integration and the analyst will focus on capturing end user requirements and creating the dashboards and reports.  Depending on who's involved, the functionality of the tools and the languages used will vary. As mentioned in a recent CMS Wire post Buy and Build Your Way to a Modern Business Analytics Platform, its important to take an analytical inventory of what skills we have as well as what tools and resources we already have we may be able to take advantage of.

                                                    

]]>
Colin McGrew
tag:blog.keboola.com,2013:Post/1081540 2016-08-16T07:46:03Z 2016-10-24T23:42:02Z When Salesforce Met Keboola: Why Is This So Great?

whenharrymetsallyjpg

How can I get more out of my Salesforce data?

sfdcpngAlong with being the world’s #1 CRM, Salesforce provides an end-to-end platform to connect with your customers including Marketing Cloud to personalize experiences across email, mobile, social, and the web, Service Cloud to support customer success, Community Cloud to connect customers, partners and employees and Wave Analytics designed to unlock the data within.

After going through many Salesforce implementations, I’ve found that although companies store their primary customer’s data there, the opportunity enrich it further by bringing in related data stored in other systems such as invoices in ERP or contracts in dedicated DMS is a big one.  For example, I’ve seen clients run into the issue of having inconsistent data in multiple source systems when a customer changes their billing address.  In a nutshell, Salesforce makes it easy to report on that data stored within but can’t provide a complete picture of the customer unless we broaden our view.  

]]>
Martin Humpolec
tag:blog.keboola.com,2013:Post/1068124 2016-06-29T05:15:41Z 2016-10-24T23:42:18Z Recommendation Engine in 27 lines of (SQL) code

In the data preparation space, very frequently the focus lies in BI as the ultimate destination of data. But we see, more and more often, how data enrichment can loop straight back into the primary systems and processes.

Take recommendation. Just the basic type (“customers who bought this also bought…”). That, in its simplest form, is an outcome of basket analysis.

We recently had a customer who asked for a basic recommendation as a part of proof of concept, whether Keboola Connection is the right fit for them. The dataset we got to work with came from a CRM system, and contained a few thousand anonymized rows (4600-ish, actually) of won opportunities which effectively represented product-customer relations (what customers had purchased which products). So, pretty much ready-to-go data for the Basket Analysis app, which has been waiting for just this opportunity in the Keboola App Store. Sounded like good challenge - how can we turn this into a basic recommendation engine?

]]>
Milan Veverka
tag:blog.keboola.com,2013:Post/1066361 2016-06-23T17:30:06Z 2016-10-24T23:42:55Z Find the Right Music: Analyzing last.fm data sentiment with Keboola + Tableau

                               Find The Right Musicpng

As we covered in our recent NLP blog, there are a lot of cool use cases for text / sentiment analysis.  One recent instance we found really interesting came out of our May presentation at SeaTUG (Seattle Tableau User Group.)  As part of our presentation / demo we decided to find out what some of the local Tableau users could do with trial access to Keboola; below we’ll highlight what Hong Zhu and a group of students from the University of Washington were able to accomplish with Keboola + Tableau for a class final project!

What class was this for and why did you want to do this for a final project?

We are a group of students at the University of Washington’s department of Human Centered Design and Engineering.  For our class project for HCDE 511 – Information Visualization, we made an interactive tool to visualize music data from Last FM.  We chose the topic of music because all 4 of us are music lovers.

Initially, the project was driven by our interest in having an international perspective on the popularity vs. obscurity of artists and tracks.  However, after interviewing a number of target users, we learned that most of them were not interested in rankings in other countries.  In fact, most of them were not interested in the ranking of artists/tracks at all.  Instead, our target users were interested in having more individualized information and robust search functions, in order to quickly find the right music that is tailored to one’s taste, mood, and occasion.  Therefore, we re-focused our efforts on parsing out the implicit attributes, such as genre and sentiment, from the 50 most-used tags of each track.  That was when Keboola and its NLP plug-in came into play and became instrumental in the success of this project.

]]> Colin McGrew tag:blog.keboola.com,2013:Post/1060894 2016-06-07T20:32:49Z 2016-10-24T23:43:23Z The value of text (data) and Geneea NLP app

Just last week, a client let out a sigh: “We have all this text data (mostly customer reviews) and we know there is tremendous value in that set but outside from reading it all and manually sorting through it, what can we do with it?”

With text becoming a bigger and bigger chunk of a company’s data intake, we hear those questions more and more often. A few years ago, the “number of followers” was about the only metric people would get from their Twitter accounts. Today, we want (and can) know much more; What are people talking about? How do we escalate their complaints? What about the topics trending across data sources and platforms? Those are just some examples of questions we’re asking of NLP (Natural Language Processing) applications at our disposal.

Besides the more obvious social media stuff, there are many areas where text analytics can play an extremely valuable role. Areas like customer support (think of all the ticket descriptions and comments), surveys (most have open-ended questions and their answers often contain the most valuable insights), e-mail marketing (whether it is analyzing outbound campaigns and using text analytics to better understand what works and what doesn’t, or compiling inbound e-mails) and lead-gen (what do people mention when reaching out to you) to name a few. From time to time we even come across more obscure requests like text descriptions of deals made in the past that need critical information extracted (for example contract expiration dates) or comparisons of bodies of text to determine “likeness” (when comparing things like product or job descriptions).

]]>
Keboola
tag:blog.keboola.com,2013:Post/1053771 2016-05-20T21:06:09Z 2016-10-24T23:43:43Z Keboola and Slalom Consulting Team up to host Seattle’s Tableau User Group

On Wednesday, May 18th, Keboola’s Portland and BC team converged in Seattle to host the city’s monthly Tableau User Group with Slalom Consulting. We worked with SeaTUG’s regular hosts and organizers, Slalom Consulting, to put together a full evening of discussion around how to solve complex Tableau data problems using KBC. With 70+ people in attendance, Seattle’s Alexis Hotel was buzzing with excitement! 

The night began with Slalom’s very own Anthony Gould, consultant, data nerd and SeaTUG host extraordinaire, welcoming the group and getting everyone riled up for the night’s contest - awarding the attendee who’s SeaTUG related tweet got the most retweets! He showed everyone how we used Keboola Connection (KBC) to track that data and prepared them that this would be updated at the end of the night and prizes distributed!

]]>
Kasey Jones Tonsfeldt
tag:blog.keboola.com,2013:Post/1053726 2016-05-20T20:23:58Z 2016-10-24T23:43:51Z Cleaning Dirty Address Data in KBC

There is an increasing number of use cases and data projects for which geolocation data can add a ton of value - e-commerce and retail, supply chain, sales and marketing, etc.  Unfortunately, one of the most challenging asks of any data project is relating geographical information to various components of the dataset. On a more positive note, however, KBC’s easy integration with Google apps of all kinds allows users to leverage Google Maps to add geo-coding functionality. Since we have so many clients taking advantage of geo-coding capabilites, one of our consultants, Pavel Boiko outlined the process of adding this feature to your KBC environment. Check it out!  

]]>
Kasey Jones Tonsfeldt
tag:blog.keboola.com,2013:Post/1036817 2016-04-21T20:26:06Z 2016-10-24T22:02:23Z Anatomy of an Award Winning Data Project Part 3: Ideal Problems not Ideal Customers

Hopefully you’ve had a chance to read about our excitement and pride upon learning that two of our customers had won big awards for the work we’d done together. To jog your memory, Computer Science Corporation (CSC)’s marketing team won the ITSMA Diamond Marketing Excellence Award as a result of the data project we built together. CSC used KBC to bridge together 50+ data sources and pushing those insights out to thousands of CSC employees. To catch up on what you missed or to read again, revisit our Part 1 of our Anatomy of an Award Winning Data Project. 

Additionally, the BI team at Firehouse Subs won Hospitality Technology’s Enterprise Innovator Award for its Station Pulse dashboard built with a KBC foundation. The dashboard measures each franchise’s performance based on 10 distinct metrics and pulling data from at least six sources. To catch up on what you missed or to read again, revisit our Part 2 of our Anatomy of an Award Winning Data Project.

We’re taught that most businesses have a “typical” or “ideal” customer. When crafting a marketing strategy or explaining your business to partners, customers and your community, this concept comes up repeatedly. And we don’t really have a ready-made answer. A data-driven business can be in any industry and the flexibility and agility of the Keboola platform is by its very nature data source and use case agnostic.

And so, when these two customers of ours both won prestigious awards highlighting their commitment to data innovation, it got us thinking. These two use cases are pretty different. We worked with completely different departments, different data sources, different end-users, different KPIs, etc. And yet both have been successful, award-winning projects.

We realized that perhaps the question of an ideal customer isn’t really relevant for us. Perhaps we’d been asking the wrong question all along. We can’t define our target customer, but we can define the target problem that our customers need help solving.

]]>
Kasey Jones Tonsfeldt
tag:blog.keboola.com,2013:Post/1031268 2016-04-12T15:56:14Z 2016-10-24T23:44:35Z Anatomy of an Award Winning Data Project Part 2: Firehouse Subs Station Pulse BI Dashboard


As we reported last week, we are still beaming with pride, like proud parents at a little league game or a dance recital. Not one, but two!, of our customers won big fancy awards for the work we did together. The concept of a data-driven organization has been discussed and proposed as an ideal for a while now, but how we define and identify those organizations is certainly still up for debate. We’re pretty confident that these two customers in question - Computer Sciences Corporation (CSC) and Firehouse Subs - would be prime contenders. These awards highlight their commitment to go further than their industry counterparts to empower employees and franchisees to leverage data in new and exciting ways. 

If you missed last week’s post with CSC’s Chris Marin, check it out here. Today, let’s learn more about Firehouse Subs award winning project. In case you don’t know much about Firehouse Subs, let me bring you up to speed. The sandwich chain started in 1994 and as of March 2016 has more than 960 locations in 44 states, Puerto Rico and Canada. Firehouse Subs is no stranger to winning awards, either. In 2006, KPMG named them “Company of the Year” and they’ve been recognized for their commitment to community service and public safety as well through Firehouse Subs Public Safety Foundation®, created in 2005.  


Now let’s hear from our project champion and our main ally at Firehouse Subs, Director of Reporting and Analytics, Danny Walsh.

]]>
Kasey Jones Tonsfeldt
tag:blog.keboola.com,2013:Post/1025019 2016-04-04T16:39:00Z 2016-10-24T22:01:27Z Anatomy of an Award Winning Data Project Part 1: CSC and Marketing Analytics

Here at Keboola, we take pride in working closely with partners and customers ensuring that each project is a success. Typically we’re there from the beginning - to understand the problem the client needs to solve; to help them define the scope and timeline of the implementation; to provide the necessary resources to get buy in from the rest of their team; to offer alternative perspectives and options when mapping out the project; and to be their ally and guide throughout every step of the process. With all that work, all that dedication, it turns out we develop quite a soft spot for both our clients and their projects. 

We’ve got skin in the game, so when one of our clients receives an award because of the project we worked on together, we get pretty excited. And when two clients receive an award because of our work together, well, then we’re downright ecstatic and ready to celebrate!

At the end of 2015, two customers were honored for their commitment to data innovation. Firehouse Subs® was awarded the Hospitality Technology Innovation Award and the digital marketing team at Computer Science Corporation (CSC) for the ITSMA Diamond Marketing Excellence Award.

Since new partners and clients often ask us to explain what components and environment cultivate a successful data project, we thought we’d take this exceptional opportunity to ask our customers themselves: Danny Walsh, Director of Reporting and Analytics, Firehouse Subs and Chris Marin, Senior Principal, Digital Marketing Platform & Analytics, CSC.

Over the next couple of weeks, we’ll share each of their stories and explain how we feel these separate use cases in two distinctly different industries are reflective of what we at Keboola view as the ideal conditions for creating a wildly successful - award-winning even - data project.

]]>
Kasey Jones Tonsfeldt
tag:blog.keboola.com,2013:Post/1018094 2016-03-22T18:27:51Z 2016-10-24T22:00:31Z Empowering the Business User in your BI and Analytics Environment


There’s one trend on Gartner’s radar that hasn’t changed much over the last few years and that’s the increasing move toward a self-service BI model. Gone are the days of your IT or analytics department being report factories. And if those days aren’t gone for you, then it’s time you make some substantive changes to your business intelligence environment. When end-users are forced to rely on another department to deliver the reports they need, the entire concept of being a “data-driven” organization goes right out the window. 

So other than giving your users access to ad hoc reporting capabilities, how do  you empower the user?

]]>
Kasey Jones Tonsfeldt
tag:blog.keboola.com,2013:Post/1014809 2016-03-16T18:03:19Z 2016-12-09T22:28:12Z Bi-Modal BI: Balancing Self-Service and Governance

                                   

The age old conflict.  IT needs centralization, governance, standards and control; on the other side of the coin?  Business units need the ability to move fast and try new things.  How can we get lines of business access to the data they need to for projects so they can spend their time focused on discovering new insights?  Typically they get stuck in a bottleneck of IT requests or spending 80% of their time doing data integration and preparation.  Neither group seems particularly excited to do it, and I don’t blame them.  For the analyst it increases the complexity of their tasks and seriously raises the technical knowledge requirements.  For IT, it’s a major distraction from their main purpose in life, an extra thing to do.  Self serve BI is trying to destroy the backlogged “report factories,” only to replace them with “data stores,”  which are sadly even less equipped for the job at hand.  Either way, the result is a painfully inefficient process, straining both ends of the value chain in any company that embarks on the data driven journey.

The Bi-Modal BI Answer?

An organization's ability to effectively extract value from data and analytics while maintaining a well governed source of truth is the difference between competitive advantage or sunken costs and missed opportunities.  How can we create an environment that provides the agile data access needed by the business users while still maintaining sound data governance?    Gartner has referred to a  Bi-modal IT strategy.  A big challenge with Bi-modal IT is that it pushes IT management to divide their efforts between ITs traditional focus and a more business focused agile methodology.

The DBA and Analyst Divide

Another major challenge in data access comes from the separation between DBAs and business users.  Although the technical side may have the necessary expertise to implement ETL projects, they often lack the business domain expertise needed to make the correct assumptions around context and how the data is regarded.  With so many projects competing for resources, we shouldn’t have to task a DBA on all of them.  Back to the flip side of the coin, data analysts and scientists want the right data for their tools of choice and they want it fast.  Even though there is growing set of data integration tools that allows individual business units to create and maintain their own data projects, this typically requires a lot of manual data modeling and can lead to siloed data or inconsistent metrics.  

Instead of controlling all of BI, IT can enable the business to develop their analytics without sacrificing control and governance standards.  So how can we get the right data in the hands of people who understand and need it in a timely manner?

]]>
Colin McGrew
tag:blog.keboola.com,2013:Post/1014279 2016-03-15T21:19:19Z 2016-10-24T21:59:33Z 3 Critical Steps to Evangelize the New in Business Intelligence and Analytics

The rapid evolution in business intelligence and analytics capabilities is both exhilarating and overwhelming. 

How do you protect the stability of the work you’ve already done, while evangelizing experimentation, exploration and progress within your organization?

We’ve got a few tips for you.

]]>
Kasey Jones Tonsfeldt
tag:blog.keboola.com,2013:Post/1004805 2016-03-01T18:15:31Z 2016-10-26T18:03:30Z Keboola: Data Monetization Series - How data science can help

                       

Having access to the right data in a clean and accessible format is the first step (or series of steps) leading up to actually extracting business value from your data.  As much as 80% of the time spent on data science projects involves data integration and preparation.  Once we get there, the real fun begins.  With the continued focus on big data and analytics to drive competitive advantage, data science has been spending a lot of time in the headlines.  (Can we fit a few more buzzwords into one sentence?)

Let’s take a look at a few data science apps available on our platform and how they can help us into our data monetization efforts.

Basket analysis

One of the most popular algorithms is market basket analysis.  It provides the power behind things like Amazon’s product recommendation engine and identifies that if someone buys product A, they are likely to buy product B.  More specifically, it’s not identifying products placed next to each other on the site that get bought together, rather products that aren’t placed next to each,  This can be useful in improving in-store and on site customer experience, target marketing and even the placement of content items on media sites.

Anomaly detection

Anomaly detection refers to identifying specific events that don’t conform to the expected pattern from the data.  This could take the form of fraud detection, identifying medical problems or even detecting subtle change in consumer buying behaviors.  If we look at the last example, this could help us in  identifying new buying trends early and taking advantage.  Using the example of an eCommerce company, you could identify anomalies in carts created per minute, a high number of carts abandons, an odd shift in orders per minute or a significant variance in any other number of metrics.

]]>
Colin McGrew
tag:blog.keboola.com,2013:Post/1003837 2016-02-29T17:29:28Z 2016-10-25T04:52:54Z Using a Data Prep Platform: The Key to Analytic Product Agility

                                                     

                                                                                      Guest post by Kevin Smith

For a product owner, one of the biggest fears is that the product you're about to launch won't get the necessary adoption to achieve success. This might happen for a variety of reasons— two of the most common are a lack of fit to the customers' needs and confusing design (it's just too hard to use!).

To combat the possibility of failure, many product owners have adopted the "agile" approach to building products that have enough functionality to meet to minimum needs, but are still lean enough to facilitate easy change.

As a data product builder — someone building customer-facing analytics that will be part of a product — the needs are no different but achieving agility can be a real challenge. Sure, every analytics platform provider you might consider claims that they can connect to any data, anywhere, but this leaves a lot of wiggle room. Can you really connect to anything? How easy is it? How hard is it to change later? What about [insert new technology on the horizon here] that I just heard about? If you want to build an agile data product, you've got a tough road ahead... as I found out.

Recently I started working on the analytics strategy for a small start-up firm focused on providing services to large enterprises. As they delivered their services, they wanted to show the results in an analytics dashboard instead of the traditional PowerPoint presentation. It would be more timely, easier to deliver, and could be an on-going source of revenue after an engagement was completed. As I spoke with the team, a few goals surfaced:

  1. They wanted to buy an analytics platform rather than build from scratch. The team realized that they would be better off developing the methodology that would differentiate them from the competition instead of creating the deep functionality already provided by most analytics platforms.
  2. The system had to be cost-effective both to set-up and to operate. As a start-up, there simply wasn't the cashflow available for costly analytics platforms that required extensive professional services to get started. The product had to be flexible and "configurable" by non-Engineers. With little to no budget for an Engineering staff, the team wanted a BI platform that could be configured easily as customer needs changed.
  3. Up and running quickly. This company had customers ready to go and needed a solution quickly. It would be essential to get a solution in front of the customers NOW, rather than try to migrate them to a new way of operating once the dashboards were ready. Changes would certainly be needed post-launch, but this was accepted as part of the product strategy.

None of this seemed to be impossible. I've worked on many data products with similar goals and constraints. Product teams always want to have a platform that's cost-effective, doesn't strain the technical capabilities of the organization, is flexible, and is launched sooner rather than later. It was only after a few more conversations that the problem arose: uncertain data sources.

Most data-driven products work like this: you've got a workflow application such as a help desk application or an ordering system that generates data into a database that you control. You know what data is flowing out of the workflow application and therefore, you understand the data that is available for your analytics. You connect to your database, transform the data into an analytics-ready state, then display the information in analytics on a dashboard. The situation here was different. As a services company, this business had to operate in a technology environment dictated by the customer. Some customers might use Salesforce, some might use Sugar CRM. Still others might use Zoho or one of the myriad other CRM platforms available. Although the team would structure the dashboards and analytics based on their best practices and unique methodology, the data driving the analytics product would differ greatly from customer to customer.

]]> Colin McGrew tag:blog.keboola.com,2013:Post/979643 2016-01-27T16:57:45Z 2016-10-26T18:02:28Z Keboola: Data Monetization Series Pt. 2

             

As we examined in part 1 of our Data Monetization blog series, the first step to increasing revenue with data is identifying who the analytics will be surfaced to, what their top priorities are, what questions we need to ask and which data sources we need to include.  For this blog, let’s take a look at what tools we will need to bring it all together.  

With our initial example of a VP of Sales dashboard, fortunately the secondary data sources (NetProspex, Marketo and HubSpot Signals) all integrate fairly seamlessly with the Salesforce CRM.  This should allow for some fairly straightforward analytics built on top of all the data we’ve aggregated.  If we pivot over to our CMO dashboard, things get a bit murkier.

Although our Marketo instance  easily integrates with Salesforce, the sheer volume of data sources that can provide insight to our marketing activity makes this project a much more daunting ask.  What about our social channels, Adobe Omniture, Google Ads, LinkedIn Ads, Facebook Ads, SEO as well as various spreadsheets.  In more and more instances, especially for a team managing multiple brands / channels, this number can easily shoot into the dozens.

]]>
Colin McGrew
tag:blog.keboola.com,2013:Post/973407 2016-01-18T19:06:53Z 2016-10-26T18:02:47Z Keboola: Data Monetization Series Pt. 1


When a company thinks about monetizing data, the things that come to mind are increasing revenue, identifying operational inefficiencies or creating a new revenue stream.  It’s important to keep in mind that these are the results of an effective strategy but can't be the only goal of the project.  In this blog series, we will exam these avenues with a focus on the added value that ultimately leads to monetization.  For this blog, lets look at it from the perspective of creating executive level dashboards at a B2B software company.

Who will be consuming the data and what do they care about?

Before we jump into the data itself, take a step back and understand who the analytics will be surfaced to and what their challenges are.  Make profiles with their top priorities, pain points and the questions they will be asking.  One way to get started is to make a persona priority matrix listing the top three to five challenges for each (ex. below.)

Screen Shot 2016-01-16 at 15642 PMpng

Once the matrix is laid out, you can begin mapping specific questions to each priority.  What answers might help a VP of Sales increase the effectiveness of the sales team and ultimately revenue?

  • What do our highest velocity deals look like (vertical, company size, who’s involved)?

  • What do our largest deals look like?

  • Where do our deals typically get stuck in the sales process?

  • What activities and actions are our best reps performing?

]]>
Colin McGrew
tag:blog.keboola.com,2013:Post/969572 2016-01-12T18:18:37Z 2016-10-26T18:02:11Z Adding Context With Different Types of Data

                     

Data can be vast and overwhelming, so understanding the different types helps to simplify what kind of numbers we are looking for.  Even with the treasure trove of data most organizations have in-house, there are tons of additional data sets that can be included in a project to add valuable context and create even deeper insights.  It’s important to keep in mind what type of data it is, when and where it was created, what else was going on in the world when this data was created, and so forth.  Using the example of a restaurant, let’s look at some different types of data and how they could impact an analytics project.  

Numerical data is something that is measurable and always expressed in numerical form.   For example, the number of diners attending a particular restaurant over the course of a month or the number of appetizers sold during a dinner service.  This can be segmented into two sub-categories.  

Discrete data represent items that can be counted and is listed as an exact number and take on possible values that can be listed out. The list of possible values may be fixed (also called finite); or it may go from 0, 1, 2, on to infinity (making it countably infinite).  For example:

  • Number of diners that ate at the restaurant on a particular day (you can’t have half a diner.)

  • Amount of beverages sold each week.

  • How many employees were staffed at the restaurant on a day.

Continuous data represent measurements; their possible values cannot be counted and can only be described using intervals on the real number line.  For example, the exact amount of vodka left in the bottle would be continuous data from 0 mL to 750 mL, represented by the interval [0, 750], inclusive.   Other examples:

  • Pounds of steak sold during dinner service

  • The high temperature in the city on a particular day

  • How many ounces of wine was poured in a given week

You should be able to do most mathematical operations on numerical data as well as list in ascending/descending order and display in fractions.

]]>
Colin McGrew
tag:blog.keboola.com,2013:Post/952629 2015-12-18T17:39:54Z 2016-10-26T18:03:12Z 6 Gift Ideas for the Data Geek in Your Life

                                                      

Its that time of year again and there are so many gift options to choose from.  Be it hover-boards (that may explode,) drones or Star Wars’ own BB-8 remote control droid, there’s been quite a boom in tech gadgets this year.  At Keboola we love all things data, so to get you in the holiday spirit, we wanted to share some cool gift ideas that use data to make your life easier (or at least a bit more interesting.)

Automatic Adapter

Similar to the gadget seen in the Progressive commercials, the Automatic Adapter is basically a fitness app for your vehicle.  It provides a full report on behavior through an app or a web interface regarding where you’ve been, driving behavior and even tag routes for business travel expenses.

                                                                           

]]>
Colin McGrew
tag:blog.keboola.com,2013:Post/952623 2015-12-18T17:33:34Z 2016-10-24T20:53:16Z Top 3 challenges of big data projects


The Economist Intelligence report Big data evolution: forging new corporate capabilities for the long term published earlier this year provided insight into big data projects from 550 executives across the globe. When asked what their company’s most significant challenges are related to big data initiatives, maintaining data quality, collecting and managing vast amounts of data and ensuring good data governance were 3 of the top 4 (data security and privacy was number 3.) Data availability and extracting value were actually near the bottom. This is a bit surprising as ensuring good data quality and governance is critical to getting the most value from your data project.

Maintaining data quality

Having the right data and accurate data is instrumental in the success of a big data project. Depending on the focus, data doesn’t always have to be 100% accurate to provide business benefit, numbers that are 98% confident is enough to give you insight into your business. That being said, with the sheer volume and sources available for a big data project, this is a big challenge. The first issue is ensuring that the original system of record is accurate (the sales rep updated Salesforce correctly, the person filled out the webform accurately, and so forth) as the data needs to be cleaned before integration. I’ve personally worked through CRM data projects; doing cleanup and de-duping can take a lot of resources. Once this is completed, procedures for regularly auditing the data should be put in place. With the ultimate goal of creating a single source of truth, understanding where the data came from and what happened to it is also a top priority. Tracking and understanding data lineage will help identify issues or anomalies within the project.

Collecting and managing vast amounts of data

Before the results of a big data project can be realized, processes and systems need to be put into place to bring these disparate sources together. With data living in databases, cloud sources, spreadsheets and the like, bringing all the disparate sources together into a database or trying to fuse incompatible sources can be complex. Typically, this process consists of using a data warehouse + ETL tool or custom solution to cobble everything together. Another option is to create a networked database that pulls in all the data directly, this route also requires a lot of resources. One of the challenges with these methods is the amount of expertise, development and resources required. This spans from database administration to expertise in using an ETL tool. It doesn’t end there unfortunately; this is an ongoing process that will require regular attention.

Ensuring good data governance

In a nutshell, data governance is the policies, procedures and standards an organization applies to its data assets. Ensuring good data governance requires an organization to have cross-functional agreement, documentation and execution. This needs to be a collaborative effort between executives, line of business managers and IT. These programs will vary based on their focusbut will all involve creating rules, resolving conflicts and providing ongoing services. Verifications should be put into place that confirm the standards are being met across the organization.

Conclusion

Having a successful big data project requires a combination of planning, people, collaboration, technology and focus to realize maximum business value. At Keboola, we focus on optimizing data quality and integration in our goal to provide organizations with a platform to truly collaborate on their data assets. If you’re interested in learning more you can check out a few of our customer stories.

]]>
Colin McGrew