I can’t believe it’s already been a year since we covered some great gift ideas for data people! We’re back with some more last minute ideas, some may look familiar albeit bigger (or smaller) and better while others are new arrivals. Whatever you’re looking for, we hope at least one of these ideas will help you find something that really excites the techie / data lover in your life this holiday season!
We’ve been assisting people with data and automation for a while now, helping them become “data-driven.” Several months ago, we had an exciting opportunity to automate within Keboola itself. After we lost our two main finance wizards to the joys of childcare, we decided to automate our internal business processes.
There was a lot of work ahead of us. For years, the two of them had been issuing invoices one by one. Manually. Hating unnecessary manual tasks, we were eager to put the power of our platform — Keboola Connection into work and eliminate the manual invoicing.
We expected approximately 2-3 mandays per month to be cut down. We also wanted to get much better data about what’s going on.
As our sales activities have been taking off around the globe, we would need to automate this process anyway. Otherwise soon we would have to hire a new employee just for invoicing and that is a no-go for us. Plus we didn’t want to overload Tereza, our new colleague, with this tedious work and take away her weekends from her.
When it comes to data, we often preach the agile methodology: Start small, build quick, fail fast and have the results in production from day one - slow Kaizen style improvement. This is exactly what we did with our invoicing automation project. We didn’t want to have someone write a custom app for us. We wanted to hack our current systems, prototype, fail fast and see where it would lead us. We wanted to save Tereza’s time but didn’t want to waste it 10x in the development of the system. :-)
Our “budget” was 3-4 mandays max!
Step 1 — Looking for a tool to use for the invoicing
We were looking for a tool which can handle all the basic things we need: different currencies (it’s Europe!), different bank accounts, with or without tax, paid or unpaid, and a handful of other features. Last but not least, the tool HAD to have a nice RESTful API. After some trials we opted for a Czech system – Fakturoid. They have great support, by the way. That’s a big plus.
Step 2 — Getting data about customers from Fakturoid into Keboola Connection
First, Padak took all clients we already had in Flexibee, our accounting tool, and exported them to Fakturoid. Then we added all the necessary info to the contacts.
Great. Now we had all the customers’ records ready and needed to get the data into Keboola Connection. It was time to set up our Generic Extractor. It literally took me half an hour to do it! Check it out here:
Keboola Generic extractor config for getting clients’ info from Fakturoid into Keboola Connection
Step 3 — Creating two tables with invoices and their items for uploading into Fakturoid
There was only one more thing to know. Who is supposed to pay for what and when? We store this info in our Google Spreadsheet. It contains basic info about our clients, the services they use, the price they pay for them, the invoicing period (yearly, quarterly, monthly), and the time period for which the info is valid (when their contract is valid; new contract/change = new row). To be able to pair the tables easily, we added a new column with the Fakturoid client ID.
Finally, we set up our Google Drive Extractor and loaded the data into Keboola Connection. Once we had all the data there, we used SQL to create a Transformation that took everything necessary from the tables (who we bill this month, how much, if out of country = don’t put VAT, add info about current exchange rate, etc.) and created clean output tables.
Part of the SQL transformation which creates an output table with items to pay for Fakturoid.
Step 4 — Sending the tables into Fakturoid and letting it create the invoices
This step was not as easy as exporting data from Fakturoid. We couldn’t use any preprogrammed services. Thankfully, Keboola Connection is an open environment and any developer can augment it and add new code to extend its functionality. Just wrap it up in Docker container. We asked Vlado to write a new writer for Fakturoid which would take the output tables from our Transformation (see Step 3) and create invoices in Fakturoid from the data in those tables.
It took Vlado only 2 hours to have the writer up and running!
Now when the writer is completed, Keboola Connection has one more component which is available to all its users.
Step 5 — Automating the whole process
It was the easiest part. We used our Orchestration services inside Keboola Connection and created an orchestration which automatically starts on the first day of each month. Five minutes later, all the invoices are done and sent out. #easypeasy
It is not a complicated solution. No rocket science. We believe in splitting big problems into smaller pieces, solving the small parts and putting them back together just like Lego bricks. The process should be easy, fast, open and put together from self-contained components. So when you have a problem in one part, it doesn’t affect the whole solution and you can easily fix it.
Saving Tereza’s time, this is the springboard for automating other parts of her job. We want her to spend more time doing more interesting things. And the process scales as we grow.
It took us:
- 4 hours to analyse and understand the problem and how things are connected,
- 1 hour to export clients from the accounting system,
- 1/2 hour to write a Generic Extractor from Fakturoid,
- 2 hours to write a Transformation preparing clean output data,
- 2 hours to develop a new Writer for Fakturoid, and
- 1-2 hours to do everything else related to the whole proces.
Total = circa 11 hours
Spoiler alert: I’m already working on further articles from the automation series. Look forward to reading how we implemented automatic distribution of invoices to clients and the accounting company, or how we let the systems handle our invoicing for implementation work.
Last week, Tableau hosted a session on the evolution of Business Intelligence in Portland that I had the chance to attend. Although I did review their Top 10 trends in BI when they released them earlier this year, the presentation and discussion ended up being pretty interesting. A few of the topics really resonated with me and I thought we could dig into them a bit more.
Modern BI becomes the new normal
The session (and report) kick off by highlighting Gartner’s Business Intelligence Magic Quadrant and the shift away from IT-centric BI over the last 10 years. Regardless of who’s discussing the trends (Gartner, Tableau or otherwise..) and if or when they come to fruition, it’s important to dig deeper. **Reports like those by Gartner are good guideposts for trends and technologies to exam; saw that mentioned somewhere recently, comment for credit.
That said, I think we can agree that the overall landscape of technology and the way that organizations of all sizes are taking advantage of it in the domain of business intelligence has improved over the last decade.
So does that mean modern BI has truly arrived?
Although some ideas come to mind when I hear the phrase..
What is modern business intelligence?
And do we all think of the same things when we discuss it….?
We’re always keeping an eye out for BI and analytics experts to add to our fast growing network of partners and we are thrilled to add a long-standing favorite in the Tableau ecosystem! InterWorks, who holds multiple Tableau Partner Awards, is a full spectrum IT and data consulting firm that leverages their experienced talent and powerful partners to deliver maximum value for their clients. (Original announcement from InterWorks here.) This partnership is focused on enabling consolidated end-to-end data analysis in Tableau.
Whether we’re talking Tableau BI services, data management or infrastructure, InterWorks can deliver everything from quick-strikes (to help get a project going or keep it moving) to longer-term engagements with a focus on enablement and adoption. Their team has a ton of expertise and is also just generally great to work with.
InterWorks will provide professional services to Keboola customers, with the focus on projects using Tableau alongside Keboola Connection, both in North America and in Europe, in collaboration with our respective teams. “We actually first got into Keboola by using it ourselves,” said InterWorks Principal and Data Practice Lead Brian Bickell. “After seeing how easy it was to connect to multiple sources and then integrate that data into Tableau, we knew it had immediate value for our clients.”
What does this mean for Keboola customers?
InterWorks brings world-class Tableau expertise into the Keboola ecosystem. Our clients using Tableau can have a one-stop-shop for professional services, leveraging both platforms to fully utilize their respective strengths. InterWorks will also utilize Keboola Connection as the backbone for their white-gloves offering for a fully managed Tableau crowned BI stack.
Whether working on projects with customers or partners, we both believe that aligning people and philosophy is even more critical than the technology behind it. To that end, we’ve found in InterWorks a kindred spirit, we believe in being ourselves and having fun, while ensuring we deliver the best results for our shared clients. The notion of continuous learning and trying new things was one of the driving factors behind the partnership.
Have a project you want to discuss with InterWorks?
It’s been quite an exciting year for us here at Keboola and the biggest reason for that is our fantastic network of partners and customers -- and of course a huge thanks to our team! In the spirit of the season, we wanted to take a quick stroll down memory lane and give thanks for some of the big things we were able to be a part of and the people that helped us make them happen!
Probably the biggest news from a platform perspective this year came about two years after we first announced support for the “nextt” data warehouse called Amazon Redshift. At the time, it was a huge step in the right direction. We still use Redshift for some of our projects (typically due to data residency or tool choice) but this year we were thrilled to announce a partnership born in the cloud when we officially made the lightning fast and flexible Snowflake the database of choice behind our storage API and the primary option for our transformation engine. Not to get too far into the technical weeds (you can read the full post here,) but it has helped us deliver a ton of value to our clients (better elasticity and scale, huge performance improvement for concurrent data flows, better “raw” performance by our platform, more competitive pricing for our customers and best of all, some great friends!) Since our initial announcement, Snowflake joined us in better supporting our European customers by offering a cloud deployment hosted in the EU (Frankfurt!) We’re very excited to see how this relationship will continue to grow over the next year and beyond!
One of our favorite things to do as a team is participate in field events so we can get out in the data world and learn about the types of projects people work on, challenges they run into, and find out what’s new and exciting. It’s also a great chance for our team to spend some time together as we span the globe - sometimes Slack and Goto Meeting isn’t enough!
SeaTug in May
We had the privilege of teaming up with Slalom Consulting to co-host the Seattle Tableau User Group back in May. Anthony Gould was a gracious host, Frank Blau provided some great perspective on IoT data and of course Keboola’s own Milan Veverka dazzled the crowd with his demonstration focused on NLP and text analysis. Afterwards, we had the chance to grab a few cocktails, chat with some very interesting people and make a lot of new friends. This event spawned quite a few conversations around analytics projects; one of the coolest came from a group of University of Washington students who analyzed the sentiment of popular music using Keboola + Tableau Public (check it out.)
In a recent post, we started scoping our executive level dashboards and reporting project by mapping out who the primary consumers of the data will be, what their top priorities / challenges are, which data we need and what we are trying to measure. It might seem like we are ready to start evaluating vendors and building it out the project, but we still have a few more requirements to gather.
What data can we exclude?
With our initial focus around sales analytics, the secondary data we would want to include (NetProspex, Marketo and ToutApp) all integrates fairly seamlessly with the Salesforce so it won't require as much effort on the data prep side. If we pivot over to our marketing function however, things get a bit murkier. On the low end this could mean a dozen or so data sources. But what about our social channels, Google Ads, etc, as well as various spreadsheets. In more and more instances, particularly for a team managing multiple brands or channels, the number of potential data sources can easily shoot into the dozens.
Although knowing what data we should include is important, what data can we exclude? Unlike the data lake philosophy (Forbes: Why Data Lakes Are Evil,) when we are creating operational level reporting, its important focus on creating value, not to overcomplicating our project with additional data sources that don't actually yield additional value.
Who's going to manage it?
Just as critical to the project as what and how; who’s going to be managing it? What skills do we have out our disposal and how many hours can we allocate for the initial setup as well as ongoing maintenance and change requests? Will this project be managed by IT, our marketing analytics team, or both? Perhaps IT will manage data warehousing and data integration and the analyst will focus on capturing end user requirements and creating the dashboards and reports. Depending on who's involved, the functionality of the tools and the languages used will vary. As mentioned in a recent CMS Wire post Buy and Build Your Way to a Modern Business Analytics Platform, its important to take an analytical inventory of what skills we have as well as what tools and resources we already have we may be able to take advantage of.
As we covered in our recent NLP blog, there are a lot of cool use cases for text / sentiment analysis. One recent instance we found really interesting came out of our May presentation at SeaTUG (Seattle Tableau User Group.) As part of our presentation / demo we decided to find out what some of the local Tableau users could do with trial access to Keboola; below we’ll highlight what Hong Zhu and a group of students from the University of Washington were able to accomplish with Keboola + Tableau for a class final project!
What class was this for and why did you want to do this for a final project?
We are a group of students at the University of Washington’s department of Human Centered Design and Engineering. For our class project for HCDE 511 – Information Visualization, we made an interactive tool to visualize music data from Last FM. We chose the topic of music because all 4 of us are music lovers.
Initially, the project was driven by our interest in having an international perspective on the popularity vs. obscurity of artists and tracks. However, after interviewing a number of target users, we learned that most of them were not interested in rankings in other countries. In fact, most of them were not interested in the ranking of artists/tracks at all. Instead, our target users were interested in having more individualized information and robust search functions, in order to quickly find the right music that is tailored to one’s taste, mood, and occasion. Therefore, we re-focused our efforts on parsing out the implicit attributes, such as genre and sentiment, from the 50 most-used tags of each track. That was when Keboola and its NLP plug-in came into play and became instrumental in the success of this project.
Having access to the right data in a clean and accessible format is the first step (or series of steps) leading up to actually extracting business value from your data. As much as 80% of the time spent on data science projects involves data integration and preparation. Once we get there, the real fun begins. With the continued focus on big data and analytics to drive competitive advantage, data science has been spending a lot of time in the headlines. (Can we fit a few more buzzwords into one sentence?)
Let’s take a look at a few data science apps available on our platform and how they can help us into our data monetization efforts.
One of the most popular algorithms is market basket analysis. It provides the power behind things like Amazon’s product recommendation engine and identifies that if someone buys product A, they are likely to buy product B. More specifically, it’s not identifying products placed next to each other on the site that get bought together, rather products that aren’t placed next to each, This can be useful in improving in-store and on site customer experience, target marketing and even the placement of content items on media sites.
Anomaly detection refers to identifying specific events that don’t conform to the expected pattern from the data. This could take the form of fraud detection, identifying medical problems or even detecting subtle change in consumer buying behaviors. If we look at the last example, this could help us in identifying new buying trends early and taking advantage. Using the example of an eCommerce company, you could identify anomalies in carts created per minute, a high number of carts abandons, an odd shift in orders per minute or a significant variance in any other number of metrics.
Guest post by Kevin Smith
For a product owner, one of the biggest fears is that the product you're about to launch won't get the necessary adoption to achieve success. This might happen for a variety of reasons— two of the most common are a lack of fit to the customers' needs and confusing design (it's just too hard to use!).
To combat the possibility of failure, many product owners have adopted the "agile" approach to building products that have enough functionality to meet to minimum needs, but are still lean enough to facilitate easy change.
As a data product builder — someone building customer-facing analytics that will be part of a product — the needs are no different but achieving agility can be a real challenge. Sure, every analytics platform provider you might consider claims that they can connect to any data, anywhere, but this leaves a lot of wiggle room. Can you really connect to anything? How easy is it? How hard is it to change later? What about [insert new technology on the horizon here] that I just heard about? If you want to build an agile data product, you've got a tough road ahead... as I found out.
Recently I started working on the analytics strategy for a small start-up firm focused on providing services to large enterprises. As they delivered their services, they wanted to show the results in an analytics dashboard instead of the traditional PowerPoint presentation. It would be more timely, easier to deliver, and could be an on-going source of revenue after an engagement was completed. As I spoke with the team, a few goals surfaced:
- They wanted to buy an analytics platform rather than build from scratch. The team realized that they would be better off developing the methodology that would differentiate them from the competition instead of creating the deep functionality already provided by most analytics platforms.
- The system had to be cost-effective both to set-up and to operate. As a start-up, there simply wasn't the cashflow available for costly analytics platforms that required extensive professional services to get started. The product had to be flexible and "configurable" by non-Engineers. With little to no budget for an Engineering staff, the team wanted a BI platform that could be configured easily as customer needs changed.
- Up and running quickly. This company had customers ready to go and needed a solution quickly. It would be essential to get a solution in front of the customers NOW, rather than try to migrate them to a new way of operating once the dashboards were ready. Changes would certainly be needed post-launch, but this was accepted as part of the product strategy.
None of this seemed to be impossible. I've worked on many data products with similar goals and constraints. Product teams always want to have a platform that's cost-effective, doesn't strain the technical capabilities of the organization, is flexible, and is launched sooner rather than later. It was only after a few more conversations that the problem arose: uncertain data sources.
Most data-driven products work like this: you've got a workflow application such as a help desk application or an ordering system that generates data into a database that you control. You know what data is flowing out of the workflow application and therefore, you understand the data that is available for your analytics. You connect to your database, transform the data into an analytics-ready state, then display the information in analytics on a dashboard. The situation here was different. As a services company, this business had to operate in a technology environment dictated by the customer. Some customers might use Salesforce, some might use Sugar CRM. Still others might use Zoho or one of the myriad other CRM platforms available. Although the team would structure the dashboards and analytics based on their best practices and unique methodology, the data driving the analytics product would differ greatly from customer to customer.
Data can be vast and overwhelming, so understanding the different types helps to simplify what kind of numbers we are looking for. Even with the treasure trove of data most organizations have in-house, there are tons of additional data sets that can be included in a project to add valuable context and create even deeper insights. It’s important to keep in mind what type of data it is, when and where it was created, what else was going on in the world when this data was created, and so forth. Using the example of a restaurant, let’s look at some different types of data and how they could impact an analytics project.
Numerical data is something that is measurable and always expressed in numerical form. For example, the number of diners attending a particular restaurant over the course of a month or the number of appetizers sold during a dinner service. This can be segmented into two sub-categories.
Discrete data represent items that can be counted and is listed as an exact number and take on possible values that can be listed out. The list of possible values may be fixed (also called finite); or it may go from 0, 1, 2, on to infinity (making it countably infinite). For example:
Number of diners that ate at the restaurant on a particular day (you can’t have half a diner.)
Amount of beverages sold each week.
How many employees were staffed at the restaurant on a day.
Continuous data represent measurements; their possible values cannot be counted and can only be described using intervals on the real number line. For example, the exact amount of vodka left in the bottle would be continuous data from 0 mL to 750 mL, represented by the interval [0, 750], inclusive. Other examples:
Pounds of steak sold during dinner service
The high temperature in the city on a particular day
How many ounces of wine was poured in a given week
You should be able to do most mathematical operations on numerical data as well as list in ascending/descending order and display in fractions.