September 19, 2016
At the same time as the announcement about default backend in KBC being shifted to Snowflake, I have started working on a new project. The customer pushed us the initial dump of two main tables (10M rows each) and some other small attribute tables.
The main transformation handles initial join of one of the big table and a handful of small ones…
… and then it is meshed with the other big table, on which we have to calculate duration…
… and apply multiple cases
Originally, I had the complete transformations done in MySQL since I’ve used that transformation for data exploration on a small sample of data. However, when trying on the whole set, I had to kill the transformation after 36 minutes of running and not delivering a single output table…
Since MySQL was not an option, I have tested and compared both Redshift and new Snowflake transformations. It took almost 20 minutes for Redshift to cope with this transformation. In contrast, it took only 5 minutes for Snowflake!
Furthermore, I have explored (thanks Marcus) the difference between CREATE TABLE and CREATE VIEW : There is a subtle difference between those, it will actually shave off some time to use CREATE VIEW (the difference is between 7-10%).
The purpose of this exercise was to evaluate the speed from the KBC user perspective. In other words, I have compared the whole transformation completion time - both table data load and the query time. This represents the full user experience on the respective storage and transformation back-ends.