January 12, 2018

Data can be vast and overwhelming, so understanding the different types helps to simplify what kind of numbers we are looking for. Even with the treasure trove of data, most organizations have in-house. There are tons of additional data sets that can be included in a project to add valuable context and create even deeper insights. It’s important to keep in mind what type of data it is, when and where it was created, what else was going on in the world when this data was created, and so forth. Using the example of a restaurant, let’s look at some different types of data and how they could impact an analytics project.

Numerical data is something that is measurable and always expressed in numerical form. For example, the number of diners attending a particular restaurant over the course of a month or the number of appetizers sold during a dinner service. This can be segmented into two sub-categories.

Discrete data represents items that can be counted and are listed as an exact number and take on possible values that can be listed out. The list of possible values may be fixed (also called finite); or it may go from 0, 1, 2, on to infinity (making it countably infinite). For example:

- Number of diners that ate at the restaurant on a particular day (you can’t have half a diner).
- Amount of beverages sold each week.
- How many employees were staffed at the restaurant on a day.

Continuous data represents measurements; their possible values cannot be counted and can only be described using intervals on the real number line. For example, the exact amount of vodka left in the bottle would be continuous data from 0 mL to 750 mL, represented by the interval [0, 750], inclusive. Other examples:

- Pounds of steak sold during dinner service
- The high temperature in the city on a particular day
- How many ounces of wine was poured in a given week

You should be able to do most mathematical operations on numerical data as well as list in ascending/descending order and display in fractions.

Categorical data represents characteristics such as sex, or the types of books someone likes. Categorical data can take on numerical values (such as “1” indicating male and “2” indicating female), but those numbers don’t have any mathematical meaning and can’t be added together. (Categorical data may also be referred to as qualitative data, or Yes/No data.) For example:

- Marital status
- Hometown
- Favorite types of restaurant

Ordinal data combines numerical and categorical data. The data does fall into categories, however the numbers assigned to each category have meaning. Take Yelp as an example. Rating restaurants on a scale from 0 (lowest) to 5 (highest) stars yields ordinal data. Although ordinal data is often visualized in charts and graphics, unlike categorical data, the numbers do have an associated mathematical meaning. From a survey of 1,000 people rating a restaurant on a scale from 0 to 5, taking the average of the 1,000 responses will have meaning and thus would not be classified as categorical data. Other examples:

- Average customer satisfaction rating for a given month
- Google seller rating
- OpenTable rating

Looking at our restaurant, there are a lot of different ways we can approach analysis using contextual data. Incorporating weather data and seasonality with sales data may help us better understand which items sell better during specific seasons and weather conditions. Additionally, being able to identify how the number of employees working in the restaurant and the number of diners at the restaurant on a given day effect the average amount of sale per diner and the average Yelp review score could be interesting. If we owned a chain of restaurants, we could create a benchmark scorecard for each location to enable performance comparison within our group of restaurants.

The specific data sets that will be most relevant to a particular analytics use case will vary based on industry and the focus of the project, but the main point to keep in mind is that no data point lives in a vacuum. Regardless of the type of data, what these examples highlight is that there is plenty of information you can gather from the information you already have; when you enrich that data your insights grow more profound.

If you’re interested to learn more about Keboola, check out how we helped CSC at context to their digital marketing analytics by bring together over 50 data sources.