Chapter 5 Big Data
Data has become the basis of almost every decision making for firms’ strategies. Digital marketing brings wide array of data not captured before that could bring new insights on consumers’ opinions and behavior. Thus, big data is very important and powerful in decision making as well as in improving our lives. The steps involved in using the big data constitutes:
- Data extraction and analysis
- Personalization and customization
- Continuous experiments
- Privacy and data monitoring
5.1 Data Extraction and Analysis
The wide array of data could be categorized into two groups:
- Structured Data: numerical data on consumer purchasing, participation in social media, or exposure to online marketing etc.
- Unstructured Data: text, audio, or even video contents freely provided by consumers.
Because of the scale of these structured and unstructured data, they are often called ‘Big Data’. These Big Data are characterized these three features:
- Volume: Large number of data points are available sometimes at a very granular level such as at individual level. High volume implies need for models that are scalable.
- Variety: Data now comes in wide variety of format, some are structured and other are unstructured, hence it is messy. High variety may require integration across disciplines to analyze the data and gain meaningful insights.
- Velocity: Large number of data points are created within a short span of time. High velocity opens opportunities for real-time, or virtually real-time, marketing decision making that may or may not be automated.
Big data helps us to see and understand the relations within and among the pieces of information that come from variety of sources in variety of formats. These relations are often measured in terms of correlation3across various constructs. Note that correlation doesn’t imply causality and with big data sometimes we may misinterpret these association across constructs as causation. Correlations show what, not why. Furthermore, big data also helps us to organize and analyze the data at various layers.
The variety of data that comes in unstructured format often requires quantifying them before it can be tabulated and analyzed. This process is called datafication. Note that datafication is different from digitization where information is simply coded into stream of bits. Some of the examples of datafication are:
- Presenting user generated contents as sentiments.
- Presenting location as points of longitude and latitude.
- Interactions among users (e.g., on social media) are represented as ‘social graph’.
5.2 Personalization and Customization
Each individual is different from others in terms of his/her preferences or choices. These individual differences are often characterized as consumer heterogeneity in marketing. Due to heterogeneity across individuals there is need to come up with marketing strategies that satisfy the need of individual consumers. Hence, personalized searches, services, and ads have the potential of revolutionizing marketing and digital market place makes it possible.
Furthermore, consumers preferences and choices are not static and they keep on evolving. Therefore, these personalized services need to be customized at various points of time depending on changes in user preferences and choices.
5.3 Experiments
Although big data may give great insights based on correlation, but it is not same as causation. Let’s take an example to highlight this point.
Example-1. Suppose we observe that there are more police in areas with higher crime rates—can we conclude that police cause crime? More specifically, does the correlation mean that if you assign more police to an area will you get more crime? You may have a very good model that can predict the crime rate by precinct depending on how many police have been assigned to that precinct, but that model could fail miserably in estimating how the crime rate changes when you add more policemen. It is quite possible to see a positive relationship based on the observational data and a negative relation based on an actual experiment.
Example-2. A marketing manager is asked, How do you know increased advertising will generate more sales?''Look at this chart,‘’ he responded, ``Every December I increase ad spend, and every December I get more sales.’’
These types of correlation are often called spurious correlation4 and may lead to bad judgement. Some of the questions and especially those relating to policy decisions it is very important to know the direction of causality and the size of effects precisely. For example, companies are interested in use of big data to estimate their demand functions to know precisely, ``what is the effect of price cuts on amounts sold?’’ To answer such critical questions econometrics comes in handy - as the goal is to establish a causal relationship.
The gold standard for establishing causality is: Experiments. The idea is to randomly assign treatment-control experiments. Some examples of experiments are:
- User interface design
- Ranking results search and ads
- Feature experiments
- Product design
5.4 Privacy and Data Monitoring
In this increasingly digitized world most of our transactions are computer mediated and hence recorded in digital format. Thus, one can observe the wide array of consumer behaviors more precisely better than ever. Furthermore, people are willing to share more personal data in this connected world if they think they are getting some value in return. Given the sensitivity of these data it has become very important to monitor the data and respect individuals’ privacy.
Correlation quantifies the statistical relationship between two data values. A strong correlation means that when one of the data values changes, the other is highly likely to change as well.↩
Check this site for some hilarious spurious correlations http://tylervigen.com/↩