Marketing Analytics for Data-Rich Environments
Declaration: This is a summary of the white paper by Michel Wedel & P.K. Kannan
The routine capture of digital information through online and mobile applications produces vast data streams on how consumers feel, behave, and interact around products and services as well as how they respond to marketing efforts. Data are assuming an increasingly central role in organizations, as marketers aim to harness data to build and maintain customer relationships; personalize products, ser- vices, and the marketing mix; and automate marketing processes in real time. The explosive growth of media, channels, digital devices, and software applications has provided ﬁrms with unprecedented opportunities to leverage data to offer more value to customers, enhance their experiences, increase their satisfaction and loyalty, and extract value. Although big data’s potential may have been over- hyped initially, and companies may have invested too much in data capture and storage and not enough in analytics, it is becoming clear that the availability of big data is spawning data-driven decision cultures in companies, providing them with competitive advantages, and having a signiﬁcant impact on their ﬁnancial performance. The increasingly widespread recognition that big data can be leveraged effectively to support marketing decisions is highlighted by the success of industry leaders. Entirely new forms of marketing have emerged, including recommendations, geo-fencing, search marketing, and retargeting. Marketing analytics has come to play a central role in these developments, and there is urgent demand for new, more powerful metrics and analytical methods that make data-driven marketing operations more efﬁcient and effective. However, it is yet not sufﬁciently clear which types of analytics work for which types of problems and data, what new methods are needed for analyzing new types of data, or how companies and their management should evolve to develop and implement skills and procedures to compete in this new environment.
Key domains for analytics applications are
(1) customer relationship management (CRM), with methods that help acquisition, retention, and satisfaction of customers to improve their lifetime value to the ﬁrm2;
(2) the marketing mix, with methods, models, and algorithms that support the allocation of resources to enhance the effectiveness of marketing effort;
(3) personalization of the marketing mix to individual consumers, in which signiﬁcant advances have been made as a result of the development of various approaches to capture customer heterogeneity; and
(4) privacy and security, an area that is of increasing concern to ﬁrms and regulators.
These domains lead to two pillars of the successful development and implementation of marketing analytics in ﬁrms:
(1) the adoption of organizational structures and cultures that foster data-driven decision making and
(2) the education and training of analytics professionals.
A Brief History of Marketing Data and Analytics
Marketing analytics involves collection, management, and analysis—descriptive, diagnostic, predictive, and prescriptive— of data to obtain insights into marketing performance, maximize the effectiveness of instruments of marketing control and optimize ﬁrms’ return on investment (ROI). It is inter- disciplinary, being at the nexus of marketing and other areas of business, mathematics, statistics, economics, econometrics, psychology, psychometrics, and, more recently, computer science. Marketing analytics has a long history, and as a result of explosive growth in the availability of data in the digital economy in the last two decades, ﬁrms have increasingly recognized the key competitive advantages that analytics may afford, which has propelled its development and deployment (Davenport 2006).
The history of the systematic use of data in marketing starts around 1910 with the work of Charles Coolidge Parlin for the Curtis Publishing Company in Boston (Bartels 1988, p. 125). Parlin gathered information on markets to guide advertising and other business practices, prompting several major U.S. companies to establish commercial research departments. Duncan (1919) emphasized the use of external in addition to internal data by these departments. Questionnaire survey research, already conducted in the context of opinion polls by Gallup in the 1820s, became increasingly popular in the 1920s (Reilly 1929). Around that time, concepts from psychology were being brought into marketing to foster greater understanding of the consumer. Starch’s (1923) attention, interest, desire, action (AIDA) model is a prime example, and he is credited for the widespread adoption of copy research. This era also saw the ﬁrst use of eye-tracking data (Nixon 1924).
In 1923, A.C. Nielsen founded one of the ﬁrst market research companies. Nielsen started by measuring product sales in stores, and in the 1930s and 1950s, he began assessing radio and television audiences. In 1931, the market research ﬁrm Burke was founded in the United States, and it initially did product testing research for Procter & Gamble. In 1934, the market research ﬁrm GfK was established in Germany. The next decade saw the rise of ﬁeld experiments and the increased use of telephone surveys (White 1931).
Panel data became increasingly popular, at ﬁrst mostly for measuring media exposure, but in the 1940s ﬁrms began using panel data to record consumer purchases (Stonborough 1942). George Cullinan, who introduced the “recency, frequency, monetary” metrics that became central in CRM (Neslin 2014), stimulated the use of companies’ own customer data beginning in 1961. In 1966, the Selling Areas Marketing Institute was founded, which focused on warehouse withdrawal data. The importance of computers for marketing research was ﬁrst recognized around that time as well (Casher 1969).
Beginning in the late 1970s, geo-demographic data were amassed from government databases and credit agencies by the market research ﬁrm Claritas, founded on the work by the sociologist Charles Booth around 1890. The introduction of the Universal Product Code and IBM’s computerized point- of-sale scanning devices in food retailing in 1972 marked the ﬁrst automated capture of data by retailers. Companies such as Nielsen quickly recognized the promise of using point- of-sale scanner data for research purposes and replaced bimonthly store audits with more granular scanner data. Soon, individual customers could be traced through loyalty cards, which led to the emergence of scanner panel data (Guadagni and Little 1983). The market research ﬁrm IRI, which measured television advertising since the company’s founding in 1979, rolled out its in-home barcode scanning service in 1995.
The use of internal customer data was greatly propelled by the introduction of the personal computer to the mass market by IBM in 1981. Personal computers enabled marketers to store data on current and prospective customers, which contributed to the emergence of database marketing, pioneered by Robert and Kate Kestnbaum and Robert Shaw (1987). In 1990, CRM software emerged, for which earlier work on sales force automation at Siebel Systems paved the way. Personal computers also facilitated survey research through personal and telephone interviewing.
In 1995, after more than two decades of development at the Defense Advanced Research Projects Agency and other organizations, the World Wide Web came into existence, and this led to the availability of large volumes of marketing data. Clickstream data extracted from server logs were used to track page views and clicks using cookies. Click-through data yielded measures of the effectiveness of online advertising. The Internet stimulated the development of CRM systems by ﬁrms such as Oracle, and in 1999 Salesforce was the ﬁrst company to deliver CRM systems through cloud computing. Google was founded in 1998, and it championed keyword search and the capture of search data. Search engines had been around since the previous decade; the ﬁrst ﬁle transfer protocol search engine Archie was developed at McGill University. The advent of user-generated content, including online product reviews, blogs, and video, resulted in increasing volume and variety of data. The launch of Facebook in 2004 opened an era of social network data. With the advent of YouTube in 2005, vast amounts of data in the form of user-uploaded text and video became the raw material for behavioral targeting. Twitter, with its much simpler 140- character messages, followed suit in 2006. Smartphones had existed since the early 1990s, but the introduction of the Apple iPhone in 2007, with its global positioning system (GPS) capabilities, marked the onset of the capture of consumer location data at an unprecedented scale.
The initiative of the Ford Foundation and the Harvard Institute of Basic Mathematics for Applications in Business (in 1959/1960) is widely credited for having provided the major impetus for the application of analytics to marketing (Winer and Neslin 2014). It led to the founding of the Marketing Science Institute in 1961, which has since had a continued role in bridging marketing academia and practice. Statistical methods (e.g., analysis of variance) had been applied in marketing research for more than a decade (Ferber 1949), but the development of statistical and econometric models tailored to speciﬁc marketing problems took off when marketing was recognized as a ﬁeld of decision making through the Ford/Harvard initiative (Bartels 1988, p. 125). The development of Bayesian decision theory at the Harvard Institute (Raiffa and Schlaifer 1961) also played a role, exempliﬁed by its successful application to, among other things, pricing decisions by Green (1963). Academic research in marketing began to focus more on the development of statistical models and predictive analytics. Although it is not possible to review all subsequent developments here (for an extensive review, see Winer and Neslin 2014), we note a few landmarks.
New product diffusion models (Bass 1969) involved applications of differential equations from epidemiology. Stochastic models of buyer behavior (Massy, Montgomery, and Morrison 1970) were rooted in statistics and involved distributional assumptions on measures of consumers’ purchase behavior. The application of decision calculus (Little and Lodish 1969; Lodish 1971) to optimize spending on advertising and the sales force became popular after its introduction to marketing by Little (1970). Market share and demand models for store-level scanner data (Nakanishi and Cooper 1974) were derived from econometric models of demand. Multidimensional scaling and unfolding techniques, founded in psychometrics (Coombs 1950), became an active area of research, with key contributions by Green (1969) and DeSarbo (DeSarbo and Rao 1986). These techniques paved the way for market structure and product positioning research by deriving spatial maps from proximity and preference judgments and choice. Conjoint analysis (Green and Srinivasan 1978) and, later, conjoint choice analysis (Louvie´re and Woodworth 1983) are unique contributions that evolved from work in psychometrics by Luce on the quantiﬁcation of psychological attributes (Luce and Tukey 1964). Scanner panel–based multinomial logit models (Guadagni and Little 1983) were built directly on research in econometrics by McFadden (1974). The nested logit model that captures hierarchical consumer decision making was introduced in marketing (Kannan and Wright 1991), and it was recognized that models of multiple aspects of consumer behavior (e.g., incidence, choice, timing, quantity) could be integrated (Gupta 1988). This proved to be a powerful insight for models of recency, frequency, and monetary metrics (Schmittlein and Peterson 1994). Whereas previous methods to identify competitive market structures were based on estimated cross-price elasticities, models that derive competitive maps from panel choice data were developed on the basis of the notion that competitive market structures arise from consumer perceptions of substitutability, revealed through their choices of products (Elrod 1988). Time-series methods (DeKimpe and Hanssens 1995) enabled researchers to test whether marketing instruments resulted in permanent or transient changes in sales.
Heterogeneity in the behaviors of individual consumers became a core premise on which marketing strategy was based, and the mixture choice model was the ﬁrst to enable managers to identify response-based consumer segments from scanner data (Kamakura and Russell 1989). This model was generalized to accommodate a wide range of models of consumer behavior (Wedel and DeSarbo 1995). Consumer heterogeneity was represented in a continuous fashion in hierarchical Bayes models (Rossi, McCulloch, and Allenby 1996). Although scholars initially debated which of these two approaches best represented heterogeneity, research has shown that the approaches each match speciﬁc types of marketing problems, with few differences between them (Andrews, Ainslie, and Currim 2002). It can be safely said that the Bayesian approach is now one of the dominant modeling approaches in marketing, offering a powerful framework to develop integrated models of consumer behavior (Rossi and Allenby 2003). Such models have been successfully applied to advertisement eye tracking (Wedel and Pieters 2000), e-mail marketing (Ansari and Mela 2003), web browsing (Montgomery et al. 2004), social networks (Moe and Trusov 2011), and paid search advertising (Rutz, Trusov, and Bucklin 2011).
The derivation of proﬁt-maximizing decisions, inspired by the work of Dorfman and Steiner (1954) in economics, formed the basis of the operations research (OR) approach to optimal decision making in advertising (Parsons and Bass 1971), sales force allocation (Mantrala, Sinha, and Zoltners 1994), target selection in direct marketing (Bult and Wansbeek 1995), and customization of online price discounts (Zhang and Krishnamurthi 2004). Structural models founded in economics include approaches that supplement aggregate demand equations with supply-side equilibrium assumptions (Chintagunta 2002), based on the work of the economists Berry, Levinsohn, and Pakes (1995). A second class of structural models accommodates forward-looking behavior (Erdem and Keane 1996), based on work in economics by Rust (1987). Structural models allow for predictions of agent shifts in behavior when policy changes are implemented (Chintagunta et al. 2006).
From Theory to Practice
Roberts, Kayande, and Stremersch (2014) empirically demonstrate the impact of these academic developments on marketing practice. Through interviews among managers, they ﬁnd a signiﬁcant impact of several analytics tools on ﬁrm decision making. The relevance of these developments for the practice of marketing is further evidenced by examples of companies that were founded on academic work. Early cases of successful companies include Starch and Associates, a company that specialized in ad copy testing based on Starch’s academic work, and John D.C. Little and Glen L. Urban’s Management Decision Systems, which was later sold to IRI. Zoltman and Sinha’s work on sales force allocation was implemented in practice through ZS Associates. Claes Fornell’s work on the measurement of satisfaction led to the American Consumer Satisfaction Index, produced by his company, CFI Group. MarketShare, the company cofounded by Dominique Hanssens, successfully implemented his models on the long-term effectiveness of the marketing mix. Jan-Benedict E.M. Steenkamp founded AiMark, a joint venture with GfK that applies academic methods and con- cepts particularly in international marketing. Virtually all of these companies became successful through the application of analytics.
Examples of companies with very close ties to academia include Richard M. Johnson’s Sawtooth Software, which specializes in the design and analysis of and software for conjoint studies, and Steven Cohen and Mark Garratt’s In4mation Insights, which applies comprehensive Bayesian statistical models to a wide range of applied problems including marketing-mix modeling. In some cases, marketing academia lags behind developments in practice and so focuses instead on the impact and validity of these developments in practice. In other cases, academics are coinvestigators who rely on data and problems provided by companies and work together with these companies to develop implementable analytics solutions. Yet, as we discuss next, in an increasing number of application areas in the digital economy, academics are leading the development of new concepts and methods.
The development of data-driven analytics in marketing from around 1900 until the introduction of the World Wide Web in 1995 has progressed through approximately three stages:
(1) the description of observable market conditions through simple statistical approaches,
(2) the development of models to provide insights and diagnostics using theories from economics and psychology, and
(3) the evaluation of marketing policies, in which their effects are predicted, and marketing decision making is supported using statistical, econometric, and OR approaches.
In many cases throughout the history of marketing analytics, soon after new sources of data became available, methods to analyze them were introduced or developed (for an outline of the history of data and analytical methods, see Figure 2; Table 1 summarizes state-of-the-art approaches). Many of the methods developed by marketing academics since the 1960s have now found their way into practice and support decision making in areas such as CRM, marketing mix, and personalization and have increased the ﬁnancial performance of the ﬁrms deploying them.
Since 2000, the automated capture of online clickstream, messaging, word-of-mouth (WOM), transaction, and location data has greatly reduced the variable cost of data col- lection and has resulted in unprecedented volumes of data that provide insights on consumer behavior at exceptional levels of depth and granularity. Although academics have taken up the challenge to develop diagnostic and predictive models for these data in the last decade, these developments are admittedly still in their infancy. On the one hand, descriptive metrics displayed on dashboards are popular in practice. This could be the result of constraints on computing power, a need for rapid real-time insights, a lack of trained analysts, and/or the presence of organizational barriers to implementing advanced analytics. In particular, unstructured data in the form of blogs, reviews, and tweets offer opportunities for deep insights into the economics and psychology of consumer behavior, which could usher in the second stage in digital marketing analytics once appropriate models are developed and applied. On the other hand, machine learning methods from computer science (including deep neural networks and cognitive systems, which we discuss subsequently; see Table 1) have become popular in practice but have been infrequently researched in marketing aca- demia. Their popularity may stem from their excellent predictive performance and black-box nature, which allows for routine application with limited analyst intervention. The question is whether marketing academics should jump on the machine learning bandwagon, something they may have been reluctant to do because these techniques do not establish causal effects or produce generalizable theoretical insights. However, combining these approaches with more classical models for marketing analytics may address these shortcomings and hold promise for further research (Table 2). It is reasonable to expect that the third step in the evolution of analytics in the digital economy—the development of models to generate diagnostic insights and support real-time decisions from big data—is imminent. However, marketing academia will need to develop analytical methods with a keen eye for data volume and variety as well as speed of computation, components that have thus far been largely ignored (see Table 2). In the remainder of this article, we review recent developments and identify potential barriers and opportunities toward successful implementation of analytics to support marketing decisions in data-rich environments.
Data and Analytics
Types of Data
Big data is often characterized by the four “Vs”: volume (from terabytes to petabytes), velocity (from one-time snapshots to high-frequency and streaming data), variety (numeric, network, text, images, and video), and veracity (reliability and validity). The ﬁrst two characteristics are important from a computing standpoint, and the second two are important from an analytics standpoint. Sometimes a ﬁfth “V” is added: value. It transcends the ﬁrst four and is important from a business standpoint. Big data is mostly observational, but surveys, ﬁeld experiments, and lab experiments may yield data of large variety and high velocity. Much of the excitement surrounding big data is exempliﬁed by the scale and scope of observational data generated by the “big three” of big data: Google, Amazon, and Facebook. Google receives more than 4 million search queries per minute from the 2.4 billion Internet users around the world and processes 20 petabytes of information per day. Face- book’s 1.3 billion users share 2.5 million pieces of content each minute. Amazon has created a marketplace with 278 million active customers from which it records data on online browsing and purchasing behavior. These and other ﬁrms have changed the landscape of marketing in the last decade through the generation, provision, and utilization of big data.
Emerging solutions to link customer data across online and ofﬂine channels and across television, tablet, mobile, and other digital devices will further contribute to the availability of data. Moreover, in 2014, well over 15 billion devices were equipped with sensors that enable them to connect and transfer data over networks without human interaction. This “Internet of Things” may become a major source of new product and service development and generate massive data in the process.
Surveys have become much easier to administer with the advances in technology allowing for online and mobile data collection (e.g., Amazon Mechanical Turk). Firms continuously assess customer satisfaction; new digital interfaces require this to be done with short surveys to reduce fatigue and attrition. For example, loyalty is often evaluated with single-item Net Promoter Scores. Therefore, longitudinal and repeated cross-section data are becoming more common. Mittal, Kumar, and Tsiros (1999) use such data to track the drivers of customer loyalty over time. To address the issue of shorter questionnaires, analytic techniques have been developed to create personalized surveys that are adaptive on the basis of the responses to earlier questions (Kamakura and Wedel 1995) as well as the design of tailored split- questionnaires for massive surveys (Adigu¨ zel and Wedel 2008).
Digital technologies facilitate large-scale ﬁeld experiments that produce big data and have become powerful tools for eliciting answers to questions on the causal effects of marketing actions. For example, large-scale A/B testing enables ﬁrms to “test and learn” for optimizing website designs, (search, social, and mobile) advertising, behavioral targeting, and other aspects of the marketing mix. Hui et al. (2013) use ﬁeld experiments to evaluate mobile promotions in retail stores. Alternatively, natural (or quasi-) experiments capitalize on exogenous shocks that occur naturally in the data to establish causal relations, but often more extensive analytical methods (including matching and instrumental variables methods) are required to establish causality. For ex- ample, Ailawadi et al. (2010) show how quasi-experimental designs can be used to evaluate the impact of the entry of Wal-Mart stores on retailers, using a before-and-after design with a control group of stores matched on a variety of measures. Another way to leverage big data to assess causality is to examine thin slices of data around policy changes that occur in the data, which can reveal the impact of those changes on dependent variables of interest through so-called regression discontinuity designs (Hartmann, Nair, and Narayanan 2011).
Finally, lab experiments typically generate smaller volumes of data, but technological advances have allowed for online administration and collection of audio, video, eye- tracking, face-tracking (Teixeira, Wedel, and Pieters 2010), and neuromarketing data obtained from electroencephalography and brain imaging (Telpaz, Webb, and Levy 2015). Such data are collected routinely by ﬁrms such as Nielsen, and they often yield p > n data with more variables than respondents. Meta-analysis techniques can be used to generalize ﬁndings across large numbers of these experiments (Bijmolt, Van Heerde, and Pieters 2005).
Software for Big Data Processing
Figure 3 provides an overview of the classes of marketing data discussed previously and methods to store and manipulate it. For small to medium-sized structured data, the conventional methods such as Excel spreadsheets; ASCII ﬁles; or data sets of statistical packages such as SAS, S-Plus, STATA, and SPSS are adequate. SAS holds up particularly well as data size increases and is popular in many industry sectors (e.g., retailing, ﬁnancial services, government) for that reason. As the number of records goes into the millions, relational databases such as MySQL (used by, e.g., Wikipedia) are increasingly effective for data manipulation and for querying. For big and real-time web applications in which volume, variety, and velocity are high, databases such as NoSQL are the preferred choice because they provide a mechanism for storage and retrieval of data that does not require tabular relations like those in relational databases, and they can be scaled out across commodity hardware. Apache Cassandra, an open-source software initially developed by Facebook, is a good example of such a distributed database management system. Hadoop, originally developed at Yahoo!, is a system to store and manipulate data across a multitude of computers, written in the Java programming language. At its core are the Hadoop distributed ﬁle management system for data storage and the MapReduce programming framework for data processing. Typically, applications are written in a language such as Pig, which maps queries across pieces of data that are stored across hundreds of computers in a parallel fashion and then combines the information from all to answer the query. SQL engines such as Dremel (Google), Hive (Hortonworks), and Spark (Databricks) allow very short response times. For postprocessing, however, such high- frequency data are often still stored in relational databases with greater functionality.
C++, Fortran, and Java are powerful and fast low-level programming tools for analytics that come with large libraries of routines. Java programs are often embedded as applets within the code of web pages. R, used by Google, is a considerably slower but often-used open-source, higher-level programming language with functionality comparable to languages such as MATLAB. Perl is software that is suited for processing unstructured clickstream (HTML) data; it was initially used by Amazon but has been mostly supplanted by its rival Python (used by Dropbox), which is a more intuitive programming language that enables MapReduce implementation. Currently, academic research in marketing analytics already relies on many of these programming languages, and R seems to be the most popular. Much of this software for big data management and processing likely will become an integral part of the ecosystem of marketing academics and applied marketing analysts soon.
Volume, Variety, Velocity: Implications for Big Data Analytics
The question is whether better business decisions require more data or better models. Some of the debate surrounding that question originates in research at Microsoft, in which Banko and Brill (2001) showed that in the context of text mining, algorithms of different complexity performed similarly, but adding data greatly improved performance. Indeed, throughout the academic marketing literature, complex models barely outperform simpler ones on data sets of small to moderate size. The answer to the question is rooted in the bias–variance trade-off. On the one hand, bias results from an incomplete representation of the true data-generating mechanism (DGM) by a model because of simplifying assumptions. A less complex model (one that contains fewer parameters) often has a higher bias, but a model needs to simplify reality to provide generalizable insights. To quote statistician George Box, “All models are wrong, but some are useful.” A simple model may produce tractable closed-form solutions, but numerical and sampling methods allow for examination of more complex models at higher computational cost. Model averaging and ensemble methods such as bagging or boosting address the bias in simpler models by averaging many of them (Hastie, Tibshirani, and Friedman 2008). In marketing, researchers routinely use model-free evidence to provide conﬁdence that more complex models accurately capture the DGM (see, e.g., Bronnenberg, Dube´, and Gentzkow 2012). Field experiments are increasingly popular because data quality (veracity) can substitute for model complexity: when the DGM is under the researchers’ control, simpler models can be used to make causal inferences (Hui et al. 2013). Variance, on the other hand, results from random variation in the data due to sampling and measurement error. A larger volume of data reduces the variance. Complex models calibrated on smaller data sets often over-ﬁt the data (i.e., they capture random error rather than the DGM). The notion that more data reduces error is well known to beneﬁt machine learning methods such as neural networks, which are highly parameterized (Geman, Bienenstock, and Doursat 1992). However, not all data are created equal. A larger volume of data reduces variance, and even simpler models will ﬁt better. Yet as data variety increases and data become richer, the underlying DGM expands. Much of the appeal of big data in marketing is that it provides traces of consumer behaviors (e.g., activities, interests, opinions, interactions) that were previously costly to observe even in small samples. To fully capture the information value of these data, more complex models are needed. Those models will support deeper insights and better decisions, while, at the same time, large volumes of data will support such richer representations of the DGM. However, these models come at greater computational costs.
Many current statistical and econometric models and the estimation methods used in the marketing literature are not designed to handle large volumes of data efﬁciently. Solutions to this problem involve data reduction, faster algorithms, model simpliﬁcation, and/or computational solutions, which we discuss next. To fully support data-driven marketing decision making, the ﬁeld of marketing analytics needs to encompass four levels of analysis:
(1) descriptive data summarization and visualization for exploratory purposes,
(2) diagnostic explanatory models that estimate relationships between variables and allow for hypothesis testing,
(3) predictive models that enable forecasts of variables of interest and simulation of the effect of marketing control settings, and
(4) prescriptive optimization models that are used to determine optimal levels of marketing control variables.
Figure 4 shows that the feasibility of these higher levels of analysis decreases as a function of big data dimensions. It illustrates that the information value of the data grows as its volume, variety, and velocity increases but that the decision value derived from analytical methods increases at the expense of increased model complexity and computational cost.
In the realm of structured data, in which many of the advances in marketing analytics have been so far, all four levels of analysis are encountered. Many of the developments in marketing engineering (Lilien and Rangaswamy 2006) have been in this space as well, spanning a very wide range of areas of marketing (including pricing, advertising, promotions, sales force, sales management, competition, distribution, marketing mix, branding, segmentation and positioning, new product development, product portfolio, loyalty, acquisition, and retention). Explanatory and predictive models, such as linear and logistic regression and time-series models, have traditionally used standard econometric estimation methods such as generalized least squares, method of moments, and maximum likelihood. These optimization-based estimation methods become unwieldy for complex models with a large number of parameters. For complex models, simulation- based likelihood and Bayesian Markov chain Monte Carlo (MCMC) methods are used extensively. Markov chain Monte Carlo is a class of Bayesian estimation methods, the primary objective of which is to characterize the posterior distribution of model parameters. Such methods involve recursively drawing samples of subsets of parameters from their conditional posterior distributions (Gelman et al. 2003). This makes it possible to ﬁt models that generate deep insight into the underlying phenomenon with the aim of generating predictions that generalize across categories, contexts, and markets. Optimization models have been deployed for salesforce allocation, optimal pricing, conjoint analysis, optimal product/service design, optimal targeting, and marketing-mix applications.
There have been an increasing number of marketing analytics applications in the realm of unstructured data. Technological developments in processing unstructured data and the development of metrics from data summaries—such as provided by text-mining, eye-tracking, and pattern-recognition software—allow researchers to provide a data structure to facilitate the application of analytical methods. An example of the use of metrics as a gateway to predictive analytics includes the application by Netzer et al. (2012), who use text mining on user-generated content to develop competitive market structures. Once a data structure is put in place using metrics, researchers can build explanatory, prediction, and optimization models. Although the application of predictive and prescriptive approaches for unstructured data still lags, especially in practice, analyzing unstructured data in marketing seems to boil down to transforming them into structured data using appropriate metrics.
Large-volume structured data comprises four main dimensions: variables, attributes, subjects, and time (Naik et al. 2008). The cost of modeling structured data for which one or more of these dimensions is large can be reduced in one of two ways. First, one or more of the dimensions of the data can be reduced through aggregation, sampling, or selection; alternatively, situation-appropriate simpliﬁcations in model speciﬁcations can be used. Second, the speed and capacity of computational resources can be increased with approximations, more efﬁcient algorithms, and high-performance computing. Techniques for reducing the dimensionality of data and speeding up computations are often deployed simultaneously, and we discuss these subsequently.
Aggregation and compression.
Data volume can be reduced through aggregation of one or more of its dimensions, most frequently subjects, variables, or time. This can be done by simple averaging or summing—which, in several cases, yields sufﬁcient statistics of model parameters that make processing of the complete data unnecessary—as well as through variable-reduction methods such as principal component analysis and related methods, which are common in data mining, speech recognition, and image processing. For example, Naik and Tsai (2004) propose a semiparametric single-factor model that combines sliced inverse regression and isotonic regression. It reduces dimensionality in the analysis of high-dimensional customer transaction databases and is scalable because it avoids iterative solutions of an objective function. Naik, Wedel, and Kamakura (2010) extend this to models with multiple factors and apply it to the analysis of large data on customer churn.
Aggregation of data on different samples of customers (e.g., mobile, social, streaming, geo-demographic) can be accomplished by merging aggregated data along spatial (e.g. designated market area, zip code) or time (e.g., week, month) dimensions or through data-fusion methods (Gilula, McCulloch, and Rossi 2006; Kamakura and Wedel 1997). Data requirements for speciﬁc applications can be reduced by fusing data at different levels of aggregation. For example, if store-level sales data are available from a retailer, these could be fused with in-home scanner panel data. This creates new variables that can increase data veracity because the store data has better market coverage but no competitor information, while the reverse is true for the home scanning data. Fusion may also be useful when applying structural models of demand that recover individual-level heterogeneity from aggregate data (store-level demand), in which case the fusion with individual-level data (scanner panel data) can help identify the heterogeneity distribution. Feit et al. (2013) use Bayesian fusion techniques to merge such aggregate data (on customer usage of media over time) with disaggregate data (customers’ individual-level usage at each touch point) to make inferences about customer-level behavior patterns.
Selection can be used to reduce the dimensionality of big data in terms of variables, attributes, or subjects. Selection of subjects/customers can be used when interest focuses on speciﬁc well-deﬁned subpopulations or segments. Even though big data may have a large number of variables (p > n data), they may not all contribute to prediction. Bayesian additive regression tree approaches produce tree structures that may be used to select relevant variables. In the com- putationally intense Bayesian variable section approach, the key idea is to use a mixture prior, which enables the re- searcher to obtain a posterior distribution over all possible subset models. Alternatively, lasso-type methods can be used, which place a Laplace prior on coefﬁcients (Genkin, Lewis, and Madigan 2007). Routines have been developed for the estimation of these approaches using parallel computing (Allenby et al. 2014).
Computation. Many of the statistical and econometric models used in marketing are currently not scalable to big data. MapReduce algorithms (which are at the core of Hadoop) provide a solution and allow for the processing of very large data in a massively parallel way by bringing computation locally to pieces of the data distributed across multiple cores rather than copying the data in its entirety for input into analysis software. For example, MapReduce- based clustering, naive Bayes classiﬁcation, singular value decomposition, collaborative ﬁltering, logistic regression, and neural networks have been developed. This framework was initially used by Google and has been implemented for multicore desktop grids and mobile computing environments. Likelihood maximization is well suited for MapReduce because the log-likelihood consists of a sum across individual log-likelihood terms that can easily be distributed and allow for Map() and Reduce() operations. In this context, stochastic gradient descent (SGD) methods are often used to optimize the log-likelihood. Rather than evaluating the gradient of all terms in the sum, SGD samples a subset of these terms at every step and evaluates their gradient, which greatly economizes computations.
Analytics and Models
Rich internal and/or external data enable marketing analytics to create value for companies and help them achieve their short-term and long-term objectives. We deﬁne marketing analytics as the methods for measuring, analyzing, predicting, and managing marketing performance with the purpose of maximizing effectiveness and return on investment (ROI). Figure 5 shows how big data marketing analytics creates increasing diagnostic breadth, which is often particularly beneﬁcial for supporting ﬁrms’ long-term objectives.
After ﬁve decades of development, most marketing strategies and tactics now have their own well-speciﬁed data and analytical requirements. Academic marketing research has developed methods that speciﬁcally tackle issues in areas such as pricing, advertising, promotions, sales force, sales management, competition, distribution, branding, seg- mentation, positioning, new product development, product portfolio, loyalty, and acquisition and retention. Several marketing subﬁelds have had extensive development of analytical methods, so that a cohesive set of models and decision making tools is available, including CRM analytics, web analytics, and advertising analytics. Next, we discuss analytics for three closely connected core domains in more detail: marketing-/media-mix optimization, personalization, and privacy and security.
Marketing Mix/Media Mix
Models to measure the performance of the ﬁrm’s marketing mix, forecast its effects, and optimize its elements date back to the 1960s. We noted some of these landmark developments in the “Analytics” subsection (for reviews, see Gatignon 1993; Hanssens 2014; Hanssens, Parsons, and Schultz 2001; Leeﬂang et al. 2000; Rao 2014). As new sources of data become available, there are increased opportunities for better and more detailed causal explanations as well as recommendations for optimal actions at higher levels of speciﬁcity and granularity. This was the case when scanner data became available (see Wittink et al. 2011), and new sources of digital data will lead to similar developments. For example, digital data on competitive intelligence and external trends can be used to understand the drivers of performance under the direct control of the ﬁrm and disentangle them from the external factors such as competition, environmental, economic, and demographic factors and overall market trends. Incorporating new data sources. Research in marketing- mix allocation signiﬁcantly beneﬁts from two speciﬁc developments in data availability. The ﬁrst is the increased availability of extensive customer-level data from within ﬁrm environments—through conducting direct surveys of customers, measuring attitudes or satisfaction, or recording customer behavior in physical stores and on websites and mobile apps. Hanssens et al. (2014) take advantage of one source of such data—consumer mindset metrics—to better model marketing actions’ impact on sales performance. They ﬁnd that combining marketing-mix and attitudinal metrics in VAR models improves both the prediction of sales and recommendations for marketing- mix allocation. The second development involves using data collected on customers and prospects outside the ﬁrm environment in addition to data that are available within the ﬁrm. This may alleviate the problem that activities of (potential) customers with competitors are unobservable in internal data and may helpfully determine their path to purchase. For example, measures of online WOM (Godes and Mayzlin 2004), online reviews (Chevalier and Mayzlin 2006), or clickstreams (Moe 2003) can be included in marketing-mix models to provide better explanations and predictions of consumer choice and sales.
Attribution and allocation to new touch points. Data from new channels and devices are contributing to the development of new ways in which better marketing-mix decisions can be made. For example, while Prins and Verhoef (2007) examine the synergies between direct marketing and mass communications, Risselada, Verhoef, and Bijmolt (2014) take advantage of data from customers’ social net- works to understand the dynamic effects of direct marketing and social inﬂuence on the adoption of a high-technology product.
Fong, Fang, and Luo (2015) examine the effectiveness of locational targeting of mobile promotions using a randomized ﬁeld experiment and investigate targeting at the ﬁrm’s own location (geo-fencing) versus a competitor’s location (geo-conquesting). They ﬁnd that competitive locational targeting produces increasing returns to the depth of promotional discounts.
The aforementioned research highlights convergence of different media (television, Internet, and mobile) and the resultant spillovers of marketing-mix actions delivered through those media. The availability of individual-level paths to purchase data—across multiple online channels (e.g., display ads, afﬁliates, referrals, search), across devices (e.g., desktop, tablet, smartphones), or across online and ofﬂine touch points—will create signiﬁcant opportunities to understand and predict the impact of marketing actions at a very granular level.
In addition, increased options for marketers to inﬂuence
consumers—such as through ﬁrm-generated content in social media and content marketing, in which ﬁrms become content creators and publishers—have placed importance on the issue of understanding the individual effects of these options as part of the marketing mix. Newer methods and techniques are needed to accurately measure their impact. For example, Johnson, Lewis, and Nubbemeyer (2015) measure the effect of display ads using a new methodology that facilitates identification of the treatment effects of ads in a randomized experiment. They show it to be better than public service announcements and intent-to-treat A/B tests in minimizing the costs of tests. After such individual effects are measured, optimally allocating budgets across marketing/media-mix elements becomes possible.
Albers (2012) provides guidelines on how practical decision aids for optimal marketing mix allocation can be developed. He points to the need to study managers’ behavior to better determine the speciﬁcation of supply-side models. One of the important payoffs of working in a data-rich environment lies in the creation of decision aids to better budget and better allocate investments across the marketing mix, different products, market segments, and customers.
Assessing causality of marketing-mix effects. Assessing causality in marketing-mix models has received widespread attention in academia but unfortunately has not yet received as much attention in industry. If a marketing control variable is endogenously determined but not accounted for in the model (because of, e.g., missing variables, management actions dependent on sales outcomes), the DGM is not accurately captured. In that case, predictions of the effects of this marketing-mix element will be biased (Rossi 2014). This problem may be alleviated if exogenous IVs that are related to the endogenous control variable can be found. First, the variety in big data might help in ﬁnding better IVs, which is necessary because IVs are often problematic. In the case of television advertising, Shapiro (2014) exploits discontinuities in advertising spending in local designated market areas. Regression discontinuity designs that exploit variations in a possibly endogenous treatment variable on either side of a threshold are not economical in their data usage and may, therefore, beneﬁt from large data (Hartmann, Nair, and Narayanan 2011).
Personalization takes marketing-mix allocation one step further in that it adapts the product or service offering and other elements of the marketing mix to the individual users’ needs (Khan, Lewis, and Singh 2009). There are three main methods of personalization.
(1) Pull personalization provides a personalized service when a customer explicitly requests it. An example is Dell, which enables customers to customize the computer they buy in terms of prespeciﬁed product features.
(2) Passive personalization displays personalized information about products or services in response to related customer activities, but the consumer must act on that information. For example, Catalina Marketing Services, an industry leader of personalized coupons delivered at the checkout counter of brick-and-mortar retail stores, personalizes coupons based on shoppers’ purchase history recorded on their loyalty cards. Recommendation systems represent another example of this approach.
(3) Push personalization takes passive personalization one step further by sending a personalized product or service directly to customers without their explicit request. An example of this is Pandora, which creates online or mobile personalized radio stations. The radio stations are individually tailored on the basis of users’ initial music selections and similarities between song attributes extracted from the Music Genome database.
For each of these types of personalization, there are three possible levels of granularity:
(1) mass personalization, in which all consumers receive the same offering and/or marketing mix, personalized to their average taste;
(2) segment- level personalization, in which groups of consumers with homogeneous preferences are identiﬁed and the marketing mix is personalized in the same way for all consumers in one segment; and
(3) individual-level personalization, in which each consumer receives offerings and/or elements of the marketing mix customized to his or her individual tastes and behaviors.
However, the availability of big data with extensive individual-level information does not necessarily make it desirable for companies to personalize at the most granular level. Big data offers ﬁrms the opportunity to choose an optimal level of granularity for different elements of the marketing mix, depending on the existence of economies of scale and ROI. For example, a ﬁrm such as Ford Motor Company develops a global (mass) brand image; personalizes product and brand advertising to segments of customers; customizes sales effort, prices, and promotions at the individual level; and personalizes in-car experiences using imaging technology.
Recommendation systems are powerful personalization tools, with best-in-class applications by Amazon and Netﬂix. There are two basic types of recommendation engines that are based on content ﬁltering or collaborative ﬁltering, but there are also hybrid recommendation systems that combine features of both types. Content ﬁltering involves digital agents that make recommendations based on the similarity between a customer’s past preferences for products and services. Collaborative ﬁltering predicts a customer’s preferences using those of similar customers. Model-based systems use statistical methods to predict these preferences; the marketing literature has predominantly focused on these (Ansari, Essegaier, and Kohli 2000).
Conceptually, personalization consists of
(1) learning consumer preferences,
(2) adapting offerings to consumers, and
(3) evaluating the effectiveness of the personalization.
Some of the problems with ratings-based recommendation systems have prompted companies (e.g., Amazon) to use data obtained unobtrusively from customers as input for online and mobile personalization of services. These three stages have long been used in closed-loop marketing (CLM) strategies. In digital environments, CLM can be fully automated in a continuous cycle, which gives rise to adaptive personalization systems.
Adaptive personalization systems take personalization a step further by providing dynamically personalized services in real time (Steckel et al. 2005). For example, Groupon personalizes daily deals for products and services from local or national retailers and delivers them by e-mail or on mobile devices; as it collects more data on the individual subscriber, the deals are more accurately personalized. Another example is the buying and selling of online display ad impressions in real-time bidding auctions on ad-exchange platforms. These auctions are run fully automated in the time (less than one- tenth of a second) it takes for a website to load. The winning ad is instantly displayed on the publisher’s site. To construct autonomous bidding rules, advertisers (1) track consumers’ browsing behavior across websites, (2) selectively expose segments deﬁned based on those behaviors to their online display ads, and (3) record consumers’ click-through behavior in response to their ads. This enables ad placement to be targeted across consumers, time, ad networks, and websites at a very high level of granularity.
Privacy and Security
As more customer data are collected and personalization advances, privacy and security have become critical issues for big data analytics in marketing. According to a recent survey (Dupre 2015), more than three-quarters of consumers think that online advertisers have more information about them than they are comfortable with, and approximately half of them believe that websites ignore privacy laws.
The implication of the aforementioned factors for marketing analytics is that there will be increased emphasis on data minimization and anonymization (see also Verhoef, Kooge, and Walk 2016). Data minimization requires marketers to limit the type and amount of data they collect and retain and dispose of the data they no longer need. Data can be rendered anonymous using procedures such as k-anonymization (each record is indistinguishable from at least k – 1 others), removing personally identiﬁable information, recoding, swapping or randomizing data, or irreversibly encrypting data ﬁelds to convert data into a nonhuman readable form. However, although these methods protect privacy, they may not act as a deterrent to data breaches (Miller and Tucker 2011).
As a result of data minimization, less individual-level data may become available for analytics development in academic and applied research, and increasingly more data will become available in aggregated form only. Research in marketing analytics should develop procedures to accommodate minimized and anonymized data without degrading diagnostic and predictive power, and analytical methods that preserve anonymity. For example, the Federal Trade Commission requires data providers such as Experian or Claritas to protect the privacy of individual consumers by aggregating individual-level data at the zip code level. Direct marketers rely on these data but traditionally ignore the anonymized nature of zip code–level information when developing their targeted marketing campaigns.
Future studies should build on these research directions and focus on the following topics and questions (Table 2).
Big Marketing Data
1. How can the fusion of data generated within the ﬁrm with data generated outside the ﬁrm take advantage of metadata on the context of customer interactions? How can this be done in a way that enables real-time analytics and real-time decisions?
2. What new methodologies and technologies will facilitate the integration of “small stats on big data” with “big stats on small data” approaches? What are key trade-offs that need to be made to estimate realistic models that are sufﬁcient approximations?
3. How can ﬁeld experiments be used to generate big (observational) data to obtain valid estimates of marketing effects quickly enough to enable operational efﬁciency without delaying marketing processes?
4. How can machine learning methods be combined with econometric methods to facilitate estimation of causal effects from big data at high speeds? What speciﬁc conditions determine where these new methods should be designed in
terms of the continuum of machine learning to theory-based models?
5. What are viable data-analysis strategies and approaches for diagnostic, predictive, and prescriptive modeling of large- scale unstructured data?
6. How can deep learning and cognitive computing techniques be extended for analyzing and interpreting unstructured marketing data? How can creative elements of the marketing mix be incorporated in predictive and prescriptive techniques?
1. How can granular online, mobile data be aligned with more aggregate ofﬂine data to shed light on the path to purchase and facilitate behavioral targeting? How can metadata of contexts and unstructured data on creatives be incorporated in the analysis of path-to-purchase data?
2. How can ROI modeling more accurately identify and quantify the simultaneous ﬁnancial impact of online and ofﬂine marketing activities?
3. What new techniques and methods can accurately measure the synergy, carryover, and spillover across media and devices using integrated path-to-purchase data?
4. How can attribution across media, channels, and devices account for strategic behavior of consumers and endogeneity in targeting?
5. How can planning cycles for different marketing instruments be incorporated in marketing-mix optimization models?
1. What content should be personalized, at which level of granularity, and at what frequency? How can content be tailored to individual consumers using individual-level insights and automated campaign management?
2. How can ﬁrms derive individual-level insights from big data, using faster and less computationally expensive techniques to give readings of customers’ intentions in real time?
3. How can ﬁrms personalize the mix of touch points (across channels, devices, and points in the purchase funnel) for customers in closed-loop cycles so that their experience is consistently excellent?
4. What role can cognitive systems, general artiﬁcial intelligence, and automated attention analysis systems play in delivering personalized customer experiences?
Security and Privacy
1. What techniques can be used to reduce the backlash to intrusion, as more personalization increases the chances that it may backﬁre?
2. What new methodologies need to be developed to give customers more control in personalizing their own experiences and to enhance the efﬁcacy of data-minimization techniques?
3. How can data, software, and modeling solutions be developed to enhance data security and privacy while maximizing personalized marketing opportunities?
Implementing Big Data and Analytics in Organizations
Organizations use analytics in their decision making in all functional areas—not only marketing and sales, but also supply chain management, ﬁnance, and human resources. This is exempliﬁed by Wal-Mart, a pioneer in the use of big data analytics for operations, which relies heavily on analytics in human resources and marketing. Many companies aspire to integrate data-driven decisions across different functional areas. While managing big data analytics involves technology, organizational structure, and skilled and trained analysts, the primary precondition for its successful implementation in companies is a culture of evidence-based decision making. This culture is best summed up with a quote widely attributed to W. Edwards Deming: “In God we trust; all others must bring data.” In such a culture, company executives acknowledge the need to organize big data analytics and give data/analytics managers responsibility and authority to utilize resources to store and maintain databases; develop and/or acquire software; and build and deploy descriptive, predictive, and normative models (Grossman and Siegel 2014). In those successful companies, big data analytics champions are typically found in the boardroom (e.g., chief ﬁnancial ofﬁcer, chief marketing ofﬁcer), and analytics are used to drive business innovation rather than merely improve operations (Hagen and Khan 2014).
To summarize, organizations that aim to extract value from big data analytics should have
(1) a culture and leaders that recognize the importance of data, analytics, and data- driven decision making;
(2) a governance structure that prevents silos and facilitates integrating data and analytics into the organization’s overall strategy and processes in such a way that value is generated for the company; and
(3) a critical mass of marketing analysts that collectively have sufﬁciently deep expertise in analytics as well as substantive marketing knowledge. Almost every company currently faces the challenge of hiring the right talent to accomplish this. An ample supply of marketing analysts with a cross- functional skill set; proﬁciency in technology, data science, and analytics; and up-to-date domain expertise is urgently needed, as are people with management skills and knowledge of business strategy to put together and lead those teams.
About us :
Pexitics stands for People Excellence Indicator analytics We help organizations meet KPIs in the space of People management (Create talent benchmarks for Hiring, Learning Path creation, Promotions for better Talent management, increase Revenue per employee, Reduce Absenteeism, create Mental wellness programs, etc.) We do this with a combination of Assessments and Surveys followed by analytics, giving clear insights and roadmaps for organizations and teams. (Please view sample reports at https://pexiscore.com/assessment)
• PexiScore Assessments and Reports – for higher Quality of hiring and for Learning Needs Identification. Our reports work at a 95% accuracy.
• PexiScore Surveys – These are Employee surveys around 360-degree feedback, Happiness at Work, Engagement and Values Surveys. Each Survey is backed by certification and detailed analytics with segmentation across organizational and employee demographics.
What gives us an advantage is our capability to merge Qualitative feedback with Quantitative scores from the Assessments to give a holistic output which helps organizations make best People decisions for growth and profitability.
What we deliver includes :
o Hiring Benchmarks (Performer Competency Scores) across Jobs in the organization – Hire / Internally move people who meet these scores
o Learning Gap (difference between Performer Competency Score vs Employee) will give the personalized Learning Needs – Provide Learning and Coaching Intervention based on this Gap – which will identify Urgent and required interventions.
o Compare Potential (for next level of jobs) Competency Scores of a pool of peers – Use the best Potential Competency Scores to decide on Promotions and Succession Planning
o The percentage of Fitment to the Job Role and therefore, higher productivity potential – Use this to decide the compensation at the point of Hiring as well when working on Pay Parity decisions (post hiring)
Pexiscore (Pexiscore.com) is our product website which enables all our Assessments and Surveys are online with real time results and dashboard support. All assessments are available in English and Hindi, with the option to convert into any other Indian language. The following features are free on the PexiScore system – It is cloud based, mobile friendly, dashboard driven decision system.