The call for accountability in marketing has been growing over recent years to justify total investments in the area. Furthermore, to allocate budgets across different marketing channels, managers have been trying to better understand their contribution to sales. To model the impact of marketing on sales, time-series approaches have been proven to be successful. However, Allergan’s experience and academic research shows that marketing attribution models are challenging as the complex relationships between variables need to be identified and model structures need to be defined. […] This is a very cumbersome, manual process, which takes weeks or even months to accurately define even just one marketing attribution model for one brand, given that companies usually have a broad product/brand portfolio or different customer segments to analyze. But there is good news! New AI algorithms and techniques cover those weaknesses, such as Gradient Boosted Machines (GBMs) and Shapley values that make marketing mix modelling approaches more accurate and scalable for a higher number of stores, brands, products, or customer segments.

Allergan Case — the promising journey of AI-driven marketing analytics:

The Allergan team traditionally utilized GLM models for estimating the impact of promotion campaigns and to optimize the marketing mix of dozens of products across a variety of marketing channels. We started our AI-marketing journey two years ago together with H2O.ai to improve our marketing analytics. One of our recent challenges was to apply new machine learning algorithms to estimate the impact of a radio campaign for one of our products. For this project, both GLMs and H2O.ai’s Driverless AI were utilized and compared. […] We were very excited to see how new machine learning algorithms perform as we all know how much effort it is to define traditional linear marketing mix models and that they had a lot of problems with complex real-world cases.

Read more about all challenges that can be overcome and the Allergan Case in the free white paper

Traditional Linear Marketing Mix Models vs. new Algorithms

General Linear Models (GLMs) have been around since 1972 and have been used extensively in the area of marketing mix modelling as they allow for an additive approach (sales = baseline + TV spend + Radio spend + other channels). This, in combination with clear model specifications, provides analysts with the opportunity to identify incremental contributions of marketing channels to sales. However, new algorithms outperform linear models not just in accuracy but also in explainability by accounting for non-linearity between a predictor and dependent variable and interactions between predictor variables. Tree-based models in particular score high on all those characteristics mentioned above, and among them Gradient Boosted Machines (GBMs). Gradient boosting is a machine learning technique, typically based on decision trees, that uses an ensemble of weak prediction models. Those models usually show a higher accuracy in out of sample predictions than GLM and are more robust than single decision trees. This difference can lead to a significant business impact compared to linear Marketing Mix Models. The attribution of sales to marketing channels in GBMs can be done through a methodology called Shapley values.

Allergan Case — the holy grail of marketing attribution analytics:

The concept of Shapley values combined with new machine learning models is intriguing — it promises the holy grail of marketing analytics. It combines highly accurate out-of-the box machine learning models with high explainability leading to accurate marketing mix attribution models. Historically, we used linear models and knew their weaknesses from an accuracy perspective, but they are easily explainable and therefore provided a great value in marketing attribution analytics. Hence, we usually accepted all of their weaknesses in accuracy and adjusted the models in multiple iterations to ensure that model outputs align with our business understanding.

Read more about the benefits of Shapley Values and GBMs compared to the traditional GLM approach in the free white paper

Overcoming the weaknesses of Linear Marketing Mix Models

Traditional linear time-series models have known weaknesses we are attempting to overcome with GBMs and Shapley values. First, the short-term impact of media on sales over several weeks can be low and is therefore very hard to estimate with linear models if the error of the model is higher than their impact.

Further challenges include the problem of group aggregations and the time to calibrate a model correctly […],non-linearity in market response functions (see Figure 1) […], variable interactions […], and asymmetries of market responses over time […].

Figure 1: Exemplary Media Response Curve Scenarios

Allergan Case — the comparison between both approaches:

We wanted to compare the traditional linear marketing mix with the new machine learning based approach to experience the differences firsthand. We started off with our traditional linear marketing mix approach. Over several weeks we developed a fixed effect linear model in SAS and ran multiple iterations to ensure that model output aligns with our business understanding e.g. coefficients of promotions are expected to be positive and specific consumer control variables are expected to be significant. […] We applied H2O.ai’s Driverless AI to run a series of tree-based models (including GBMs) and extracted Shapley values via its Python client to calculate the contribution for each channel. We were able to do that within hours, not weeks.

Read more about how to overcome the weaknesses of GLMs and Allergan’s outcome of the comparison of the approaches in the free white paper

Once an impact of a media channel has been identified via a linear model, it is usually assumed that this holds for the entire time sample, unless there are clearly identifiable structural breaks. However, even when there are no clear structural breaks, we know that market responsiveness may not be the same over time. For example, advertising effectiveness might decline over the life cycle of a product, seasonal differences of a media channel’s impact, the sales shock of an established brand after the introduction of a competing product or the own-price effects on media effectiveness (see Figure 2 below). […]

Figure 2: Media Channel ROI over time and Average Media Channel ROI

Optimal Budget Allocation

The final goal of marketing mix modelling is to allocate budgets across channels, products, customer segments and brands. A common approach to allocate marketing budgets among marketing practitioners has usually been upon the combination of insights from more or less static Linear Marketing Models and market statistics of media spending’s of competitors to increase the share of voice for a company’s media campaigns. Unfortunately, this will not lead to the expected long-term effects and will only show a short-term impact on sales. […]

Read more about how apply the GBM marketing mix model for optimal budget allocation in the free white paper

Summary

Budget Allocation has always been acomplicated topic in the field of Marketing as it is a complex optimization problem taking into account not just a variety of marketing effects (product, price, place, promotion), context effects (such as trend, seasonality), market actions (competitor actions, new product launches, competitive price changes) and groups that might show different market response patterns (different brands, regions, customer segments). We hope that we could make a good case for the move to AI/ML by implementing GBMs and Shapley values in your marketing analytics to provide the unique opportunity to build more holistic marketing mix models. Your final models provide a production ready environment for a marketing application that can help marketers better understand budget allocation and help them to improve the impact of marketing spend.

Allergan Case — the comparison between both approaches:

Marketing mix/attribution analytics has always been a highly complicated topic and brought value only with a high effort in data preparation, testing of single variable impacts and variable interactions, functional forms for important response curves, defining ad-stock effects and applying business knowledge to get to logical results. With all that effort, we could get to comparable results the new Machine Learning approach was providing in a shorter amount of time. […]

This article was written with Akhil Sood, Associate Director @ Marketing Sciences, Allergan; Vijay Raghavan, Associate Vice President @ Marketing Sciences, Allergan and got published as white paper on the website from H2O.ai.

Read more about the outcome of Allergan’s pilot project in the free white paper