Data-Driven Attribution 101

Karl Villanueva May 09 2018 7 PM | 5 min read

Marketing attribution has proven to be just as elusive in 2018. CMOs and performance marketers continue struggling to answer “Where should I invest my marketing budget?”. Yet attribution remains a confounding topic even for industry practitioners. With Google planning to deprecate last-click conversions, the question now moves to “What attribution model is right for me?

For those not familiar with all the models, they are discussed below:

  • Multi-Touch: A broad way of saying that the model covers all touch points in a funnel
  • Data-Driven: Default terminology used within Google Analytics, the model uses Shapley values (see more below)
  • Last-Click: The most common model, assigns full value to the last converting channel
  • First-Click: The opposite of Last-Click, assigns full value to the first channel
  • Linear / Even Credit: Splits the attribution equally among all touch points
  • Position-Based / “bathtub” / “U-shaped” / “smile”: Assigns a 40%-20%-40% split among the first-middle-last channels

I took a look on the Google Keyword Planner and created a keyword index group for each model to see how the trends are evolving across different attribution types by search volume since 2014.

Attribution Time

  • Multi-Touch has always been the most popular model but also the fastest growing concern on attribution
  • The Data-Driven model has spiked since the start of 2017 as it was added as an option in the Google Analytics conversion tab
  • Last-Click has been declining since May 2017, nicely coinciding with the release of Google Attribution 360 at their Google Marketing Next 2017 event
  • All other attribution types remain fairly narrowly searched, with Position-Based being the most popular among the others

By using the keyword explorer tool within AdWords, we can also identify what issues marketers are facing. We can see that the highest volume ones are broad informative searches like “what is attribution” or “marketing mix modeling”.

Attribution Search Volume

What Is Data-Driven Attribution

There are many attribution models that incorporate all the various touch points on the  path to conversion. For our purpose, we will take the one used by Google for the Analytics 360 and Attribution 360 platforms given that it uses Shapley Values as its methodology.
(Disclaimer: I worked on the Attribution 360 alpha in my previous role)

Shapley Values

Shapley values is a cooperative game theory concept that attributes incremental values to each participant. That alone does not tell us much, so let us use a typical digital marketing example.

Let’s say we have 3 online marketing channels.

Three Channels: Social, Paid Search, Email

Next, let’s say each of them have the following conversion rates by themselves.


Then we have the following combination (the cooperative part) conversion rates.


First you may think, “Wow, I want such high conversion rates too!”. We are just using round numbers for the example, but the logic is the same.

After that, we need to calculate the incremental value of each channel. So let us assume that the click path permutation is Social -> Email -> Search. The total should still be 20%, but the order matters.


Since Social + Email = 15%, we got the 12% for Email by subtracting 15% (Social and Email together) to the value of only Social which is 3%. We then got 5% for Search by subtracting 15% (Social and Email together) to 20% (value of all 3 channels).

The six possible permutations will result in:


The only thing left is to get the average for each channel, and we would have arrived at our data-driven attribution values.


Total Conversion Rate: 19.98%

*Sum slightly off due to rounding

So in a click path including all channels, which in our example resulted in a 20% conversion rate, we can say that Search accounted for 5.16% of that conversion rate and so on. This is a much more fair and accurate representation of each channel’s value rather than just splitting it evenly or worse, assigning everything to just the last click.

Do note this is a vastly simplified example. A quick visit to the “Top Conversion Paths” tab on Google Analytics will also quickly show why this approach is not feasible to be calculated manually. An average advertiser could easily have thousands of such touch points.

Why It Helps to Make Lookback Windows as Long As Possible

Lookback windows are a commonly used limits within marketing attribution. Usually it is derived from a generalization of the path to conversion.

“As a furniture retailer, normally our visitors convert after 30 days from first hearing about us.”

This approach essentially ignores any marketing efforts made beyond 1 month, and thus considers them  “worthless”. However, a Google study found that a car purchaser could have 900+ digital touch points over 3 months. If our attribution limited our marketing to 30 days, our team would be investing in purely the later stage channels, whereas maybe the first ad the customer heard of was the most memorable. Think Super Bowl ads as an extreme example and think of  a company you first heard of via such a notable channel.


Why It May Help with Incrementality Testing

There are many incrementality tests that aim to provide an accurate result of the marketing spend and efforts. Usually it always tries to answer the question:

“How many people have bought your product due to seeing an ad, which ad, and by how much?”

The typical method involves splitting your target audience in half, and suppressing delivery on a target group. So a target audience of 100 people will be split into a control (60) and an experiment (40). Some in the control side have their ad suppressed in order for ad uplift to be recorded compared to the experiment side and that’s how the incremental lift is calculated.

The main downside of lift testing is suppressing ad delivery in order to get results. Shapley values are constantly calculating the appropriate value as shown in our calculations earlier. As it is a cooperative game solution, it allocates the distribution of credit among multiple channels.

Limitations and Next Steps

While Shapley Values via data-driven attribution provide a much better attribution model than the rigid rule-based models from the past, there are still a lot of issues to consider, for example:

  • Offline-Online Tracking (O2O) & General Data Availability
  • View-Tracking and Viewability Rates
  • Implementation of Click Tags and View Tags

In the next post, we’ll discuss how attribution affects your budget, teamwork and even compensation.


   Retail Re-UP: Adjusting to the Next Normal of Social Advertising CHECK IT NOW
Karl Villanueva

Read Next