Sunday, October 13, 2019

Did Russia Use Manafort's Polling Data in 2016 Election?


Introduction:

On August 2, 2016 then Trump campaign manager, Paul Manafort, gave polling data to Konstantin Kalimnik a Russian widely assumed to be a spy. Before then Manafort ordered his protege, Rick Gates, to share polling data with Kilmnik. Gates periodically did so starting April or May. The Mueller Report stated it did not know why Manafort was insistent on giving this information or whether the Russian's used it to further Trump's cause (p. 130 see here for my summary of Mueller Report V1).

One theory says that Manafort wanted to show the good work he was doing to Kilimnik's boss, a Russian Oligarch named Deripiskia, whom Manafort owed money to. A more sinister hypothesis is that Manafort knew that the information would be valuable in the hands of Russian's trying to interfere with the election.

This post will analyze whether the Russians used the polling data irrespective of Manafort's intent. I looked at Russian Facebook Ads uncovered by House Intelligence Committee and tried to identify any changes in messaging after August 2nd. I conclude with a guess on what the polling data was shared.

Russian Facebook Data:

The House Intelligence Committee released  thousands of Russian Advertisements by the Internet Research Agency. There have been several analysis on these advertisements that discuss they're effectiveness and one good one is by Spangher et al. However, I couldn't find any that showed topics of advertisements over time.

I focused the analysis to data in 2016 which includes periods of Manafort coming into the position of campaign manager and the election itself in november. Overall there 1858 facebook Ads captured in this dataset. Below is a time series plot of number of Advertisements per day for 2016.

There are periods of high activity in May / June and in October right before the election.

Change After August 2nd?

Each advertisement has metadata and text associated with it including: date, text, target population, etc. To see if there were any changes through time and in particular august 2nd I tried some topic modeling and text clustering to see if there were any natural changes. I couldn't find any changes or trends using an unsupervised approach.

Instead I built a predictive model with the response being a binary variable; before  / after august 2nd and explanatory variables as text features from each ad (over 1200 words). I then performed variable importance on these words to see which were most predictive. Below I plotted the number of adverts with the important words divided by numer of advertisements for a particular day to get a normalized percentage.




The blue line is when Manafort made contact with Kilimnik initially and the red line is the august 2nd meeting. There does appear to be large increases in the words associated with African American civil rights topics after 8/2. Specifically these words were not in the advertisements texts themselves but were in the ‘people who liked’ description. That is, if you liked ‘Martin Luther King’ on your profile then a particular ad would target you.

Another way to look at this information is to see the proportion of these words used before and after 8/2.




The above plot shows the number of times a word appeared before and after 8/2 and the P(date>8/2) | word). For instance the word 1954, signifying the beginning of civil rights, occured 4 times before and 376 times after 8/2 which means that just under 99% of times it appears happen after that. This suggests there was a change in the IRA advertisements where they focused more on targeting people that were interested African American civil rights issues.

Conclusions / Discussions

I’m guessing that the contents of the polling data would be something related to African Americans and how those that have an interest in civil rights movement are more susceptible to negative ads.

Do I think the evidence presented here is that strong enough to believe the Russians used polling data? Meh, not really. For few reasons:

  • All words found here were used a few times before the 8/2
  • Gates gave information on a continuous basis. If Russians used this data I assume they would incorporate it accordingly and there would not be a discrete change at 8/2
  • I only did this for one date. Perhaps if I did this analysis for an arbitrary dates then I would find other words that were associated with other dates
I’m not saying that they didn’t use the polling data but I don’t think the evidence here is strong enough to say that they did. At a minimum I think that the IRA and Russians adapted Ads to target different populations at different points in time. This shows they are sophisticated and probably learn from previous results. 



Code

10 comments:

  1. Using statistics methods gives only an external result about relationships. These methods do not reveal causal relationships. Therefore, you got a negative result. The absence of correlations and the reality of cause-and-effect relationships in The Datasaurus Dozen (https://www.r-bloggers.com/the-datasaurus-dozen/) are perfectly shown.
    To solve this problem, a meaningful model is needed and only then statistics.

    Yours sincerely
    Vladimir

    ReplyDelete
    Replies
    1. Dear Mr. Vladimir,

      Not really sure what to make of your question..

      Where did I say I had a 'positive result'?

      Also the datasaurus dozen is about how there could be non-linear relationships that go beyond correlation. It has nothing to do with causation.

      Delete
    2. The real nonlinearity of the content model is that Deripaska owns only part of the shares of companies that are only Russian by name. Legally, these are exclusively American campaigns. They define a massive campaign against Trump in favor of the democrats in the federal media of Russia.

      P.S.
      Surprised that my comment was missed. Crimea is under sanctions. A significant part of the US information resources is simply blocked. It is not clear why.

      Yours sincerely

      Vladimir

      Delete
    3. haha

      go home russia. you're drunk

      Delete
    4. Very happy for your answer
      You are primitive peace duke. In Russian, it sounds like a pussy.
      Went to pussy!

      Yours sincerely
      Vladimir

      Delete
    5. Americans are not able to lead real discussions!

      Delete
  2. The whole point of the polling data was to target voters on a geographic basis. Your data appears to be nationwide.

    ReplyDelete
    Replies
    1. Hey Znmeb. Thanks for comment. Perhaps I should have elaborated more.

      Gates did say that the data he passed on to Kilmnik was about battleground states of michigan, wisconsin, etc (p. 140 in mueller report v1).

      However, the Russian Facebook ads presented here did not really focus on these states. For example the number of times Russian Ads targeted Michigan, Wisconsin, and Minnesota are 40, 18, and 30 times out of a total of 1858 ads. This compares with about 352 occurrences of the phrase 'civil rights' and 362 times of Martin Luther King. So geography clearly wasn't a large factor in how Russian's targeted their ads. Especially compared to other factors such as those interested in civil rights.

      Which brings us to your point that the 'whole point' of polling data was about geographic data. Was it? I couldn't find the actual data that was passed so I'm not sure (if you know where it is please send along!). I assume 'a point' of the polling data given to Kilimnik was geographic. But Manafort gave some 75 pages of data. Maybe some of it was about other things. And since the biggest change in Russian adverts after 8/2 was about civil rights maybe some of it was about this topic.


      Thanks,
      Sam


      Delete
  3. This is very sloppy work here. It feels more like you were trying to showcase your prowess and less like teaching.

    ReplyDelete