Economist Year in Review: Part 2
I like how The Economist focuses on Geo Political issues and I want to capture that in my analysis. I therefore labelled each article by the countries/demonyns included in the text (If either “United States” or “American” was included in text then it would label that particular article “United States”. Of course, an article can be labelled many countries).
I then removed all instances of Country Names from the text. I did so because I wanted to use LDA again, and having country names in the text would create a dependancy I don't want.
So, for this part of analysis I have two matrices. One involves LDA matrix of 3440 articles by 100 topics that describes each article as a percentage of 100 topics (with the country names and demonyms removed from the text). The second matrix is 3440 by 193 which represents each article and which countries are mentioned in the text of the article.
The question I want to address is; what are the main groups of international affairs? For example; we expect Syria, Iran, and US to be in a cluster and China, Japan, and US to a cluster as well.
To do this, I used K-means clustering on the articles by countries matrix to group articles based on the countries included in the text. Then to understand what these groups discuss, I averaged the topics of each classified group.
Lets just see how many times each country is mentioned at least once in an article
load("/Users/sweiss/Google Drive/countrymatrix.rdata")
country.sums = colSums(country.mat)
barplot(sort(country.sums, decreasing = TRUE)[1:50], las = 3, cex.names = 0.7,
ylab = "Number of Times a country name or Demonym occured at least once in an article",
main = "Number of times a Country was Identified in an article (Economist 2013)")
USA is number 1 with UK in a distance 2nd. Obviously there's some UK bias because this is an English newspaper.
Below are the clusters of articles based on the countries named and Topics most associated with those clusters. I chose 10 clusters using the elbow graph method (not shown).
library(topicmodels)
load("/users/sweiss/google drive/economistnocountryld100.Rda")
theta = posterior(ld.100)$topics
theta.average = colMeans(theta)
top.5.factors = names(sort(theta.average, decreasing = TRUE))
load("/Users/sweiss/Google Drive/countrymatrix.rdata")
kmeans.10 <- kmeans(x = country.mat, centers = 10)
cluster.10 = kmeans.10$cluster
for (i in 1:10) {
theta.average.cluster.1 = colMeans(theta[which(cluster.10 == i), ])
top.5.factors.cluster.1 = names(sort(theta.average.cluster.1, decreasing = TRUE))
print(paste("Cluster", i))
print("Top 10 Countries in Cluster")
print(sort(colSums(country.mat[which(cluster.10 == i), ]), decreasing = TRUE)[1:10])
print("Average Number of Countries in Article")
print(mean(rowSums(country.mat[which(cluster.10 == i), ])))
print("Number of Articles in Cluster")
print(sum(rowSums(country.mat[which(cluster.10 == i), ])))
print(terms(ld.100, 10)[, as.numeric(top.5.factors.cluster.1)[1:10]])
}
## [1] "Cluster 1"
## [1] "Top 10 Countries in Cluster"
## United Kingdom France Japan Russia Italy
## 254 73 60 46 44
## Spain Israel Turkey Netherlands Canada
## 34 33 33 32 31
## [1] "Average Number of Countries in Article"
## [1] 1.224
## [1] "Number of Articles in Cluster"
## [1] 1567
## Topic 33 Topic 19 Topic 97 Topic 79 Topic 66
## [1,] "rate" "elect" "cell" "labour" "minist"
## [2,] "price" "parti" "research" "cameron" "govern"
## [3,] "economist" "vote" "scienc" "tori" "prime"
## [4,] "interest" "voter" "work" "parti" "polit"
## [5,] "index" "poll" "human" "britain" "leader"
## [6,] "market" "polit" "brain" "conserv" "parliament"
## [7,] "trade" "campaign" "one" "polit" "parti"
## [8,] "commod" "seat" "cancer" "miliband" "opposit"
## [9,] "job" "win" "might" "mps" "elect"
## [10,] "balanc" "candid" "use" "david" "berlusconi"
## Topic 40 Topic 16 Topic 26 Topic 58 Topic 53
## [1,] "presid" "polic" "protest" "test" "local"
## [2,] "polit" "crime" "govern" "time" "town"
## [3,] "elect" "prison" "street" "ask" "park"
## [4,] "power" "crimin" "erdogan" "peopl" "build"
## [5,] "leader" "say" "polic" "experi" "council"
## [6,] "presidenti" "sentenc" "support" "think" "centr"
## [7,] "constitut" "drug" "call" "word" "place"
## [8,] "countri" "murder" "demonstr" "suggest" "peopl"
## [9,] "year" "peopl" "polit" "person" "plan"
## [10,] "run" "jail" "opposit" "relat" "new"
## [1] "Cluster 2"
## [1] "Top 10 Countries in Cluster"
## France Germany Italy Spain United Kingdom
## 128 118 116 103 58
## United States Greece Netherlands Ireland Russia
## 55 53 52 38 32
## [1] "Average Number of Countries in Article"
## [1] 8.138
## [1] "Number of Articles in Cluster"
## [1] 1237
## Topic 9 Topic 28 Topic 88 Topic 62 Topic 83 Topic 14
## [1,] "euro" "european" "bank" "bond" "price" "economi"
## [2,] "zone" "europ" "financi" "debt" "market" "growth"
## [3,] "european" "union" "loan" "rate" "cost" "econom"
## [4,] "countri" "commiss" "lend" "yield" "rise" "gdp"
## [5,] "bank" "countri" "capit" "investor" "year" "invest"
## [6,] "crisi" "nation" "deposit" "govern" "demand" "export"
## [7,] "bail" "want" "credit" "interest" "low" "account"
## [8,] "imf" "brussel" "crisi" "market" "increas" "product"
## [9,] "market" "treati" "borrow" "borrow" "like" "year"
## [10,] "economi" "like" "asset" "default" "high" "busi"
## Topic 29 Topic 17 Topic 91 Topic 52
## [1,] "worker" "firm" "mrs" "left"
## [2,] "job" "market" "merkel" "holland"
## [3,] "work" "industri" "left" "right"
## [4,] "labour" "product" "parti" "presid"
## [5,] "employ" "big" "coalit" "polit"
## [6,] "wage" "new" "green" "socialist"
## [7,] "unemploy" "profit" "centr" "now"
## [8,] "skill" "busi" "govern" "put"
## [9,] "pay" "sale" "democrat" "yet"
## [10,] "factori" "compani" "social" "fran"
## [1] "Cluster 3"
## [1] "Top 10 Countries in Cluster"
## United States United Kingdom Canada France Russia
## 686 115 49 46 37
## Japan Afghanistan Australia Spain Georgia
## 34 27 24 23 21
## [1] "Average Number of Countries in Article"
## [1] 2.257
## [1] "Number of Articles in Cluster"
## [1] 1548
## Topic 87 Topic 72 Topic 19 Topic 90 Topic 63 Topic 47
## [1,] "republican" "court" "elect" "insur" "fund" "state"
## [2,] "obama" "law" "parti" "health" "investor" "feder"
## [3,] "democrat" "case" "vote" "will" "return" "california"
## [4,] "senat" "legal" "voter" "plan" "invest" "governor"
## [5,] "polit" "judg" "poll" "care" "asset" "year"
## [6,] "congress" "rule" "polit" "feder" "equiti" "say"
## [7,] "hous" "right" "campaign" "obamacar" "share" "one"
## [8,] "bill" "lawyer" "seat" "mani" "manag" "texa"
## [9,] "america" "suprem" "win" "exchang" "profit" "back"
## [10,] "barack" "justic" "candid" "state" "money" "also"
## Topic 97 Topic 64 Topic 62 Topic 88
## [1,] "cell" "compani" "bond" "bank"
## [2,] "research" "firm" "debt" "financi"
## [3,] "scienc" "busi" "rate" "loan"
## [4,] "work" "deal" "yield" "lend"
## [5,] "human" "share" "investor" "capit"
## [6,] "brain" "billion" "govern" "deposit"
## [7,] "one" "sharehold" "interest" "credit"
## [8,] "cancer" "buy" "market" "crisi"
## [9,] "might" "stake" "borrow" "borrow"
## [10,] "use" "privat" "default" "asset"
## [1] "Cluster 4"
## [1] "Top 10 Countries in Cluster"
## Germany United States United Kingdom France China
## 278 112 94 70 44
## Netherlands Switzerland Russia Japan Greece
## 32 29 27 26 21
## [1] "Average Number of Countries in Article"
## [1] 4.324
## [1] "Number of Articles in Cluster"
## [1] 1202
## Topic 91 Topic 28 Topic 9 Topic 88 Topic 17 Topic 64
## [1,] "mrs" "european" "euro" "bank" "firm" "compani"
## [2,] "merkel" "europ" "zone" "financi" "market" "firm"
## [3,] "left" "union" "european" "loan" "industri" "busi"
## [4,] "parti" "commiss" "countri" "lend" "product" "deal"
## [5,] "coalit" "countri" "bank" "capit" "big" "share"
## [6,] "green" "nation" "crisi" "deposit" "new" "billion"
## [7,] "centr" "want" "bail" "credit" "profit" "sharehold"
## [8,] "govern" "brussel" "imf" "crisi" "busi" "buy"
## [9,] "democrat" "treati" "market" "borrow" "sale" "stake"
## [10,] "social" "like" "economi" "asset" "compani" "privat"
## Topic 76 Topic 66 Topic 29 Topic 15
## [1,] "billion" "minist" "worker" "investig"
## [2,] "year" "govern" "job" "claim"
## [3,] "will" "prime" "work" "case"
## [4,] "cost" "polit" "labour" "alleg"
## [5,] "also" "leader" "employ" "charg"
## [6,] "worth" "parliament" "wage" "report"
## [7,] "total" "parti" "unemploy" "trial"
## [8,] "last" "opposit" "skill" "said"
## [9,] "make" "elect" "pay" "former"
## [10,] "estim" "berlusconi" "factori" "accus"
## [1] "Cluster 5"
## [1] "Top 10 Countries in Cluster"
## Syria United States Iraq Iran Israel
## 121 88 80 73 66
## Turkey Lebanon Russia Egypt Saudi Arabia
## 57 42 40 38 37
## [1] "Average Number of Countries in Article"
## [1] 7.39
## [1] "Number of Articles in Cluster"
## [1] 1005
## Topic 46 Topic 48 Topic 81 Topic 92 Topic 21
## [1,] "rebel" "america" "muslim" "forc" "attack"
## [2,] "assad" "obama" "ian" "armi" "kill"
## [3,] "regim" "presid" "islam" "defenc" "war"
## [4,] "war" "nuclear" "islamist" "militari" "bomb"
## [5,] "govern" "polici" "brother" "arm" "group"
## [6,] "forc" "relat" "now" "secur" "terrorist"
## [7,] "north" "intern" "morsi" "general" "dead"
## [8,] "arm" "weapon" "brotherhood" "war" "violenc"
## [9,] "group" "washington" "back" "soldier" "terror"
## [10,] "western" "foreign" "includ" "troop" "drone"
## Topic 89 Topic 26 Topic 66 Topic 37 Topic 7
## [1,] "white" "protest" "minist" "deal" "polit"
## [2,] "black" "govern" "govern" "trade" "putin"
## [3,] "palestinian" "street" "prime" "talk" "anti"
## [4,] "king" "erdogan" "polit" "negoti" "now"
## [5,] "relat" "polic" "leader" "agreement" "also"
## [6,] "arab" "support" "parliament" "agre" "may"
## [7,] "say" "call" "parti" "side" "kremlin"
## [8,] "west" "demonstr" "opposit" "free" "soviet"
## [9,] "state" "polit" "elect" "sign" "western"
## [10,] "two" "opposit" "berlusconi" "two" "power"
## [1] "Cluster 6"
## [1] "Top 10 Countries in Cluster"
## Niger Nigeria United States France United Kingdom
## 55 52 20 14 13
## Mali South Africa China Ghana Cameroon
## 12 12 10 9 7
## [1] "Average Number of Countries in Article"
## [1] 7.182
## [1] "Number of Articles in Cluster"
## [1] 395
## Topic 60 Topic 46 Topic 21 Topic 22 Topic 40 Topic 5
## [1,] "africa" "rebel" "attack" "money" "presid" "food"
## [2,] "ship" "assad" "kill" "pay" "polit" "farm"
## [3,] "african" "regim" "war" "servic" "elect" "farmer"
## [4,] "port" "war" "bomb" "save" "power" "product"
## [5,] "region" "govern" "group" "charg" "leader" "say"
## [6,] "contain" "forc" "terrorist" "cost" "presidenti" "produc"
## [7,] "intern" "north" "dead" "card" "constitut" "agricultur"
## [8,] "world" "arm" "violenc" "fee" "countri" "meat"
## [9,] "dubai" "group" "terror" "payment" "year" "rice"
## [10,] "countri" "western" "drone" "account" "run" "year"
## Topic 92 Topic 88 Topic 49 Topic 81
## [1,] "forc" "bank" "immigr" "muslim"
## [2,] "armi" "financi" "border" "ian"
## [3,] "defenc" "loan" "migrant" "islam"
## [4,] "militari" "lend" "mani" "islamist"
## [5,] "arm" "capit" "peopl" "brother"
## [6,] "secur" "deposit" "say" "now"
## [7,] "general" "credit" "illeg" "morsi"
## [8,] "war" "crisi" "year" "brotherhood"
## [9,] "soldier" "borrow" "countri" "back"
## [10,] "troop" "asset" "issu" "includ"
## [1] "Cluster 7"
## [1] "Top 10 Countries in Cluster"
## China United States Japan United Kingdom Russia
## 409 187 91 60 40
## France Australia Canada Vietnam Taiwan
## 39 29 24 24 23
## [1] "Average Number of Countries in Article"
## [1] 3.399
## [1] "Number of Articles in Cluster"
## [1] 1390
## Topic 85 Topic 24 Topic 17 Topic 95 Topic 4 Topic 14
## [1,] "offici" "south" "firm" "project" "parti" "economi"
## [2,] "beij" "north" "market" "mine" "polit" "growth"
## [3,] "said" "korea" "industri" "water" "power" "econom"
## [4,] "recent" "asia" "product" "govern" "leader" "gdp"
## [5,] "report" "island" "big" "build" "nation" "invest"
## [6,] "one" "region" "new" "river" "politician" "export"
## [7,] "govern" "relat" "profit" "say" "member" "account"
## [8,] "ministri" "east" "busi" "construct" "congress" "product"
## [9,] "public" "sea" "sale" "one" "support" "year"
## [10,] "communist" "also" "compani" "plan" "constitut" "busi"
## Topic 86 Topic 88 Topic 48 Topic 74
## [1,] "open" "bank" "america" "foreign"
## [2,] "also" "financi" "obama" "govern"
## [3,] "will" "loan" "presid" "countri"
## [4,] "mani" "lend" "nuclear" "local"
## [5,] "hong" "capit" "polici" "make"
## [6,] "can" "deposit" "relat" "control"
## [7,] "anoth" "credit" "intern" "abroad"
## [8,] "kong" "crisi" "weapon" "intern"
## [9,] "argu" "borrow" "washington" "last"
## [10,] "one" "asset" "foreign" "may"
## [1] "Cluster 8"
## [1] "Top 10 Countries in Cluster"
## United States Brazil Mexico Spain Chile
## 124 80 78 23 21
## China Venezuela Argentina United Kingdom Colombia
## 18 16 15 15 14
## [1] "Average Number of Countries in Article"
## [1] 5.233
## [1] "Number of Articles in Cluster"
## [1] 675
## Topic 11 Topic 40 Topic 17 Topic 16 Topic 87 Topic 82
## [1,] "countri" "presid" "firm" "polic" "republican" "reform"
## [2,] "world" "polit" "market" "crime" "obama" "govern"
## [3,] "global" "elect" "industri" "prison" "democrat" "will"
## [4,] "america" "power" "product" "crimin" "senat" "polici"
## [5,] "develop" "leader" "big" "say" "polit" "chang"
## [6,] "rich" "presidenti" "new" "sentenc" "congress" "system"
## [7,] "emerg" "constitut" "profit" "drug" "hous" "plan"
## [8,] "intern" "countri" "busi" "murder" "bill" "polit"
## [9,] "latin" "year" "sale" "peopl" "america" "public"
## [10,] "accord" "run" "compani" "jail" "barack" "need"
## Topic 14 Topic 37 Topic 75 Topic 47
## [1,] "economi" "deal" "number" "state"
## [2,] "growth" "trade" "america" "feder"
## [3,] "econom" "talk" "sinc" "california"
## [4,] "gdp" "negoti" "time" "governor"
## [5,] "invest" "agreement" "less" "year"
## [6,] "export" "agre" "rate" "say"
## [7,] "account" "side" "year" "one"
## [8,] "product" "free" "declin" "texa"
## [9,] "year" "sign" "rise" "back"
## [10,] "busi" "two" "increas" "also"
## [1] "Cluster 9"
## [1] "Top 10 Countries in Cluster"
## India China United States United Kingdom Japan
## 292 123 114 78 51
## Pakistan Russia Brazil Australia Indonesia
## 43 39 33 32 31
## [1] "Average Number of Countries in Article"
## [1] 5.014
## [1] "Number of Articles in Cluster"
## [1] 1464
## Topic 4 Topic 19 Topic 2 Topic 17 Topic 24 Topic 66
## [1,] "parti" "elect" "govern" "firm" "south" "minist"
## [2,] "polit" "parti" "nation" "market" "north" "govern"
## [3,] "power" "vote" "peopl" "industri" "korea" "prime"
## [4,] "leader" "voter" "ethnic" "product" "asia" "polit"
## [5,] "nation" "poll" "local" "big" "island" "leader"
## [6,] "politician" "polit" "villag" "new" "region" "parliament"
## [7,] "member" "campaign" "one" "profit" "relat" "parti"
## [8,] "congress" "seat" "just" "busi" "east" "opposit"
## [9,] "support" "win" "group" "sale" "sea" "elect"
## [10,] "constitut" "candid" "countri" "compani" "also" "berlusconi"
## Topic 21 Topic 95 Topic 11 Topic 39
## [1,] "attack" "project" "countri" "one"
## [2,] "kill" "mine" "world" "world"
## [3,] "war" "water" "global" "argu"
## [4,] "bomb" "govern" "america" "blog"
## [5,] "group" "build" "develop" "histori"
## [6,] "terrorist" "river" "rich" "long"
## [7,] "dead" "say" "emerg" "great"
## [8,] "violenc" "construct" "intern" "view"
## [9,] "terror" "one" "latin" "point"
## [10,] "drone" "plan" "accord" "much"
## [1] "Cluster 10"
## [1] "Top 10 Countries in Cluster"
## Tanzania Kenya Rwanda Uganda South Africa
## 17 16 15 15 14
## China Niger India Nigeria Angola
## 13 12 11 11 10
## [1] "Average Number of Countries in Article"
## [1] 11.87
## [1] "Number of Articles in Cluster"
## [1] 273
## Topic 60 Topic 49 Topic 46 Topic 35 Topic 13 Topic 2 Topic 92
## [1,] "africa" "immigr" "rebel" "peopl" "store" "govern" "forc"
## [2,] "ship" "border" "assad" "mani" "retail" "nation" "armi"
## [3,] "african" "migrant" "regim" "fire" "shop" "peopl" "defenc"
## [4,] "port" "mani" "war" "now" "sale" "ethnic" "militari"
## [5,] "region" "peopl" "govern" "caus" "chain" "local" "arm"
## [6,] "contain" "say" "forc" "need" "sell" "villag" "secur"
## [7,] "intern" "illeg" "north" "damag" "buy" "one" "general"
## [8,] "world" "year" "arm" "also" "custom" "just" "war"
## [9,] "dubai" "countri" "group" "help" "can" "group" "soldier"
## [10,] "countri" "issu" "western" "miss" "good" "countri" "troop"
## Topic 14 Topic 95 Topic 63
## [1,] "economi" "project" "fund"
## [2,] "growth" "mine" "investor"
## [3,] "econom" "water" "return"
## [4,] "gdp" "govern" "invest"
## [5,] "invest" "build" "asset"
## [6,] "export" "river" "equiti"
## [7,] "account" "say" "share"
## [8,] "product" "construct" "manag"
## [9,] "year" "one" "profit"
## [10,] "busi" "plan" "money"
The easy clusters to understand are numbers 1 (Super Powers), 4 (Euro Zone), 5 (Mideast Conflict), 7 (Asia), and 10 (South America). Cluster 3 seems to be about Regional Politics around India. Clusters 2 and 6 have low average number of countries per article so they are intranational articles.
No comments:
Post a Comment