Saturday, January 18, 2014

Economist Year in Review: Part 1

Intro: A new a year means a new Year in Review to digest and synthesize the events. Here's my attempt creating a Data oriented Year in Review of The Economist Newspaper using all the articles from the past year.

Getting Data: I used Python to 'scrape' data from The Economist website. I've included the code, but not the actual articles. If you have access to the Economist then you too can use the code to download the articles from the past year.
LDA: First I wanted to know what kind of topics were written about so I made a topics model with 100 topics. Below are the top ten words of each topic in decreasing order of importance. In addition, some of topics displayed over time that I thought were interesting.
library(topicmodels)
load("/users/sweiss/google drive/ld100.Rda")
theta = posterior(ld.100)$topics
theta.average = colMeans(theta)
top.5.factors = names(sort(theta.average, decreasing = TRUE))
terms(ld.100, 10)[, as.numeric(top.5.factors)]
##       Topic 50    Topic 16   Topic 62    Topic 90  Topic 83    
##  [1,] "price"     "elect"    "scienc"    "bank"    "minist"    
##  [2,] "rate"      "parti"    "cell"      "financi" "govern"    
##  [3,] "economist" "vote"     "research"  "loan"    "polit"     
##  [4,] "index"     "polit"    "found"     "capit"   "prime"     
##  [5,] "market"    "voter"    "technolog" "lend"    "parti"     
##  [6,] "econom"    "poll"     "one"       "credit"  "parliament"
##  [7,] "exchang"   "seat"     "human"     "deposit" "opposit"   
##  [8,] "commod"    "candid"   "brain"     "financ"  "leader"    
##  [9,] "interest"  "win"      "look"      "asset"   "elect"     
## [10,] "job"       "campaign" "univers"   "regul"   "coalit"    
##       Topic 46     Topic 64  Topic 52    Topic 66     Topic 30    
##  [1,] "market"     "number"  "billion"   "presid"     "republican"
##  [2,] "firm"       "year"    "compani"   "polit"      "obama"     
##  [3,] "product"    "sinc"    "firm"      "elect"      "senat"     
##  [4,] "industri"   "increas" "share"     "power"      "democrat"  
##  [5,] "sale"       "rise"    "busi"      "presidenti" "congress"  
##  [6,] "manufactur" "rate"    "profit"    "year"       "hous"      
##  [7,] "make"       "fall"    "deal"      "govern"     "state"     
##  [8,] "profit"     "accord"  "buy"       "constitut"  "polit"     
##  [9,] "busi"       "quarter" "sharehold" "last"       "barack"    
## [10,] "cost"       "averag"  "stake"     "countri"    "presid"    
##       Topic 20  Topic 24  Topic 54   Topic 67 Topic 27  Topic 74 
##  [1,] "economi" "rate"    "group"    "syria"  "will"    "use"    
##  [2,] "growth"  "inflat"  "govern"   "rebel"  "may"     "system" 
##  [3,] "econom"  "bank"    "armi"     "war"    "year"    "can"    
##  [4,] "gdp"     "polici"  "peac"     "assad"  "hope"    "work"   
##  [5,] "invest"  "central" "pakistan" "regim"  "alreadi" "one"    
##  [6,] "year"    "currenc" "kill"     "weapon" "get"     "make"   
##  [7,] "account" "market"  "polit"    "iraq"   "next"    "way"    
##  [8,] "quarter" "fed"     "violenc"  "syrian" "expect"  "machin" 
##  [9,] "export"  "economi" "war"      "libya"  "think"   "design" 
## [10,] "spend"   "reserv"  "attack"   "forc"   "far"     "problem"
##       Topic 49  Topic 22     Topic 96   Topic 48   Topic 78   Topic 25 
##  [1,] "citi"    "parti"      "fund"     "firm"     "one"      "polic"  
##  [2,] "local"   "polit"      "investor" "compani"  "may"      "crime"  
##  [3,] "mayor"   "leader"     "return"   "busi"     "might"    "prison" 
##  [4,] "town"    "power"      "invest"   "consult"  "like"     "drug"   
##  [5,] "area"    "member"     "asset"    "industri" "way"      "say"    
##  [6,] "council" "politician" "equiti"   "big"      "can"      "crimin" 
##  [7,] "resid"   "call"       "manag"    "servic"   "whether"  "jail"   
##  [8,] "centr"   "nation"     "share"    "client"   "even"     "sentenc"
##  [9,] "street"  "offici"     "stock"    "work"     "question" "murder" 
## [10,] "build"   "chief"      "money"    "say"      "littl"    "gun"    
##       Topic 31   Topic 21   Topic 26   Topic 77    Topic 61 Topic 40
##  [1,] "euro"     "investig" "labour"   "mobil"     "year"   "year"  
##  [2,] "zone"     "charg"    "britain"  "technolog" "day"    "last"  
##  [3,] "spain"    "alleg"    "cameron"  "phone"     "week"   "two"   
##  [4,] "european" "claim"    "tori"     "appl"      "april"  "ago"   
##  [5,] "bail"     "case"     "parti"    "comput"    "month"  "five"  
##  [6,] "europ"    "trial"    "conserv"  "servic"    "last"   "past"  
##  [7,] "greec"    "report"   "miliband" "oper"      "may"    "month" 
##  [8,] "countri"  "scandal"  "mps"      "googl"     "time"   "four"  
##  [9,] "ireland"  "accus"    "david"    "network"   "said"   "three" 
## [10,] "cyprus"   "former"   "polit"    "softwar"   "june"   "end"   
##       Topic 33 Topic 5   Topic 11 Topic 23  Topic 15   Topic 80   Topic 29
##  [1,] "can"    "cut"     "court"  "money"   "britain"  "china"    "long"  
##  [2,] "make"   "spend"   "law"    "cost"    "british"  "chines"   "time"  
##  [3,] "world"  "budget"  "case"   "pay"     "london"   "offici"   "term"  
##  [4,] "peopl"  "billion" "legal"  "billion" "england"  "beij"     "like"  
##  [5,] "argu"   "year"    "right"  "fee"     "kingdom"  "hong"     "big"   
##  [6,] "think"  "deficit" "rule"   "year"    "unit"     "kong"     "run"   
##  [7,] "one"    "govern"  "judg"   "use"     "briton"   "shanghai" "also"  
##  [8,] "like"   "fiscal"  "lawyer" "paid"    "english"  "even"     "less"  
##  [9,] "good"   "will"    "suprem" "charg"   "scotland" "also"     "even"  
## [10,] "idea"   "tax"     "justic" "payment" "servic"   "yuan"     "well"  
##       Topic 65  Topic 98   Topic 34  Topic 56  Topic 1     Topic 35   
##  [1,] "measur"  "european" "law"     "histori" "space"     "road"     
##  [2,] "data"    "europ"    "rule"    "mani"    "one"       "line"     
##  [3,] "suggest" "union"    "regul"   "yet"     "mar"       "car"      
##  [4,] "also"    "countri"  "bill"    "war"     "scienc"    "train"    
##  [5,] "may"     "commiss"  "pass"    "day"     "technolog" "transport"
##  [6,] "can"     "nation"   "requir"  "never"   "field"     "rail"     
##  [7,] "chang"   "dutch"    "ban"     "father"  "light"     "railway"  
##  [8,] "base"    "brussel"  "new"     "centuri" "though"    "speed"    
##  [9,] "differ"  "germani"  "propos"  "old"     "earth"     "drive"    
## [10,] "point"   "franc"    "control" "much"    "orbit"     "say"      
##       Topic 73  Topic 12    Topic 28   Topic 2    Topic 82   Topic 58  
##  [1,] "will"    "project"   "protest"  "school"   "bond"     "women"   
##  [2,] "octob"   "build"     "street"   "educ"     "debt"     "children"
##  [3,] "month"   "plan"      "govern"   "student"  "rate"     "age"     
##  [4,] "next"    "will"      "call"     "univers"  "yield"    "famili"  
##  [5,] "novemb"  "water"     "polic"    "teacher"  "govern"   "young"   
##  [6,] "one"     "say"       "day"      "year"     "interest" "men"     
##  [7,] "year"    "river"     "demonstr" "colleg"   "market"   "old"     
##  [8,] "first"   "construct" "support"  "teach"    "investor" "parent"  
##  [9,] "said"    "built"     "peopl"    "children" "financ"   "sex"     
## [10,] "septemb" "billion"   "thousand" "pupil"    "borrow"   "child"   
##       Topic 86  Topic 70 Topic 97   Topic 19 Topic 95   Topic 4   
##  [1,] "say"     "reform" "polici"   "first"  "use"      "work"    
##  [2,] "mani"    "will"   "visit"    "second" "onlin"    "worker"  
##  [3,] "one"     "chang"  "also"     "back"   "data"     "job"     
##  [4,] "can"     "polici" "diplomat" "two"    "internet" "labour"  
##  [5,] "peopl"   "system" "leader"   "canada" "can"      "employ"  
##  [6,] "see"     "plan"   "presid"   "time"   "social"   "wage"    
##  [7,] "languag" "new"    "foreign"  "made"   "user"     "unemploy"
##  [8,] "still"   "need"   "two"      "place"  "applic"   "skill"   
##  [9,] "want"    "propos" "want"     "chang"  "search"   "young"   
## [10,] "main"    "polit"  "might"    "now"    "peopl"    "low"     
##       Topic 44  Topic 92  Topic 69     Topic 63 Topic 81   Topic 43
##  [1,] "countri" "time"    "state"      "chief"  "govern"   "peopl" 
##  [2,] "foreign" "test"    "unit"       "manag"  "state"    "kill"  
##  [3,] "world"   "ask"     "feder"      "boss"   "privat"   "fire"  
##  [4,] "global"  "experi"  "governor"   "execut" "public"   "mani"  
##  [5,] "intern"  "suggest" "california" "offic"  "sector"   "miss"  
##  [6,] "develop" "person"  "year"       "also"   "offici"   "now"   
##  [7,] "rich"    "control" "texa"       "job"    "own"      "bad"   
##  [8,] "emerg"   "word"    "san"        "board"  "say"      "least" 
##  [9,] "import"  "show"    "one"        "head"   "privatis" "disast"
## [10,] "abroad"  "think"   "counti"     "need"   "local"    "die"   
##       Topic 99 Topic 84       Topic 13   Topic 94     Topic 47   
##  [1,] "like"   "start"        "media"    "america"    "deal"     
##  [2,] "class"  "busi"         "televis"  "american"   "talk"     
##  [3,] "middl"  "firm"         "news"     "unit"       "negoti"   
##  [4,] "still"  "new"          "advertis" "state"      "agreement"
##  [5,] "well"   "entrepreneur" "newspap"  "washington" "two"      
##  [6,] "just"   "ventur"       "video"    "nation"     "agre"     
##  [7,] "much"   "small"        "show"     "long"       "sign"     
##  [8,] "better" "big"          "year"     "see"        "side"     
##  [9,] "half"   "founder"      "time"     "obama"      "want"     
## [10,] "part"   "capit"        "watch"    "action"     "will"     
##       Topic 88   Topic 89      Topic 75  Topic 18 Topic 59  Topic 45     
##  [1,] "secur"    "forc"        "africa"  "shop"   "poor"    "muslim"     
##  [2,] "agenc"    "militari"    "african" "retail" "peopl"   "egypt"      
##  [3,] "govern"   "defenc"      "south"   "store"  "help"    "islamist"   
##  [4,] "secret"   "armi"        "countri" "busi"   "poverti" "islam"      
##  [5,] "attack"   "arm"         "east"    "custom" "social"  "saudi"      
##  [6,] "say"      "war"         "kenya"   "sell"   "work"    "brother"    
##  [7,] "intellig" "afghanistan" "nigeria" "open"   "give"    "morsi"      
##  [8,] "offici"   "secur"       "middl"   "sale"   "incom"   "arab"       
##  [9,] "inform"   "soldier"     "region"  "onlin"  "benefit" "brotherhood"
## [10,] "spi"      "drone"       "say"     "say"    "money"   "arabia"     
##       Topic 32   Topic 10   Topic 36    Topic 9    Topic 71 Topic 7      
##  [1,] "hous"     "right"    "new"       "univers"  "music"  "climat"     
##  [2,] "price"    "group"    "york"      "paper"    "art"    "chang"      
##  [3,] "properti" "gay"      "one"       "research" "film"   "carbon"     
##  [4,] "home"     "campaign" "will"      "publish"  "one"    "warm"       
##  [5,] "land"     "marriag"  "year"      "public"   "show"   "environment"
##  [6,] "mortgag"  "support"  "old"       "book"     "man"    "tree"       
##  [7,] "rent"     "member"   "two"       "author"   "can"    "emiss"      
##  [8,] "new"      "among"    "bloomberg" "work"     "now"    "model"      
##  [9,] "valu"     "liber"    "boston"    "studi"    "cultur" "water"      
## [10,] "rise"     "debat"    "post"      "journal"  "first"  "global"     
##       Topic 8     Topic 14  Topic 79  Topic 87    Topic 51  Topic 42 
##  [1,] "asia"      "germani" "game"    "health"    "trade"   "immigr" 
##  [2,] "ship"      "german"  "sport"   "drug"      "market"  "mexico" 
##  [3,] "australia" "mrs"     "footbal" "hospit"    "exchang" "say"    
##  [4,] "island"    "merkel"  "club"    "care"      "import"  "migrant"
##  [5,] "south"     "green"   "play"    "treatment" "financi" "border" 
##  [6,] "sea"       "berlin"  "team"    "cancer"    "goldman" "year"   
##  [7,] "region"    "europ"   "world"   "patient"   "deriv"   "illeg"  
##  [8,] "port"      "social"  "player"  "doctor"    "econom"  "countri"
##  [9,] "countri"   "coalit"  "leagu"   "medic"     "world"   "home"   
## [10,] "indonesia" "left"    "can"     "also"      "financ"  "refuge" 
##       Topic 85 Topic 53   Topic 91  Topic 55 Topic 68     Topic 57
##  [1,] "food"   "power"    "oil"     "black"  "itali"      "tax"   
##  [2,] "say"    "energi"   "gas"     "mani"   "church"     "rais"  
##  [3,] "hotel"  "electr"   "billion" "white"  "italian"    "revenu"
##  [4,] "one"    "plant"    "energi"  "may"    "berlusconi" "pay"   
##  [5,] "meat"   "nuclear"  "price"   "like"   "left"       "incom" 
##  [6,] "drink"  "wind"     "reserv"  "race"   "christian"  "financ"
##  [7,] "eat"    "industri" "shale"   "make"   "one"        "rate"  
##  [8,] "good"   "generat"  "year"    "stop"   "cathol"     "govern"
##  [9,] "may"    "batteri"  "invest"  "less"   "movement"   "rich"  
## [10,] "consum" "renew"    "new"     "becom"  "hous"       "money" 
##       Topic 72  Topic 60   Topic 76     Topic 41  Topic 93    Topic 39 
##  [1,] "air"     "insur"    "brazil"     "pension" "french"    "russia" 
##  [2,] "new"     "health"   "year"       "pay"     "franc"     "russian"
##  [3,] "airport" "will"     "farmer"     "scheme"  "holland"   "putin"  
##  [4,] "fli"     "care"     "farm"       "public"  "pari"      "ukrain" 
##  [5,] "airlin"  "plan"     "latin"      "benefit" "fran"      "polit"  
##  [6,] "will"    "obamacar" "say"        "fund"    "mali"      "kremlin"
##  [7,] "flight"  "exchang"  "govern"     "will"    "left"      "moscow" 
##  [8,] "take"    "like"     "agricultur" "retir"   "socialist" "soviet" 
##  [9,] "dubai"   "cost"     "land"       "year"    "now"       "also"   
## [10,] "plane"   "mani"     "brazilian"  "detroit" "presid"    "now"    
##       Topic 100 Topic 37  Topic 38   Topic 3       Topic 6   Topic 17 
##  [1,] "mine"    "turkey"  "india"    "israel"      "japan"   "north"  
##  [2,] "make"    "erdogan" "indian"   "iran"        "abe"     "south"  
##  [3,] "gold"    "yet"     "say"      "isra"        "japanes" "korea"  
##  [4,] "world"   "say"     "state"    "palestinian" "new"     "park"   
##  [5,] "high"    "includ"  "year"     "arab"        "say"     "korean" 
##  [6,] "year"    "turkish" "delhi"    "west"        "tokyo"   "nuclear"
##  [7,] "wast"    "smoke"   "run"      "middl"       "minist"  "kim"    
##  [8,] "miner"   "may"     "congress" "east"        "countri" "regim"  
##  [9,] "steel"   "world"   "now"      "iranian"     "prime"   "state"  
## [10,] "littl"   "troubl"  "yet"      "state"       "now"     "test"
As one can see in the most common topics; economics, finance, business, politics top the list.
econ.3 = read.csv("/users/sweiss/google drive/economistdata.csv")
respons.vars = paste(1:100)
theta.econ = cbind(econ.3[, 1:3], theta)
date.theta.agg = aggregate(x = theta.econ[respons.vars], by = theta.econ["Date"], 
    FUN = mean)
date.theta.agg$Month <- factor(date.theta.agg$Date, levels = date.theta.agg$Date[!duplicated(date.theta.agg$Date)])
date.theta.agg[, 1] = as.Date(date.theta.agg[, 1])
library(scales)

library(ggplot2)
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 68])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Syrian Civil War' Topic include words: syria,rebel,war,assad,regim") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-2
Syrian Civil War 'peaked' in May/June. Since then the percentage of articles attributed to this topic has declined. After it became clear that USA, France, and UK were not going to intervene militarily, there were fewer articles written on the topic.
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 32])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Euro Crisis' Topic include words: euro,zone,spain,european,bail") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-3
Euro Crisis was with us the entire year…looks like this can continue on next year too!
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 29])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Protests' Topic include words: protest,street,govern,call,police") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-4
Protest topic was very high in June/July because of the Brazillian Protests and Turkish Protests. Sharp increase in final weeks of year of this topic from Thailand and even (gasp!) Singapore.
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 61])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Obama Care' Topic include words: insur,health,care,plan,obamaca") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-5
Obama Care topic increased in end of year due to botched introduction of healthcare website.
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 96])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Internet' Topic include words: use,online,data,internet,social") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-6
Finally, the Internet topic showed a strong decline ovoer the past year despite events like Twitter IPO. Perhaps this is an indication that social networks and internet are becoming so ubiquitious, its simply not news anymore.

No comments:

Post a Comment