Introduction
A typical article that discusses economics is awash with words that allude economics connection with water or a liquid. For Example: Markets get saturated when a competing country dumps its products on them Stock prices can get frothy and might develop into a bubble A credit squeeze will dampen expectations and sink prices People often talk of soaking the rich’s slush funds Floating more shares will dilute the value of a stock curiously in my opinion: During a credit drought a mortgage will become underwater and business might need a bailout and of course prices and inflate or deflate It is tempting to see the relation of how money flows through an economy and that prices inflate or deflate as money supply increases or decreases. And there has been serious effort to model money and markets as a flow.
The analogy of money or markets as a liquid that flows has a long history in economics. Francois Quesnay discussed the topic in 1758 as the Tableau economique. Irving Fisher saw markets as the as “the flowing water..”. Machines that model the economy as water called “Phillipes Machines”were built in the 1940’s. Perhaps even Monetarists could be included in this line of thought by considering the equation MV=PQ (money velocity = prices * quantity) by discussing the flow of money through an economy. However, from a theoretical point of view It is hard to reconcile this concept withrecent developments.. but thats another topic.
This post will explore how layman financial / economics articles that discuss the ‘liquidity’ of markets evolve over time using The Economist articles since 1998.
Since articles can discuss multiple topics (politics, regions, science, etc), paragraphs are the basic unit of data (bag of words) and each is assign a probability to whether the paragraph discusses economics.
If a paragraph includes a ‘liquidity’ word (discussed below) then the paragraphs probability of economics will be used as part of the index (to account for possibility that the article is not about economics).
An aggregated score by week is then calculated. Below is the resulting scaled time series of all ‘liquidity’ words along ‘inflat’ and ‘deflat’ (since these are actually technical terms I felt that they should be treated separately from the other liquidity words such as “bubble” in which some economists find that term vacuous).
Of course this is a very messy index, but we do see spikes of “liquidity” index around the financial crises at the end of 2007, a rise in ‘inflat’ index before that date and at least two spikes in ‘deflat’ index around 2003 and 2015.
To smooth out the signal, I made a simple local linear trend model for each index and extracted the states. Below are the time series of the states.
From around 2005 to mid 2007 ‘inflat’ occurred at an increasing rate in Economists articles. You can see this index sink very fast, just as the ‘liquidity index’ reached its height during the financial crises. Finally, it’s noteworthy that the most recent blip of discussion is deflation in 2015.
I thought the liquidity index I created looked like the VIX index so I plotted both scaled series below.
They do appear to increase and decrease at same rate, but not perfectly. But maybe if I’m too busy to read The Economist for a particular week, I can just check the VIX and infer how much they talk about liquidity.
Outline of Code:
- Scraped the economist articles Classified which paragraphs discussed business / economics / financial topics.
- Applied model to all paragraphs and ended with prob(economics) - find more information on methodology here
- Created corpus of words that are affiliated with the concept of ‘liquidity’ or ‘water’ in relation to economics - did this through word2vec and manually seeing which words were associated with liquidity. for instance, i found illiquid was similar to liquid so included that in the corpus. i continued this manual process until i couldn’t find any more words. I thought word2vec could be more precise in this regard and I could just find a “liquidity vector” thats similar to the words I’m looking for but this wasn’t the case. Below are words I found that I thought were appropriate (“soak”,“mop”, “bubble”,“burst”,“pop”,“frothy”,“saturate”, “liquid”,“deflat”,“choppy”,“topsy”,“mop”,“squeeze”,“damp”,“headwinds”,“dilute”,“sink”, “flow”,“inflat”,“solvent”,“ripple”,“pressure”,“expansion”,“untapped”,“reservoir”,“tap”,“funnel”,“fusion”,“pump”,“bail”,“absorb”,“intake”,“draining”,“meltdown”,“bursts”,“implos”,“explos”,“ripple”,“flotations”,“float”,“sink”,“dump”,“circul”,“spew”,“swell”,“pour”,“splashing”,“flood”,“draught”,“dry”,“reservoirs”, “reservoirs”,“fluid”,“droplets”,“dissolve”)
- Created an ‘index’ by multiplying the prob(economics)*Indicator(word included) for each paragraph and summing this value by week
- Built state-space models for index to smooth values
- Found liquidity state sort of follows VIX index.
year_links=function(year){
url=paste0("http://www.economist.com/printedition/covers?print_region=76980&date_filter%5Bvalue%5D%5Byear%5D=",year)
doc = htmlTreeParse(url, useInternalNodes = T)
text = xpathSApply(doc, "//a", xmlValue)
src = xpathApply(doc, "//a[@href]", xmlGetAttr, "href")
links=unlist(src)
links=links[grep(paste0("printedition/",year),links)]
links_editions=paste0("http://www.economist.com",links)
links_editions=unique(links_editions)
return(links_editions)
}
weekly_links=lapply(1998:2015, year_links)
weekly_links=unlist(weekly_links)
#
# doc = htmlTreeParse(links_editions[1], useInternalNodes = T)
# src = xpathApply(doc2, "//a[@href]", xmlGetAttr, "href")
# links_week=unlist(src)
# links_week=links_week[grep("news",links_week)]
#
#
#
links_editions=weekly_links[2]
scrape_articles=function(links_editions){
doc = htmlTreeParse(links_editions, useInternalNodes = T)
src = xpathApply(doc, "//a[@href]", xmlGetAttr, "href")
links_week=unlist(src)
links_week=links_week[unique(c(grep("node",links_week), grep("news",links_week)))]
links_week=paste0("http://www.economist.com",links_week)
text1=vector(length=length(links_week))
for(i in 1:length(links_week)){
doc= tryCatch(htmlTreeParse(links_week[i], useInternalNodes = T), error=function(e) NULL)
if(! is.null(doc)) {text = xpathSApply(doc, '//*[contains(concat( " ", @class, " " ), concat( " ", "main-content", " " ))]', xmlValue) }
if(length(text)!=0) {text1[i]=text}
}
return(cbind(edition=rep(links_editions,length(links_week)),link=links_week, text1))
}
articles=lapply(weekly_links,scrape_articles)
articles1=do.call(rbind,articles)
#extract dates
articles1=cbind(articles1,date=gsub("http://www.economist.com/printedition/","", articles1[,1]))
#remove small articles
articles1=articles1[-which(nchar(articles1[,3])<550),]
articles1=cbind(articles1,numeric_date=as.numeric(as.Date(articles1[,"date"])))
library(stringr)
##clean up text /split articles into paragraphs
articles_text=str_replace_all(articles1[,3], "[^[:alnum:]]", " ")
link_articles=rep( unlist(lapply(paragraphs,length)))
paragraphs=strsplit(articles1[,3], "\n")
date_articles=rep(articles1[,"date"],unlist(lapply(paragraphs,length)))
link_articles=rep(articles1[,"link"],unlist(lapply(paragraphs,length)))
##remove small size paragraphs
paragraphs_1=unlist(paragraphs)
paragraphs_2=paragraphs_1[-which(nchar(paragraphs_1)<90)]
link_articles_2=link_articles[-which(nchar(paragraphs_1)<90)]
date_articles_2=date_articles[-which(nchar(paragraphs_1)<90)]
link_articles_2=link_articles[-which(nchar(paragraphs_1)<90)]
paragraph_date=cbind(date_articles_2,paragraphs_2)
economist_paragraph_section=unlist(lapply(strsplit(link_articles_2,"/"), function(x) x[5]))
#create article class
economist_sections=c("united-states","middle-east-and-africa","europe","finance-and-economics","business","britain","books-and-arts","asia","science-and-technology","americas","china","business-and-finance","international")
economist_paragraph_section_target=economist_paragraph_section
economist_paragraph_section_target[-which(economist_paragraph_section_target %in% economist_sections)]=""
paragraphs_3=str_replace_all(paragraphs_2, "[^[:alnum:]]", " ")
paragraphs_3=iconv(paragraphs_3, to='ASCII//TRANSLIT')
##create term document matrix for classification
it <- itoken(paragraphs_3,
preprocess_function = tolower,
tokenizer = word_tokenizer,
ids = 1:length(paragraphs_3))
vocab <- create_vocabulary(it, stopwords = sw)
# Each element of list represents document
tokens <- paragraphs_3%>%
tolower() %>%
word_tokenizer()
it <- itoken(tokens, ids = 1:length(paragraphs_3))
vocab <- create_vocabulary(it, stopwords = sw)
it <- itoken(tokens, ids = 1:length(paragraphs_3))
# Or
# it <- itoken(movie_review$review, tolower, word_tokenizer, ids = movie_review$id)
vectorizer <- vocab_vectorizer(vocab)
dtm <- create_dtm(it, vectorizer)
dtm_1=dtm[-which(economist_paragraph_section_target==""),]
economist_paragraph_section_target_1=economist_paragraph_section_target[-which(economist_paragraph_section_target=="")]
economist_paragraph_section_target_2=rep(0,length(economist_paragraph_section_target_1))
economist_paragraph_section_target_2[which(economist_paragraph_section_target_1 %in% c("business","business-and-finance","finance-and-economics"))]=1
word_sums=colSums(dtm_1)
dtm_1=dtm_1[,-which(word_sums<10)]
#build model and predict on all text
library(glmnet)
require(doMC)
registerDoMC(cores=4)
glmnet=cv.glmnet(dtm_1,economist_paragraph_section_target_2,parallel=TRUE,family="binomial",type.measure="auc")
pred_glmnet=predict(dtm[,-which(word_sums<10)], s='lambda.min')
paragraphs_3[order(pred_glmnet,decreasing=TRUE)[1:10]]
pred_glmnet=dtm[,-which(word_sums<10)]%*%coef(glmnet,s='lambda.min')[-1,]+coef(glmnet,s='lambda.min')[1,]
prob_glmnet=1-exp(-pred_glmnet[,1])/(1+exp(-pred_glmnet[,1]))
##words to use
words=unique(c("soak","mop", "bubble","burst","pop","frothy","saturate", "liquid","deflat","choppy","topsy","mop","squeeze","damp","headwinds","dilute","sink",
"flow","inflat","solvent","ripple","pressure","expansion","untapped","reservoir","tap","funnel","fusion","pump","bail",
"absorb","intake","draining","meltdown","bursts","implos","explos","ripple","flotations","float" ,"sink","dump","circul",
"spew","swell","pressure","pour","splashing","flood","draught","dry","reservoirs",
"reservoirs","fluid","droplets","dissolve"))
business_economics_predictions=prob_glmnet
library(parallel)
#see if word is in a paragraph
is_included=mclapply(words, function(x) grepl(x,paragraphs_3,ignore.case = FALSE), mc.cores=4)
is_included_1=do.call(cbind,is_included)
liquidity_words=rowSums(is_included_1[,-which(words %in% c("inflat", "deflat"))])*business_economics_predictions
inflat_words=(is_included_1[,which(words %in% c("inflat"))])*business_economics_predictions
deflat_words=(is_included_1[,which(words %in% c("deflat"))])*business_economics_predictions
all_words=(is_included_1)*business_economics_predictions
date_articles_3=date_articles_2
##aggregate by week
library(dplyr)
count_dates=as.data.frame(cbind(liquidity_words,inflat_words,deflat_words,date=date_articles_2))
count_dates[,1]=as.numeric(as.character(count_dates[,1]))
count_dates[,2]=as.numeric(as.character(count_dates[,2]))
count_dates[,3]=as.numeric(as.character(count_dates[,3]))
count_dates$prob=business_economics_predictions
data_date <- group_by(count_dates, date)
index_by_date=summarise(data_date, liquidity_words=sum(liquidity_words),inflat_words=sum(inflat_words),deflat_words=sum(deflat_words),prob=sum(prob))
index_by_date_1=as.data.frame(index_by_date)
index_by_date_scaled=index_by_date_1
index_by_date_scaled[,-1]=apply(index_by_date_scaled[,-1],2,scale)
index_by_date_melt=melt(index_by_date_scaled[,-5])
library(reshape)
library(ggplot2)
index_plot=ggplot(index_by_date_melt, aes(x=as.Date(date), y=value, group=variable,colour=variable)) + geom_line()+ scale_x_date()+ theme_classic()+labs(title="Time Series of Indicies",x="Date (by Week)",y="Scaled Value of Index")
code="
data {
int n;
vector[n] y;
real<lower=0> theta1_mean;
real<lower=0> theta1_sd;
}
parameters {
real<lower=0> sigma_v;
real<lower=0> sigma_w;
vector[n] theta_innov;
}
transformed parameters {
vector[n] theta;
theta[1] <- theta1_mean + theta1_sd * theta_innov[1];
for (t in 2:n) {
theta[t] <- theta[t - 1] + sigma_w * theta_innov[t];
}
}
model {
theta_innov ~ normal(0, 1);
y ~ normal(theta, sigma_v);
}
"
local_linear_trend_state=function(y){
fit=stan(model_code=code, data=list(y=y, n=length(y), theta1_mean=y[1], theta1_sd=10), iter=2000,chains=3)
mat=as.matrix(fit)
theta=mat[,grep("theta", colnames(mat))]
return(colMeans(theta[,-c(1:ncol(theta)/2)]))
}
index_by_date_1$all_liquidity=rowSums(index_by_date_1[,c("liquidity_words" ,"inflat_words" ,"deflat_words")])
states=lapply(index_by_date_1[,-1],local_linear_trend_state)
states=do.call(cbind,states)
states_1=states
states_1=data.frame(date=index_by_date_1[,"date"], states_1)
states_1[,-1]=apply(states_1[,-1],2,scale)
library(reshape)
states_melt=melt(states_1[,-5])
library(ggplot2)
states_plot=ggplot(states_melt, aes(x=as.Date(date), y=value, group=variable,colour=variable)) + geom_line()+ scale_x_date()+ theme_classic()+labs(title="Time Series of Indicy States",x="Date (by Week)",y="Scaled Value of Index")
library(Quandl)
library(Quandl)
vix=Quandl("CBOE/VIX")
states_1[,1]=as.Date(states_1[,1])-1
states_2=merge(states_1,vix, by.x="date",by.y="Date", all=TRUE)
states_2[,1]=as.factor(states_2[,1])
states_2[,-1]=apply(states_2[,-1],2,scale)
states_3_melt=melt(states_2[-which(is.na(states_2[,2])),c("date","liquidity_words","VIX Close")])
A typical article that discusses economics is awash with words that allude economics connection with water or a liquid. For Example: Markets get saturated when a competing country dumps its products on them Stock prices can get frothy and might develop into a bubble A credit squeeze will dampen expectations and sink prices People often talk of soaking the rich’s slush funds Floating more shares will dilute the value of a stock curiously in my opinion: During a credit drought a mortgage will become underwater and business might need a bailout and of course prices and inflate or deflate It is tempting to see the relation of how money flows through an economy and that prices inflate or deflate as money supply increases or decreases. And there has been serious effort to model money and markets as a flow.
The analogy of money or markets as a liquid that flows has a long history in economics. Francois Quesnay discussed the topic in 1758 as the Tableau economique. Irving Fisher saw markets as the as “the flowing water..”. Machines that model the economy as water called “Phillipes Machines”were built in the 1940’s. Perhaps even Monetarists could be included in this line of thought by considering the equation MV=PQ (money velocity = prices * quantity) by discussing the flow of money through an economy. However, from a theoretical point of view It is hard to reconcile this concept withrecent developments.. but thats another topic.
This post will explore how layman financial / economics articles that discuss the ‘liquidity’ of markets evolve over time using The Economist articles since 1998.
Since articles can discuss multiple topics (politics, regions, science, etc), paragraphs are the basic unit of data (bag of words) and each is assign a probability to whether the paragraph discusses economics.
If a paragraph includes a ‘liquidity’ word (discussed below) then the paragraphs probability of economics will be used as part of the index (to account for possibility that the article is not about economics).
An aggregated score by week is then calculated. Below is the resulting scaled time series of all ‘liquidity’ words along ‘inflat’ and ‘deflat’ (since these are actually technical terms I felt that they should be treated separately from the other liquidity words such as “bubble” in which some economists find that term vacuous).
Of course this is a very messy index, but we do see spikes of “liquidity” index around the financial crises at the end of 2007, a rise in ‘inflat’ index before that date and at least two spikes in ‘deflat’ index around 2003 and 2015.
To smooth out the signal, I made a simple local linear trend model for each index and extracted the states. Below are the time series of the states.
From around 2005 to mid 2007 ‘inflat’ occurred at an increasing rate in Economists articles. You can see this index sink very fast, just as the ‘liquidity index’ reached its height during the financial crises. Finally, it’s noteworthy that the most recent blip of discussion is deflation in 2015.
I thought the liquidity index I created looked like the VIX index so I plotted both scaled series below.
They do appear to increase and decrease at same rate, but not perfectly. But maybe if I’m too busy to read The Economist for a particular week, I can just check the VIX and infer how much they talk about liquidity.
Outline of Code:
- Scraped the economist articles Classified which paragraphs discussed business / economics / financial topics.
- Applied model to all paragraphs and ended with prob(economics) - find more information on methodology here
- Created corpus of words that are affiliated with the concept of ‘liquidity’ or ‘water’ in relation to economics - did this through word2vec and manually seeing which words were associated with liquidity. for instance, i found illiquid was similar to liquid so included that in the corpus. i continued this manual process until i couldn’t find any more words. I thought word2vec could be more precise in this regard and I could just find a “liquidity vector” thats similar to the words I’m looking for but this wasn’t the case. Below are words I found that I thought were appropriate (“soak”,“mop”, “bubble”,“burst”,“pop”,“frothy”,“saturate”, “liquid”,“deflat”,“choppy”,“topsy”,“mop”,“squeeze”,“damp”,“headwinds”,“dilute”,“sink”, “flow”,“inflat”,“solvent”,“ripple”,“pressure”,“expansion”,“untapped”,“reservoir”,“tap”,“funnel”,“fusion”,“pump”,“bail”,“absorb”,“intake”,“draining”,“meltdown”,“bursts”,“implos”,“explos”,“ripple”,“flotations”,“float”,“sink”,“dump”,“circul”,“spew”,“swell”,“pour”,“splashing”,“flood”,“draught”,“dry”,“reservoirs”, “reservoirs”,“fluid”,“droplets”,“dissolve”)
- Created an ‘index’ by multiplying the prob(economics)*Indicator(word included) for each paragraph and summing this value by week
- Built state-space models for index to smooth values
- Found liquidity state sort of follows VIX index.
year_links=function(year){
url=paste0("http://www.economist.com/printedition/covers?print_region=76980&date_filter%5Bvalue%5D%5Byear%5D=",year)
doc = htmlTreeParse(url, useInternalNodes = T)
text = xpathSApply(doc, "//a", xmlValue)
src = xpathApply(doc, "//a[@href]", xmlGetAttr, "href")
links=unlist(src)
links=links[grep(paste0("printedition/",year),links)]
links_editions=paste0("http://www.economist.com",links)
links_editions=unique(links_editions)
return(links_editions)
}
weekly_links=lapply(1998:2015, year_links)
weekly_links=unlist(weekly_links)
#
# doc = htmlTreeParse(links_editions[1], useInternalNodes = T)
# src = xpathApply(doc2, "//a[@href]", xmlGetAttr, "href")
# links_week=unlist(src)
# links_week=links_week[grep("news",links_week)]
#
#
#
links_editions=weekly_links[2]
scrape_articles=function(links_editions){
doc = htmlTreeParse(links_editions, useInternalNodes = T)
src = xpathApply(doc, "//a[@href]", xmlGetAttr, "href")
links_week=unlist(src)
links_week=links_week[unique(c(grep("node",links_week), grep("news",links_week)))]
links_week=paste0("http://www.economist.com",links_week)
text1=vector(length=length(links_week))
for(i in 1:length(links_week)){
doc= tryCatch(htmlTreeParse(links_week[i], useInternalNodes = T), error=function(e) NULL)
if(! is.null(doc)) {text = xpathSApply(doc, '//*[contains(concat( " ", @class, " " ), concat( " ", "main-content", " " ))]', xmlValue) }
if(length(text)!=0) {text1[i]=text}
}
return(cbind(edition=rep(links_editions,length(links_week)),link=links_week, text1))
}
articles=lapply(weekly_links,scrape_articles)
articles1=do.call(rbind,articles)
#extract dates
articles1=cbind(articles1,date=gsub("http://www.economist.com/printedition/","", articles1[,1]))
#remove small articles
articles1=articles1[-which(nchar(articles1[,3])<550),]
articles1=cbind(articles1,numeric_date=as.numeric(as.Date(articles1[,"date"])))
library(stringr)
##clean up text /split articles into paragraphs
articles_text=str_replace_all(articles1[,3], "[^[:alnum:]]", " ")
link_articles=rep( unlist(lapply(paragraphs,length)))
paragraphs=strsplit(articles1[,3], "\n")
date_articles=rep(articles1[,"date"],unlist(lapply(paragraphs,length)))
link_articles=rep(articles1[,"link"],unlist(lapply(paragraphs,length)))
##remove small size paragraphs
paragraphs_1=unlist(paragraphs)
paragraphs_2=paragraphs_1[-which(nchar(paragraphs_1)<90)]
link_articles_2=link_articles[-which(nchar(paragraphs_1)<90)]
date_articles_2=date_articles[-which(nchar(paragraphs_1)<90)]
link_articles_2=link_articles[-which(nchar(paragraphs_1)<90)]
paragraph_date=cbind(date_articles_2,paragraphs_2)
economist_paragraph_section=unlist(lapply(strsplit(link_articles_2,"/"), function(x) x[5]))
#create article class
economist_sections=c("united-states","middle-east-and-africa","europe","finance-and-economics","business","britain","books-and-arts","asia","science-and-technology","americas","china","business-and-finance","international")
economist_paragraph_section_target=economist_paragraph_section
economist_paragraph_section_target[-which(economist_paragraph_section_target %in% economist_sections)]=""
paragraphs_3=str_replace_all(paragraphs_2, "[^[:alnum:]]", " ")
paragraphs_3=iconv(paragraphs_3, to='ASCII//TRANSLIT')
##create term document matrix for classification
it <- itoken(paragraphs_3,
preprocess_function = tolower,
tokenizer = word_tokenizer,
ids = 1:length(paragraphs_3))
vocab <- create_vocabulary(it, stopwords = sw)
# Each element of list represents document
tokens <- paragraphs_3%>%
tolower() %>%
word_tokenizer()
it <- itoken(tokens, ids = 1:length(paragraphs_3))
vocab <- create_vocabulary(it, stopwords = sw)
it <- itoken(tokens, ids = 1:length(paragraphs_3))
# Or
# it <- itoken(movie_review$review, tolower, word_tokenizer, ids = movie_review$id)
vectorizer <- vocab_vectorizer(vocab)
dtm <- create_dtm(it, vectorizer)
dtm_1=dtm[-which(economist_paragraph_section_target==""),]
economist_paragraph_section_target_1=economist_paragraph_section_target[-which(economist_paragraph_section_target=="")]
economist_paragraph_section_target_2=rep(0,length(economist_paragraph_section_target_1))
economist_paragraph_section_target_2[which(economist_paragraph_section_target_1 %in% c("business","business-and-finance","finance-and-economics"))]=1
word_sums=colSums(dtm_1)
dtm_1=dtm_1[,-which(word_sums<10)]
#build model and predict on all text
library(glmnet)
require(doMC)
registerDoMC(cores=4)
glmnet=cv.glmnet(dtm_1,economist_paragraph_section_target_2,parallel=TRUE,family="binomial",type.measure="auc")
pred_glmnet=predict(dtm[,-which(word_sums<10)], s='lambda.min')
paragraphs_3[order(pred_glmnet,decreasing=TRUE)[1:10]]
pred_glmnet=dtm[,-which(word_sums<10)]%*%coef(glmnet,s='lambda.min')[-1,]+coef(glmnet,s='lambda.min')[1,]
prob_glmnet=1-exp(-pred_glmnet[,1])/(1+exp(-pred_glmnet[,1]))
##words to use
words=unique(c("soak","mop", "bubble","burst","pop","frothy","saturate", "liquid","deflat","choppy","topsy","mop","squeeze","damp","headwinds","dilute","sink",
"flow","inflat","solvent","ripple","pressure","expansion","untapped","reservoir","tap","funnel","fusion","pump","bail",
"absorb","intake","draining","meltdown","bursts","implos","explos","ripple","flotations","float" ,"sink","dump","circul",
"spew","swell","pressure","pour","splashing","flood","draught","dry","reservoirs",
"reservoirs","fluid","droplets","dissolve"))
business_economics_predictions=prob_glmnet
library(parallel)
#see if word is in a paragraph
is_included=mclapply(words, function(x) grepl(x,paragraphs_3,ignore.case = FALSE), mc.cores=4)
is_included_1=do.call(cbind,is_included)
liquidity_words=rowSums(is_included_1[,-which(words %in% c("inflat", "deflat"))])*business_economics_predictions
inflat_words=(is_included_1[,which(words %in% c("inflat"))])*business_economics_predictions
deflat_words=(is_included_1[,which(words %in% c("deflat"))])*business_economics_predictions
all_words=(is_included_1)*business_economics_predictions
date_articles_3=date_articles_2
##aggregate by week
library(dplyr)
count_dates=as.data.frame(cbind(liquidity_words,inflat_words,deflat_words,date=date_articles_2))
count_dates[,1]=as.numeric(as.character(count_dates[,1]))
count_dates[,2]=as.numeric(as.character(count_dates[,2]))
count_dates[,3]=as.numeric(as.character(count_dates[,3]))
count_dates$prob=business_economics_predictions
data_date <- group_by(count_dates, date)
index_by_date=summarise(data_date, liquidity_words=sum(liquidity_words),inflat_words=sum(inflat_words),deflat_words=sum(deflat_words),prob=sum(prob))
index_by_date_1=as.data.frame(index_by_date)
index_by_date_scaled=index_by_date_1
index_by_date_scaled[,-1]=apply(index_by_date_scaled[,-1],2,scale)
index_by_date_melt=melt(index_by_date_scaled[,-5])
library(reshape)
library(ggplot2)
index_plot=ggplot(index_by_date_melt, aes(x=as.Date(date), y=value, group=variable,colour=variable)) + geom_line()+ scale_x_date()+ theme_classic()+labs(title="Time Series of Indicies",x="Date (by Week)",y="Scaled Value of Index")
code="
data {
int n;
vector[n] y;
real<lower=0> theta1_mean;
real<lower=0> theta1_sd;
}
parameters {
real<lower=0> sigma_v;
real<lower=0> sigma_w;
vector[n] theta_innov;
}
transformed parameters {
vector[n] theta;
theta[1] <- theta1_mean + theta1_sd * theta_innov[1];
for (t in 2:n) {
theta[t] <- theta[t - 1] + sigma_w * theta_innov[t];
}
}
model {
theta_innov ~ normal(0, 1);
y ~ normal(theta, sigma_v);
}
"
local_linear_trend_state=function(y){
fit=stan(model_code=code, data=list(y=y, n=length(y), theta1_mean=y[1], theta1_sd=10), iter=2000,chains=3)
mat=as.matrix(fit)
theta=mat[,grep("theta", colnames(mat))]
return(colMeans(theta[,-c(1:ncol(theta)/2)]))
}
index_by_date_1$all_liquidity=rowSums(index_by_date_1[,c("liquidity_words" ,"inflat_words" ,"deflat_words")])
states=lapply(index_by_date_1[,-1],local_linear_trend_state)
states=do.call(cbind,states)
states_1=states
states_1=data.frame(date=index_by_date_1[,"date"], states_1)
states_1[,-1]=apply(states_1[,-1],2,scale)
library(reshape)
states_melt=melt(states_1[,-5])
library(ggplot2)
states_plot=ggplot(states_melt, aes(x=as.Date(date), y=value, group=variable,colour=variable)) + geom_line()+ scale_x_date()+ theme_classic()+labs(title="Time Series of Indicy States",x="Date (by Week)",y="Scaled Value of Index")
library(Quandl)
library(Quandl)
vix=Quandl("CBOE/VIX")
states_1[,1]=as.Date(states_1[,1])-1
states_2=merge(states_1,vix, by.x="date",by.y="Date", all=TRUE)
states_2[,1]=as.factor(states_2[,1])
states_2[,-1]=apply(states_2[,-1],2,scale)
states_3_melt=melt(states_2[-which(is.na(states_2[,2])),c("date","liquidity_words","VIX Close")])
Fab article...could you also do this with other publications such as Barron's?
ReplyDeleteKind regards,
Joe
Thanks - Yea one could do something similar with Barron's. Give it a shot!
DeleteGreat work. I tried the reproduce this work, but couldnt. Mr. Weiss what is your "stopwords" when you run the code? Additionally I couldnt understand how you to get inflat and deflat words in code. Thanks in advance. Kind regards.
ReplyDeleteThanks!
DeleteI just used sw=c(), so no stop words. I probably should have included common english words to delete.
inflat and deflat indices just include 'inflat' and 'deflat'. I created those indices because those are the technical terms for price level increase / decrease.
This comment has been removed by the author.
ReplyDeleteSam. I am very interested in your research. I am working on a PhD thesis and your data would be very helpful to me. I am setting up independent variables for a regression analysis. I would appreciate talking with you about to access your raw data to convert it to a variable. Thank you for any interest you migh have in helping me. My email is guy@standel.com
ReplyDelete