Thursday, May 22, 2014

didYouMean() Function: Using Google to correct errors in Strings

A function that will take a String as an input and return the "Did you mean.." or "Showing Results for.." from Good for misspelled names or locations.

##if on windows might need: options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
  input=gsub(" ", "+", input)
  doc=getURL(paste("",input,"/", sep=""))
  dym=gregexpr(pattern ='Did you mean',doc)
  srf=gregexpr(pattern ='Showing results for',doc)
    return(gsub("[+]"," ",new.text))
  else if(srf[[1]][1]!=-1){
    return(gsub("[+]"," ",new.text))
  else(return(gsub("[+]"," ",input)))

So didYouMean("gorecge washington") returns "george washington"

Works well with misspelled companies or nouns or phrases. For example; you're doing text analysis on twitter and a customer raves about Carlsburg beer. Only problem is he's enjoying their product while tweeting (something that happens only rarely, I'm sure) and wrote "clarsburg gprou". Not to worry!

> didYouMean("clarsburg gprou")
[1] "carlsberg group"

Or suppose you have a 3 phase plan for profits. This can help you get there!

didYouMean("clletc nuderpants")
[1] "collect underpants"

Saturday, May 17, 2014

Modelling This Time is Different: Corrected

New Code

I made two errors in my previous post.

The first is that I put Probability in the utility function. Generally, this is a no no where the E[utility]=sum over i: P(outcome i)*U(outcome i). I therefore changed it to a more simple maximization problem (no Lagrangian multipliers necessary) where the individual maximizes E[Profits].

 The second problem has to deal with maximizing subject to probabilities. Since I sampled from the joint posterior distribution of unknown parameters, I had a number of draws from the distribution. What I did in the previous analysis was maximize each pair of simulated draws individually, and then averaged over these maximized results to get what I thought was the optimal result. In general, this method does not result in the optimal value. I should have maximized all pairs simultaneously. Basically I did E[max s of f(s,p)] instead of max s of E[f(s,a)].

In general, the results are superficially similar to my original analysis. Even if the results are largely the same, its best to describe my mistakes upfront, and avoid awkward questions later.