Sunday, July 19, 2015

Corrections

Oops.

Sorry, y'all, made a couple mistakes in interpreting my own results in terms of extrapolation. This is the curse of the data-driven -- things end up being a number that's sometimes hard to check.

The main problem stemmed from a slight error in assumptions to the models used in the last two posts (see below if interested). The other is a simple misinterpretation of my own results; these models used to provide the number of days until an event occurs after the first data point -- I was assuming that it was days from the present (so a ~16 day difference). The new versions of the models produce results in terms of "days since the most recent data point," so this confusion won't occur again.

With this error now corrected, I can now make some better predictions. First of all, the previous estimate for Bernie Sanders (I) surpassing Hillary Clinton (D) is reduced. My last post indicated that in 88 days (based on my full data set), they should have the same number of Likes. Furthermore, using only the last 12 days (where Sanders has had a bit more of a hot streak), I predicted this shift to occur in 70 days.

After applying the correction, using all the data, Sanders should pass Clinton in 70 days (c. September 27). Using just the last 12 days (starting on day 6 of my data collection), Sanders should pass Clinton in 56 days (c. September 13). Two weeks in an election year can mean a lot of momentum, hence my desire to correct this. In either case, it would occur around 1.3M or 1.4M Likes each.

(the animation in the previous post is actually correct -- I just didn't report the correct number of days).

Based on my entire data set, here are some (potentially) interesting future dates:
July 29: Sanders passes Marco Rubio (R) around 918K Likes.
August 10: Donald Trump (R) hits 3M Likes.
August 14: Ben Carson (R) passes Mike Huckabee (R) around 1.8M Likes.
August 22: Clinton passes Rick Perry (R) at 1.2M Likes.
September 7: Sanders passes Perry at 1.2M Likes.
September 17: Trump hits 4M Likes.
September 20: Jeb Bush (R) passes Rick Santorum (R) at 266K Likes.
September 22: Clinton passes Ted Cruz (R) at 1.4M Likes.
September 24: Sanders passes Cruz at 1.4M Likes.
September 27: Sanders passes Clinton at 1.4M Likes.

Anyway, these are just the simplest possible predictions and don't account for an off-hand sound bite here or a riveting debate there. I'll go into that over the next couple days while things seem to be quiet on the data front. I'll try to work up some more visualizations too -- this has been a lot of text. Until next time.



Below:
The error was that I was assuming (incorrectly) that the Like data was collected at 24-hour intervals. This is largely the case, but not always. As such, the analysis saw a 1.5 day interval and .5 day interval and treated them both as full days. Why is this bad? Because on the longer interval, we see an uncharacteristically large jump in Likes, and on the shorter one, we see an uncharacteristically small jump in Likes.

To correct this, I made a simple function that figures out exactly how much time there was between measurements and now use that in my models. In particular, it creates a list of days since the last measurement (so if the first data point was 16 days prior to the most recent point, it would have a value of -16). That way, all predictions are phrased in terms of "days from now" rather than "days since the start of my data set." Easier to not make the same mistake in the future.

From a list of times called "times" that contains timestamps as strings, I created this new list with:

dayTimes = N[(AbsoluteTime[#] - AbsoluteTime[times[[Length[times]]]])/(3600*24)] & /@ times

Saturday, July 18, 2015

Bernie Beats Biden

#CalledIt

Last night, Bernie Sanders (I) surpassed Joe Biden (D) [not yet announced] in terms of total Likes. Sanders now has 837K to Biden's 836K as of midnight central time.

Sanders is still behind Hillary Clinton (D), the forerunner of the Democrats (1,066K). However, over the last 11 days, Sanders has added at least 1.5 times as many Likes as Clinton each night. Especially considering that he is behind still, it should come as no surprise that his percent increase in Likes has been at least double hers over the same timeframe. In fact, a linear best fit line of Clinton's lead now predicts Sanders surpassing her in 88 days if we use the data from my entire span of collection. Over these past 11 "magical" days for Sanders, the same fit shows him rivaling Clinton in 71 days. Assuming linear growth in Likes for Sanders, this would put both of them at 1.25M Likes.

To further this analysis, I performed a simple linear regression (best-fit line) for each candidate based on all the data I have so far (admittedly, it's not fair to Scott Walker [R], because most of the data for him shows no change since he announced after I started collecting data), and extrapolated it. The following animation shows the total estimated number of Likes for each candidate running up until the election.

Interestingly, it shows that by the time of the election, we could expect Donald Trump (R) to be up against Sanders. However, because primaries happen earlier, we should look at where candidates are estimated to be around Super Tuesday. In that case, this model predicts Trump to have 8.2M Likes compared to Ben Carson's (R) 2.64M and Sanders' 2.56M. This is perhaps the simplest possible model for analyzing this data, but an interesting one nevertheless.

Friday, July 17, 2015

What's Happened So Far

So far, there haven't been too many major developments (see the first post on this blog for a summary of who is leading), but here are a few interesting things.

Donald Trump (R) has averaged a 1.12% (+/- 0.56%) increase each night for the past 2 weeks. This has led to him skyrocketing to 2.37M Likes, up from 2.04M on the Fourth of July. The penultimately popular candidate, Rand Paul (R), is still the only other candidate with more than 2M Likes, but has shown very little momentum (500+/-230 Likes per day).

From Independence Day until now, the only changes in rank based on total Likes were: Carly Fiorina (R) passing Chris Christie (R) on July 11 and Bobby Jindal (R) passing Rick Santorum (R) on July 17. Fiorina has seen rapid growth (1.4% +/- 0.48% daily increase, approximately 1500 Likes), while Christie has seen little change. Jindal's surpassing of Santorum was essentially inevitable; Santorum is the only candidate to have shown a net decrease at any point (Joe Biden [D] has also shown a net decrease, but has not announced whether he will run -- he's included in the data set for comparison however). As such, Jindal's paltry 330+/-180 Likes would eventually overtake Santorum's 36+/-38 Likes.

It happens to be that the Republican candidates are performing better than the Democratic candidates using only Likes, which is why the major changes seem to be happening within the GOP. For the Democrats, as of July 17, it seems that Bernie Sanders (I) is within spitting distance of Biden (828K Likes to Biden's 836K). If he does not surpass Biden tonight, it will be the next night for sure (unless Biden so happened to decide to run today). Hillary Clinton (D) is still considerably ahead with 1.06M Likes. However, for most of the days in the past 2 weeks, Sanders has been slowly closing the gap, from 276K Likes to 234K. Should things continue at present rates (approximately 3000 Likes per night), Sanders will catch Clinton in approximately 90 days -- mark your calendars, folks, there's your first (highly inaccurate, simple, almost certainly wrong, grassroots-movement-ignorant) prediction! Guess we'll see about that in mid-October.



As for me, producing this page has helped me see some of the utilities I need to be building. I'll keep the code minimal here, for the most part, but a fun little thing I needed was a lookup function to give the index of a candidate in the data set based on their name. In Mathematica, I used this function:

findIndex[candidate_String] := Position[names, _String?(StringContainsQ[#, candidate] &)][[1, 1]]

This reads in a string of the candidate's name (or partial name, such as "Sand") and compares the name against all available names (in the variable names), and returns the first such match. Anyway, fun little function thing. Carry on.

First Visualizations

As promised, here's some of the data I've collected so far, visualized in a couple different ways.

Total Likes for each major candidate through time. High-res version here: http://i.imgur.com/RAEhT0Y.gif. Note: Likes for Scott Walker (R) were only added upon his announcement, so his Like count was back-filled to not show a sudden jump in Likes when he was added to the data set.









Total Likes for Republican candidates through time. High-res version here: http://i.imgur.com/ndNDC4Q.png.
Total Likes for Democratic candidates through time. High-res version here: http://i.imgur.com/qsKVXB2.png.

Thursday, July 16, 2015

Introduction -- Likes and Votes

The 2016 U.S. Presidential Election is likely to be a war fought largely on the web. Obama's precedent indicates increased candidate reliance on social media. This begs the question:

How powerful is a Facebook Like?

In particular, can Facebook Likes indicate how a candidate is doing in the polls?

I'll be investigating this over the coming months by daily monitoring of the candidates' total number of Likes. I'm using a little Java script on an old laptop to scrub these values nightly through Facebook's API (unless it has technical issues, then doing so manually roughly once per day). I'll be analyzing the data in Mathematica, showing various visualizations of the data. All the code (and data) is available at https://www.github.com/eaott/election. Feel free to check it out for yourself (or just wait for a post to see how things are going).

I have two weeks of data now, which have shown several things. Donald Trump (R) has the most Likes by quite some margin (2.4M Likes, compared to Rand Paul's [R] 2.0M and Mike Huckabee's [R] 1.8M), and has the greatest gains each night (~28,000 Likes as of this morning, compared to Bernie Sander's [I] ~7300 and Ben Carson's [R] ~6600). Trump, Carly Fiorina (R), and Jim Webb (D) seem to alternate in terms of percentage growth overnight (anywhere from 1% to 2%). For the current state of things, see below. More to come in the next couple days.