Tuesday, October 6, 2015

Carson Leads; Second Debate; Interactive Graph

Howdy! It's been a while. Sorry about that. Turns out grad school is kinda hard. Who knew?

There's a lot of data to talk about (if you want graphs, just scroll down a bit to the Graphs and Stuff section), but I wanted to say a bit about some of the code changes I've made and why I think they're worth noting, plus just tell more of a story about the project.

Code

A lot of my time for this blog is spent trying to make new and better visualization tools for this data set. Part of it comes from learning new tools in Mathematica, part of it comes from more software engineering -- how can I build this so that this time in January I can still effectively use the same code? That takes some forethought. As my high school computer science teacher's wall said, "weeks of programming can save hours of planning" (Yes, Mr. Martin, I still think about this regularly when I code and quote it to the students I tutor).

Being my own project manager (+ y'all)

As the sole programmer on this project, I know all the flaws of my system. I know that before this post, I hadn't included John Kasich (R) in my results even though I was monitoring his page. I know that it was impossible to understand what was going on in some of my plots. I know I wasn't handling some special cases well. I know I couldn't highlight a particular candidate.

Figuring out these problems is one thing. Addressing them is another. Although I usually end up having long coding sessions to fix multiple things, I have to do so systematically. That means determining what's critical, what will cause changes "downstream" and so forth. For example, I ran into this one for this post: what happens if the data in a Databin in the Wolfram Language isn't in chronological order? I include a timestamp in each submission I make to the Wolfram Data Drop, but the fact is, the rows in a Databin are ordered by submission time, not manual timestamp (although they are treated the same in many places).

Why is that a problem? A lot of the code in the project relies on neighboring rows being sequential (especially for finding the differences between days and things like that). So when I had to re-create some submissions to properly handle Kasich to account for missing data (assume -1 or 0 everywhere, including percent change, until real daily values can be computed), that left me with a new problem of how to compensate. Fortunately, I was able to deal with this (and sorting names by last name in some places) upfront. Without planning how to do this effectively, I would probably still be editing code, rather than making a structured fix. Planning > hacking, at least if you care about the quality of the code next week.

I also want to respond to what you think about this blog. I hadn't been able to display daily results, and that's kinda annoying if you want to check in on things between my posts. I hadn't included Kasich (nor fixed Scott Walker's (R) data [who also entered the race later]), nor added the ability to highlight a candidate, along with some other back-end updates. That's a lot. Carry on to see results.

Optional Parameters

Okay, super tech-ing out now. The Wolfram Language supports optional parameters. That sentence may make no sense to you. Let's talk about fast food instead. You pull up to your local Whataburger (I'm all about that honey butter chicken biscuit) and see all the choices you can order. You decide on a double cheeseburger. That could be the end of the story. But maybe you want to customize it. Maybe you want grilled onions. Oh, and you should put double lettuce. And substitute ketchup for mustard. You took the main idea then added a lot of options (admittedly fairly structured ones -- you're not going to be able to add a side of chicken enchiladas with green sauce).

The Wolfram Language (WL, which I'm using for this project) supports this sort of flexibility, including providing defaults (or guesses). For example, doing a simple plot could be done with many options:
Simple plot 
Highly customized plot
As part of my project, I'm building a lot of functions specifically for my data. Particularly for the graphs of Likes versus time, there's a lot to be customized. There's color, there's whether to highlight a candidate, there's which candidates to plot, there's the plot range... and I'm sure I'll have more next time.

There's a cool pattern in WL to do this (there are others; this is the one I used successfully for some statistics-related functions I created as an exercise a few weeks ago). You can create the full function:
myNewFunction[x,y,z]:= .... 
Then you can determine which variables should have options and default values:
myNewFunction[x,OptionsPattern[{y->5, z->7}]]:=myNewFunction[x,OptionValue[y],OptionValue[z]]
This creates a new version that requires a value for x and assumes that y=5 and z=7.  However, if the user uses new values (myNewFunction[2,y->10]) the calculation will be updated accordingly. Essentially, it's nice to have the first version where the user specifies all the values so you don't get confused (as the person writing the function), and it's nice to have the second as a user because it's more flexible and "smart." This pattern provides the best of both worlds: any functionality changes are only made once, and any new options are easy to incorporate into versions.

In doing so with my time series graphing function, I made it so I can selectively customize options for the plot without having to change the function or write down every single parameter I have at my disposal. TL;DR: I can change stuff easier to create prettier plots faster.

Graphs and Stuff

Here are the actual results. (I apologize that I don't have data for a few days due to some technical issues)

Ben Carson (R) Leads All Candidates

Last time, Donald Trump (R) had full control over the political field. A challenger has arisen. Ben Carson (R) has rapidly risen in popularity on Facebook:
Oh hey, I can highlight candidates now.
Each vertical line indicates one of the televised GOP debates. Carson has skyrocketed since the first debate and even more so since the second one. He now has approximately 19% of the total number of Likes to Trump's 18% and third-place Rand Paul's (R) 9.5%.

In related news, between September 23 and October 2, he was the first to hit 4M Facebook Likes.

Past Predictions

This brings me to some past predictions. I'll copy the predictions here and write the updates in bold. (Many of the events occurred on September 17 – just a few hours after the second GOP debate)
  • August 26: Clinton passes Perry at 1.2M Likes (Occurred on August 24 at 1.2M)
  • August 29: Rubio hits 1M Likes (Occurred on September 17 – a delay was predicted last time due to his waning FB support at the time)
  • September 1: Bush passes Santorum at 265K (Occurred on September 17 at 266K)
  • September 4: Sanders passes Cruz at 1.4M (Occurred September 17)
  • September 18: Bush passes Jindal at 286K (Not yet occurred)
  • September 29: Sanders passes Huckabee (R) at 1.8M (Not yet occurred)
  • September 29: Bush passes Walker at 300K (Not yet occurred)
  • October 13: Sanders passes Paul (R) at 2.1M (Not yet occurred)
So, my predictions were kinda bad. #LinearRegressionOnNonlinearData

I don't have new predictions yet, but here's the current state of affairs.
One of these days, I'll have this sort of thing compared with real polling numbers... or FEC filings. 

Interactive Graph!!!

Okay, so this took a lot of work, but I think I finally got it, folks! You can now make some of your own graphs with the power of the cloud (the Wolfram Cloud, specifically)!

Here's what it looks like: 
You select which type of data you want to look at, potentially filtering by political party, and optionally highlighting a particular candidate then hit submit! Give it a little while to process the data for you and then voilĂ ! You get your very own graph (such as the one earlier in this post), or these:

Absolute overnight change, Republicans only, Donald Trump highlighted
Percentage overnight change, Democrats only, Bernie Sanders highlighted

To create your own graphs, GO HERE https://wolfr.am/7mn~J_kq Enjoy!

No comments:

Post a Comment