Twittering with Mathematica
The popularity of Twitter has really exploded in the past few months. The service poses a simple question: “What are you doing?” Users respond in 140 characters or less. The 140-character limit comes from the 160-character limit of SMS messages, minus a few characters for things like the user’s screen name. Twitter could probably best be described as “micro-blogging.” It’s kind of a cross between blogging and instant messaging.
People use Twitter for all kinds of reasons, everything from staying in touch with friends to receiving announcements and support from companies with a presence on Twitter. Here are a few examples:
"Can't wait to see Nigel, Laura and Ben later."
"Who's up for cutting class today and hitting 18 holes? and the 19th too."
"Research Project Manager,Seattle, WA, United States: Penn, Schoen and Berland Associates is currently seeking qu.. http://tinyurl.com/cyj7pd"
That last one trails off a bit due to the 140-character limit.
Status messages (“tweets”) can be posted from the web, a mobile phone, or any number of desktop or mobile applications built specifically to interact with Twitter via its API. Twitter’s REST API is particularly interesting because it allows many operations to be performed using simple HTTP queries that return XML documents. Recall that Mathematica can do HTTP (via Import or J/Link) and can also import XML. You can probably guess where I’m going with this: Twittering with Mathematica.
We’ll begin with a simple example that doesn’t require any authentication, the public timeline. This URL returns an XML document containing a list of 20 of the most recent posts to Twitter by all users.
Here are the users who made the posts in that document:
We can also get a list of status messages from (and replies to) a specific user—in this case WolframResearch.
The textual contents of the tweets are found in the “text” XML element.
Not all of the data on Twitter is publicly available. Certain things require authentication for a particular user account. Import handles these cases as well. It will prompt the user for login credentials if necessary. Here we will log in as WolframResearch.
Let’s say we want to retrieve a list of users who are following us.
Most Twitter APIs return a fixed number of results. For status messages it’s usually 20 results at a time. For user lists like this it’s usually 100 results at a time. This returned a full 100 results, so let’s get more results from page 2.
Keep going…
Page 5 returned 0 results, so we’re done.
Now we get to the fun part: setting your status programmatically from Mathematica. The tricky thing about this is that it requires the HTTP POST method. So far we’ve been using the HTTP GET method, and Import always uses the GET method.
No matter. J/Link provides access to all of Java from within Mathematica, and we can make use of Java to perform the HTTP POST necessary to set our Twitter status.
The first step is to initialize J/Link and create an HttpClient object.
Next we create a credentials object with a user name and password.
Set the authorization scope to twitter.com port 443. We could just as easily use port 80 with HTTP, but it’s better to send the password over the network encrypted, so we’ll use port 443 with HTTPS instead.
Next we encode the tweet into percent-escaped UTF-8 bytes.
Create the HTTP POST method.
Finally, execute the HTTP POST method.
A result of 200 means the post was successful. Let’s get the resulting data and import it as XML.
The other Twitter API function that requires the HTTP POST method is the one that deletes a status message. Let’s create a new HTTP POST method with the URL to destroy a tweet. Then we’ll execute it using the same client as before, as this client has already been authenticated.
Success.
Now that we’ve seen the guts of the code necessary to interact with Twitter, it would be helpful to simplify things with some reusable functions that encapsulate this code. I have written a Mathematica package (Twitter.m—click to download at the end of this post) which does exactly that. It incorporates all of the functionality we’ve seen so far and more. Let’s try a few things with it.
The first thing we’ll do is create a session to hold our login credentials. The user name and password can be passed explicitly to TwitterSessionOpen; if they are not, the function will prompt for them with a password dialog.
We store the result in a variable called session. Most of the Twitter package functions require a session value to be passed as the first argument. It is possible to open multiple sessions simultaneously.
Pull out the user associated with this session.
Display a user interface element depicting the user, with image, text, hyperlink, and tooltip.
Find out information about the user.
Next, get a list of the user’s tweets.
Find out information about the tweets.
Display a user interface element for the status message.
Let’s do a quick analysis of our friends and followers. We’ll take a look at the Union, Complement, and so on of our lists of friends and followers. We’ll have to use the unique ID values instead of the actual TwitterUser object wrappers, because the wrappers are not necessarily unique.
So we can tell that 48% of our friends also follow us, while 50% of our followers are also our friends.
Next, let’s find out which group of users associated with us tweets the most. In this case we’ll need to map the unique ID values back to the wrapper objects.
Comparing all of our friends and followers shows us that the chattiest group (highest average number of tweets per user) are those which are our friends-but-not-followers. The next-chattiest group are those who are both-friends-and-followers. The least-chatty group are those who are only our followers-but-not-friends.
Let’s combine the friends and followers into a single list of all the users associated with us.
So it looks like just a few of our friends and followers are responsible for the vast majority of the whole groups’ tweets.
Now let’s get the dates all these users joined Twitter and group them together by month.
Most of our friends and followers appear to have joined Twitter recently, but there are a few old-timers in that list.
Many of the features available in the Twitter API are accessible through this Twitter package. Feel free to explore them all.
Now that we’re done we can close the session. This step is optional.
The ability to analyze Twitter data and to tweet programmatically opens the door to many interesting possibilities. Imagine running a long computation that notifies you via Twitter when it finishes. Imagine finding new friends by programmatically determining friends of friends or friends of friends of friends. Want more followers? Perhaps you could analyze the Twittering habits of popular users to see what might make them so popular. The possibilities are endless, as they usually are with data explorations in Mathematica.