A Year Ago Today
On September 5 of last year, The New York Times took the unusual step of publishing an op-ed anonymously. It began “I Am Part of the Resistance inside the Trump Administration,” and quickly became known as the “Resistance” op-ed. From the start, there was wide‐ranging speculation as to who might have been the author(s); to this day, that has not been settled. (Spoiler alert: it will not be settled in this blog post, either. But that’s getting ahead of things.) When I learned of this op-ed, the first thing that came to mind, of course, was, “I wonder if authorship attribution software could....” This was followed by, “Well, of course it could. If given the right training data.” When time permitted, I had a look on the internet into where one might find training data, and for that matter who were the people to consider for the pool of candidate authors. I found at least a couple of blog posts that mentioned the possibility of using tweets from administration officials. One gave a preliminary analysis (with President Trump himself receiving the highest score, though by a narrow margin—go figure). It even provided a means of downloading a dataset that the poster had gone to some work to cull from the Twitter site.
The code from that blog was in a language/script in which I am not fluent. My coauthor on two authorship attribution papers (and other work), Catalin Stoean, was able to download the data successfully. I first did some quick validation (to be seen) and got solid results. Upon setting the software loose on the op-ed in question, a clear winner emerged. So for a short time I “knew” who wrote that piece. Except. I decided more serious testing was required.