100,000 Posts to MathGroup
June 11, 2009 — Guest Contributor: Steven Christensen, MathGroup
The Mathematica mailing list and internet newsgroup comp.soft-sys.math.mathematica (called MathGroup for short) have been in existence for more than twenty years now. In January 2005, we passed the 50,000-message mark. Now, in only a little more than four years, we have added another 50,000.
I want to take this opportunity to talk about the history of this effort—how it was started and what is involved in its operation. While it may sound like trials and tribulations, it is actually fun, and I have learned a lot about Mathematica and its uses and users, and about servers, the internet, and general social interactions over the years.
In the last part of the 80s, Stephen Wolfram and I both ended up at the University of Illinois in Urbana in buildings next to each other. I was a Senior Research Scientist at NCSA and he was Director of his Complex Systems center. At NCSA, one of my responsibilities was to locate and test scientific software for Sun workstations, which were an important element of our workstation network and also one of the machines Stephen was using to develop what became Mathematica. In supporting workstation software, I created an email list for NCSA’s users where they could ask about what software they might use and get information on what was new.
When Mathematica was released in 1988, the mailing list I started morphed into a small group of people talking about how they might use Mathematica to do their work. As Wolfram Research grew and there were more Mathematica users, the mailing list also grew. I left Illinois in 1990 and brought the mailing list to Chapel Hill, North Carolina. I still had access to the mail server at NCSA, but eventually the cost of dial up just got too expensive and I had to find a new place. At first I did local dial up to UNC, and later Duke, which was a lot cheaper.
In 1994, it was suggested to me that creating a Usenet newsgroup would be the next step. There were enough messages per day that some readers started to complain that they wanted another way to read the posts. At the time—pre-browser, pre-Google—newsgroups were a central aspect of internet communication. I was made aware that some users in other countries had to pay for each email they got, so providing a new place to post was very important to them.
Starting a newsgroup was a tedious process that involved proposals, voting, central authority approvals, and so on. It took about six months of effort to get the newsgroup started. Perhaps the most aggravating part of it was the group name. I wanted something simple like comp.mathematica or sci.mathematica, but the powers that be said that I had to use comp.soft-sys.math.mathematica. This seemed kind of redundant, but rules were rules. The proposal was put out to the mailing list on December 1, 1994 to encourage enough yes votes. It did pass, and the newsgroup started on April 7, 1995 with message number mg660. From then on, the number of posts grew quickly due to the visibility on Usenet and the growing user base.
At this same time, I also started the sunfreeware.com project for Sun Microsystems. I had been very interested in open source software from my NCSA days and during time serving on the Sun Users Group board of directors. Running both projects very quickly overwhelmed my single server and made me less than popular at the organizations that were loaning me Internet connections. My one-person consulting company was forced to buy a fractional T1 line, which at that time cost about $1000 a month or so. It was a struggle. In the next two years, both projects grew quickly. The fractional T1 line became a full T1, then two T1s a year after that, and finally four T1s less than a year later. The phone company technical guy who installed the two final T1s watched the traffic when he turned on the circuits and saw them max out instantly. I am sure he thought I must be running a less-than-savory site by the look he gave me, but I assured him it was just software downloads and mail. Eventually, the ISP that I was using called me and said that they had to drop me from their customer list. My traffic was burdening their local system in Raleigh so much that their other current customers were complaining and they could not add new customers. I was hurting their business and they gave me two months to get off. The local phone company that owned the circuits to my office said they also could not put more bandwidth there. I was forced to move my servers to a colocation site. The rack space I rented had a 10-megabit and later a 20-megabit dedicated circuit at $4000 a month along with one T1 line back to my office and a backup cable-modem circuit. Fortunately, by this time my business had grown and I could (sort of) afford the cost.
I continued to update my Sun servers and development machines. The number of mailing list readers passed 2,000 and the number of posts per day passed 20, and that meant that roughly 40,000 emails had to go out daily. I urged readers to move to the newsgroup as much as possible to avoid delays due to mass mailing, and this did help. But outgoing emails were no longer the only concern. As we all know, spam has grown into a monumental problem. If your email address was on the web anywhere, you got spam very quickly. The amount of spam that came to the newsgroup and mailing list addresses grew from maybe 10 per day to hundreds to now about 2,000 a day, or roughly 20-50 spam messages to each authentic post. Thus, it is not just outgoing mail that my servers have to deal with, but incoming as well.
To confront this problem as it got worse, I wrote my own little spam filters, then when those failed, I used open source systems like spamassassin. But with thousands of messages to examine each day, my mail server slowed down to the point of uselessness. Finally, when Google mail came along, I forwarded all my incoming mail there to use their spam filters, which can be trained to some extent. This had a remarkable effect. The 1% or so of spam that got through the Google filters would come to my Thunderbird mail client, where its filters would clean up almost all of the rest. Periodically, I would run scripts on the spam folders to look for false positives, which, as time went on, were fewer and fewer.
One other communication issue I have to deal with is with ISPs and internet mail hosts. Comcast, AOL, and Yahoo periodically decide that posts from MathGroup are spam and my mail server gets blacklisted. Readers on the mailing list complain when all of a sudden they are not getting group mail. The hard part comes when I get a complaint email and then try to respond that mail. The response gets blocked as well. I have to go to another of my email accounts and send the message from there to say I know about the issue and will try to fix it. Eventually, after dealing with the ISPs and cable companies directly and making no progress, I have given up. Users are now told to complain to their email providers themselves or move to the newsgroup. As I write this, my logs tell me that Yahoo has decided to reject messages. There are lots of other problems, but you get the idea.
Newsgroups also present problems. When I send out the posts to the newsgroup, they have to go to news servers at my ISP and elsewhere. On occasion, these servers fail or get confused and posts don’t get out. It can, and recently did, take several days for my ISP to figure out why my posts are not going out. And, since there is no way for me to post to the newsgroup to tell people the group is not working, I start getting emails wondering about that. I think my ISP may be getting tired of middle-of-the-night tech calls about this. I am probably going to buy access to one of the commercial news server sites to act as my primary or backup to avoid this.
The bandwidth and server parts of the project turned out to be straightforward to solve compared to the actual work of running the group and other projects. As with all newsgroups and mailing lists, a set of moderation rules had to be devised. The controversy about this and the cries of “censorship” in my email box and in other newsgroups got really strident sometimes. At first, I tried to respond in a nice way, but finally decided to just ignore it all. It seemed to me the main issues revolved around two things. First, there were those folks who just loved, or had their own vested interest in, some other computer algebra system. Basically, lots of “my system is better than your system” emails that I refuse to post. Second were those people who just hated me, Mathematica, Wolfram Research, and so forth. The four-letter words directed at me, the flames hot enough to melt steel, and threats to start newsgroups to counter mine where fascinating, but eventually died off pretty much or I just ignored them.
Anyway, getting away from all of this, how does a message that comes to me get processed?
A post comes to me as email no matter how it is posted. Each post goes into a folder I call “new”. (In the last minute, three have shown up.) Once I get maybe twenty in that folder, I sort them by user to look for duplicates. You would be surprised how many times messages get mailed twice or more. I then run a script on the new folder to split them up into numbered messages—mg100000, for example. Next, I run other scripts that clean up mail headers that are not needed. I then run tests on each message looking for common problems that I need to fix. The most important of these are HTML and other attachments. One of the rules of the group is that there are no attachments and no HTML. This is for security reasons and to accommodate older email clients. It is common these days for email clients to think that emails with attachments are spam or contain viruses. I decided to reject all attachments. This requires that I edit out HTML and also contact authors to tell them that they must put their attachments somewhere else to be downloaded. I tried once to reject all emails with HTML attachments, but with thousands of users using every imaginable email client, it was fruitless and time-consuming to reject each post, so I just edit them myself.
I finally read the messages several times for content. Sometimes, I get emails asking for the price or availability of Mathematica, or some other non-technical post. These are answered by me with the appropriate pointers to the web. Other emails are just so simple I answer them myself or insist on more information. Basically, if the question might be remotely interesting to a Mathematica user, I let it through and let the many readers do the “scolding”, redirection, or “hand-holding”. Of course, there are still flames and sometimes personal comments or attacks, and I try to filter these also.
Once posts are acceptable, I run more scripts that prepare the messages for either the mailing list or the newsgroup and send them out. I send every message back to myself several ways just to make sure they are going out and if not, figure where the problem might be. The result is what everyone sees. I make mistakes of course, which I hope are not too horrible.
Needless to say, all of this is not nearly as important ultimately as the actual people who generously give their time and expertise to answer and discuss the posts. There are perhaps thirty or so readers inside and outside of Wolfram Research who, for years now, have been doing a remarkable job of helping both new and expert Mathematica, and now Wolfram|Alpha, users day in and day out. They, along with everyone who contributes, do the real work.
I was honored to be a student and friend of Bryce DeWitt, who some of you know was one of the finest theoretical physicists of the last 100 years. One of the things he told me back in the 70s was, “Make sure all your projects are useful, work that will still be valuable to others twenty years from now.”
It is with this in mind that I have tried to keep the mailing list and newsgroup running and hope to continue to do so. The goal has always been to provide a place where Mathematica users and developers can get help or discuss any technical Mathematica-related topic. I am told that many new ideas on how to improve Mathematica have been found by the software geniuses inside Wolfram Research in the posts. This is great—just what I want to see.
Thanks very much to Stephen Wolfram, everyone at Wolfram Research, and all the readers and contributors for all the support and encouragement to me and to all Mathematica users. I have sincerely appreciated the many email thank-yous I have received since message 100,000.
The current list of rules, information on how to join and use the group, and other information can be found here. Everyone who joins the mailing list gets a copy of the rules. Archives of posts back to 1989 are online too.