The dynamics of spam on a semi-dormant blog

It's time for a serious post. E.g., a careful analysis of patterns of spam attempts on a widely-read but essentially dormant blog.

The blog in question is now entitled "Walt, Even Randomer" and combines four years' of Walt at Random archives with the occasional new post that isn't right for the new home of Walt at Random--e.g., reviews of old movies, ALA schedules, pure copies of posts from other blogs (except for the announcements of new Cites & Insights issues, which do appear on both blogs).

The semi-dormant blog was averaging 3,000 page views per day when the active portion moved and has a Google Page Rank of 5 (sometimes 6, if the wind's blowing in the right direction), so it's a target for spammers, particularly link spammers. It also has Spam Karma 2, so very few spamments get through. (Notes also automatically disable six months after a post appears, since many link spam attempts are on very old posts--and I disabled linkbacks long ago, since the spam-to-signal ratio was just too high.)

The settings for Spam Karma 2 are severe enough that, once in a while, a legitimate comment gets moderated, so I try to check the spams before deleting them or letting them get deleted. (So far, that doesn't seem to be either an issue or a possibility at ScienceBlogs--and one spam comment, a very clever one, did make it through to one post already.)

Anecdata

That's what this is, to be sure--at best, anecdotal data or anecdata*. It has all the scientific rigor of talk radio.

That said, and (as regards the lead sentence) noting that I don't do emoticons, here's a few notes on the varieties of spam encountered over a brief study period.

Very informative!

Complimenting the blogger seems like one common way of ingratiating spam. For example:

I found walt.lishost.org very informative. The article is professionally written and I feel like the author knows the subject well. walt.lishost.org keep it that way.

This might work better if the domain name was the name of the blog, to be sure: "I found Walt at Random very informative" is a tad more convincing. Six payday loan companies offered this sentiment.

I Love the way you write

You can't be too effusive. This comment continues "...thanks for posting." You're certainly welcome, even if you're commenting on entirely bland announcements with no writing style at all.

Twentyfive people love the way I write--and, oddly enough, although each person has a different name and gmail account, there's the same URL for all 25. (In the case of all 25, that URL is on a blacklist. On the other hand, the posts to which these comments were attached would tend to make me wonder just why my style was so admired--and why posts arrived in pairs, but with identical text.)

Other compliments and apparently specific questions

"What is captcha code? pls provide me captcha codes or plugin, thanks in advance." Sorry, yet another payday loans company, but I don't provide that service.

"hey .. way to go with this post .. i'll need more tips tho so [remainder omitted]" Much as I'd love to help out a low-cost loans provider...

"good work, hope you make more related posts! will keep an eye on this blog ;)" Given the nature of the URL provided, you're too busy eying sexcams, I'd think. (Two of these, different gmail accounts.)

"walt.lishost.org - da best. Keep it going! have a nice day" Some Russian company that can post within five seconds of reaching the form.

"I always enjoy finding a 'good' blog. Thanx and I'm going to add you to my RSS feed." Another mystery "Flash Gordon" poster--and it may be worth noting that most of these were on LLN Highlights reposts.

"great stuff thx things make since now hehe good concept" - This one, linking to a supposed boot seller, starts to move over into the dada area...

"This is a fast loading page, do you know who the webhost is and if they are cheap?" Nah, yet another sex seller, the blog's just there--that URL with "lishost.org" in it is meaningless.

"Hi, I love your work." Concise, if from another sex seller (and on an oft-spammed post that should have no comments at all).

The dada element

I think most of the spamments fall into this category--text that's hard to take seriously if you actually read it. Just a few examples, including only the first few words of what are sometimes lengthy (lengthy--typically around 2,400 characers) spamments:

  • "Stone happy rich source chemical formula..."
  • "Within the blew and terbinafine..."
  • "Parry con had agreed free circus..."
  • "Bill heard through this denavir cream..."
  • "Unless they destroying their altace photo..."
  • "They gave horrific implicatio chemi..."

Most of these also seem to link to a single URL or one of several related URLs. I lost count of how many there were--let's just say dozens (scores, probably--more than half of all the spamments).

If someone was willing to accept all these comments, then filter out all the obvious spam words (drug names, etc.), you could make some interesting found poetry from the remnants. I can just see someone with a goatee and a beret, sitting in a smoky Berkeley cellar reciting the results...a few decades ago.

Flat-out spam. Deal with it.

These are the comments that start with a link and are, in essence, nothing but links. Some include long lists of links (43 seems to be typical), some only a few. In a way, they're the most pathetic form--easiest to block and obviously spam. Only about half a dozen of these, once all the dada-found poetry entries are eliminated.

Ah, but there are three variations:

  • "x nude" followed by URLs (where "x" can be some surprising names): Half a dozen.
  • Some nonsense word (eldbberyj, tdofnnkw, kxxlxhud...) followed by URLs: Another half dozen.
  • "x" Sex Tape, or just "x" followed by URLs (where "x" can again be a little odd): Only five of those.

The rest...

What else? There's a long, long story about a kid and his computer; I saw that one three or four times. There's a string of nonsense characters followed by "Comment 1" or "Comment 3" or "Comment 5" or whatever--apparently testing to see whether anything makes it through. (If you're doing blog searches for the result, well, sorry, Charlie, it didn't and won't.)

Serious conclusions

  1. Spam is a damn nuisance. In four short years, a blog with modest readership in a narrow area had more than 31,000 spam attempts...and counting.
  2. Spammers are remarkably amateurish. Even the social-engineering spams were so badly done as to be laughable. If you're going to flatter me about my writing, at least choose a post with some vague evidence that I actually wrote something!
  3. It must work somewhere! If spamments weren't improving link scores and Google page ranks, they would disappear.

Now, back to skimming each day's set and sending them off to perdition...


*Updated 7/2/09: While I don't remember ever hearing "anecdata" before, I had no reason to believe it was original. It isn't, as a belated search shows. Some usages are similar to mine; some, unfortunately, seem to suggest the legitimacy of treating several anecdotes as being data. Sad, that.

More like this