Spam
Tuesday 23rd October, 2007 15:27 Comments: 2
Following on from a previous post, why is spam email still so prevalent?
Mohamed D. Burks 2:56 pm + [SPAM: 35.7/5.0] Prepare yourself for your new s'e_xual ...
Mohamed V. Burks 2:56 pm + [SPAM: 35.4/5.0] This offer will make your s'e_xual dr...
Mohamed F. Burks 2:56 pm + [SPAM: 30.9/5.0] Bigger penis won't be on TV but in yo...
Mohamed T. Burks 2:56 pm + [SPAM: 34.6/5.0] Your s'e_xual life will be more than ...
Not only did it all get flagged as spam, but the almost identical sender name makes it easy to spot. I also got two emails a few minutes apart allegedly from Lloyds TSB about my online account, I can't imagine them sending out two so close together, with different subject lines but almost identical bodies (the headers suggest that a polish mailserver [based on surgemail] was being abused, possibly acting as an open relay). On the off chance that email gets past the server's spam filtering, Outlook's filtering tends to catch the rest. Occasionally one or two will get through, typically short ones that are meaningless. I guess spammers think that the odd person will receive just one spam and then go on to buy a product. I don't know how many people still fall for these things, but presumably it's enough to explain the hundreds of emails that try to clog up my inbox. The good news is they appear to be getting desperate, and have started to ditch textual spam (images, PDF, Excel files) and moved onto poor quality audio files. Hopefully it's only a matter of time until people give up on from spam email.
Unfortunately, trying to fight comment spam and contact form spam is a lot more difficult, and is something that I can see increasing over time (to make up for the decline in email spam). Susan Bradley made a post about spam that she was getting, allegedly from people with GMail accounts (that's Google Mail, if you're in the UK like me). From my point of view, and Susan's, it's easy to spot the pattern. But it's a bit more complicated for a computer, and even if they were to start recognising patterns (such as now###@gmail.com), what can you do if the bots/spammers generate random prefixes to the gmail.com/hotmail.co.uk accounts? You can't necessarily block by IP address if too many people use a contact form in a short space of time, because some users might be going through a proxy (adding this sort of detection combined with a "captcha" won't work too well either: aside from the obvious accessibility issues, bots are starting to beat the systems). And even if you come up with something clever, how do you easily add that to existing applications? Or even to future ones? Even if you could add something clever for contact forms, you can't exactly add it to forum posts (some places do try and stop people from posting within a minute or so of the last one, but those sort of restrictions often annoy me) or similar places that frequently accept user input from the same user. And even if you tried to slow someone down, what happens if they start using something else to distribute their actions (e.g. the application uses a session id to track the user, but the flood protection mechanism is based on source IP address)? Or do you end up storing the time of the last action in with the session information and give weightings to each web page that accepts POST data to try and combat spam?
If people thought email spam was tricky to detect and stop, wait until comment spam really picks up.
Mohamed D. Burks 2:56 pm + [SPAM: 35.7/5.0] Prepare yourself for your new s'e_xual ...
Mohamed V. Burks 2:56 pm + [SPAM: 35.4/5.0] This offer will make your s'e_xual dr...
Mohamed F. Burks 2:56 pm + [SPAM: 30.9/5.0] Bigger penis won't be on TV but in yo...
Mohamed T. Burks 2:56 pm + [SPAM: 34.6/5.0] Your s'e_xual life will be more than ...
Not only did it all get flagged as spam, but the almost identical sender name makes it easy to spot. I also got two emails a few minutes apart allegedly from Lloyds TSB about my online account, I can't imagine them sending out two so close together, with different subject lines but almost identical bodies (the headers suggest that a polish mailserver [based on surgemail] was being abused, possibly acting as an open relay). On the off chance that email gets past the server's spam filtering, Outlook's filtering tends to catch the rest. Occasionally one or two will get through, typically short ones that are meaningless. I guess spammers think that the odd person will receive just one spam and then go on to buy a product. I don't know how many people still fall for these things, but presumably it's enough to explain the hundreds of emails that try to clog up my inbox. The good news is they appear to be getting desperate, and have started to ditch textual spam (images, PDF, Excel files) and moved onto poor quality audio files. Hopefully it's only a matter of time until people give up on from spam email.
Unfortunately, trying to fight comment spam and contact form spam is a lot more difficult, and is something that I can see increasing over time (to make up for the decline in email spam). Susan Bradley made a post about spam that she was getting, allegedly from people with GMail accounts (that's Google Mail, if you're in the UK like me). From my point of view, and Susan's, it's easy to spot the pattern. But it's a bit more complicated for a computer, and even if they were to start recognising patterns (such as now###@gmail.com), what can you do if the bots/spammers generate random prefixes to the gmail.com/hotmail.co.uk accounts? You can't necessarily block by IP address if too many people use a contact form in a short space of time, because some users might be going through a proxy (adding this sort of detection combined with a "captcha" won't work too well either: aside from the obvious accessibility issues, bots are starting to beat the systems). And even if you come up with something clever, how do you easily add that to existing applications? Or even to future ones? Even if you could add something clever for contact forms, you can't exactly add it to forum posts (some places do try and stop people from posting within a minute or so of the last one, but those sort of restrictions often annoy me) or similar places that frequently accept user input from the same user. And even if you tried to slow someone down, what happens if they start using something else to distribute their actions (e.g. the application uses a session id to track the user, but the flood protection mechanism is based on source IP address)? Or do you end up storing the time of the last action in with the session information and give weightings to each web page that accepts POST data to try and combat spam?
If people thought email spam was tricky to detect and stop, wait until comment spam really picks up.
Sadie - Wednesday 24th October, 2007 14:28
Presumably, sites that care about the quality of their visible comments are increasingly going to lock down their community to people who are logged in and verified. What's interesting is whether this will mean an increase in cross-site identity systems like OpenID or Passport (or whatever it's called now).
For a long time now, animemusicvideos.org has required users to wait for two weeks before they're allowed to download anything, as an effort to discourage bots.
Our forums at basingstokeanimesociety.com have been saved from spam by accident: I screwed with the login form to make WordPress and PHPBB share a login system, with the fortunate side-effect that spambots no longer know how to interact with it.
For a long time now, animemusicvideos.org has required users to wait for two weeks before they're allowed to download anything, as an effort to discourage bots.
Our forums at basingstokeanimesociety.com have been saved from spam by accident: I screwed with the login form to make WordPress and PHPBB share a login system, with the fortunate side-effect that spambots no longer know how to interact with it.
Cross-site identity schemes probably won't affect it that much. The ability for a bot to create an account that they can use across multiple sites will probably make it easier to post spam, and the only way to combat that is for sites that notice bad behaviour from a user to somehow blacklist that account.
Adding a delay will stop the stupid bots, but there's nothing to stop a botnet creating accounts and then using them two weeks later to post spam. The best way would probably be to force some sort of moderation for new users, but regular users can post immediately, but that requires human interaction and many large sites may not be able to keep on top of content, plus there's the whole issue of moderation slowing everything down.
It's mostly a matter of staying one step ahead. As long as the majority are unable to spam you, that should leave enough time to weed out the blatant stuff that makes it through. Or you can use my approach of only allowing friends to login and leave comments, and manually create the accounts myself.
Adding a delay will stop the stupid bots, but there's nothing to stop a botnet creating accounts and then using them two weeks later to post spam. The best way would probably be to force some sort of moderation for new users, but regular users can post immediately, but that requires human interaction and many large sites may not be able to keep on top of content, plus there's the whole issue of moderation slowing everything down.
It's mostly a matter of staying one step ahead. As long as the majority are unable to spam you, that should leave enough time to weed out the blatant stuff that makes it through. Or you can use my approach of only allowing friends to login and leave comments, and manually create the accounts myself.