Microsoft's ACE Team (what a terrible name!) posted
First Line of Defense for Web Applications - Part 3, an article with suggestions on how to filter malicious content/unexpected input. This is a good thing to do, as you shouldn't rely on IIS6's built in techniques (e.g. ASP.NET's viewstate with a MAC to avoid tampering) - Microsoft frequently state things like: "Do not rely on ASP.NET request validation. Treat it as an extra precautionary measure in addition to your own input validation". But the recommendations in this article aren't particularly good ones. They talk about two techniques: looking for known values in the input (white list) or looking for a known list of values that shouldn't be expected from the user (black list). For example, if you're expecting a name or line in an address, it's unlikely that someone has the name <script or onmouseover, so you could consider using them in a blacklist. But this approach isn't perfect. A better approach, where practical, is to use a whitelist. This could be a list of expected values, such as the numbers 1-31 that are used in a PHP script as part of a date handling application, or it could be a fairly predictable pattern.
These are just a few of the inputs she will need to look out for:
User Input Expected: First Name
Regular Expression: (<|&lt;|%3C)(%20|\\s)*(script|applet|embed|))
The black list strategy is a weak protection mechanism because you cannot brain storm all the bad characters attackers will use for a particular attack. We all know security is an ever changing landscape. Black list comes heavily dependent on attacker’s next moves and therefore has to be continuously updated and changed. As new attack techniques come out, this list becomes outdated and requires constant monitoring.
It probably doesn't help that their example only lists a few of the inputs, this is pretty much saying "here's a really bad implementation of a poor technique". They move on to the whitelist:
The white list strategy compares foreign user input to specific input that will be treated as acceptable. For example:
User Input Expected: First Name
Regular Expression: [a-z A-Z-]
The above is a White list of all known good inputs, e.g Only Caps A to Z and small a- z will be allowed. All other input is discarded as evil.
This is fine, unless your name contains a special character (e.g. Sian, Chloe, but I guess most of them are used to writing their names like that). This regular expression is pretty useless if you have a foreign name, and even worse if you're using a completely different alphabet. To make matters worse, this regular expression can't be used for Surname, at least not if your name is something like O'Neill. It's amazing how many websites use JavaScript to stop users from entering the apostrophe into fields, in a misguided attempt to avoid things like SQL injection. Lastly, I'm pretty sure the regular expression should be [a-zA-Z] if you're only allowing 52 characters. If you'd like to allow a few more (English language) characters in your ASP.NET application, such as the apostrophe, try reading this
far more useful article on MSDN -
How To: Use Regular Expressions to Constrain Input in ASP.NET, which even gives examples.
What's the best approach? There isn't a hard and fast rule, but if you know every possible option, it's not a bad idea to present the user with a drop down list (or series of radio buttons) of those options and pass back a reference (e.g. positive integer value) that's easy to validate (if there are 12 options and someone submits "16" or "-1" or "abc" then reject the input). If you're not expecting HTML to be entered, make sure you HTML encode characters that could be abused (such as < > " '). Be aware of how the input will be returned to the user, if it's always part of the content then you have different things to worry about compared to returning it as a value in a text input field, and be very
very careful if you're returning the information as part of some JavaScript of XML, as they encode things slightly differently.
It's nice to see the ACE Team making an effort and raising awareness, but sometimes you need more than a high level overview. Especially one that doesn't link to articles that cover specific areas in more detail.