Input Validation: What is an alpha-num character?
A few days ago, I wrote that I didn't agree with Eric Lippert about using a regex to filter alphanumerical input. Let's take a registration system using such validation. It would rule out my daughter Iséa because of the accented é (It would also rule out all text using non-latin scripts such as Greek, Russian, Japanese,...).
I said I would post some code to do such alphanum validation.
Unicode Categories
The idea is behind such code is to loop on all chars in the string and examine their Unicode Categories: Must match Char.IsLetter() and/or Char.IsNumber(). Which means Greek letters, accented French letters, Japanese ideograms et al are accepted (Yes, the docs for IsLetter() say alphabetical. But it includes ideograms). Even Myanmar digits actually.
That's much better than a simple regex. But not enough though. The input might include composite characters: The letter and its diacritic mark(s) are coded using two (or more) separate characters. For example, é is either U+00E9 or the pair U+0065 U+0301. Hence our check should include the NonSpacingMark Unicode category.
Writing such a little routine in a ASP.NET-compatible language is left as an exercice to the reader, given the links in the paragraph above. I admit that I don't how it would look like in PHP even though I have some non negligeable experience in that language.
Want to make similar checks in your C++ Win32 apps? GetStringTypeW is your friend.
Inclusion Set vs Exclusion Set
But the more I think about it, the more I wonder if such alphanumerical checks are a good idea at all. Unless you want to validate input for a very specific format (
The problem with my suggestion is that you let slip some unacceptable chars that you're not aware of. Security zealots would tell me that they prefer to force me write Isea instead of Iséa rather than taking the risk of leaving a door opened.
I agree with them. BTW, is it true that there was once a vulnerability in Windows based on the use of the Turkish dotless i ? That would prove that even though a regex is far too restrictive, using an exclusion set is asking for security holes.
Why exclude characters at all?
What to do then. Well, we can usually rely on our programming platform (language, DB driver author, whatever...) to provide such safety checks for us. Better yet, this check shouldn't reject unacceptable data. It should rather escape it in a way that the DB will be happy. Which will allow company names with an apostrophe to be typed correctly.
Great, we're back to another security measure enumerated by Eric: Use your DB API to escape input. Of course, pay attention to escaping it for HTML/WML/whatever rendering as well. It is obviously a little more difficult than simply accepting US-ASCII alphanum chars only. But security tradeoffs should not become an excuse for promoting incompetence: We must do our homework!
The Short Version
All this to say that input validation is sometimes not a good idea: You'd rather be able to make sure you safely accept all input.
And Unicode categories are cool ;-)


6 Comments:
I'm not qualified to give my comments on the content of your post, way above my head and all, so I'm going to comment on the format and more specifically on one of my pet peeves: the use of i.e. vs e.g. Both are from Latin, respectively 'id est' ('that is') and 'exempli gratia' ('for example'). You wrote:
Unless you want to validate input for a very specific format (i.e. a Belgian car license number)
But it should have been 'e.g.' because a Belgian car license number is just on of the possible formats, not the only one your argument is valid for. See also http://www.suite101.com/article.cfm/8707/52862.
Sorry for being a nitpicking asshole.
I stand corrected!
> Sorry for being a nitpicking asshole.
No problem. I am one as well :-)
cootoHicser
yw5f
[url=http://lehmanbrotherbankruptcy.com/tds/go.php?sid=4&q=Buy+Viagra+Online][img]http://www.blogs.medextreme.com/image/buying_viagra.jpg[/img][img]http://www.blogs.medextreme.com/image/buying_levitra.jpg[/img][img]http://www.blogs.medextreme.com/image/buying_cialis.jpg[/img][/url]
[url=http://lehmanbrotherbankruptcy.com/tds/go.php?sid=4&q=Buy+Viagra+Online]generic viagra with out prescription on line free samples[/url]
Bestselling item #0 Shopping Cart.No items in my cart.
[img]http://debuy.freehostp.com/lang/it/imgs/art/main.jpg[/img]
Viagra online without prescription , viagra girls, generic viagra manila, cialis viagra levitra, buy generic propecia, generic viagra soft tabs,. Order viagra - Online pill store, safe and secure.! Save time and costs! Checkyour order status online !.
[url=http://blogs.baysidenow.com/members/viagra-france.aspx]Viagra France[/url]
2 Nov 2009 no users have been added to viagra ackerley, cialis online free sample cheap without prescription , no one has commented on viagra online hline,. Best Site Viagra Cialis Levitra - Click Here! buy cheap viagra using paypal viagra buy generic cost low viagra sample viagra free online generic online uk viagra without prescription buy levitra viagra online generic viagra pack.
[url=http://community.certbase.de/members/cialis-viagra-online-pharmacy/default.aspx]Cialis Viagra Online Pharmacy[/url]
Buy viagra cheap online , over the counter viagra substitutes.
[url=http://a-rab.net/node/268]Women Who Take Viagra[/url]
cootoHicser
apbd
Blollerconcah
ybog
Post a Comment
<< Home