|
|
|
|
|
home > code > tutorials > Email Validation - Explained |
|
Email Validation - ExplainedThere exists an inconsistency within the web when it comes to email addresses. The big problem is that there is not a be-all, end-all algorithm for email address validation. This means a million programmers have stepped up to the challenge, myself included, to fill that gap by creating email validators. The problem is, each one is different. Some are too tight, some are too loose, some just dont work at all. I'm going to solve this issue once and for all by coming up with an email validation minimum requirements list that i think works in almost every situation. It's taken years to compile this list. I'll also make a jscript (classic ASP) class and a C# compiled class for the .NET framework that enforces all rules in the list. BackgroundThe definitive list of validation criteria for emails came from many areas:
The reason I'm making the list is to cut down on the amount of feedback I send to people about email validation and to try to put all the most pertinent information in one place (kind of like what i did for vbscript's GetObject() method). Email Validation - Minimum Required FunctionalityAll email validators exist to validate an email address's syntax only. Ideally, communication with the SMTP server of origin for a given email address (available by doing an MX record check with a DNS server), establishing an SMTP session and then executing the VRFY command should be enough to validate any email address on Earth. However, not all servers support VRFY. In that case you'd have to call RCPT TO with the email address and accept the email as valid if you got a 250 or 251 return code. But, once again, not all SMTP servers work the same so a server might accept the email in RCPT TO, even if it is an obviously bad email. Let's not forget that smtp wasn't meant to be a speedy protocol. It's gonna take time and cycles to do communication with an SMTP server. Therefore, it seems that communicating with a SMTP server to validate an email address as a first step is probably not the best way to validate an email address. It's usually a great second step after you've weeded out the crappiest emails. This means that a list of required email parts and syntax as commonly used is necessary to come up with a good email validator. Email Syntax:[<]account@domain[>]
The syntax of an email is well known. A leading account is separated from a trailing domain by the @ character. This is not negotiable. That means, the first rule of a valid email address is that there must be an account and a domain separated by the @ sign. The details will be in how we validate the individual account and domain pieces of the email.
According to the SMTP and POP3 protocols, the account can contain anything. That means any character in the ASCII encoding and may be case sensitive. This means, even the @ char can appear one or more times in the account part of the email. This means when separating the account@domain part of an email, the separating @ is always the last @ in the string or the first @ from the right - depending on how you look at it. This also means that no email validator should validate the account part of the email, except to ensure that it is not zero-length. The account must contain at least one character. Here are the rules for the account part of an email:
Next is the domain. The domain is where we will do all the validation work for an email address. The domain is made up of 1 or more sub domains. The sub domains are separated by the . character. Every sub domain must have at least one character. The last sub domain may or may not be a TLD, which I refer to as the domain extension. The allowable characters for the domain are a-z, 0-9 and -. The domain is not case sensitive. It looks like this: Domain Syntax:subdomain[.subdomainN[...]][.extension]
The rules for the domain part of an email are as follows:
Final Email Syntax To Be Enforced By Validatorssquare brackets indicate optional email components [<]email[>] email = account@domain account = 1 or more chars containing: ASCII 0-255 domain = subdomain[.subdomainN[...]][.extension] subdomain = 1 or more chars containing: a-z0-9- extension = 2 to 6 chars containing: a-z Valid Email ExamplesThe following email addresses should be considered valid me.@localhost my@email@account@domain.com *#@oers my.name.is.mud@durtie.com 1time@0 tester@the.domain.has.many.subs Invalid Email ExamplesThe following email addresses should be considered invalid me@.localhost my@email@account@domain..com *#@oers. durtie.com @0 tester@the.%.domain.has.many.subs Email Validation Software and Source Code
C#/.NET Framework Class
JScript/ASP 2.0/3.0/Windows Scripting Host Class |