The ASP Emporium
Free Active Server Applications and Examples by Bill Gearhart
Online since Friday January 7, 2000

 home > code > tutorials > Email Validation - Explained

enter a phrase to search: (advanced search)


 h o m e 

 w h a t 's  n e w 

 a l l   c o d e 
  .net:
    • Fundamentals
    • C# Classes
  classic asp:
    • Code Library
    • ASP Apps
  general:
    • Tutorials
    • SQL

 d o w n l o a d s 

 u s e r   f o r u m s 

 l i n k s 

 s e a r c h 

 s u p p o r t 


Email Validation - Explained

There exists an inconsistency within the web when it comes to email addresses. The big problem is that there is not a be-all, end-all algorithm for email address validation. This means a million programmers have stepped up to the challenge, myself included, to fill that gap by creating email validators. The problem is, each one is different. Some are too tight, some are too loose, some just dont work at all.

I'm going to solve this issue once and for all by coming up with an email validation minimum requirements list that i think works in almost every situation. It's taken years to compile this list. I'll also make a jscript (classic ASP) class and a C# compiled class for the .NET framework that enforces all rules in the list.

Background

The definitive list of validation criteria for emails came from many areas:

  • the RFC's for SMTP and POP3
  • various SMTP and POP3 servers I've worked with
  • feedback from users of previous versions of email verification code that I wrote
  • other places I cant remember and stuff I've seen since like 1995 or so

The reason I'm making the list is to cut down on the amount of feedback I send to people about email validation and to try to put all the most pertinent information in one place (kind of like what i did for vbscript's GetObject() method).

Email Validation - Minimum Required Functionality

All email validators exist to validate an email address's syntax only.

Ideally, communication with the SMTP server of origin for a given email address (available by doing an MX record check with a DNS server), establishing an SMTP session and then executing the VRFY command should be enough to validate any email address on Earth.

However, not all servers support VRFY. In that case you'd have to call RCPT TO with the email address and accept the email as valid if you got a 250 or 251 return code. But, once again, not all SMTP servers work the same so a server might accept the email in RCPT TO, even if it is an obviously bad email.

Let's not forget that smtp wasn't meant to be a speedy protocol. It's gonna take time and cycles to do communication with an SMTP server. Therefore, it seems that communicating with a SMTP server to validate an email address as a first step is probably not the best way to validate an email address. It's usually a great second step after you've weeded out the crappiest emails. This means that a list of required email parts and syntax as commonly used is necessary to come up with a good email validator.

Email Syntax:
[<]account@domain[>]

The syntax of an email is well known. A leading account is separated from a trailing domain by the @ character. This is not negotiable. That means, the first rule of a valid email address is that there must be an account and a domain separated by the @ sign. The details will be in how we validate the individual account and domain pieces of the email.

  • the email may come into the validator surrounded by < and >. Those characters should be removed before processing the email
  • the remaining email should be trimmed of leading and trailing white space
  • the email address is made up of 2 parts: the account and the domain
  • account and domain are always separated by @. This doesn't necessarily mean that account cannot contain @ though for some mail servers. It just means only process domain after the last @ char.
  • email validators should offer properties that return the account separately from the domain. This helps work with other code that might do an SMTP validation of the email - in that case, having the account and domain separated properly is quite nice.

Account

According to the SMTP and POP3 protocols, the account can contain anything. That means any character in the ASCII encoding and may be case sensitive. This means, even the @ char can appear one or more times in the account part of the email. This means when separating the account@domain part of an email, the separating @ is always the last @ in the string or the first @ from the right - depending on how you look at it.

This also means that no email validator should validate the account part of the email, except to ensure that it is not zero-length. The account must contain at least one character. Here are the rules for the account part of an email:

  • The account should not be validated. It can contain any character and may or may not be case sensitive.
  • The account can contain the @ character one or more times. The account part of an email effectively ends at the last @ character in the email string (the last @ is NOT part of the account).
  • The account part of the email must be at least one character in length.

Domain

Next is the domain. The domain is where we will do all the validation work for an email address. The domain is made up of 1 or more sub domains. The sub domains are separated by the . character. Every sub domain must have at least one character. The last sub domain may or may not be a TLD, which I refer to as the domain extension. The allowable characters for the domain are a-z, 0-9 and -. The domain is not case sensitive. It looks like this:

Domain Syntax:
subdomain[.subdomainN[...]][.extension]

The rules for the domain part of an email are as follows:

  • The domain is not case sensitive
  • The domain is split into one or more sub domains
  • Sub domains are always separated by a . character
  • A sub domain cannot contain the . character. The valid characters for a sub domain are a-z, 0-9, -
  • Each sub domain must contain at least one character
  • The last sub domain may be a TLD
  • If the last sub domain is a TLD, it can be validated against a list of all known TLDs
  • All email validators must provide for validation with a TLD required and with a TLD optional. For example: me@localhost is a valid email if the TLD is not required. Having both options provides better flexibility and allows your email code to be used in an internal network scenario as well as on the internet.

Final Email Syntax To Be Enforced By Validators

square brackets indicate optional email components

[<]email[>]

	email     = account@domain
	account   = 1 or more chars containing: ASCII 0-255
	domain    = subdomain[.subdomainN[...]][.extension] 
	subdomain = 1 or more chars containing: a-z0-9-
	extension = 2 to 6 chars containing: a-z

Valid Email Examples

The following email addresses should be considered valid

me.@localhost
my@email@account@domain.com
*#@oers
my.name.is.mud@durtie.com
1time@0
tester@the.domain.has.many.subs

Invalid Email Examples

The following email addresses should be considered invalid

me@.localhost
my@email@account@domain..com
*#@oers.
durtie.com
@0
tester@the.%.domain.has.many.subs

Email Validation Software and Source Code

C#/.NET Framework Class
This is a C# implementation of the rules above and is meant for use with the .NET framework. It can be used in any framework compatible language.

JScript/ASP 2.0/3.0/Windows Scripting Host Class
This is a JScript implementation of the rules above meant for classic ASP (2.0/3.0) or Windows Scripting Host. It can be used with either VBScript, PerlScript or JScript.