Validating Domain Names

Cal has validating email addresses down pat but I can never seem to find a respectable regular expression for validating the format of domain names (useful when you license software by domain). So using the list of generic and country-specific top level domains (combined list) provided by the Internet Assigned Numbers Authority I put this together:

/^([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$/i

It checks against all known tlds and ensures that the domain name begins and ends with an alphanumeric character, allowing for dashes and sub-domains. Single character domains are allowed (did you know PayPal owns x.com?) but maximum length restrictions are not enforced. Additional logic is required to prevent something like co.uk passing as a valid domain name or to confirm that the domain is actually hosted somewhere—but I’ll leave that as an exercise for the reader.

Previous
Clip-n-Seal has a new site
Next
Ghost Town
Author
Shaun Inman
Posted
May 8th, 2006 at 7:42 am
Categories
PHP
Comments
011 (Now closed)

011 Comments

001

Wow - thats incredible and pretty robust. The toughest part of that is the second and matching all possible domain names - you seemed to cover this very well!

Nice job!

Author
Nate K
Posted
May 8th, 2006 5:20 am
002

I’ve had to retort to a very simple regex which just checks for an @ sign, something before it, at least one dot after it and at least 2 letters in the last suffix.

Following that, I actually do a DNS lookup behind the scenes to check that the domain is registered, and following that, I use a funky little networking hack to check for an MX record in the domain’s DNS records - if it has an MX record, I allow it to pass.

There are ways of going even further and making sure that, when you talk to the recipient’s SMTP server, that it responds accordingly, but that’s one step too far for me right now.

Author
a Bit Gone
Posted
May 8th, 2006 5:20 am
003

What about internationalized domain names that use Unicode? You can still write these domains as Punycode using the xn-- prefix, but that’s not very elegant.

Author
Michel Fortin
Posted
May 8th, 2006 5:50 am
004

if(checkdnsrr('domain.com', 'A') ) return true; ;)

Can easily be extended to only check for nameservers and such … see the PHP Online Documentation for more info.

Note that this does not work in Windows/BSD systems, but the PEAR Net_DNS package can provide this for those platforms.

wbr, B!

Author
Bramus!
Posted
May 8th, 2006 6:18 am
005

Stepping around the validation issue for a minute, if a customer doesn’t want to give you a valid e-mail/domain is that someone you really want to be doing business with? All the dns lookups or regular expressions aren’t going to save the customer from himself if he’s just stubborn from the start and puts in a “valid” but fake e-mail.

Author
Dave
Posted
May 8th, 2006 6:40 am
006

That’s a good question Michel, I’d have to look into how to define allowable Unicode characters with PCRE.

Thanks for the tip Bramus. I prefer to stick with functions that are native to PHP 4 for the sake of backwards compatibility.

I think a few people are missing that this RegExp validates the format of a domain name, not an email address.

Author
Shaun Inman
Posted
May 8th, 2006 6:47 am
007

Cor blimey Shaun, that’s one helluva RegEx, deffinatly bookmarked.

Author
Sam Kellett
Posted
May 8th, 2006 8:00 am
008

Haha. I needed this like, now! So funny. Thanks Shaun!

Author
Michael Simmons
Posted
May 8th, 2006 8:06 am
009

so I guess with that regex you wouldn’t allow someone to run mint on a website by IP address only (ie: without a domain name)?

Author
Jehiah
Posted
May 8th, 2006 8:27 am
010

Correct, you can’t purchase a Mint license for an IP address and as a result you can’t install Mint using just an address. But once installed, Mint will run on a site served from an IP address.

But that’s really moot since this regular expression validates the format of a domain name. An IP address is not a domain name. It’s an IP address.

Author
Shaun Inman
Posted
May 8th, 2006 10:30 am
011

fair enough; ip addresses are not the same as a domain name.

I just like to think of them as valid substitutes (though it’s really the other way around).

Author
Jehiah
Posted
May 8th, 2006 1:33 pm