Validating Domain Names
Cal has validating email addresses down pat but I can never seem to find a respectable regular expression for validating the format of domain names (useful when you license software by domain). So using the list of generic and country-specific top level domains (combined list) provided by the Internet Assigned Numbers Authority I put this together:
/^([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$/i
It checks against all known tlds and ensures that the domain name begins and ends with an alphanumeric character, allowing for dashes and sub-domains. Single character domains are allowed (did you know PayPal owns x.com?) but maximum length restrictions are not enforced. Additional logic is required to prevent something like co.uk passing as a valid domain name or to confirm that the domain is actually hosted somewhere—but I’ll leave that as an exercise for the reader.
011 Comments
Wow - thats incredible and pretty robust. The toughest part of that is the second and matching all possible domain names - you seemed to cover this very well!
Nice job!
I’ve had to retort to a very simple regex which just checks for an @ sign, something before it, at least one dot after it and at least 2 letters in the last suffix.
Following that, I actually do a DNS lookup behind the scenes to check that the domain is registered, and following that, I use a funky little networking hack to check for an MX record in the domain’s DNS records - if it has an MX record, I allow it to pass.
There are ways of going even further and making sure that, when you talk to the recipient’s SMTP server, that it responds accordingly, but that’s one step too far for me right now.
What about internationalized domain names that use Unicode? You can still write these domains as Punycode using the
xn--
prefix, but that’s not very elegant.if(checkdnsrr('domain.com', 'A') ) return true;
;)Can easily be extended to only check for nameservers and such … see the PHP Online Documentation for more info.
Note that this does not work in Windows/BSD systems, but the PEAR Net_DNS package can provide this for those platforms.
wbr, B!
Stepping around the validation issue for a minute, if a customer doesn’t want to give you a valid e-mail/domain is that someone you really want to be doing business with? All the dns lookups or regular expressions aren’t going to save the customer from himself if he’s just stubborn from the start and puts in a “valid” but fake e-mail.
That’s a good question Michel, I’d have to look into how to define allowable Unicode characters with PCRE.
Thanks for the tip Bramus. I prefer to stick with functions that are native to PHP 4 for the sake of backwards compatibility.
I think a few people are missing that this RegExp validates the format of a domain name, not an email address.
Cor blimey Shaun, that’s one helluva RegEx, deffinatly bookmarked.
Haha. I needed this like, now! So funny. Thanks Shaun!
so I guess with that regex you wouldn’t allow someone to run mint on a website by IP address only (ie: without a domain name)?
Correct, you can’t purchase a Mint license for an IP address and as a result you can’t install Mint using just an address. But once installed, Mint will run on a site served from an IP address.
But that’s really moot since this regular expression validates the format of a domain name. An IP address is not a domain name. It’s an IP address.
fair enough; ip addresses are not the same as a domain name.
I just like to think of them as valid substitutes (though it’s really the other way around).