tagsToLowerCase()

Mike Davidson came to me yesterday with a request for a JavaScript function. Given an HTML string with uppercase tags and attributes and mixed case attribute values and tag contents, return that string with lowercased tags and attributes but leave the attribute values and tag contents alone. (Certain browsers return the innerHTML of elements in this malformed way—which can be a problem if the source is headed to a textarea and ultimately saved to a database.)

He did some serious googling to no avail before calling in The Wolf. Satisfied with the result, he suggested I post my solution.

function tagsToLowerCase(html)
{
    html = html.replace(/([a-z])s*(=)s*("|')/gi, '$1$2$3');
    if (parts = html.match(/(</?[a-z][a-z0-9]*| [a-z]+=)/gi))
    {
        for (var i = 0; i < parts.length; i++)
        {
            var part = parts[i];
            html = html.replace(new RegExp(part, 'g'), part.toLowerCase());
        };
    };
    return html;
}

First thing the function does is cinch up any space around attribute equals signs. It then matches the beginning of opening/closing tags and attribute names. It then loops through the matches replacing each in the original html with the lowercased equivalent. The resulting function is quote agnostic.

Previous
God hearts sIFR
Next
A Precedent of Piracy
Author
Shaun Inman
Posted
February 5th, 2006 at 9:33 am
Categories
JavaScript
Comments
019 (Now closed)

019 Comments

001

Forgive my ignorance, but when would this be useful? When working on other people’s bad code?

Author
oliver taylor
Posted
Feb 5th, 2006 6:13 am
002

It would be useful when working with an inline HTML editor that receives its content from an existing element’s innerHTML property. In certain browsers, Safari for example, even if the physical source code is properly formed using lowercase tags and attribute names accessing the innerHTML property of any object will return malformed code with uppercase tags and attribute names.

Author
Shaun Inman
Posted
Feb 5th, 2006 6:17 am
003

By the way, it should be known that I keep my Wolf function requests to a minimum these days. For any youngsters in the audience, Googling is a much better way to generally solve problems like these. Maybe my Googling skills are in decline, but I looked for more than an hour and found nothing. I even tried The Dutch Wolf first but he was unsuccessful (something about being drunk and not coherent enough to grapple regexes at the time).

Author
Mike D.
Posted
Feb 5th, 2006 7:49 am
004

The final loop can be reduced/simplified to: return html.replace(/(</?[a-z][a-z0-9]*| [a-z]+=)/gi, function($0, $1) { return $1.toLowerCase(); })

Author
Björn Graf
Posted
Feb 5th, 2006 8:12 am
005

Not if you want it to work in Safari (which is really what this is all about). Safari replaces the matches with the string value of your anonymous function. Not very helpful.

Author
Shaun Inman
Posted
Feb 5th, 2006 8:16 am
006

Hmm.. I was working around the exact same problem (in Safari) and used a php function instead. But I took a completely different route—walking through the tag character by character, keeping track of whether I was in an attribute or not. I also applied

The performance improvement in mine is probably negligible with modern computers. Yours is certainly prettier.

Now I have to think about where that action makes more sense, in the browser or on the server.

Author
Chris Renner
Posted
Feb 5th, 2006 9:05 am
007

Oh come on, I wasn’t drunk. Just very tired, and with somewhat higher levels of caffeine and alcohol running through my veins. And I had a good excuse.

Author
Mark Wubben
Posted
Feb 5th, 2006 10:16 am
008

I remember doing something really similar for the purposes of getting the innerHTML to look right. Ultimately, I ended up addapting Wubben’s “importNode” function from his site (takes a reference to an html node):

function cleanedInnerHTML(oNode) {
  var oFront = "";
  var oTail = "";

  if(oNode.nodeType == 1) {
    // tag
    var tagname = "<"+oNode.nodeName.toLoweCase();
    for(var i = 0; i < oNode.attributes.length; i++){
      tagname += " "+oNode.attributes[i].name.toLowerCase();
      tagname += "=""+oNode.attributes[i].value+""";
    }
    tagname += ">";

    // tag on end of front
    oFront = oFront + tagname;

    // close on end
    oTail = "</"+oNode.nodeName.toLowerCase+">"+oTail;
  } else if(oNode.nodeType == 3) {
    // text adds to front
    if (!oNode.nodeValue.match(/^s+$/)) {
      oFront = oFront + oNode.nodeValue;
    }
  }

  if(oNode.hasChildNodes()){
    for(var oChild = oNode.firstChild; oChild; oChild = oChild.nextSibling){
      oFront = oFront + this.generateNodeCode(oChild);
    }
  }

  return oFront+oTail;
};

The results of which were then kicked over into flashvars. It’s purpose was to pass an entire UL navigation structure into flash as proper XML (of which flash could easily deconstruct and build it’s menu dynamically).

Author
Jakob Heuser
Posted
Feb 5th, 2006 11:25 am
009

Well, my anonymous function works in Safari 2.0.3 and I do not have access to 1.x :]

Author
Björn Graf
Posted
Feb 5th, 2006 8:32 pm
010

I don’t know about 1.x but it doesn’t work in 2.0.2 even.

Author
Shaun Inman
Posted
Feb 6th, 2006 3:36 am
011

It would seem more practical to do this on the server rather than with JavaScript… but hey, if you say it’s useful…arighty then.

Author
Dustin Diaz
Posted
Feb 6th, 2006 9:36 am
012

Thanks, thats a handy function to have.

Author
Rob Botlon
Posted
Feb 6th, 2006 9:42 am
013

First thing the function does is cinch up any space around attribute equal signs.

Actually, the first line of this routine cinches up the space around any equals sign followed by a double or single quote, not just those within attributes. So if run this against a code example like:

$foo = "string value";

it’ll cinch the space around that equals sign, too.

Author
John Gruber
Posted
Feb 6th, 2006 5:20 pm
014

Well, Reg Ex can be expensive, and innerHTML is evil. So, how about a pure DOM solution:

//    get the source of a node
function getNodeSource(node) {
    var source = "";
    //    normalize tag case
    var tag = node.nodeName.toLowerCase();
    //    attributes for tag, normalizing attribute name case
    var att_str = "";
    if (node.attributes.length > 0) {
        for (var j = 0; j < node.attributes.length; j++) {
            att_str += ' ' + node.attributes[j].name.toLowerCase() + '="' + node.attributes[j].value + '"';
        }
    }
    //    source for child nodes ("innerHTML")
    if (node.childNodes.length > 0) {
        for (var i = 0; i < node.childNodes.length; i++) {
            //    element nodes
            if (node.childNodes[i].nodeType == 1) {
                source += getNodeSource(node.childNodes[i]);            //    DEBUG-RECURSION
            }
            //    text nodes
            else if (node.childNodes[i].nodeType == 3) {
                source += node.childNodes[i].nodeValue;
            }
            //    comment nodes; broken in Safari... sigh
            else if (node.childNodes[i].nodeType == 8) {
                source += '<!-- ' + node.childNodes[i].nodeValue + '-->';
            }
        }
    }

    return '<' + tag + att_str + '>' + source + '<' + '/' + tag + '>'; 
}

Seems a bit snappier than your original for large nodes. Only tested on Firefox (fine) and Safari because that’s what I have handy. Unfortunately, Safari doesn’t understand comment nodes (node.nodeType == 8 or node.nodeName == “#comment”) so on Safari this will strip comments, but who comments their markup ;)

Author
Jeff Zerger
Posted
Feb 7th, 2006 9:59 pm
015

Whoops.. just looked at Jakob’s comment, which is doing pretty much the same thing as mine. Sorry ‘bout that!

Author
Jeff Zerger
Posted
Feb 7th, 2006 10:05 pm
016

I wrote something slightly related in HTML to XHTML.

Author
Robert Nyman
Posted
Feb 8th, 2006 5:09 am
017

While that is a great function to use for the task, I agree with Jakob and Jeff. It’s just another reason why innerHTML should not be used. Sure it’s easy and convenient, but isn’t it lazy? And how long will it continue to be around? Mike, is this just another one of those things that you’ll change when it stops working and not do it right the first time around? And please don’t take this as rude, I’m genuinely curious as to why you’re using innerHTML.

Author
Zack Gilbert
Posted
Feb 13th, 2006 1:24 pm
018

Zack:

Mike, is this just another one of those things that you’ll change when it stops working and not do it right the first time around?

Ummm, I’m sorry, do I have a reputation for not doing things right that I don’t know about? Or are you just speaking in general terms? I don’t think I’ve ever coded anything that has “stopped working” because of browser evolution.

Whether innerHTML or DOM doesn’t matter so much to me, really. InnerHTML is not going away any time soon… like any time in the next five years at least. It may be “going away” from an “uncool to use” standpoint, but not from an “actually works” standpoint. Part of the reason I needed this code to begin with was to deal with the inadequacies a spellchecking component we just launched. I guarantee you that the spellchecker will be updated/replaced long before innerHTML stops being a viable fix. That said, I do recognize the long-term benefits of DOM methods vs. RegEx methods.

Author
Mike D.
Posted
Feb 14th, 2006 7:31 am
019

Mike,

Cool. That answers it. Thanks.

And what I was referring to was from your article, March to Your Own Standard where you said:

If I only had a few small errors on a few random pages around my site, I could easily miss the day when “the big switchover” happens and wind up with broken pages I don’t know about. And since this code is in the form of a server-side include, I can freely remove it with a few clicks.

And no, I didn’t mean that you weren’t doing things “right”. I was just wondering if using innerHTML was another one of those tests (referred to in your article). You do amazing work (which is why I read your blog regularly) and I am just curious as to your reasoning behind it.

Author
Zack Gilbert
Posted
Feb 14th, 2006 1:17 pm