Bill Brown bio photo

Bill Brown

A complicated man.

Twitter Github

I love regular expressions. Okay, I love the challenge of crafting regular expressions. I do not enjoy reading regular expressions that I have not created or, really, even the ones I do create. But give me a problem and tell me to make a regular expression to match things and I am all over it.

A co-worker wanted a regular expression to turn unlinked URLs in text into HTML links and to correct linked URLs that lacked a protocol into valid URLs. In other words, if "" appeared in some text, it needed to be replaced with <a href=""></a> and <a href="">some link text<a> needed to turn into <a href="">some link text<a>

My first pass was a monster regular expression that handled both situations but I couldn't get the replacement string to account for the fact that there was already link text in the invalid URL example. And I couldn't adequately cover the situation where there were attributes before the href attribute. So scrap that one.

This is what I came up with after separating it into two replacement passes. I share it with you both as a testament to my regular expression abilities (good or bad, you decide) and because this situation seems like one that might come up pretty frequently.

Regular expression Replacement string
<a href="http://${domain}">${domain}</a>
href="(?<domain>www\.[^"]+)" href="http://${domain}"