I love regular expressions. Okay, I love the challenge of crafting regular expressions. I do not enjoy reading regular expressions that I have not created or, really, even the ones I do create. But give me a problem and tell me to make a regular expression to match things and I am all over it.
A co-worker wanted a regular expression to turn unlinked URLs in text into HTML links and to correct linked URLs that lacked a protocol into valid URLs. In other words, if "www.google.com" appeared in some text, it needed to be replaced with <a href="http://www.google.com/">www.google.com</a>
and <a href="www.google.com">some link text<a>
needed to turn into <a href="http://www.google.com">some link text<a>
My first pass was a monster regular expression that handled both situations but I couldn't get the replacement string to account for the fact that there was already link text in the invalid URL example. And I couldn't adequately cover the situation where there were attributes before the href
attribute. So scrap that one.
This is what I came up with after separating it into two replacement passes. I share it with you both as a testament to my regular expression abilities (good or bad, you decide) and because this situation seems like one that might come up pretty frequently.
Regular expression | Replacement string |
---|---|
(?<=\s|^)(?<domain>www\.[^\s]+)(?=\s) |
<a href="http://${domain}">${domain}</a> |
href="(?<domain>www\.[^"]+)" |
href="http://${domain}" |