How does Facebook recognize URLs?

Recently I helped @olinicola to match URLs using a regular expression in order to find links shared on a Facebook page using APIs. After several attempts we found one with a mostly perfect coverage of the Facebook behavior on Daring Fireball (look at the post for a better explanation about how it works).

This is the “extended” version for all valid uris.

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

There is also a “simpler” version who match only http://*, https://* and www.*

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

Fortunately are both public domain. Many thanks to John Gruber