Sunday, 17 January 2010

A URL matching regex in Python — any problems?

Can anyone see any flaws in it for real-world URL?

>>> str = 'and now that was a URL'

>>> urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&#+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', str)

>>> urls


For me it looks like working but you never now...  Comments from @HD42 would be highly appreciated =)

No comments:

Post a Comment