Monday, 19 October 2009

Probably, the most horrible UNIX command I've ever written

pv enwiki.xml | grep -C1 '<title>' | tr -d ' ' | sed 's/<title>//' | sed 's/</title>//' |  
sed 's/<id>//' | sed 's/</id>//' | sed 's/<page>//'  | tr -d ' ' | grep -v '^$' | 
tr 'n' 't' | tr '-' 'n' | grep -v '^$' > title_id.txt 

I know it's probably as efficient as spaghetti code is - but hey! It works and does what I want it to!