Collection of small text corpora of interesting data

WWW: https://github.com/gaborcsardi/rcorpora
