It has been a more-busy-than-usual week on campus, and I've had a pretty packed conference schedule. I've been much too far behind on New Year's Resolution:
Ship Copy.
So, in it's most-purest most-rawest most-honest form, you can find a number of raw transcripts in the new repo: decause/raw
So, What does that look like?
.
├── libreplanet
│ └── 2015
│ ├── closingkeynote-libreplanet-karensandler.txt
│ ├── debnicholson-friday.txt
│ ├── freesoftwareawards.txt
│ └── highpriorityprojs-libreplanet-friday.txt
├── pycon
│ └── 2015
│ ├── ninapresofeedback.txt
│ ├── pycon-day2-keynotes.txt
│ ├── pyconedusummit2015.txt
│ └── scherer-pycon-ansible-day2.txt
└── RIT
└── 2015
├── biella-astra-raw.txt
├── biella-molly-guest-lecture.txt
└── molly-sauter-where-is-the-digital-street.txt
6 directories, 13 files
$ wc -w libreplanet/2015/*.txt
2287 libreplanet/2015/closingkeynote-libreplanet-karensandler.txt
139 libreplanet/2015/debnicholson-friday.txt
586 libreplanet/2015/freesoftwareawards.txt
2297 libreplanet/2015/highpriorityprojs-libreplanet-friday.txt
5309 total
$ wc -w pycon/2015/*.txt
106 pycon/2015/ninapresofeedback.txt
2233 pycon/2015/pycon-day2-keynotes.txt
1844 pycon/2015/pyconedusummit2015.txt
63 pycon/2015/scherer-pycon-ansible-day2.txt
4246 total
$ wc -w RIT/2015/*.txt
4521 RIT/2015/biella-astra-raw.txt
2489 RIT/2015/biella-molly-guest-lecture.txt
1964 RIT/2015/molly-sauter-where-is-the-digital-street.txt
8974 total
18529 total total
18,529 words, or, just over 41 pages total of raw text.
There is a flaw in my workflow. Though there is some utility in a raw transcript, really it is mostly when delivered in real-time. After the fact, there is much post-production work to be done, like spell checking. Even after, if there is a video, then the transcript is partial, and incomplete. This is bothersome to many potential downstream consumers of raw text. So where does that leave us?
I've played with word_cloud before within my decause/presignaug for building
presidential inauguration visualizations last year. Since then, word_cloud has
gotten much more sophisticated--now using scikitlearn, and numpy, and providing the ability to fit word clouds within images!
pip install cython firstsudo yum install freetype-devel (probably not necessary, since this is alleviated by pointing at a diff .ttf typeface...)FONT_PATH within word_cloud.pyGIMP didn't display the color encoding in the file statusbar when you opened things :) )pycon.py and pycon-greys.py.