Podatus is data visualization for the text of novels. Sentence and parts-of-speech frequencies are plotted by streamgraphs, where thickness in the graph represents frequency. The streamgraphs are oriented either vertically or horizontally depending on the need to scale: the list of all books vs. the details of one book. Ultimately, I don’t know how many books will be added to Podatus, perhaps very many. As for the book-detail page, the vertical orientation leaves room for other layers of information, for notations and the content of the book itself, hopefully enabling further discovery.
The word ‘podatus’ as defined on Wordnik says this: podatus n. In medieval musical notation, a ligature which represents an ascending step or skip.
The idea of ascending (or descending) is seen throughout this data. After all, authors are wise to use varying sentence lengths as one of their tools to keep interest and influence the pace of the narrative. Graphically speaking, the streamgraphs of this project are a bit musical (not to mention future hopes of mine to add a more direct musical relationship to this visualization).
Ever since reading and thoroughly enjoying the works of James Joyce 1 , I am inspired to learn more about the layers of construction in narrative, from basic levels onward. Obviously, this visualization of parts-of-speech begins at the basic level. But I have to start somewhere. It is my hope to keep building on this platform and use it as a tool kit for further studies of the structure of texts, layering parts of speech with mood, meter and meaning.
One of my early goals of this project was to compare the writing structure across multiple books. Here, I’ve tried to do that with the books-list page, focusing on individual chapters; maybe I will need further focus to do justice to this idea.
Several sources 2 suggest the pace of action in a narrative is an inverse correlation with sentence length, that, for example, short sentences strung together make for an increased pace of action in the story. Conversely, longer sentences slow the pace. Ultimately, I want to ask questions about authors of certain genres and time periods. For example, how has the structure of the novel changed over time? How is academic writing look when compared with literature?
Podatus is built with a Ruby on Rails 4.2 and D3js for constructing streamgraphs from dynamically-loaded data via JSON. Sentence-based CSV data is constructed using NLTK (Python). The majority of text uses the Kaffeesatz typeface by Yanone and made available for free through Google fonts. The Podatus project source is hosted on Github.
The e-texts where chosen from the top-25 downloaded books from Gutenberg.org; selecting among these 25, I focused on British authors. The plain text (UTF-8) renditions of the books were downloaded, then marked up by a simple convention to indicate where the book text begins and ends, as well as each of the chapters. Otherwise, preamble and Gutenberg licensing language become part of the analysis. The markup also supports other annotation which can later become part of the book’s depiction on the web site. This markup is done manually in a text editor.
Once downloaded, each book is processed by tools based on NLTK, parsing and finding parts of speech, ultimately creating chapter- and sentence-based CSV data files, which are subsequently loaded into the Rails database.