Wheaton College Norton, Massachusetts


Tool Archive

This page summarizes our previous tools. See a summary and links to the most recent upgrade to our tool set, Lexos (Summer 2015).

Summer 2012 -- First cut at online tools:

An early design decision was to develop three independent tools rather than try to build one big app. In a large part, this was driven by the fact that the second and most sophisticated tool in the pipeline, diviText, was based on new technology to most of our undergraduate programming team (e.g., ExtJS, JQuery, etc).
The pipeline of three online tools enable you to first "scrub" (clean) your text(s), then cut a single text into chunks, and last build dendrograms (trees) that show the relationships in and between chunks of your text(s).

  • scrubber v1.0 -- strip tags, remove stop words, apply lemma list: prepare text for diviText
  • diviText v1.2.1 -- cut texts into chunks in one of three ways, count words, .zip results
  • treeView v1.2 -- build a dendrogram and save the output as .pdf or phyloXML

Tutorial Videos

To get a feel for the types of lexomics analyses that you can do, follow along on our Project Videos starting with "The Story of Daniel", "Reading Dendograms", etc.


Download the software for these three open-source tools:

Advanced (offline, in progress) tools:

trueTree v1.0 -- cluster validation ... just how good is that clade?
topWords v1.0 -- find significant discriminating words between clades

Summer 2011 -- Early command-line scripts:

Prior to developing our online tools, we wrote this suite of command-line scripts that morphs data into needed formats in preparation for your experimental analyses of texts, including statistical summaries of word usage across select groups (or chunks) of texts, authorship attribution techniques, and clustering and classification methods.

Note: In order for each script to work properly you must download the whole suite of scripts.

  • Cut text into chunks
    This handy script "cuts" texts into user-specified chunks. For example, you could cut the poem of Daniel into ten 450 words chunks; subsequent scripts will treat each of these chunks as an independent text.cutter.zip - ReadMe
  • Merge the counts into one file
    This script can be used either after you've created a Virtual Manuscript or following as3_countWords. The main goal of this script, in addition to collecting some statistics on your collection of texts, is to merge the counts into one file in preparation for further analysis, for example, in R (see below).mergeWordCounts.zip - ReadMe

Comments are closed.