Wheaton College Norton, Massachusetts
Lexomics

Academics

Tools

Online tools:

This pipeline of three online tools enable you to first "scrub" (clean) your text(s), then cut a single text into chunks, and last build dendrograms (trees) that show the relationships in and between chunks of your text(s).

  • scrubber v1.0 -- strip tags, remove stop words, apply lemma list: prepare text for diviText
  • diviText v1.2.1 -- cut texts into chunks in one of three ways, count words, .zip results
  • treeView v1.2 -- build a dendrogram and save the output as .pdf or phyloXML

Tutorials and transcripts for these tools can be found here.

Download the software for these three open-source tools:

Advanced (offline, in progress) tools:

trueTree v1.0 -- cluster validation ... just how good is that clade?
topWords v1.0 -- find significant discriminating words between clades


Early command-line scripts:

Prior to developing our online tools, we wrote this suite of command-line scripts that morphs data into needed formats in preparation for your experimental analyses of texts, including statistical summaries of word usage across select groups (or chunks) of texts, authorship attribution techniques, and clustering and classification methods.

Note: In order for each script to work properly you must download the whole suite of scripts.

  • Cut text into chunks
    This handy script "cuts" texts into user-specified chunks. For example, you could cut the poem of Daniel into ten 450 words chunks; subsequent scripts will treat each of these chunks as an independent text.cutter.zip - ReadMe
  • Merge the counts into one file
    This script can be used either after you've created a Virtual Manuscript or following as3_countWords. The main goal of this script, in addition to collecting some statistics on your collection of texts, is to merge the counts into one file in preparation for further analysis, for example, in R (see below).mergeWordCounts.zip - ReadMe

Comments are closed.