Wheaton College Norton, Massachusetts
Wheaton College
Lexomics

Academics

Software

Online Tools

(0) Scrubber (alpha) -- strip tags, remove stop words, apply lemma list: prepare text for (1) diviText
(1) diviText -- cut texts into segments in one of three ways, count words, .zip results
(2) treeView (beta) -- build a dendrogram and save as .pdf or phyloXML
(3) cluster validation (in progress) -- just how "good" is that dendrogram?

Analysis Software (as)

This suite of software morphs data into needed formats in preparation for your experimental analyses of texts, including statistical summaries of word usage across select groups (or chunks) of texts, authorship attribution techniques, and clustering and classification methods.


Click Puzzle Piece for further Description and Downloadable Content

Note: In order for each script to work properly you must download the whole suite of scripts.
Click Here to Download Suite of Scripts
  • piece2 Cut Text into Chunks
    This handy script "cuts" texts into user-specified chunks. For example, you could cut the poem of Daniel into ten 450 words chunks; subsequent scripts will treat each of these chunks as an independent text.

    cutter.zip - ReadMe

  • piece4a Merge the Counts into One File
    This script can be used either after you've created a Virtual Manuscript or following as3_countWords. The main goal of this script, in addition to collecting some statistics on your collection of texts, is to merge the counts into one file in preparation for further analysis, for example, in R (see below).

    mergeWordCounts.zip - ReadMe


Additional Scripts

Comments are closed.