Click a question below to learn more. Be sure to send us any additional questions; we'd like to answer them here.
Just what is lexomics?
The term "lexomics" was originally coined to describe the computer-assisted detection of "words" (short sequences of bases) in genomes. When applied to literature as we do here, lexomics is the analysis of the frequency, distribution, and arrangement of words in large-scale patterns. More specifically as relating to our current suite of tools we have built and use, we segment text(s), count the number of times each word appears in each segment (or chunk), and then apply cluster analysis to build dendrograms (branching diagrams or trees) that show relationships between the chunks. Watch our Project Videos that introduce some of the problems we have been working on and watch our online demos as we apply our online tools to those tasks.
Will lexomics replace me as a scholar?
Nope. Lexomics, like most computational text mining tools, are what John Burrows calls a "middle game" technique. You (the scholar) have much scholarship to do before running computational tools (for example, collecting texts, forming hypotheses as to where you might segment your texts, etc) and then following the use of the tools, you have much work to do after to interpret the results and form yet new hypotheses. The computer is just a tool "in the middle", albeit a very powerful tool that today's scholars of texts want in their arsenal.
Where do I start?
If you are new to lexomics, we strongly recommend that you review our tutorial materials.
What browser should I use?
On MacOS, Windows, or Linux, we recommend that you use the Chrome browser, however, Firefox also works. We do not recommend the IE browser since we have not tested our tools on that platform.
You have three tools. What is the correct order to use them?
Our three tools are intended to work in a sequential pipeline: one, two, three. First, use scrubber to "clean" up your texts. Second, diviText will accept the output from scrubber and will cut up a text (or texts) into chunks and/or merge more than one set of text chunks with another. Last, treeView will accept the output from diviText and produce a dendrogram showing relationships between your (chunks of) texts.
What languages (other than Old English, Latin, etc) does Lexomic analysis work with?
We have worked hard on the initial tool in our pipeline, scrubber, so that it will handle texts of almost any file type (raw .txt, Unicode, .sgml, .html, .xml, even .docx) and language.
How reliable are Lexomic results? Is there any way to verify their reliability?
Excellent question. Every set of chunks from a text or texts will produce a dendrogram. Your question really asks how we can assign confidence measures for any and all clades (groups) in the dendrogram. We are currently prototyping a new tool (not yet available) called "trueTree" that performs a bootstrapping method that assigns levels of confidence to our dendrograms.
Is lexomics analysis in general and are dendrograms in particular unqiue to your research group?
No. Scholars have been counting words, clustering, and classifying texts for years. Our contribution is to build tools that make the experimental process relatively easy for scholars to perform cluster analysis (dendrograms). This includles our group and our undergraduate students!