Wheaton College Norton, Massachusetts
Wheaton College



Our software work has shifted from building tools (e.g., DNA Dictionary, genome browsers) to machine learning classification experiments (LeBlanc et al. 2012, 2013).

Sharing a slice of experimental time: a Suite of Scripts

Click Puzzle Piece for further Description and Downloadable Links

Note: In order for each script to work properly you must download the whole suite of scripts and save them into a common directory.
  • piece2Cut Genomes into Chunks

    Cutter.pl ("Script #1") is the second of a suite of scripts designed to assist in the analysis of DNA. This particular script breaks a large DNA sequence down into several smaller chunks of user-determined size.

    cutter.zip - ReadMe

  • piece3     Frequency Counts of Motifs

    This script assumes that the script cutter.pl has already been run. This script goes through all the files created by cutter.pl that match the type of data specified in the command line, counts the number of times each unique lmer appears in the genome as well as its reversed complementary sequence, and outputs the results into a series of .xls files one for each combination of lmer size and input file.

    countMotifs.zip - ReadMe

  • piece4     Prepare Data for R

    This script takes the various motif counts created by the motifCounts.pl script and combines them into an single .xls file for use in satistical analysis and also adds some additional metadate.
    prepare4R.zip - ReadMe

Additional Scripts

  • additional_piece

    • additional_piece2

      This particular script quieries a database to gather metadata about the bugs in the data directory. Data gathered includes the organism's reference sequence, super kingdom, group, genus, species, strain, oxygen requirements, habitat, temperature range, and pathogenic data.

      extractGroupPhylum.zip - ReadMe

Comments are closed.