Software

Our software work has shifted from building tools (e.g., DNA Dictionary, genome browsers) to machine learning classification experiments (LeBlanc et al. 2012, 2013).

Sharing a slice of experimental time: a Suite of Scripts

Note: In order for each script to work properly you must download the whole suite of scripts and save them into a common directory.

piece1 Extract Genomes from Local Database

This script accesses a database and retrieves all the different organisms on the server and gets some basic information about them.

extract_from_dB.zip – ReadMe (pdf)

piece2 Cut Genomes into Chunks

Cutter.pl (“Script #1”) is the second of a suite of scripts designed to assist in the analysis of DNA. This particular script breaks a large DNA sequence down into several smaller chunks of user-determined size.

cutter.zip – ReadMe (pdf)

piece3 Frequency Counts of Motifs

This script assumes that the script cutter.pl has already been run. This script goes through all the files created by cutter.pl that match the type of data specified in the command line, counts the number of times each unique lmer appears in the genome as well as its reversed complementary sequence, and outputs the results into a series of .xls files one for each combination of lmer size and input file.

countMotifs.zip – ReadMe (pdf)

piece4 Prepare Data for R

This script takes the various motif counts created by the motifCounts.pl script and combines them into an single .xls file for use in statistical analysis and also adds some additional metadate.
prepare4R.zip – ReadMe (pdf)

Additional Scripts

extractGroupPhylum

This particular script queries a database to gather metadata about the bugs in the data directory. Data gathered includes the organism’s reference sequence, super kingdom, group, genus, species, strain, oxygen requirements, habitat, temperature range, and pathogenic data.

extractGroupPhylum.zip – ReadMe (pdf)