On this page


Vicky Li Horst ’14 (see her previous work in our group below) moved to the Broad Institute.


Another iteration of our interdisciplinary course DNA (COMP/BIO 242) offered. See DNA (Comp/Bio 242) Syllabus (pdf). Scripting in Python to explore the wonders of genomes as texts. Threads: the human microbiome project, ethical implications of personalized genomic medicine, reviewing a 23andMe personalized report, computational experiments, and scientific writing.

Vicky Li ’14 and Francine Camacho ’14, Bioinformatics majors, explored the metagenome of the guts of pill bugs (Porcellio Scaber) found in Norton and Cape Cod, MA.

Chris DeMolles ’13 refactored his suite of Python modules to segment and count L-mers (L = 4-12) and the number of Inverted Repeats (IRs) as stock for machine learning algorithms to identify (potentially) new small RNAs.


Mengyang (Vicky) Li ’14 and Chris DeMolles ’13 started a new project for finding small RNAs. In the Fall, Mengyang, a Bioinformatics major, gave a series of lectures to the group on microbial RNAs. Chris prototyped SVM (Support Vector Machine) software to classify sequences of DNA as [RNA vs. non-coding, non-RNA]. Chris’ software includes scripts to automatically check for new entries and then “scrape” needed files from NCBI’s human microbiome project.


Donald Bass ’12 completed an honors thesis in computer science under Mike Kahn (Statistics) and LeBlanc on cluster validation techniques. As of the Fall 2012, Donald is at Northeastern University studying for his PhD in Computer Science.

Kelsey Hichens ’13 and Emily Baldwin ’14 enrolled in a semester of research (COMP 499 — Genomics Research) with LeBlanc and Dyer. The research experience entitled “Detecting Horizontal Transfer in Microbial Genomes,” led to the simHT software for simulating horizontal transfer between chunks for various microbes. The two presented their results and software at CCSCNE 2012 in April.



Emily Baldwin ’10 walked into LeBlanc’s office as a first year student (before the semester started) and said, “I want to work with you in genomics.” Emily plowed through our Perl book.


Donald Bass ’12 upgraded the pipeline to help ensure that all the scripts used were both documented and cross platform, as well as rewriting several algorithms to ensure more accurate results, fixing severe bugs in two scripts that search the pubMed database and generate random genomes, and finally wrote a script to generate abstract chromosomes.

Nicholas Faulconer ’12 spent the first week familiarizing himself with the “old” scripts, implemented by Neil Kathok ’10. He fixed/adjusted/updated Scripts 1 and 4, and Seed.pl. In order for seed.pl to run and be updated he learned how to run a MySQL server.


Matthew Brousseau ’11 revised Perl scripts to become cross platform while updating documentation in the scripts and pods as well as moving the Genomics site over to WordPress to look all nicey-nice.

Brandon Waltz ’11 revised and updated documentation on all the experiment scripts as well as incorporated this website into the overall structure of the Wheaton web using WordPress.

Neil Kathok ’10 wrote Perl scripts to test all the examples in our book and ported each example to the book’s companion website. Neil has building a new database of microbial genomes and associated suite of scripts to automatically seed the database when new microbial genomes become available at NCBI.

White Board


Matt Brown ’10 joined our weekly journal clubs in 2007 and 2008. He led SQL tutorials for the group as we began a new database design.


Through out the fall of 2007 and spring of 2008, the Genomics Group consisted of students Neil Kathok ’10, Evan Ferri ’08, and Matt Brown ’10. Neil, Evan, and Matt worked with Professors Dyer, Kahn, and LeBlanc to update the group’s microbial database and to design experiments that use genomic signature to investigate relationships between microbial plasmids and pathogenicity.


Christina Nelson ’11 (right) presented some of our ongoing work on horizontal transfer to the President’s Commission during the Fall semester. Her poster was entitled: ‘Words and Rules in Horizontal Transfer’. Neil Kathok ’10 finished a new database of 700+ microbial bugs and their associated metadata.


Evan Ferri ’08 , a biology major, was quickly recruited to join our group after finishing our Perl DNA course. Evan kept the software types honest as we searched for genomic signatures between plasmids and their hosts during 2007 and early 2008. Evan plans to go on to medical school.


While on year-long sabbaticals, Betsey and Mark joined colleague Mike Kahn (Statistics) in a series of experiments to classify 200+ bacteria and archaea genomes by temperature regime (hyperthermophile, thermophile, or mesophile). Mike introduced the group to the statistical programming language “R” where Mark and Mike developed models using CART (Classification and Regression Trees). The work led to a publication in the journal Archaea 2:159-167, 2007.


Neil Kathok ’10 joined the Genomics Group in the Spring of 2007 to help test all of the Perl routines to appear in our new book, Perl for Exploring DNA. Neil also created a website which will appear on the book’s Oxford University Press companion website to allow users of the text to download all of the Perl examples.


Robbie Grossman ’07 was a computer science major who lent considerable expertise in database design and Perl scripts to automatically seed our a database of microbial genomes.

Robby Grossman ’07

Sarah Milewski ’07 was a computer science major. Sarah implemented routines to seed our new microbial database.


Mark and Betsey made inroads with reaching the national community of computer science professors as they presented workshops at CCSC-Eastern (Iona College, NY) and SIGCSE (Houston, TX). With “pre-doc” Greg Williams ’02 as a consultant, Mark and Betsey shipped their manuscript for their new book, ‘Perl for Exploring Biological Sequences’ (Oxford University Press, due out in early 2007). The group received additional funding and a one-year extension on their NSF grant (NSF DUE # 0340761), to help finish their book.


Sarah Milewski ’07 and Robbie Grossman ’07 designed and prototyped a database to store the entire chromosomal and plasmid DNA for over 300 prokaryotic genomes. Greg Williams ’02 helped implement a final version. The database is a huge move forward for the group as it holds metadata on the each organism (e.g., salinity, mobility, temperature range, etc), thus facilitating the range of questions to pose of microbial genomes.


Under the leadership of Wheaton Biology Professor Shawn McCafferty, Mark and Nguni Phakela ’06 developed some software to help Shawn with his research in phylogenetic analyses. In particular, Nguni and Mark implemented routines to compute patristic distances of a variable number of organisms.

Nguni Phakela ’06, was Wheaton’s first double major in computer science and biochemistry. Nguni prototyped some of first experimental software in phylogenetics under the guidance of Biology Professor Shawn McCafferty.



Steve Benz finished his senior year helping Betsey upgrade the favGene tool. Betsey presented some of her work at ASM. Mark taught and did research at the University of Wollongong (UoW), Australia for the year. He presented the group’s work on favGene at the International Conference on Bioinformatics in Auckland, New Zealand. Mark worked with Paul White (UoW) ’06 on a set of routines to automate experimental runs that test authorship attribution techniques, typically used on English texts, in the context of identifying the kingdom, genus, and species of a given sequence of DNA.

Stephen Benz ’05 was a computer science major and biochemistry minor and now has a Ph.D. Steve was our lead programmer for two years in 2004 and 2005. Steve worked from the west coast during the summer of 2002 completing the redesign of our GUI for the Motif Lexicon v2.0. Steve was funded for the 2002-2003 academic year and worked on the design and implementation of regular expressions and software to search for putative zinc-finger binding sites.

Steve Benz ’05

Chris Wilbur ’05 biology major. Chris put together a DNA helix model, a beautiful addition to our lab.


In a year-long project, Steve Benz ’05, Robby Grossman ’07, and Nguni Phakela ’06 worked on version 2.0 of the “favorite gene” project. favGene is an application that stores genomes in a MySQL database on the backend to facilitate a user (via Perl scripts on the frontend) in selecting and assembling the DNA sequences in the upstream, downstream, and/or genic regions of their “favorite” set of genes. Additional Perl scripts allow the user to either take home their sequences or search just those regions for various motifs using regular expressions. Steve and Robby presented their work at the northeast regional conference on computer science education (CCSCNE 2004) at Union College in April 2004. As part of a new NSF grant (NSF DUE # 0340761), Robby and Steve will work during the summer of 2004 on favGene.

Patrick Sagui ’04 computer science major. Patrick is a jack-of-all-boxes. Pat ported our Perl and C++ CGIs to a new Linux box over the summer of 2002 and helped with the upgrade of the Motif Lexicon to v2.0.

Pete Cahalan ’04, scripted in C++ and Perl during the summer of 2003 as the Motif Lexicon moved to v3.0. Pete and Brian left scripts and directions for handling new organisms that are added to the Motif Lexicon.

Brian Donorfio ’04, moved the Motif Lexicon to v3.0 during the summer of 2003; v3.0 handles multiple organisms. Pete and Brian worked during the summer of 2003 on the Motif Lexicon. They shipped v3.0 of the lexicon so that it includes the functionality of handling multiple organisms.

Jonah Cool ’04 Jonah worked with Steve Benz on the favGene application. Jonah researched the literature for Zinc Finger transcription factors and their associated DNA binding sites. Jonah now has his Ph.D. in Biology.

Spring 2004 Images
CCSCNE 2004 at Union College


By January of 2003, Greg Williams ’02 had set up a MySQL database (with worm, yeast, and Ecoli genomes) and written Perl utilities that search and/or create files of DNA sequence of the upstream, downstream, and genic regions of a family of genes (favGene v1.0). At Dickison College in Carlisle, PA on March 21st and 22nd, Dyer, B. and LeBlanc, M. presented Genomics in the Undergraduate Curriculum.

The following month, April , professors LeBlanc and Dyer presented Teaching together: A three-year case study in genomics at the Northeastern Conference on Computing in Small Colleges and published in The Journal of Computing Sciences. The abstract of Adam Villa’s (’03) Searching DNA Neighborhoods was published in The Journal of Computing Sciences in Colleges, April 2003, p245. Also published that April was Mark LeBlanc and Betsy Dyer’s Teaching together: A three-year case study in genomics in The Journal of Computing Sciences in Colleges, v18(5), April 2003, p85-95. During the entire 2002-2003 academic year, Steve Benz ’05 and Jonah Cool ’04, worked jointly on designing regular expressions and Perl programs to search for putative zinc finger binding sites. Steve and Jonah presented their research at the annual conference of Computer Science in Colleges at Rhode Island College, RI, April 2003. To be presented at the Northeastern Conference on Computing in Small Colleges (2003) by Benz, S. ’05 and Cool, J. ’04 is Using Regular Expressions to Locate Putative Zinc Finger Binding Sites. Also the abstract has been published in The Journal of Computing Sciences in Colleges. Adam Villa ’03 completed an honors thesis in computer science in the spring . Adam’s thesis was entitled, “Supporting Exploratory Analyses of Gene Regulation in Localized DNA Neighborhoods.” Adam presented his research at the annual conference of Computer Science in Colleges at Rhode Island College, RI, April 2003.

Also during that April , in The Journal of Computing Sciences in Colleges, Benz, S. ’05 and Cool, J. ’04 published Using Regular Expressions to Locate Putative Zinc Finger Binding Sites.

Austin Jordan ’03 computer science major. Austin worked as a consultant during the summer of 2003 on a redesign of the Genomics Research Group web site.

Adam Villa ’03, computer science major, honors thesis student, and French minor. Adam started with the group in January 2001. He first helped prototype an HTML-wrapper for the software framework for our Motif Lexicon. Adam completed an honors thesis in genomics subtitled “Giving DNA a Trie” where he implemented the “Neighborhood” CGI in the Motif Lexicon. Adam now has his PhD student in computer science and is a professor at Providence College.

Martin Baron ’03 double major: computer science and mathematics. Martin put the finishing touches on the statistical module in the summer of 2002 and prototyped a “relateds” module as we shipped version 2.0 of the Motif Lexicon.

Nick Doolittle ’03 computer science major who was a lead on the implementation of a C++ class for handling the caching of results for the Motif Lexicon in January 2001. He started a new module to find all repeats of various configurations with loops of any length in January 2002.

Jon Lister ’03 dual math/computer science major. Jon created a database of biology and computer science programs in the northeast in support of our NSF workshops on incorporating genomics into the undergraduate curriculum.

Presented at the Northeastern Conference on Computing in Small Colleges (2003) was Adam Villa’s, ’03, Searching DNA Neighborhoods. The abstract was also published in The Journal of Computing Sciences in Colleges. In June , Dyer, B. and LeBlanc, M. presented “A Second Workshop for Professors teaching Undergraduate Biology or Computer Science with an interest in incorporating “Genomics” (the analysis of DNA sequences) into their curricula.” (NSF DUE #0126643).

Summer 2003 Images

During July 28th through the 30th , in Boulder, CO, at MathFest 2003, Professor LeBlanc will present a two-day short course on ‘Reading the Book of Life: How Bioinformatics Makes Sense of Molecular Messages’ — Moving Research to the Classroom: Linking courses in Biology and Computer Science.

Spring 2003 Images


During January break Nick Doolittle ’03 started a new module to find all repeats of any length with unlimited loop size. Missy Kimball ’02 integrated a new module to give the statistical likelihood of a requested motif. This module was the result of earlier work by Andrea Chritoforou ’01. Greg Williams ’02 wrote a very cool “Etymology” module which searches PubMed for all published abstracts for one’s requested motif. In the spring Greg Williams ’02 enrolled in COMP 499 – Genomics Research – and implemented a suite of filters for motifs in the regulatory regions of some of the genes involved in making the flagella in bacteria. His version of the “favorite gene project” is due in the Fall of 2002.

Martin and Pat moved the motif lexicon to version 2.0, implementing a number of enhancements to the GUI and statistics module. Martin added a new “relateds” module for the lexicon while Pat ported our CGIs to a new Linux box. In the August 19, 2002, v17 issue of The Scientist, Dyer, B. and LeBlanc, M. published free user-friendly genomics software.

Martin Baron ’03 and Patrick Sagui ’04 (above) were Mars Fellows for the summer.

Missy Kimball ’02 dual mathematics and computer science major. A jack of all trades, you could usually find her proving abstract theorems, working on programs in C, or belting out tunes with the Whims, the acappella group. Missy prototyped the initial motif lexicon in Perl and C and added a new statistical module in January 2002.

Trevor Agnitti ’02 dual computer science and physics major. In addition to his contributions in the genomics group, he was known for his hallway scootering and a wide selection of really spiffy shirts.

Greg Williams ’02 was a former computer science major and now has his Ph.D. Greg sported wild hair and served as our hacker supreme for years. He reviewed our entire manuscript of our Perl book (Oxford, 2007), serving as our in-house Perl expert. He also helped finish off a database of microbial genomes.

At the 14th International Genome Sequencing and Analysis Conference, on October 4th in Boston, MA, Professors Mark LeBlanc and Betsy Dyer along with students Baron, M. ’03, Christoforou, A. ’01, Doolittle, N. ’03, Kimball, M. ’02, Villa, A. ’03, Williams, G. ’02 presented The DNA Motif Lexicon — cataloguing and annotating genomes.

Glen returns for a party and checks out our latest software.


In the Winter 2002 issue of Cell Biology Education (101-104), Betsy Dyer and Mark LeBlanc published a workshop report – Incorporating Genomics Research into Undergraduate Curricula. On December 14th , Professors Mark LeBlanc and Betsy Dyer presented Collaborations in Genomics – Connecting Courses in Genetics and Computer Science at the workshop New Paradigms in Teaching Introductory and Cell Biology at the Annual Meeting of the American Society for Cell Biology in San Francisco, CA. Two days later on December 16th, Professors LeBlanc and Dyer along with Steve Benz, ’05 and Jonah Cool, ’04 attended The 42nd Annual Meeting of the American Society for Cell Biology to present Towards a DNA Dictionary.


During January break, the group implemented a Motif Lexicon for C.elegans (Nathan Buggia ’01 Computer Science, Melissa Kimball ’02 Mathematics & Computer Science, Nick Doolittle ’03 Computer Science, Adam Villa ’03 Computer Science). During the spring semester, Andrea Christoforou ’01, a double major in Chemistry and Mathematics worked on counting problems in genomics under Professor of Mathematics Shelly Leibowitz and Professor of Biology Betsey Dyer. Andrea Christoforou ’01, a double major in Chemistry and Mathematics and Melissa Kimball ’01, a double major in Computer Science and Mathematics presented their research projects in genomics at the annual conference of Computer Science in Colleges at Middlebury College, VT, April 2001.

Professors Dyer and LeBlanc were awarded a grant from the National Science Foundation (NSF DUE 0126643) to encourage other faculty in biology and computer science to work together and incorporate genomics into their curriculums.


Nathan Buggia ’01 computer science major. When he wasn’t found clutching a copy of von Heijne’s Sequence Analysis in Molecular Biology, he was probably cranking out some C++ or Perl as he served coffee at the Lyon’s Den. Nathan left the group a software framework for the motif lexicon to facilitate continued student development on the project.

Andrea Christoforou ’01 double-major in Mathematics and Chemistry and a Fulbright Scholar studying bioinformatics in England. Andrea worked on closed form equations to compute the likelihood of finding all possible repeats.

Jen Tobin ’01 Biochemistry major. Jenn searched MedLine for all possible current functions for intergenic motifs of length L=4bp.

Spring 2001 Images
June 2001 Images

During the fall semester Greg Williams ’03 searched for the top-10 motifs of any lengths from local neighborhoods as well as genome wide in C.elegans . Adam Villa ’03 included a new module into the motif lexicon for searching “neighborhoods”.
June2001Group  group1


In the spring of 2000, Glen Aspeslagh finished the “Favorite Gene Project” with his thesis work entitled: Software for Locating Potential Regulatory Motifs in the Promotors of the Kreb’s Cycle Genes of Caenorhabditis elegans.

In the summer Trevor Agnitti ’02 and Melissa Kimball ’02 were funded as Mars Fellows and implemented a prototype of what has become our Motif Lexicon.
Results from initial experimental runs by Nathan Buggia ’01 and Glen Aspeslagh ’00 were published: LeBlanc, M., G. Aspeslagh, N. Buggia, B. Dyer (2000). An Annotated Catalog of Inverted Repeats of Caenorhabditis elegans Chromosomes III and X, with Observations Concerning Odd/Even Biases and Conserved Motifs, Genome Research, v10(9):1381-1392, Cold Spring Harbor Laboratory Press, September 2000. In the fall semester, Nathan Buggia enrolled in an independent research course entitled: “System design engineering for data storage and analysis in genomics research.” Nate’s work set the foundation for the Jan. 2001 system redesign.

Glen Aspeslagh ’00 computer science major. When he wasn’t working on some C++ dynamic programming or Perl data viewers, he could be found programming his Palm Pilot. Glen implemented the first browser tool on our site and started the “Favorite Gene Project” with his thesis work on repeats in the Krebs cycle.


BeginWheaton’s first genomics group started in January, 1999 . It consisted of Professor of Biology Betsey Dyer, Professor of Computer Science Mark LeBlanc, Glen Aspeslagh ’00, Nathan Buggia ’01, and a team of computer science students who set up our new NSF-funded lab.

Our initial search and browse tools went online in the fall of 1999.



In the beginning… Mark and Betsey experiment with their first “linking” of a computer science course and a biology course.