The WHALE lab explores artificial intelligence research in the areas of machine learning and knowledge discovery in databases. We are interested in problems where humans universally demonstrate proficiency, but robots and algorithms remain outmatched by infants and skilled analysts.
Mining Massive Time Series Databases
Increasingly, the information stored in database systems are not those that are highly structured (e.g., webpage texts; bank transactions). Rather, databases contain massive amounts of real-valued, unstructured information. The sources for these data are diverse -- security cameras; fetal heart rate monitors; sensor buoys floating in the ocean; stock tickers. Yet, humans are unable to interpret meaningfully all of these data because of the sheer volume collected. The wealth of data, combined with the resulting cognitive overload, presents an opportunity to develop novel machine learning and data mining methods. In the lab, we explore motif discovery in large time series databases, and now particularly focus on the broader task of knowledge discovery in robotic and vital signs data.
- Using Modified Multivariate Bag-of-Words Models to Classify Physiological Data, P. Ordóñez, T. Armstrong, T. Oates, and J. Fackler. Proceedings of the IEEE 11th International Conference on Data Mining Workshops, 2011
- Unsupervised Discovery of Motifs Under Amplitude Scaling and Shifting in Time Series Databases, with E. Drewniak. Proceedings of the 7th International Conference on Machine Learning and Data Mining in Pattern Recognition, 2011
Improved Pipeline for Cognitive Robotics
Computational approaches to language acquisition typically decompose the problem into learning specific components (e.g., syntax, lexicon, phonology) -- treating them as isolated tasks. These problem simplifications have led to disconnected solutions and algorithms with no expectation of grounded inputs. In our earlier work, we created an architecture for bootstrapping together different language learning components. We looked at ways in which different learners, typically arranged as different levels in a hierarchy (e.g., phoneme level, word level, phrase level), could share information akin to the ways that human learners acquire language. More recently, in the lab we explore ways to seed this architecture with information from the learner's environment. We are developing methods to discover the mapping between sensor experiences and the discrete symbols used to initialize the bootstrap algorithms.
- Unsupervised Discovery of Phoneme Boundaries in Multi-Speaker Continuous Speech, T. Armstrong and S. Antetomaso. Proceedings of the Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, 2011
- An Architecture for Bootstrapping Lexical Semantics and Grammatical Structures, T. Armstrong and T. Oates. Proceedings of the IEEE International Conferences on Web Intelligence and Intelligent Agent Technology Workshop on Learning, Agents and Formal Languages, 2011