Overview: The Machine Learning for Data Science laboratory focuses on the development of machine learning models and algorithms for addressing a variety of challenging problems in the emerging areas of computational social science, computational ecology, computational behavioral science and computational medicine. The lab is located within the College of Information and Computer Sciences at the University of Massachusetts Amherst. The lab was co-founded in 2012 by Prof. Benjamin Marlin, Prof. Daniel Sheldon and Prof. Hanna Wallach. Prof. Brendan O'Connor joined the lab in 2014.

Openings: We are always interested in hearing from MS/PhD and PhD students who would like to join the lab. The desired educational background includes calculus, linear algebra, probability and statistics, artificial intelligence, algorithms, and programming. Prior experience with machine learning, numerical optimization or Bayesian statistics is a plus. Relevant prior research experience is highly valued. Candidates should apply directly to the College of Information and Computer Sciences and should clearly indicate their interest in the MLDS lab in their personal statement. Information about the UMass CS graduate program is available here.

Recent Publications

Natarajan, A., G. Angarita, E. Gaiser, R. Malison, D. Ganesan, and B. M. Marlin, "Domain Adaptation Methods for Improving Lab-to-field Generalization of Cocaine Detection using Wearable ECG", Proceedings of the 2016 ACM international joint conference on Pervasive and ubiquitous computing (UbiComp 2016), Heidelberg, Germany, 2016. natarajan_ubicomp_2016.pdf
Bernstein, G., and D. R. Sheldon, "Consistently Estimating Markov Chains with Noisy Aggregate Data.", AISTATS, Cadiz, Spain, March 2016.
Natarajan, A., K. S. Xu, and B. Eriksson, "Detecting Divisions of the Autonomic Nervous System Using Wearables", 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Florida, USA, 2016. natarajan_embc_16.pdf
Jacek, N., M. - C. Chiu, B. Marlin, and E. J. B. Moss, "Assessing the Limits of Program-Specific Garbage Collection Performance", Programming Language Design and Implementation, 2016. Abstract

We consider the ultimate limits of program-specific garbage collector performance for real programs. We first characterize the GC schedule optimization problem using Markov Decision Processes (MDPs). Based on this characterization, we develop a method of determining, for a given program run and heap size, an optimal schedule of collections for a non-generational collector. We further explore the limits of performance of a generational collector, where it is not feasible to search the space of schedules to prove optimality. Still, we show significant improvements with Least Squares Policy Iteration, a reinforcement learning technique for solving MDPs. We demonstrate that there is considerable promise to reduce garbage collection costs by developing program-specific collection policies.

Sadasivam, R. S., S. L. Cutrona, R. L. Kinney, B. M. Marlin, K. M. Mazor, S. C. Lemon, and T. K. Houston, "Collective-Intelligence Recommender Systems: Advancing Computer Tailoring for Health Behavior Change Into the 21st Century", Journal of Medical Internet Research, vol. 18, 2016. AbstractWebsite

What is the next frontier for computer-tailored health communication (CTHC) research? In current CTHC systems, study designers who have expertise in behavioral theory and mapping theory into CTHC systems select the variables and develop the rules that specify how the content should be tailored, based on their knowledge of the targeted population, the literature, and health behavior theories. In collective-intelligence recommender systems (hereafter recommender systems) used by Web 2.0 companies (eg, Netflix and Amazon), machine learning algorithms combine user profiles and continuous feedback ratings of content (from themselves and other users) to empirically tailor content. Augmenting current theory-based CTHC with empirical recommender systems could be evaluated as the next frontier for CTHC.

Winner, K., and D. Sheldon, "Probabilistic Inference with Generating Functions for Poisson Latent Variable Models", Advances in Neural Information Processing Systems, Barcelona, Spain, 2016.