Overview: The Machine Learning for Data Science laboratory focuses on the development of machine learning models and algorithms for addressing a variety of challenging problems in the emerging areas of computational social science, computational ecology, computational behavioral science and computational medicine. The lab is located within the College of Information and Computer Sciences at the University of Massachusetts Amherst. The lab was co-founded in 2012 by Prof. Benjamin Marlin, Prof. Daniel Sheldon and Prof. Hanna Wallach. Prof. Brendan O'Connor joined the lab in 2014.

Openings: We are always interested in hearing from MS/PhD and PhD students who would like to join the lab. The desired educational background includes calculus, linear algebra, probability and statistics, artificial intelligence, algorithms, and programming. Prior experience with machine learning, numerical optimization or Bayesian statistics is a plus. Relevant prior research experience is highly valued. Candidates should apply directly to the College of Information and Computer Sciences and should clearly indicate their interest in the MLDS lab in their personal statement. Information about the UMass CS graduate program is available here.

Recent Publications

Dadkhahi, H., and B. Marlin, "Learning Tree-Structured Detection Cascades for Heterogeneous Networks of Embedded Devices", 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017. Abstractfp0911-dadkhahia.pdf

In this paper, we present a new approach to learning cascaded classifiers for use in computing environments that involve networks of heterogeneous and resource-constrained, low-power embedded compute and sensing nodes. We present a generalization of the classical linear detection cascade to the case of tree-structured cascades where different branches of the tree execute on different physical compute nodes in the network. Different nodes have access to different features, as well as access to potentially different computation and energy resources. We concentrate on the problem of jointly learning the parameters for all of the classifiers in the cascade given a fixed cascade architecture and a known set of costs required to carry out the computation at each node. To accomplish the objective of joint learning of all detectors, we propose a novel approach to combining classifier outputs during training that better matches the hard cascade setting in which the learned system will be deployed. This work is motivated by research in the area of mobile health where energy efficient real time detectors integrating information from multiple wireless on-body sensors and a smart phone are needed for real-time monitoring and the delivery of just-in-time adaptive interventions. We evaluate our framework on mobile sensor-based human activity recognition and mobile health detector learning problems.

Dadkhahi, H., M. F. Duarte, and B. M. Marlin, "Out-of-Sample Extension for Dimensionality Reduction of Noisy Time Series", IEEE Transactions on Image Processing, vol. 26, no. 11: IEEE, pp. 5435–5446, 2017. Abstract1606.08282.pdf

This paper proposes an out-of-sample extension framework for a global manifold learning algorithm (Isomap) that uses temporal information in out-of-sample points in order to make the embedding more robust to noise and artifacts. Given a set of noise-free training data and its embedding, the proposed framework extends the embedding for a noisy time series. This is achieved by adding a spatio-temporal compactness term to the optimization objective of the embedding. To the best of our knowledge, this is the first method for out-of-sample extension of manifold embeddings that leverages timing information available for the extension set. Experimental results demonstrate that our out-of-sample extension algorithm renders a more robust and accurate embedding of sequentially ordered image data in the presence of various noise and artifacts when compared with other timing-aware embeddings. Additionally, we show that an out-of-sample extension framework based on the proposed algorithm outperforms the state of the art in eye-gaze estimation.

Soha, R., M. Addison, G. Deepak, M. Benjamin, and G. Jeremy, "iLid: Low-power Sensing of Fatigue and Drowsiness Measures on a Computational Eyeglass", Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 2: ACM, pp. 23, 2017. Abstractubicomp17-ilid.pdf

The ability to monitor eye closures and blink patterns has long been known to enable accurate assessment of fatigue and drowsiness in individuals. Many measures of the eye are known to be correlated with fatigue including coarse-grained measures like the rate of blinks as well as fine-grained measures like the duration of blinks and the extent of eye closures. Despite a plethora of research validating these measures, we lack wearable devices that can continually and reliably monitor them in the natural environment. In this work, we present a low-power system, iLid, that can continually sense fine-grained measures such as blink duration and Percentage of Eye Closures (PERCLOS) at high frame rates of 100fps. We present a complete solution including design of the sensing, signal processing, and machine learning pipeline; implementation on a prototype computational eyeglass platform; and extensive evaluation under many conditions including illumination changes, eyeglass shifts, and mobility. Our results are very encouraging, showing that we can detect blinks, blink duration, eyelid location, and fatigue-related metrics such as PERCLOS with less than a few percent error.

Adams, R. J., and B. M. Marlin, "Learning Time Series Detection Models from Temporally Imprecise Labels", The 20th International Conference on Artificial Intelligence and Statistics, 2017. Abstractadams17a.pdf

In this paper, we consider a new low-quality label learning problem: learning time series detection models from temporally imprecise labels. In this problem, the data consist of a set of input time series, and supervision is provided by a sequence of noisy time stamps corresponding to the occurrence of positive class events. Such temporally imprecise labels commonly occur in areas like mobile health research where human annotators are tasked with labeling the occurrence of very short duration events. We propose a general learning framework for this problem that can accommodate different base classifiers and noise models. We present results on real mobile health data showing that the proposed framework significantly outperforms a number of alternatives including assuming that the label time stamps are noise-free, transforming the problem into the multiple instance learning framework, and learning on labels that were manually re-aligned.

Bernstein, G., and D. R. Sheldon, "Consistently Estimating Markov Chains with Noisy Aggregate Data.", AISTATS, Cadiz, Spain, March 2016.
Natarajan, A., K. S. Xu, and B. Eriksson, "Detecting Divisions of the Autonomic Nervous System Using Wearables", 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Florida, USA, 2016. natarajan_embc_16.pdf