Home

Prospective Students

I am looking for highly motivated PhD-track students to work on multiple funded research projects starting in Fall 2018. Preferred qualifications include prior research experience in machine learning as an undergraduate or MS-level RA; a strong course background in multivariate calculus, probability and statistics, linear algebra, and numerical analysis; and a strong interest in working on interdisciplinary research bridging machine learning, mobile and distributed systems, and health and behavioral science. Application instructions can be found here: https://www.cics.umass.edu/admissions/application-instructions.

Research Interests

My research interests lie at the intersection of artificial intelligence, machine learning, and statistics. I am particularly interested in hierarchical graphical models and approximate inference/learning techniques including dynamic programming, Markov Chain Monte Carlo and variational Bayesian methods. My current research has a particular emphasis on models and algorithms for multivariate time series data and explores both probabilistic and neural network-based models and their combination.

Thanks to awards from ARL, IARPA, NSF and NIH, my current application focus is on machine learning-based analytics for mobile and wearable sensor data, as well as electronic health records data. I am also interested in large-scale, real-time, heterogeneous distributed machine learning systems that bridge mobile and embedded computing with cloud-based systems including distributed prediction cascades and distributed real-time active learning. My research group collaborates widely with researchers in mobile and distributed computing, mobile health, behavioral science, and medicine.

In the past, I have worked on a broad range of applications including collaborative filtering and ranking, unsupervised structure discovery and feature induction, object recognition and image labeling, and natural language processing, and I continue to consult on projects in these areas.

Recent Publications

Adams, Roy J., and Benjamin M. Marlin. "Learning Time Series Detection Models from Temporally Imprecise Labels." The 20th International Conference on Artificial Intelligence and Statistics. 2017. Abstractadams17a.pdf

In this paper, we consider a new low-quality label learning problem: learning time series detection models from temporally imprecise labels. In this problem, the data consist of a set of input time series, and supervision is provided by a sequence of noisy time stamps corresponding to the occurrence of positive class events. Such temporally imprecise labels commonly occur in areas like mobile health research where human annotators are tasked with labeling the occurrence of very short duration events. We propose a general learning framework for this problem that can accommodate different base classifiers and noise models. We present results on real mobile health data showing that the proposed framework significantly outperforms a number of alternatives including assuming that the label time stamps are noise-free, transforming the problem into the multiple instance learning framework, and learning on labels that were manually re-aligned.

Dadkhahi, Hamid, and Benjamin Marlin Learning Tree-Structured Detection Cascades for Heterogeneous Networks of Embedded Devices. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining., 2017. Abstractfp0911-dadkhahia.pdf

In this paper, we present a new approach to learning cascaded classifiers for use in computing environments that involve networks of heterogeneous and resource-constrained, low-power embedded compute and sensing nodes. We present a generalization of the classical linear detection cascade to the case of tree-structured cascades where different branches of the tree execute on different physical compute nodes in the network. Different nodes have access to different features, as well as access to potentially different computation and energy resources. We concentrate on the problem of jointly learning the parameters for all of the classifiers in the cascade given a fixed cascade architecture and a known set of costs required to carry out the computation at each node. To accomplish the objective of joint learning of all detectors, we propose a novel approach to combining classifier outputs during training that better matches the hard cascade setting in which the learned system will be deployed. This work is motivated by research in the area of mobile health where energy efficient real time detectors integrating information from multiple wireless on-body sensors and a smart phone are needed for real-time monitoring and the delivery of just-in-time adaptive interventions. We evaluate our framework on mobile sensor-based human activity recognition and mobile health detector learning problems.

Kumar, Santosh, and others. "Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K)." IEEE Pervasive Computing. 16 (2017): 18-22. AbstractWebsite

The Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K) is enabling the collection of high-frequency mobile sensor data for the development and validation of novel multisensory biomarkers and sensor-triggered interventions.

Dadkhahi, Hamid, Marco F. Duarte, and Benjamin M. Marlin. "Out-of-Sample Extension for Dimensionality Reduction of Noisy Time Series." IEEE Transactions on Image Processing. 26 (2017): 5435-5446. Abstract1606.08282.pdf

This paper proposes an out-of-sample extension framework for a global manifold learning algorithm (Isomap) that uses temporal information in out-of-sample points in order to make the embedding more robust to noise and artifacts. Given a set of noise-free training data and its embedding, the proposed framework extends the embedding for a noisy time series. This is achieved by adding a spatio-temporal compactness term to the optimization objective of the embedding. To the best of our knowledge, this is the first method for out-of-sample extension of manifold embeddings that leverages timing information available for the extension set. Experimental results demonstrate that our out-of-sample extension algorithm renders a more robust and accurate embedding of sequentially ordered image data in the presence of various noise and artifacts when compared with other timing-aware embeddings. Additionally, we show that an out-of-sample extension framework based on the proposed algorithm outperforms the state of the art in eye-gaze estimation.

Soha, Rostaminia, Mayberry Addison, Ganesan Deepak, Marlin Benjamin, and Gummeson Jeremy. "iLid: Low-power Sensing of Fatigue and Drowsiness Measures on a Computational Eyeglass." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 1 (2017): 23. Abstractubicomp17-ilid.pdf

The ability to monitor eye closures and blink patterns has long been known to enable accurate assessment of fatigue and drowsiness in individuals. Many measures of the eye are known to be correlated with fatigue including coarse-grained measures like the rate of blinks as well as fine-grained measures like the duration of blinks and the extent of eye closures. Despite a plethora of research validating these measures, we lack wearable devices that can continually and reliably monitor them in the natural environment. In this work, we present a low-power system, iLid, that can continually sense fine-grained measures such as blink duration and Percentage of Eye Closures (PERCLOS) at high frame rates of 100fps. We present a complete solution including design of the sensing, signal processing, and machine learning pipeline; implementation on a prototype computational eyeglass platform; and extensive evaluation under many conditions including illumination changes, eyeglass shifts, and mobility. Our results are very encouraging, showing that we can detect blinks, blink duration, eyelid location, and fatigue-related metrics such as PERCLOS with less than a few percent error.

Nguyen, Thai, Roy J. Adams, Annamalai Natarajan, and Benjamin M. Marlin Parsing Wireless Electrocardiogram Signals with the CRF-CFG Model. Conference on Uncertainty in Artificial Intelligence Machine Learning for Health Workshop., 2016. Abstractnguyen-uai-health2016.pdf

Recent advances in wearable sensor technology have made it possible to simultaneously collect multiple streams of physiological and context data from individuals as they go about their daily activities in natural environments. However, extracting reliable higher-level inferences from these raw data streams remains a key data analysis challenge. In this paper, we focus on the specific case of the analysis of data from wireless electrocardiogram (ECG) sensors. We present a new robust probabilistic approach to ECG morphology extraction using conditional random field context free grammar models, which have traditionally been applied to parsing problems in natural language processing. We introduce a robust context free grammar for parsing noisy ECG data, and show significantly improved performance on the ECG morphological labeling task.

Chiu, Meng-Chieh, Benjamin Marlin, and Eliot Moss. "Real-Time Program-Specific Phase Change Detection for Java Programs." 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools. 2016. Abstracta12-chiu.pdf

It is well-known that programs tend to have multiple phases in their execution. Because phases have impact on micro-architectural features such as caches and branch predictors, they are relevant to program performance and energy consumption. They are also relevant to detecting whether a program is executing as expected or is encountering unusual or exceptional conditions, a software engineering and program monitoring concern. We offer here a method for real-time phase change detection in Java programs. After applying a training protocol to a program of interest, our method can detect phase changes at run time for that program with good precision and recall (compared with a “ground truth” definition of phases) and with small performance impact (average less than 2%). We also offer improved methodology for evaluating phase change detection mechanisms. In sum, our approach offers the first known implementation of real-time phase detection for Java programs.

Nguyen, Thai, Roy J. Adams, Annamalai Natarajan, and Benjamin M. Marlin. "Parsing Wireless Electrocardiogram Signals with Context Free Grammar Conditional Random Fields." IEEE Wireless Health. 2016. Abstractnguyen-wh2016.pdf

Recent advances in wearable sensor technology have made it possible to simultaneously collect multiple streams of physiological and context data from individuals as they go about their daily activities in natural environments. However, extracting reliable higher-level inferences from these raw data streams remains a key data analysis challenge. In this paper, we focus on the specific case of the analysis of data from wireless electrocardiogram (ECG) sensors. We present a new robust probabilistic approach to ECG morphology extraction using conditional random field context free grammar models, which have traditionally been applied to parsing problems in natural language processing. We introduce a robust context free grammar for parsing noisy ECG data, and show significantly improved performance on the ECG morphological labeling task.

Funded Projects

[2017-2020] Enhancing Context-Awareness and Personalization for Intensively Adaptive Smoking Cessation Messaging Interventions. See NSF award listing.

[2017-2022] Alliance for IoBT Research on Evolving Intelligent Goal-driven Networks (IoBT-REIGN) (with Prashant Shenoy, UMass PI. UIUC prime to ARL.). See ARL and UMass Amherst press releases, and the IoBT website.

[2017-2020]  mPerf: A Theory-driven Approach to Model and Predict Everyday Job Performance Using Mobile Sensors (with Deepak Ganesan, UMass PI. U. Memphis prime to IARPA). See project website.

[2014-2018] Center of Excellence for Mobile Sensor Data to Knowledge (with Santosh Kumar, U. Memphis, PI). See center website.

[2014-2019]. NSF CAREER: Machine Learning for Complex Health Data Analytics.

[2013-2016] Accurate and Computationally Efficient Predictors of Java Memory Resource Consumption (with Eliot Moss, PI).

[2012-2015]  SensEye: An Architecture for Ubiquitous, Real-Time Visual Context Sensing and Inference (with Deepak Ganesan, PI).

[2012-2015]  Patient Experience Recommender System for Persuasive Communication Tailoring (with Tom Houston, UMMS, PI).

[2012-2014] Foresight and Understanding from Scientific Exposition (With Andrew McCallum, PI and Raytheon BBN Technologies)