Research Interests

My research interests lie at the intersection of artificial intelligence, machine learning, and statistics. I am particularly interested in hierarchical graphical models and approximate inference/learning techniques including dynamic programming, Markov Chain Monte Carlo and variational Bayesian methods. My current research has a particular emphasis on models and algorithms for multivariate time series data and explores both probabilistic and neural network-based models and their combination.

Thanks to awards from ARL, IARPA, NSF and NIH, my current application focus is on machine learning-based analytics for mobile and wearable sensor data, as well as electronic health records data. I am also interested in large-scale, real-time, heterogeneous distributed machine learning systems that bridge mobile and embedded computing with cloud-based systems including distributed prediction cascades and distributed real-time active learning. My research group collaborates widely with researchers in mobile and distributed computing, mobile health, behavioral science, and medicine.

In the past, I have worked on a broad range of applications including collaborative filtering and ranking, unsupervised structure discovery and feature induction, object recognition and image labeling, and natural language processing, and I continue to consult on projects in these areas.

Recent Publications

Li, Steven Cheng-Xian, and Benjamin M. Marlin Classification of Sparse and Irregularly Sampled Time Series with Mixtures of Expected Gaussian Kernels and Random Features. 31st Conference on Uncertainty in Artificial Intelligence., 2015. Abstractli-uai2015.pdf

This paper presents a kernel-based framework for classification of sparse and irregularly sampled time series. The properties of such time series can result in substantial uncertainty about the values of the underlying temporal processes, while making the data difficult to deal with using standard classification methods that assume fixed-dimensional feature spaces. To address these challenges, we propose to first re-represent each time series through the Gaussian process (GP) posterior it induces under a GP regression model. We then define kernels over the space of GP posteriors and apply standard kernel-based classification. Our primary contributions are (i) the development of a kernel between GPs based on the mixture of kernels between their finite marginals, (ii) the development and analysis of extensions of random Fourier features for scaling the proposed kernel to large-scale data, and (iii) an extensive empirical analysis of both the classification performance and scalability of our proposed approach.

Huang, Haibin, Evangelos Kalogerakis, and Benjamin Marlin. "Analysis and synthesis of 3D shape families via deep-learned generative models of surfaces." Symposium on Geometry Processing. 2015. Abstracthuang-sgp2015.pdf

We present a method for joint analysis and synthesis of geometrically diverse 3D shape families. Our method first learns part-based templates such that an optimal set of fuzzy point and part correspondences is computed between the shapes of an input collection based on a probabilistic deformation model. In contrast to previous template-based approaches, the geometry and deformation parameters of our part-based templates are learned from scratch. Based on the estimated shape correspondence, our method also learns a probabilistic generative model that hierarchically captures statistical relationships of corresponding surface point positions and parts as well as their existence in the input shapes. A deep learning procedure is used to capture these hierarchical relationships. The resulting generative model is used to produce control point arrangements that drive shape synthesis by combining and deforming parts from the input collection. The generative model also yields compact shape descriptors that are used to perform fine-grained classification. Finally, it can be also coupled with the probabilistic deformation model to further improve shape correspondence. We provide qualitative and quantitative evaluations of our method for shape correspondence, segmentation, fine-grained classification and synthesis. Our experiments demonstrate superior correspondence and segmentation results than previous state-of-the-art approaches.

Mayberry, Addison, Yamin Tun, Pan Hu, Duncan SmithFreedman, Deepak Ganesan, Benjamin Marlin, and Christopher Salthouse CIDER: Enabling RobustnessPower Tradeoffs on a Computational Eyeglass. 21st Annual International Conference on Mobile Computing and Networking., 2015. Abstractcider.pdf

The human eye offers a fascinating window into an individual's health, cognitive attention, and decision making, but we lack the ability to continually measure these parameters in the natural environment. The challenges lie in: a) handling the complexity of continuous high-rate sensing from a camera and processing the image stream to estimate eye parameters, and b) dealing with the wide variability in illumination conditions in the natural environment. This paper explores the power--robustness tradeoffs inherent in the design of a wearable eye tracker, and proposes a novel staged architecture that enables graceful adaptation across the spectrum of real-world illumination. We propose, a system that operates in a highly optimized low-power mode under indoor settings by using a fast Search-Refine controller to track the eye, but detects when the environment switches to more challenging outdoor sunlight and switches models to operate robustly under this condition. Our design is holistic and tackles a) power consumption in digitizing pixels, estimating pupillary parameters, and illuminating the eye via near-infrared, b) error in estimating pupil center and pupil dilation, and c) model training procedures that involve zero effort from a user. We demonstrate that the system can estimate pupil center with error less than two pixels, and pupil diameter with error of one pixel (0.22mm). Our end-to-end results show that we can operate at power levels of roughly 7mW at a 4Hz eye tracking rate, or roughly 32mW at rates upwards of 250Hz.

Saleheen, Nazir, Amin Ali, Syed Monowar Hossain, Hillol Sarker, Soujanya Chatterjee, Benjamin Marlin, Emre Ertin, Mustafa al'Absi, and Santosh Kumar puffMarker : A Multi-Sensor Approach for Pinpointing the Timing of First Lapse in Smoking Cessation. 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing., 2015. Abstractpuff-marker.pdf

Smoking is the leading cause of preventable deaths. Mobile technologies can help to deliver just-in-time-interventions to abstinent smokers and assist them in resisting urges to lapse. Doing so, however, it requires identification of high-risk situations that may lead an abstinent smoker to relapse. In this paper, we propose an explainable model for detecting smoking lapses in newly abstinent smokers using respiration and 6-axis inertial sensors worn on wrists. We propose a novel method by identifying windows of data that represent the hand at the mouth. We then develop a model to classify into puff or non-puff. On the training data, the model achieves a recall rate of 98%, for a FP rate of 1.5%. When the model is applied to the data collected from 13 abstainers, the false positive rate is 0.3/hour. Among 15 lapsers, the model is able to pinpoint the timing of first lapse in 13 participants.

Kumar, S., and others. "Center of excellence for mobile sensor Data-to-Knowledge (MD2K)." Journal of the American Medical Informatics Association. 22.6 (2015): 1137-1142. AbstractFull Text

Mobile sensor data-to-knowledge (MD2K) was chosen as one of 11 Big Data Centers of Excellence by the National Institutes of Health, as part of its Big Data-to-Knowledge initiative. MD2K is developing innovative tools to streamline the collection, integration, management, visualization, analysis, and interpretation of health data generated by mobile and wearable sensors. The goal of the big data solutions being developed by MD2K is to reliably quantify physical, biological, behavioral, social, and environmental factors that contribute to health and disease risk. The research conducted by MD2K is targeted at improving health through early detection of adverse health events and by facilitating prevention. MD2K will make its tools, software, and training materials widely available and will also organize workshops and seminars to encourage their use by researchers and clinicians.

Iyengar, Srinivasan, Sandeep Kalra, Anushree Ghosh, David Irwin, Prashant Shenoy, and Benjamin Marlin. "iProgram: Inferring Smart Schedules for Dumb Thermostats." 10th Annual Women in Machine Learning Workshop. 2015. Abstract

Heating, ventilation, and air conditioning (HVAC) accounts for over 50% of a typical home's energy usage. A thermostat generally controls HVAC usage in a home to ensure user comfort. In this paper, we focus on making existing "dumb" programmable thermostats smart by applying energy analytics on smart meter data to infer home occupancy patterns and compute an optimized thermostat schedule. Utilities with smart meter deployments are capable of immediately applying our approach, called iProgram, to homes across their customer base. iProgram addresses new challenges in inferring home occupancy from smart meter data where i) training data is not available and ii) the thermostat schedule may be misaligned with occupancy, frequently resulting in high power usage during unoccupied periods. iProgram translates occupancy patterns inferred from opaque smart meter data into a custom schedule for existing types of programmable thermostats, e.g., 1-day, 7-day, etc. We implement iProgram as a web service and show that it reduces the mismatch time between the occupancy pattern and the thermostat schedule by a median value of 44.28 minutes (out of 100 homes) when compared to a default 8am-6pm weekday schedule, with a median deviation of 30.76 minutes off the optimal schedule. Further, iProgram yields a daily energy saving of 0.42kWh on average across the 100 homes. Utilities may use iProgram to recommend thermostat schedules to customers and provide them estimates of potential energy savings in their energy bills.

Li, Steven Cheng-Xian, and Benjamin M. Marlin. "Collaborative Multi-Output Gaussian Processes for Collections of Sparse Multivariate Time Series,." NIPS Time Series Workshop. 2015. Abstractli-nips-ts2015.pdf

Collaborative Multi-Output Gaussian Processes (COGPs) are a flexible tool for modeling multivariate time series. They induce correlation across outputs through the use of shared latent processes. While past work has focused on the computational challenges that result from a single multivariate time series with many observed values, this paper explores the problem of fitting the COGP model to collections of many sparse and irregularly sampled multivariate time series. This work is motivated by applications to modeling physiological data (heart rate, blood pressure, etc.) in Electronic Health Records (EHRs).

Adams, Roy J., Edison Thomaz, and Benjamin M. Marlin. "Hierarchical Nested CRFs for Segmentation and Labeling of Physiological Time Series." NIPS Workshop on Machine Learning in Healthcare. 2015. Abstractadams-nips-heath2015.pdf

In this paper, we address the problem of nested hierarchical segmentation
and labeling of time series data. We present a hierarchical
span-based conditional random field framework for this problem that
leverages higher-order factors to enforce the nesting constraints. The framework can
incorporate a variety of additional factors including higher order cardinality
factors. This research is motivated by hierarchical activity recognition problems
in the field of mobile Health (mHealth). We show that the specific model of interest in the mHealth setting supports exact MAP inference in quadratic time. Learning is accomplished in the structured support vector machine framework. We show positive results on real and synthetic data sets.

Funded Projects

[2017-2020] Enhancing Context-Awareness and Personalization for Intensively Adaptive Smoking Cessation Messaging Interventions. See NSF award listing.

[2017-2022] Alliance for IoBT Research on Evolving Intelligent Goal-driven Networks (IoBT-REIGN) (with Prashant Shenoy, UMass PI. UIUC prime to ARL.). See ARL and UMass Amherst press releases, and the IoBT website.

[2017-2020]  mPerf: A Theory-driven Approach to Model and Predict Everyday Job Performance Using Mobile Sensors (with Deepak Ganesan, UMass PI. U. Memphis prime to IARPA). See project website.

[2014-2018] Center of Excellence for Mobile Sensor Data to Knowledge (with Santosh Kumar, U. Memphis, PI). See center website.

[2014-2019]. NSF CAREER: Machine Learning for Complex Health Data Analytics.

[2013-2016] Accurate and Computationally Efficient Predictors of Java Memory Resource Consumption (with Eliot Moss, PI).

[2012-2015]  SensEye: An Architecture for Ubiquitous, Real-Time Visual Context Sensing and Inference (with Deepak Ganesan, PI).

[2012-2015]  Patient Experience Recommender System for Persuasive Communication Tailoring (with Tom Houston, UMMS, PI).

[2012-2014] Foresight and Understanding from Scientific Exposition (With Andrew McCallum, PI and Raytheon BBN Technologies)