Research Interests

My research interests lie at the intersection of artificial intelligence, machine learning and statistics. I am particularly interested in hierarchical graphical models and approximate inference/learning techniques including Markov Chain Monte Carlo and variational Bayesian methods. My current research has a particular emphasis on models and algorithms for multivariate time series data. Thanks to recent awards from NSF and NIH, my current applied work is focusing on machine learning-based analytics for clinical and mobile health (mHealth) data. In the past, I have worked on a broad range of applications including collaborative filtering and ranking, unsupervised structure discovery and feature induction, object recognition and image labeling, and natural language processing, and I continue to consult on projects in these areas.

Recent Publications

Nguyen, Thai, Roy J. Adams, Annamalai Natarajan, and Benjamin M. Marlin. "Parsing Wireless Electrocardiogram Signals with Context Free Grammar Conditional Random Fields." IEEE Wireless Health. 2016. Abstractnguyen-wh2016.pdf

Recent advances in wearable sensor technology have made it possible to simultaneously collect multiple streams of physiological and context data from individuals as they go about their daily activities in natural environments. However, extracting reliable higher-level inferences from these raw data streams remains a key data analysis challenge. In this paper, we focus on the specific case of the analysis of data from wireless electrocardiogram (ECG) sensors. We present a new robust probabilistic approach to ECG morphology extraction using conditional random field context free grammar models, which have traditionally been applied to parsing problems in natural language processing. We introduce a robust context free grammar for parsing noisy ECG data, and show significantly improved performance on the ECG morphological labeling task.

Natarajan, Annamalai, Gustavo Angarita, Edward Gaiser, Robert Malison, Deepak Ganesan, and Benjamin Marlin. "Domain Adaptation Methods for Improving Lab-to-field Generalization of Cocaine Detection using Wearable ECG." 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 2016. Abstractnatarajan-ubicomp16.pdf

Mobile health research on illicit drug use detection typically involves a two-stage study design where data to learn detectors is first collected in lab-based trials, followed by a deployment to subjects in a free-living environment to assess detector performance. While recent work has demonstrated the feasibility of wearable sensors for illicit drug use detection in the lab setting, several key problems can limit lab-to-field generalization performance. For example, lab-based data collection often has low ecological validity, the ground-truth event labels collected in the lab may not be available at the same level of temporal granularity in the field, and there can be significant variability between subjects. In this paper, we present domain adaptation methods for assessing and mitigating potential sources of performance loss in lab-to-field generalization and apply them to the problem of cocaine use detection from wearable electrocardiogram sensor data.

Li, Steven Cheng-Xian, and Benjamin M. Marlin A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification. Advances in Neural Information Processing Systems., 2016. Abstractli-nips2016.pdf

We present a general framework for classification of sparse and irregularly-sampled time series. The properties of such time series can result in substantial uncertainty about the values of the underlying temporal processes, while making the data difficult to deal with using standard classification methods that assume fixed-dimensional feature spaces. To address these challenges, we propose an uncertainty-aware classification framework based on a special computational layer we refer to as the Gaussian process adapter that can connect irregularly sampled time series data to to any black-box classifier learnable using gradient descent. We show how to scale up the required computations based on combining the structured kernel interpolation framework and the Lanczos approximation method, and how to discriminatively train the Gaussian process adapter in combination with a number of classifiers end-to-end using backpropagation.

Sadasivam, Rajani Shankar, Erin M. Borglund, Roy Adams, Benjamin M. Marlin, and Thomas K. Houston. "Impact of a Collective Intelligence Tailored Messaging System on Smoking Cessation: The Perspect Randomized Experiment." Journal of Medical Internet Research. 18.11 (2016): e285:1-13. AbstractFull Text


Outside health care, content tailoring is driven algorithmically using machine learning compared to the rule-based approach used in current implementations of computer-tailored health communication (CTHC) systems. A special class of machine learning systems (“recommender systems”) are used to select messages by combining the collective intelligence of their users (ie, the observed and inferred preferences of users as they interact with the system) and their user profiles. However, this approach has not been adequately tested for CTHC.

Our aim was to compare, in a randomized experiment, a standard, evidence-based, rule-based CTHC (standard CTHC) to a novel machine learning CTHC: Patient Experience Recommender System for Persuasive Communication Tailoring (PERSPeCT). We hypothesized that PERSPeCT will select messages of higher influence than our standard CTHC system. This standard CTHC was proven effective in motivating smoking cessation in a prior randomized trial of 900 smokers (OR 1.70, 95% CI 1.03-2.81).

PERSPeCT is an innovative hybrid machine learning recommender system that selects and sends motivational messages using algorithms that learn from message ratings from 846 previous participants (explicit feedback), and the prior explicit ratings of each individual participant. Current smokers (N=120) aged 18 years or older, English speaking, with Internet access were eligible to participate. These smokers were randomized to receive either PERSPeCT (intervention, n=74) or standard CTHC tailored messages (n=46). The study was conducted between October 2014 and January 2015. By randomization, we compared daily message ratings (mean of smoker ratings each day). At 30 days, we assessed the intervention’s perceived influence, 30-day cessation, and changes in readiness to quit from baseline.

The proportion of days when smokers agreed/strongly agreed (daily rating ≥4) that the messages influenced them to quit was significantly higher for PERSPeCT (73%, 23/30) than standard CTHC (44%, 14/30, P=.02). Among less educated smokers (n=49), this difference was even more pronounced for days strongly agree (intervention: 77%, 23/30; comparison: 23%, 7/30, P<.001). There was no significant difference in the frequency which PERSPeCT randomized smokers agreed or strongly agreed that the intervention influenced them to quit smoking (P=.07) and use nicotine replacement therapy (P=.09). Among those who completed follow-up, 36% (20/55) of PERSPeCT smokers and 32% (11/34) of the standard CTHC group stopped smoking for one day or longer (P=.70).

Compared to standard CTHC with proven effectiveness, PERSPeCT outperformed in terms of influence ratings and resulted in similar cessation rates.

Hiatt, Laura, Roy Adams, and Benjamin Marlin. "An Improved Data Representation for Smoking Detection with Wearable Respiration Sensors." IEEE Wireless Health. 2016. hiatt-wh2016.pdf

Late breaking extended abstract.

Iyengar, Srinivasan, Sandeep Kalra, Anushree Ghosh, David Irwin, Prashant Shenoy, and Benjamin Marlin. "iProgram: Inferring Smart Schedules for Dumb Thermostats." 10th Annual Women in Machine Learning Workshop. 2015. Abstract

Heating, ventilation, and air conditioning (HVAC) accounts for over 50% of a typical home's energy usage. A thermostat generally controls HVAC usage in a home to ensure user comfort. In this paper, we focus on making existing "dumb" programmable thermostats smart by applying energy analytics on smart meter data to infer home occupancy patterns and compute an optimized thermostat schedule. Utilities with smart meter deployments are capable of immediately applying our approach, called iProgram, to homes across their customer base. iProgram addresses new challenges in inferring home occupancy from smart meter data where i) training data is not available and ii) the thermostat schedule may be misaligned with occupancy, frequently resulting in high power usage during unoccupied periods. iProgram translates occupancy patterns inferred from opaque smart meter data into a custom schedule for existing types of programmable thermostats, e.g., 1-day, 7-day, etc. We implement iProgram as a web service and show that it reduces the mismatch time between the occupancy pattern and the thermostat schedule by a median value of 44.28 minutes (out of 100 homes) when compared to a default 8am-6pm weekday schedule, with a median deviation of 30.76 minutes off the optimal schedule. Further, iProgram yields a daily energy saving of 0.42kWh on average across the 100 homes. Utilities may use iProgram to recommend thermostat schedules to customers and provide them estimates of potential energy savings in their energy bills.

Li, Steven Cheng-Xian, and Benjamin M. Marlin. "Collaborative Multi-Output Gaussian Processes for Collections of Sparse Multivariate Time Series,." NIPS Time Series Workshop. 2015. Abstractli-nips-ts2015.pdf

Collaborative Multi-Output Gaussian Processes (COGPs) are a flexible tool for modeling multivariate time series. They induce correlation across outputs through the use of shared latent processes. While past work has focused on the computational challenges that result from a single multivariate time series with many observed values, this paper explores the problem of fitting the COGP model to collections of many sparse and irregularly sampled multivariate time series. This work is motivated by applications to modeling physiological data (heart rate, blood pressure, etc.) in Electronic Health Records (EHRs).

Adams, Roy J., Edison Thomaz, and Benjamin M. Marlin. "Hierarchical Nested CRFs for Segmentation and Labeling of Physiological Time Series." NIPS Workshop on Machine Learning in Healthcare. 2015. Abstractadams-nips-heath2015.pdf

In this paper, we address the problem of nested hierarchical segmentation
and labeling of time series data. We present a hierarchical
span-based conditional random field framework for this problem that
leverages higher-order factors to enforce the nesting constraints. The framework can
incorporate a variety of additional factors including higher order cardinality
factors. This research is motivated by hierarchical activity recognition problems
in the field of mobile Health (mHealth). We show that the specific model of interest in the mHealth setting supports exact MAP inference in quadratic time. Learning is accomplished in the structured support vector machine framework. We show positive results on real and synthetic data sets.

Recent Funded Projects

[2016-2017] Improved Systems for Real-World Pervasive Human Sensing (with Deepak Ganesan, PI. DCS Cor. prime to ARL.).

[2014-2018] Center of Excellence for Mobile Sensor Data to Knowledge (with Santosh Kumar, U. Memphis, PI). See center website.

[2014-2019]. NSF CAREER: Machine Learning for Complex Health Data Analytics.

[2013-2016] Accurate and Computationally Efficient Predictors of Java Memory Resource Consumption (with Eliot Moss, PI).

[2012-2015]  SensEye: An Architecture for Ubiquitous, Real-Time Visual Context Sensing and Inference (with Deepak Ganesan, PI).

[2012-2015]  Patient Experience Recommender System for Persuasive Communication Tailoring (with Tom Houston, UMMS, PI).

[2012-2014] Foresight and Understanding from Scientific Exposition (With Andrew McCallum, PI and Raytheon BBN Technologies)