Irregular Sampling

Li, Steven Cheng-Xian, and Benjamin M. Marlin A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification. Advances in Neural Information Processing Systems., 2016. Abstractli-nips2016.pdf

We present a general framework for classification of sparse and irregularly-sampled time series. The properties of such time series can result in substantial uncertainty about the values of the underlying temporal processes, while making the data difficult to deal with using standard classification methods that assume fixed-dimensional feature spaces. To address these challenges, we propose an uncertainty-aware classification framework based on a special computational layer we refer to as the Gaussian process adapter that can connect irregularly sampled time series data to to any black-box classifier learnable using gradient descent. We show how to scale up the required computations based on combining the structured kernel interpolation framework and the Lanczos approximation method, and how to discriminatively train the Gaussian process adapter in combination with a number of classifiers end-to-end using backpropagation.

Li, Steven Cheng-Xian, and Benjamin M. Marlin Classification of Sparse and Irregularly Sampled Time Series with Mixtures of Expected Gaussian Kernels and Random Features. 31st Conference on Uncertainty in Artificial Intelligence., 2015. Abstractli-uai2015.pdf

This paper presents a kernel-based framework for classification of sparse and irregularly sampled time series. The properties of such time series can result in substantial uncertainty about the values of the underlying temporal processes, while making the data difficult to deal with using standard classification methods that assume fixed-dimensional feature spaces. To address these challenges, we propose to first re-represent each time series through the Gaussian process (GP) posterior it induces under a GP regression model. We then define kernels over the space of GP posteriors and apply standard kernel-based classification. Our primary contributions are (i) the development of a kernel between GPs based on the mixture of kernels between their finite marginals, (ii) the development and analysis of extensions of random Fourier features for scaling the proposed kernel to large-scale data, and (iii) an extensive empirical analysis of both the classification performance and scalability of our proposed approach.

Li, Steven Cheng-Xian, and Benjamin M. Marlin. "Collaborative Multi-Output Gaussian Processes for Collections of Sparse Multivariate Time Series,." NIPS Time Series Workshop. 2015. Abstractli-nips-ts2015.pdf

Collaborative Multi-Output Gaussian Processes (COGPs) are a flexible tool for modeling multivariate time series. They induce correlation across outputs through the use of shared latent processes. While past work has focused on the computational challenges that result from a single multivariate time series with many observed values, this paper explores the problem of fitting the COGP model to collections of many sparse and irregularly sampled multivariate time series. This work is motivated by applications to modeling physiological data (heart rate, blood pressure, etc.) in Electronic Health Records (EHRs).

Marlin, Benjamin M., David C. Kale, Robinder G. Khemani, and Randall C. Wetzel. "Unsupervised pattern discovery in electronic health care data using probabilistic clustering models." IHI. 2012. 389-398. Abstractehr_clustering_ihi2012_paper.pdf

Bedside clinicians routinely identify temporal patterns in physiologic data in the process of choosing and administering treatments intended to alter the course of critical illness for individual patients. Our primary interest is the study of unsupervised learning techniques for automatically uncovering such patterns from the physiologic time series data contained in electronic health care records. This data is sparse, high-dimensional and often both uncertain and incomplete. In this paper, we develop and study a probabilistic clustering model designed to mitigate the effects of temporal sparsity inherent in electronic health care records data. We evaluate the model qualitatively by visualizing the learned cluster parameters and quantitatively in terms of its ability to predict mortality outcomes associated with patient episodes. Our results indicate that the model can discover distinct, recognizable physiologic patterns with prognostic significance.