Clustering

Chiu, Meng-Chieh, Benjamin Marlin, and Eliot Moss. "Real-Time Program-Specific Phase Change Detection for Java Programs." 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools. 2016. Abstracta12-chiu.pdf

It is well-known that programs tend to have multiple phases in their execution. Because phases have impact on micro-architectural features such as caches and branch predictors, they are relevant to program performance and energy consumption. They are also relevant to detecting whether a program is executing as expected or is encountering unusual or exceptional conditions, a software engineering and program monitoring concern. We offer here a method for real-time phase change detection in Java programs. After applying a training protocol to a program of interest, our method can detect phase changes at run time for that program with good precision and recall (compared with a “ground truth” definition of phases) and with small performance impact (average less than 2%). We also offer improved methodology for evaluating phase change detection mechanisms. In sum, our approach offers the first known implementation of real-time phase detection for Java programs.

Marlin, Benjamin M., David C. Kale, Robinder G. Khemani, and Randall C. Wetzel. "Unsupervised pattern discovery in electronic health care data using probabilistic clustering models." IHI. 2012. 389-398. Abstractehr_clustering_ihi2012_paper.pdf

Bedside clinicians routinely identify temporal patterns in physiologic data in the process of choosing and administering treatments intended to alter the course of critical illness for individual patients. Our primary interest is the study of unsupervised learning techniques for automatically uncovering such patterns from the physiologic time series data contained in electronic health care records. This data is sparse, high-dimensional and often both uncertain and incomplete. In this paper, we develop and study a probabilistic clustering model designed to mitigate the effects of temporal sparsity inherent in electronic health care records data. We evaluate the model qualitatively by visualizing the learned cluster parameters and quantitatively in terms of its ability to predict mortality outcomes associated with patient episodes. Our results indicate that the model can discover distinct, recognizable physiologic patterns with prognostic significance.