Export 51 results:
Sort by: Author Type
Kae, Andrew, Erik Learned-Miller, and Benjamin M. Marlin The Shape-Time Random Field for Semantic Video Labeling. 2014 IEEE Conference on Computer Vision and Pattern Recognition., 2014. Abstractstrf_cvpr14.pdf

We propose a novel discriminative model for semantic labeling in videos by incorporating a prior to model both the shape and temporal dependencies of an object in video. A typical approach for this task is the conditional random field (CRF), which can model local interactions among adjacent regions in a video frame. Recent work [16, 14] has shown how to incorporate a shape prior into a CRF for improving labeling performance, but it may be difficult to model temporal dependencies present in video by using this prior. The conditional restricted Boltzmann machine (CRBM) can model both shape and temporal dependencies, and has been used to learn walking styles from motion- capture data. In this work, we incorporate a CRBM prior into a CRF framework and present a new state-of-the-art model for the task of semantic labeling in videos. In particular, we explore the task of labeling parts of complex face scenes from videos in the YouTube Faces Database (YFDB). Our combined model outperforms competitive baselines both qualitatively and quantitatively.

Natarajan, Annamalai, Abhinav Parate, Edward Gaiser, Gustavo Angarita, Robert Malison, Benjamin M. Marlin, and Deepak Ganesan. "Detecting cocaine use with wearable electrocardiogram sensors." UbiComp. 2013. 123-132. Abstractcocaine_ubicomp13_paper.pdf

Ubiquitous physiological sensing has the potential to profoundly improve our understanding of human behavior, leading to more targeted treatments for a variety of disorders. The long term goal of this work is development of novel computational tools to support the study of addiction in the context of cocaine use. The current paper takes the first step in this important direction by posing a simple, but crucial question: Can cocaine use be reliably detected using wearable electrocardiogram (ECG) sensors? The main contributions in this paper include the presentation of a novel clinical study of cocaine use, the development of a computational pipeline for inferring morphological features from noisy ECG waveforms, and the evaluation of feature sets for cocaine use detection. Our results show that 32mg/70kg doses of cocaine can be detected with the area under the receiver operating characteristic curve levels above 0.9 both within and between-subjects.

Parate, Abhinav, Meng-Chieh Chiu, Deepak Ganesan, and Benjamin M. Marlin. "Leveraging graphical models to improve accuracy and reduce privacy risks of mobile sensing." MobiSys. 2013. 83-96. Abstractsensing_mobisys13_paper.pdf

The proliferation of sensors on mobile phones and wearables has led to a plethora of context classifiers designed to sense the individual's context. We argue that a key missing piece in mobile inference is a layer that fuses the outputs of several classifiers to learn deeper insights into an individual's habitual patterns and associated correlations between contexts, thereby enabling new systems optimizations and opportunities. In this paper, we design CQue, a dynamic bayesian network that operates over classifiers for individual contexts, observes relations across these outputs across time, and identifies opportunities for improving energy-efficiency and accuracy by taking advantage of relations. In addition, such a layer provides insights into privacy leakage that might occur when seemingly innocuous user context revealed to different applications on a phone may be combined to reveal more information than originally intended. In terms of system architecture, our key contribution is a clean separation between the detection layer and the fusion layer, enabling classifiers to solely focus on detecting the context, and leverage temporal smoothing and fusion mechanisms to further boost performance by just connecting to our higher-level inference engine. To applications and users, CQue provides a query interface, allowing a) applications to obtain more accurate context results while remaining agnostic of what classifiers/sensors are used and when, and b) users to specify what contexts they wish to keep private, and only allow information that has low leakage with the private context to be revealed. We implemented CQue in Android, and our results show that CQue can i) improve activity classification accuracy up to 42%, ii) reduce energy consumption in classifying social, location and activity contexts with high accuracy(>90%) by reducing the number of required classifiers by at least 33%, and iii) effectively detect and suppress contexts that reveal private information.

Parate, Abhinav, Matthias Böhmer, David Chu, Deepak Ganesan, and Benjamin M. Marlin. "Practical prediction and prefetch for faster access to applications on mobile phones." UbiComp. 2013. 275-284. Abstractprefetch_ubicomp13_paper.pdf

Mobile phones have evolved from communication devices to indispensable accessories with access to real-time content. The increasing reliance on dynamic content comes at the cost of increased latency to pull the content from the Internet before the user can start using it. While prior work has explored parts of this problem, they ignore the bandwidth costs of prefetching, incur significant training overhead, need several sensors to be turned on, and do not consider practical systems issues that arise from the limited background processing capability supported by mobile operating systems. In this paper, we make app prefetch practical on mobile phones. Our contributions are two-fold. First, we design an app prediction algorithm, APPM, that requires no prior training, adapts to usage dynamics, predicts not only which app will be used next but also when it will be used, and provides high accuracy without requiring additional sensor context. Second, we perform parallel prefetch on screen unlock, a mechanism that leverages the benefits of prediction while operating within the constraints of mobile operating systems. Our experiments are conducted on long-term traces, live deployments on the Android Play Market, and user studies, and show that we outperform prior approaches to predicting app usage, while also providing practical ways to prefetch application content on mobile phones.

Riedel, Sebastian, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. "Relation Extraction with Matrix Factorization and Universal Schemas." HLT-NAACL. 2013. 74-84. Abstractuniv-schema_naacl13_paper.pdf

Traditional relation extraction predicts relations within some fixed and finite target schema. Machine learning approaches to this task require either manual annotation or, in the case of distant supervision, existing struc- tured sources of the same schema. The need for existing datasets can be avoided by using a universal schema: the union of all in- volved schemas (surface form predicates as in OpenIE, and relations in the schemas of pre- existing databases). This schema has an al- most unlimited set of relations (due to surface forms), and supports integration with existing structured data (through the relation types of existing databases). To populate a database of such schema we present matrix factorization models that learn latent feature vectors for en- tity tuples and relations. We show that such latent models achieve substantially higher accuracy than a traditional classification approach. More importantly, by operating simultaneously on relations observed in text and in pre-existing structured DBs such as Freebase, we are able to reason about unstructured and structured data in mutually-supporting ways. By doing so our approach outperforms state-of-the-art distant supervision.

Marlin, Benjamin M., Roy J. Adams, Rajani Sadasivam, and Thomas K. Houston Towards Collaborative Filtering Recommender Systems for Tailored Health Communications. AMIA 2013 Annual Symposium., 2013. Abstractcthc_recsys13_paper.pdf

The goal of computer tailored health communications (CTHC) is to promote healthy behaviors by sending messages tailored to individual patients. Current CTHC systems collect baseline patient “profiles” and then use expert-written, rule-based systems to target messages to subsets of patients. Our main interest in this work is the study of collaborative filtering-based CTHC systems that can learn to tailor future message selections to individual patients based explicit feedback about past message selections. This paper reports the results of a study designed to collect explicit feedback (ratings) regarding four aspects of messages from 100 subjects in the smoking cessation support domain. Our results show that most users have positive opinions of most messages and that the ratings for all four aspects of the messages are highly correlated with each other. Finally, we conduct a range of rating prediction experiments comparing several different model variations. Our results show that predicting future ratings based on each user’s past ratings contributes the most to predictive accuracy.

Khan, Mohammad Emtiyaz, Shakir Mohamed, Benjamin M. Marlin, and Kevin P. Murphy. "A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models." AISTATS. 2012. 610-618. Abstractsblgm-aistats2012-paper.pdf

The development of accurate models and efficient algorithms for the analysis of multivariate categorical data are important and long-standing problems in machine learning and computational statistics. In this paper, we focus on modeling categorical data using Latent Gaussian Models (LGMs). We propose a novel stick-breaking likelihood function for categorical LGMs that exploits accurate linear and quadratic bounds on the logistic log-partition function, leading to an effective variational inference and learning framework. We thoroughly compare our approach to existing algorithms for multinomial logit/probit likelihoods on several problems, including inference in multinomial Gaussian process classification and learning in latent factor models. Our extensive comparisons demonstrate that our stick-breaking model effectively captures correlation in discrete data and is well suited for the analysis of categorical data.

Marlin, Benjamin M., David C. Kale, Robinder G. Khemani, and Randall C. Wetzel. "Unsupervised pattern discovery in electronic health care data using probabilistic clustering models." IHI. 2012. 389-398. Abstractehr_clustering_ihi2012_paper.pdf

Bedside clinicians routinely identify temporal patterns in physiologic data in the process of choosing and administering treatments intended to alter the course of critical illness for individual patients. Our primary interest is the study of unsupervised learning techniques for automatically uncovering such patterns from the physiologic time series data contained in electronic health care records. This data is sparse, high-dimensional and often both uncertain and incomplete. In this paper, we develop and study a probabilistic clustering model designed to mitigate the effects of temporal sparsity inherent in electronic health care records data. We evaluate the model qualitatively by visualizing the learned cluster parameters and quantitatively in terms of its ability to predict mortality outcomes associated with patient episodes. Our results indicate that the model can discover distinct, recognizable physiologic patterns with prognostic significance.

Marlin, Benjamin M., and Nando de Freitas. "Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood." UAI. 2011. 497-505. Abstractestimators_uai11_paper.pdf

Standard maximum likelihood estimation cannot be applied to discrete energy-based models in the general case because the computation of exact model probabilities is intractable. Recent research has seen the proposal of several new estimators designed specifically to overcome this intractability, but virtually nothing is known about their theoretical properties. In this paper, we present a generalized estimator that unifies many of the classical and recently proposed estimators. We use results from the standard asymptotic theory for M-estimators to derive a generic expression for the asymptotic covariance matrix of our generalized estimator. We apply these results to study the relative statistical efficiency of classical pseudolikelihood and the recently-proposed ratio matching estimator.

Duvenaud, David K., Benjamin M. Marlin, and Kevin P. Murphy. "Multiscale Conditional Random Fields for Semi-supervised Labeling and Classification." CRV. 2011. 371-378. Abstractmultiscale_crv11_paper.pdf

Motivated by the abundance of images labeled only by their captions, we construct tree-structured multiscale conditional random fields capable of performing semi supervised learning. We show that such caption-only data can in fact increase pixel-level accuracy at test time. In addition, we compare two kinds of tree: the standard one with pair wise potentials, and one based on noisy-or potentials, which better matches the semantics of the recursive partitioning used to create the tree.

Swersky, Kevin, Marc'Aurelio Ranzato, David Buchman, Benjamin M. Marlin, and Nando de Freitas. "On Autoencoders and Score Matching for Energy Based Models." ICML. 2011. 1201-1208. Abstractautoencoders_icml11_paper.pdf

We consider estimation methods for the class of continuous-data energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularized autoencoder. We show how different Gaussian EBMs lead to different autoencoder architectures, providing deep links between these two families of models. We compare the score matching estimator for the mPoT model, a particular Gaussian EBM, to several other training methods on a variety of tasks including image denoising and unsupervised feature extraction. We show that the regularization function induced by score matching leads to superior classification performance relative to a standard autoencoder. We also show that score matching yields classification results that are indistinguishable from better-known stochastic approximation maximum likelihood estimators.

Marlin, Benjamin M., Mohammad Emtiyaz Khan, and Kevin P. Murphy. "Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models." ICML. 2011. 633-640. Abstractpiecewise_lgm_icml11_paper.pdf

Bernoulli-logistic latent Gaussian models (bLGMs) are a useful model class, but accurate parameter estimation is complicated by the fact that the marginal likelihood contains an intractable logistic-Gaussian integral. In this work, we propose the use of fixed piecewise linear and quadratic upper bounds to the logistic-log-partition (LLP) function as a way of circumventing this intractable integral. We describe a framework for approximately computing minimax optimal piecewise quadratic bounds, as well a generalized expectation maximization algorithm based on using piecewise bounds to estimate bLGMs. We prove a theoretical result relating the maximum error in the LLP bound to the maximum error in the marginal likelihood estimate. Finally, we present empirical results showing that piecewise bounds can be significantly more accurate than previously proposed variational bounds.

Marlin, Benjamin M., Richard S. Zemel, Sam T. Roweis, and Malcolm Slaney. "Recommender Systems, Missing Data and Statistical Model Estimation." IJCAI. 2011. 2686-2691. Abstractmissing_data_ijcai11_paper.pdf

The goal of rating-based recommender systems is to make personalized predictions and recommendations for individual users by leveraging the preferences of a community of users with respect to a collection of items like songs or movies. Recommender systems are often based on intricate statistical models that are estimated from data sets containing a very high proportion of missing ratings. This work describes evidence of a basic incompatibility between the properties of recommender system data sets and the assumptions required for valid estimation and evaluation of statistical models in the presence of missing data. We discuss the implications of this problem and describe extended modelling and evaluation frameworks that attempt to circumvent it. We present prediction and ranking results showing that models developed and tested under these extended frameworks can significantly outperform standard models.

Marlin, Benjamin M., Kevin Swersky, Bo Chen, and Nando de Freitas. "Inductive Principles for Restricted Boltzmann Machine Learning." AISTATS. 2010. 509-516. Abstract

Recent research has seen the proposal of several new inductive principles designed specifically to avoid the problems associated with maximum likelihood learning in models with intractable partition functions. In this paper, we study learning methods for binary restricted Boltzmann machines (RBMs) based on ratio matching and generalized score matching. We compare these new RBM learning methods to a range of existing learning methods including stochastic maximum likelihood, contrastive divergence, and pseudo-likelihood. We perform an extensive empirical evaluation across multiple tasks and data sets.

Swersky, Kevin, Bo Chen, Benjamin M. Marlin, and Nando de Freitas. "A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets." ITA. 2010. 80-89. Abstract

In this study, we provide a direct comparison of the Stochastic Maximum Likelihood algorithm and Contrastive Divergence for training Restricted Boltzmann Machines using the MNIST data set. We demonstrate that Stochastic Maximum Likelihood is superior when using the Restricted Boltzmann Machine as a classifier, and that the algorithm can be greatly improved using the technique of iterate averaging from the field of stochastic approximation. We further show that training with optimal parameters for classification does not necessarily lead to optimal results when Restricted Boltzmann Machines are stacked to form a Deep Belief Network. In our experiments we observe that fine tuning a Deep Belief Network significantly changes the distribution of the latent data, even though the parameter changes are negligible.

Khan, Mohammad Emtiyaz, Benjamin M. Marlin, Guillaume Bouchard, and Kevin P. Murphy. "Variational bounds for mixed-data factor analysis." NIPS. 2010. 1108-1116. Abstract

We propose a new variational EM algorithm for fitting factor analysis models with mixed continuous and categorical observations. The algorithm is based on a simple quadratic bound to the log-sum-exp function. In the special case of fully observed binary data, the bound we propose is significantly faster than previous variational methods. We show that EM is significantly more robust in the presence of missing data compared to treating the latent factors as parameters, which is the approach used by exponential family PCA and other related matrix-factorization methods. A further benefit of the variational approach is that it can easily be extended to the case of mixtures of factor analyzers, as we show. We present results on synthetic and real data sets demonstrating several desirable properties of our proposed method.

Moghaddam, Baback, Benjamin M. Marlin, Mohammad Emtiyaz Khan, and Kevin P. Murphy. "Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models." NIPS. 2009. 1285-1293. Abstract

In this paper we make several contributions towards accelerating approximate Bayesian structural inference for non-decomposable GGMs. Our first contribution is to show how to efficiently compute a BIC or Laplace approximation to the marginal likelihood of non-decomposable graphs using convex methods for precision matrix estimation. This optimization technique can be used as a fast scoring function inside standard Stochastic Local Search (SLS) for generating posterior samples. Our second contribution is a novel framework for efficiently generating large sets of high-quality graph topologies without performing local search. This graph proposal method, which we call "Neighborhood Fusion" (NF), samples candidate Markov blankets at each node using sparse regression techniques. Our final contribution is a hybrid method combining the complementary strengths of NF and SLS. Experimental results in structural recovery and prediction tasks demonstrate that NF and hybrid NF/SLS out-perform state-of-the-art local search methods, on both synthetic and real-world data sets, when realistic computational limits are imposed.

Marlin, Benjamin M., and Richard S. Zemel. "Collaborative prediction and ranking with non-random missing data." RecSys. 2009. 5-12. Abstract

A fundamental aspect of rating-based recommender systems is the observation process, the process by which users choose the items they rate. Nearly all research on collaborative filtering and recommender systems is founded on the assumption that missing ratings are missing at random. The statistical theory of missing data shows that incorrect assumptions about missing data can lead to biased parameter estimation and prediction. In a recent study, we demonstrated strong evidence for violations of the missing at random condition in a real recommender system. In this paper we present the first study of the effect of non-random missing data on collaborative ranking, and extend our previous results regarding the impact of non-random missing data on collaborative prediction.

Marlin, Benjamin M., Mark W. Schmidt, and Kevin P. Murphy. "Group Sparse Priors for Covariance Estimation." UAI. 2009. 383-392. Abstract

Recently it has become popular to learn sparse Gaussian graphical models (GGMs) by imposing l1 or group l1,2 penalties on the elements of the precision matrix. This penalized likelihood approach results in a tractable convex optimization problem. In this paper, we reinterpret these results as performing MAP estimation under a novel prior which we call the group l1 and l1,2 positive definite matrix distributions. This enables us to build a hierarchical model in which the l1 regularization terms vary depending on which group the entries are assigned to, which in turn allows us to learn block structured sparse GGMs with unknown group assignments. Exact inference in this hierarchical model is intractable, due to the need to compute the normalization constant of these matrix distributions. However, we derive upper bounds on the partition functions, which lets us use fast variational inference (optimizing a lower bound on the joint posterior). We show that on two real world data sets (motion capture and financial data), our method which infers the block structure outperforms a method that uses a fixed block structure, which in turn outperforms baseline methods that ignore block structure.

Marlin, Benjamin M., and Kevin P. Murphy. "Sparse Gaussian graphical models with unknown block structure." ICML. 2009. 89. Abstract

Recent work has shown that one can learn the structure of Gaussian Graphical Models by imposing an L1 penalty on the precision matrix, and then using efficient convex optimization methods to find the penalized maximum likelihood estimate. This is similar to performing MAP estimation with a prior that prefers sparse graphs. In this paper, we use the stochastic block model as a prior. This prefer graphs that are blockwise sparse, but unlike previous work, it does not require that the blocks or groups be specified a priori. The resulting problem is no longer convex, but we devise an efficient variational Bayes algorithm to solve it. We show that our method has better test set likelihood on two different datasets (motion capture and gene expression) compared to independent L1, and can match the performance of group L1 using manually created groups.

Marlin, Benjamin M., Richard S. Zemel, Sam T. Roweis, and Malcolm Slaney. "Collaborative Filtering and the Missing at Random Assumption." UAI. 2007. 267-275. Abstract

Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly different properties than ratings of user-selected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does affect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to significant improvements in prediction performance on the random sample of ratings.

Marlin, Benjamin, and Godfried Toussaint. "Constructing convex 3-polytopes from two triangulations of a polygon." Computational Geometry. 28 (2004): 41-47. AbstractDownload Paper

Guibas conjectured that given a convex polygon P in the xy-plane along with two triangulations of it, T1 and T2 that share no diagonals, it is always possible to assign height values to the vertices of P such that P∪T1∪T2 becomes a convex 3-polytope. Dekster found a counter example but left open the questions of deciding if a given configuration corresponds to a convex 3-polytope, and constructing such realizations when they exist. This paper presents a characterization of realizable configurations for Guibas' conjecture based on work from the area of polytope convexity testing. Our approach to the decision and construction problems is a reduction to a linear-inequality feasibility problem. The approach is also related to methods used for deciding if an arbitrary triangulation of a point set is a regular triangulation. We show two reductions, one based directly on a global convexity condition resulting in number of inequalities that is quadratic in the number of vertices of P, and one based on an equivalent local convexity condition resulting in a linear number of inequalities.

Marlin, Benjamin M., and Richard S. Zemel. "The multiple multiplicative factor model for collaborative filtering." ICML. 2004. Abstract

We describe a class of causal, discrete latent variable models called Multiple Multiplicative Factor models (MMFs). A data vector is represented in the latent space as a vector of factors that have discrete, non-negative expression levels. Each factor proposes a distribution over the data vector. The distinguishing feature of MMFs is that they combine the factors' proposed distributions multiplicatively, taking into account factor expression levels. The product formulation of MMFs allow factors to specialize to a subset of the items, while the causal generative semantics mean MMFs can readily accommodate missing data. This makes MMFs distinct from both directed models with mixture semantics and undirected product models. In this paper we present empirical results from the collaborative filtering domain showing that a binary/multinomial MMF model matches the performance of the best existing models while learning an interesting latent space description of the users.

Boutilier, Craig, Richard S. Zemel, and Benjamin M. Marlin. "Active Collaborative Filtering." UAI. 2003. 98-106. Abstract

Collaborative filtering (CF) allows the preferences of multiple users to be pooled to make recommendations regarding unseen products. We consider in this paper the problem of online and interactive CF: given the current ratings associated with a user, what queries (new ratings) would most improve the quality of the recommendations made? We cast this terms of expected value of information (EVOI); but the online computational cost of computing optimal queries is prohibitive. We show how offline prototyping and computation of bounds on EVOI can be used to dramatically reduce the required online computation. The framework we develop is general, but we focus on derivations and empirical study in the specific case of the multiple-cause vector quantization model.

Marlin, Benjamin M. "Modeling User Rating Profiles For Collaborative Filtering." NIPS. 2003. Abstract

In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model, and LDA, but has clear advantages over each.