Latent Variable Models

Adams, Roy J., Rajani S. Sadasivam, Kavitha Balakrishnan, Rebecca L. Kinney, Thomas K. Houston, and Benjamin M. Marlin. "PERSPeCT: Collaborative Filtering for Tailored Health Communications." Proceedings of the 8th ACM Conference on Recommender Systems. RecSys '14. New York, NY, USA: ACM, 2014. 329-332. Abstractperspect-recsys14.pdf


The goal of computer tailored health communications (CTHC) is to elicit healthy behavior changes by sending motivational messages personalized to individual patients. One prominent weakness of many existing CTHC systems is that they are based on expert-written rules and thus have no ability to learn from their users over time. One solution to this problem is to develop CTHC systems based on the principles of collaborative filtering, but this approach has not been widely studied. In this paper, we present a case study evaluating nine rating prediction methods for use in the Patient Experience Recommender System for Persuasive Communication Tailoring, a system developed for use in a clinical trial of CTHC-based smoking cessation support interventions.

Khan, Mohammad Emtiyaz, Shakir Mohamed, Benjamin M. Marlin, and Kevin P. Murphy. "A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models." AISTATS. 2012. 610-618. Abstractsblgm-aistats2012-paper.pdf

The development of accurate models and efficient algorithms for the analysis of multivariate categorical data are important and long-standing problems in machine learning and computational statistics. In this paper, we focus on modeling categorical data using Latent Gaussian Models (LGMs). We propose a novel stick-breaking likelihood function for categorical LGMs that exploits accurate linear and quadratic bounds on the logistic log-partition function, leading to an effective variational inference and learning framework. We thoroughly compare our approach to existing algorithms for multinomial logit/probit likelihoods on several problems, including inference in multinomial Gaussian process classification and learning in latent factor models. Our extensive comparisons demonstrate that our stick-breaking model effectively captures correlation in discrete data and is well suited for the analysis of categorical data.

Marlin, Benjamin M., Mohammad Emtiyaz Khan, and Kevin P. Murphy. "Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models." ICML. 2011. 633-640. Abstractpiecewise_lgm_icml11_paper.pdf

Bernoulli-logistic latent Gaussian models (bLGMs) are a useful model class, but accurate parameter estimation is complicated by the fact that the marginal likelihood contains an intractable logistic-Gaussian integral. In this work, we propose the use of fixed piecewise linear and quadratic upper bounds to the logistic-log-partition (LLP) function as a way of circumventing this intractable integral. We describe a framework for approximately computing minimax optimal piecewise quadratic bounds, as well a generalized expectation maximization algorithm based on using piecewise bounds to estimate bLGMs. We prove a theoretical result relating the maximum error in the LLP bound to the maximum error in the marginal likelihood estimate. Finally, we present empirical results showing that piecewise bounds can be significantly more accurate than previously proposed variational bounds.

Marlin, Benjamin M., Richard S. Zemel, Sam T. Roweis, and Malcolm Slaney. "Recommender Systems, Missing Data and Statistical Model Estimation." IJCAI. 2011. 2686-2691. Abstractmissing_data_ijcai11_paper.pdf

The goal of rating-based recommender systems is to make personalized predictions and recommendations for individual users by leveraging the preferences of a community of users with respect to a collection of items like songs or movies. Recommender systems are often based on intricate statistical models that are estimated from data sets containing a very high proportion of missing ratings. This work describes evidence of a basic incompatibility between the properties of recommender system data sets and the assumptions required for valid estimation and evaluation of statistical models in the presence of missing data. We discuss the implications of this problem and describe extended modelling and evaluation frameworks that attempt to circumvent it. We present prediction and ranking results showing that models developed and tested under these extended frameworks can significantly outperform standard models.

Marlin, Benjamin M., and Kevin P. Murphy. "Sparse Gaussian graphical models with unknown block structure." ICML. 2009. 89. Abstract

Recent work has shown that one can learn the structure of Gaussian Graphical Models by imposing an L1 penalty on the precision matrix, and then using efficient convex optimization methods to find the penalized maximum likelihood estimate. This is similar to performing MAP estimation with a prior that prefers sparse graphs. In this paper, we use the stochastic block model as a prior. This prefer graphs that are blockwise sparse, but unlike previous work, it does not require that the blocks or groups be specified a priori. The resulting problem is no longer convex, but we devise an efficient variational Bayes algorithm to solve it. We show that our method has better test set likelihood on two different datasets (motion capture and gene expression) compared to independent L1, and can match the performance of group L1 using manually created groups.

Marlin, Benjamin M., and Richard S. Zemel. "Collaborative prediction and ranking with non-random missing data." RecSys. 2009. 5-12. Abstract

A fundamental aspect of rating-based recommender systems is the observation process, the process by which users choose the items they rate. Nearly all research on collaborative filtering and recommender systems is founded on the assumption that missing ratings are missing at random. The statistical theory of missing data shows that incorrect assumptions about missing data can lead to biased parameter estimation and prediction. In a recent study, we demonstrated strong evidence for violations of the missing at random condition in a real recommender system. In this paper we present the first study of the effect of non-random missing data on collaborative ranking, and extend our previous results regarding the impact of non-random missing data on collaborative prediction.

Marlin, Benjamin M., Mark W. Schmidt, and Kevin P. Murphy. "Group Sparse Priors for Covariance Estimation." UAI. 2009. 383-392. Abstract

Recently it has become popular to learn sparse Gaussian graphical models (GGMs) by imposing l1 or group l1,2 penalties on the elements of the precision matrix. This penalized likelihood approach results in a tractable convex optimization problem. In this paper, we reinterpret these results as performing MAP estimation under a novel prior which we call the group l1 and l1,2 positive definite matrix distributions. This enables us to build a hierarchical model in which the l1 regularization terms vary depending on which group the entries are assigned to, which in turn allows us to learn block structured sparse GGMs with unknown group assignments. Exact inference in this hierarchical model is intractable, due to the need to compute the normalization constant of these matrix distributions. However, we derive upper bounds on the partition functions, which lets us use fast variational inference (optimizing a lower bound on the joint posterior). We show that on two real world data sets (motion capture and financial data), our method which infers the block structure outperforms a method that uses a fixed block structure, which in turn outperforms baseline methods that ignore block structure.

Marlin, Benjamin M., Richard S. Zemel, Sam T. Roweis, and Malcolm Slaney. "Collaborative Filtering and the Missing at Random Assumption." UAI. 2007. 267-275. Abstract

Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly different properties than ratings of user-selected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does affect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to significant improvements in prediction performance on the random sample of ratings.

Marlin, Benjamin M., and Richard S. Zemel. "The multiple multiplicative factor model for collaborative filtering." ICML. 2004. Abstract

We describe a class of causal, discrete latent variable models called Multiple Multiplicative Factor models (MMFs). A data vector is represented in the latent space as a vector of factors that have discrete, non-negative expression levels. Each factor proposes a distribution over the data vector. The distinguishing feature of MMFs is that they combine the factors' proposed distributions multiplicatively, taking into account factor expression levels. The product formulation of MMFs allow factors to specialize to a subset of the items, while the causal generative semantics mean MMFs can readily accommodate missing data. This makes MMFs distinct from both directed models with mixture semantics and undirected product models. In this paper we present empirical results from the collaborative filtering domain showing that a binary/multinomial MMF model matches the performance of the best existing models while learning an interesting latent space description of the users.

Marlin, Benjamin M. "Modeling User Rating Profiles For Collaborative Filtering." NIPS. 2003. Abstract

In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model, and LDA, but has clear advantages over each.