Computer Vision

Kae, Andrew, Erik Learned-Miller, and Benjamin M. Marlin The Shape-Time Random Field for Semantic Video Labeling. 2014 IEEE Conference on Computer Vision and Pattern Recognition., 2014. Abstractstrf_cvpr14.pdf

We propose a novel discriminative model for semantic labeling in videos by incorporating a prior to model both the shape and temporal dependencies of an object in video. A typical approach for this task is the conditional random field (CRF), which can model local interactions among adjacent regions in a video frame. Recent work [16, 14] has shown how to incorporate a shape prior into a CRF for improving labeling performance, but it may be difficult to model temporal dependencies present in video by using this prior. The conditional restricted Boltzmann machine (CRBM) can model both shape and temporal dependencies, and has been used to learn walking styles from motion- capture data. In this work, we incorporate a CRBM prior into a CRF framework and present a new state-of-the-art model for the task of semantic labeling in videos. In particular, we explore the task of labeling parts of complex face scenes from videos in the YouTube Faces Database (YFDB). Our combined model outperforms competitive baselines both qualitatively and quantitatively.

Duvenaud, David K., Benjamin M. Marlin, and Kevin P. Murphy. "Multiscale Conditional Random Fields for Semi-supervised Labeling and Classification." CRV. 2011. 371-378. Abstractmultiscale_crv11_paper.pdf

Motivated by the abundance of images labeled only by their captions, we construct tree-structured multiscale conditional random fields capable of performing semi supervised learning. We show that such caption-only data can in fact increase pixel-level accuracy at test time. In addition, we compare two kinds of tree: the standard one with pair wise potentials, and one based on noisy-or potentials, which better matches the semantics of the recursive partitioning used to create the tree.