Session #03: Multimodal Learning with Deep Boltzmann Machines

NOTICE: To resolve upcoming conflicts with meetings and courses, we are changing our meeting time to Fridays at 4:15pm. The location is still the CS Lab (although we may start getting a room if there are distractions).

Presenter: Thomas Flynn
Date: Friday January 22 2016 at 4:15pm
Location: CS Lab – Rm 4435
Materials: JMLR 2014, NIPS 2012, Paper Website (Includes code)
Paper Abstract: A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a unified representation that fuses modalities together. We find that this representation is useful for classification and information retrieval tasks. The model works by learning a probability density over the space of multimodal inputs. It uses states of latent variables as representations of the input. The model can extract this representation even when some modalities are absent by sampling from the conditional distribution over them and filling them in. Our experimental results on bi-modal data consisting of images and text show that the Multimodal DBM can learn a good generative model of the joint space of image and text inputs that is useful for information retrieval from both unimodal and multimodal queries. We further demonstrate that this model significantly outperforms SVMs and LDA on discriminative tasks. Finally, we compare our model to other deep learning methods, including autoencoders and deep belief networks, and show that it achieves noticeable gains.