Rohit Girdhar
Rohit Girdhar
Home
Projects
Light
Dark
Automatic
Selected
The Llama 3 Herd of Models
State-of-the-Art open-source LLM with multimodal capabilities
Llama3 team (co-lead the video recognition efforts)
PDF
Cite
Code
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
A simple and effective approach to high-quality video generation by learning to animate high quality images.
Rohit Girdhar
,
Mannat Singh
,
Andrew Brown
,
Quentin Duval
,
Samaneh Azadi
,
Sai Saketh Rambhatla
,
Akbar Shah
,
Xi Yin
,
Devi Parikh
,
Ishan Misra
PDF
Cite
Demo
ImageBind: One Embedding Space To Bind Them All
One embedding space for 6 different modalities, enables zero-shot recognition on all modalities!
Rohit Girdhar
,
Alaaeldin El-Nouby
,
Zhuang Liu
,
Mannat Singh
,
Kalyan Vasudev Alwala
,
Armand Joulin
,
Ishan Misra
PDF
Cite
Video
Code
Learning Video Representations from Large Language Models
Leveraging LLMs to auto-annotate videos for representation learning.
Yue Zhao
,
Ishan Misra
,
Philipp Krähenbühl
,
Rohit Girdhar
PDF
Cite
Colab
Code
Omnivore: A Single Model for Many Visual Modalities
A single model for images, video and single-view 3D.
Rohit Girdhar
,
Mannat Singh
,
Nikhila Ravi
,
Laurens van der Maaten
,
Armand Joulin
,
Ishan Misra
PDF
Cite
Code
Ego4D: Around the World in 3,000 Hours of Egocentric Video
The largest egocentric video dataset.
Kristen Grauman
,
Andrew Westbury
,
Rohit Girdhar
,
et al
PDF
Cite
Video
Code
Masked-attention Mask Transformer for Universal Image Segmentation
Single architecture state-of-the-art in instance, semantic and panoptic segmentation.
Bowen Cheng
,
Ishan Misra
,
Alexander G. Schwing
,
Alexander Kirillov
,
Rohit Girdhar
PDF
Cite
Code
Anticipative Video Transformer
An autoregressive video transformer architecture for action anticipation in videos.
Rohit Girdhar
,
Kristen Grauman
PDF
Cite
Code
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
A dataset to evaluate temporal reasoning in video models.
Rohit Girdhar
,
Deva Ramanan
PDF
Cite
Slides
Video
Code
Video Action Transformer Network
Among the first applications of Transformers to model videos. SOTA results: close 2nd at AVA Challenge, CVPR'18.
Rohit Girdhar
,
João Carreira
,
Carl Doersch
,
Andrew Zisserman
PDF
Cite
Video
»
Cite
×