Rohit Girdhar
Rohit Girdhar
Home
Projects
Light
Dark
Automatic
Video
Learning Video Representations from Large Language Models
Leveraging LLMs to auto-annotate videos for representation learning.
Yue Zhao
,
Ishan Misra
,
Philipp Krähenbühl
,
Rohit Girdhar
PDF
Cite
Colab
Code
OmniMAE: Single Model Masked Pretraining on Images and Videos
Single self-supervised representation for images and videos.
Rohit Girdhar
,
Alaaeldin El-Nouby
,
Mannat Singh
,
Kalyan Vasudev Alwala
,
Armand Joulin
,
Ishan Misra
PDF
Cite
Video
Code
Omnivore: A Single Model for Many Visual Modalities
A single model for images, video and single-view 3D.
Rohit Girdhar
,
Mannat Singh
,
Nikhila Ravi
,
Laurens van der Maaten
,
Armand Joulin
,
Ishan Misra
PDF
Cite
Code
Ego4D: Around the World in 3,000 Hours of Egocentric Video
The largest egocentric video dataset.
Kristen Grauman
,
Andrew Westbury
,
Rohit Girdhar
,
et al
PDF
Cite
Video
Code
Mask2Former for Video Instance Segmentation
SOTA video segmentation using Mask2Former.
Bowen Cheng
,
Anwesa Choudhuri
,
Ishan Misra
,
Alexander Kirillov
,
Rohit Girdhar
,
Alexander G. Schwing
PDF
Cite
Code
Anticipative Video Transformer
An autoregressive video transformer architecture for action anticipation in videos.
Rohit Girdhar
,
Kristen Grauman
PDF
Cite
Code
Physical Reasoning Using Dynamics Aware Embeddings
Self-supervised representations for physical reasoning.
Eltayeb Ahmed
,
Anton Bakhtin
,
Laurens van der Maaten
,
Rohit Girdhar
PDF
Cite
Code
Forward Prediction for Physical Reasoning
Forward prediction for PHYRE benchmark.
Rohit Girdhar
,
Laura Gustafson
,
Aaron Adcock
,
Laurens van der Maaten
PDF
Cite
Code
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
A dataset to evaluate temporal reasoning in video models.
Rohit Girdhar
,
Deva Ramanan
PDF
Cite
Slides
Video
Code
MetaPix: Few-Shot Video Retargeting
A dataset to evaluate temporal reasoning in video models.
Jessica Lee
,
Deva Ramanan
,
Rohit Girdhar
PDF
Cite
Slides
Video
Code
«
»
Cite
×