Rohit Girdhar
Rohit Girdhar
Home
Projects
Light
Dark
Automatic
Representation
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Self-supervised embedding to learn how actions sound from narrated in-the-wild egocentric videos.
Changan Chen
,
Ashutosh Kumar
,
Rohit Girdhar
,
David Harwath
,
Kristen Grauman
PDF
Cite
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
SOTA unsupervised video segmentation using CutLER.
Xudong Wang
,
Ishan Misra
,
Ziyun Zeng
,
Rohit Girdhar
,
Trevor Darrell
PDF
Cite
Code
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Scaling up MAE pre-pretraining, followed by weakly supervised pretraining, leads to strong representations.
Mannat Singh
,
Quentin Duval
,
Kalyan Vasudev Alwala
,
Haoqi Fan
,
Vaibhav Aggarwal
,
Aaron Adcock
,
Armand Joulin
,
Piotr Dollár
,
Christoph Feichtenhofer
,
Ross Girshick
,
Rohit Girdhar
,
Ishan Misra
PDF
Cite
Code
CutLER: Cut and Learn for Unsupervised Object Detection and Instance Segmentation
Discovering objects using DINO features, and learning an unsupervised detection + segmentation model
Xudong Wang
,
Rohit Girdhar
,
Stella X. Yu
,
Ishan Misra
PDF
Cite
Code
OmniMAE: Single Model Masked Pretraining on Images and Videos
Single self-supervised representation for images and videos.
Rohit Girdhar
,
Alaaeldin El-Nouby
,
Mannat Singh
,
Kalyan Vasudev Alwala
,
Armand Joulin
,
Ishan Misra
PDF
Cite
Video
Code
Omnivore: A Single Model for Many Visual Modalities
A single model for images, video and single-view 3D.
Rohit Girdhar
,
Mannat Singh
,
Nikhila Ravi
,
Laurens van der Maaten
,
Armand Joulin
,
Ishan Misra
PDF
Cite
Code
Detecting Twenty-thousand Classes using Image-level Supervision
Leverages image classification data to build an object detector
Xingyi Zhou
,
Rohit Girdhar
,
Armand Joulin
,
Philipp Krähenbühl
,
Ishan Misra
PDF
Cite
Colab
Code
Self-Supervised Pretraining of 3D Features on any Point-Cloud
SOTA 3D detection/segmentation results by learning contrastive representations on 3D data
Zaiwei Zhang
,
Rohit Girdhar
,
Armand Joulin
,
Ishan Misra
PDF
Cite
Code
DistInit: Learning Video Representations Without a Single Labeled Video
Distilling representations from image models to video models.
Rohit Girdhar
,
Du Tran
,
Lorenzo Torresani
,
Deva Ramanan
PDF
Cite
Cite
×