About Me

I am a research scientist at Facebook AI Research (FAIR) working on computer vision and machine learning. My current research focuses on modeling people and scenes over (3D) space and time, with applications including 3D recognition, video understanding and physical reasoning. I obtained a PhD from Carnegie Mellon University (CMU) where I worked with Deva Ramanan (here's a link to my dissertation). Earlier I graduated with a masters from CMU as well, working with Martial Hebert, Abhinav Gupta, Kris Kitani and David Fouhey as a Siebel Scholar. Even before I was a CS undergrad at IIIT, Hyderabad, working with C. V. Jawahar. I have also been fortunate to work with some amazing people through internships, at DeepMind (with Andrew Zisserman, João Carreira and Carl Doersch), Adobe Research (with Josef Sivic and Bryan Russell) and Facebook AI (with Lorenzo Torresani, Georgia Gkioxari and Du Tran).

Interns and Collaborators

I have had the opportunity to mentor and collaborate with some great upcoming researchers and students, including the following. I am always happy to collaborate with enthusiastic PhD students in computer vision/machine learning, to work with me in New York. For summer internships, we typically review applications between November and February. The slots are usually limited, but please reach out if interested.

2022 · Yue Zhao · PhD student at University of Texas, Austin
Hosted with Ishan Misra at FAIR

2022 · Xudong Wang · PhD student at University of California, Berkeley
Hosted with Ishan Misra at FAIR

2022 · Kumar Ashutosh · PhD student at University of Texas, Austin
Hosted with Lorenzo Torresani and Kristen Grauman at FAIR

2022 · Zach Chavis · PhD student at University of Minnesota
Hosted with Yixin Lin and Akshara Rai at FAIR

2022 · Kenneth Li · PhD student at Harvard University
Hosted with Xitong Yang, Weiyao Wang and Du Tran at FAIR

2021 · Bowen Cheng · PhD student at University of Illinois, Urbana Champaign
Hosted with Ishan Misra and Alex Kirillov at FAIR

2021 · Xingyi Zhou · PhD student at University of Texas, Austin
Hosted with Ishan Misra and Armand Joulin at FAIR

2021 · Bahare Fatemi · PhD student at University of British Columbia, Canada
Hosted with Quentin Duval, Michal Drozdzal and Adriana Romero Soriano at FAIR

2021 · Noureldien Hussein · PhD student at University of Amsterdam, The Netherlands
Hosted with Du Tran and Lorenzo Torresani at FAIR

2020 · Zhongzheng (Jason) Ren · PhD student at University of Illinois, Urbana Champaign
Hosted with Ishan Misra at FAIR

2020 · Zaiwei Zhang · PhD student at University of Texas, Austin
Hosted with Ishan Misra and Armand Joulin at FAIR

2020 · Kexin Yi · PhD student at Harvard University/MIT
Hosted with Laurens van der Maaten at FAIR

2020 · Eltayeb Ahmed · AI Resident at FAIR
Hosted with Anton Bakhtin and Laurens van der Maaten at FAIR

2019 · Jessica Lee · Sophomore at CMU
Hosted with Deva Ramanan at CMU · Now a Barry Goldwater Scholar at CMU

2018 · Bhavan Jasani · Masters at CMU
Hosted with Deva Ramanan at CMU · Now a Scientist at Amazon

2015 · Xiofang Wang · Masters at CMU
Hosted with Kris Kitani at CMU · Now a PhD student at CMU

Preprints

The effectiveness of MAE pre-pretraining for billion-scale pretraining

Mannat Singh*, Quentin Duval*, Kalyan Vasudev Alwala*, ..., Rohit Girdhar, and Ishan Misra
arXiv 2023 · pdf

Publications

ImageBind: One Embedding Space To Bind Them All

Rohit Girdhar*, Alaaeldin El-Nouby*, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra*
CVPR 2023 (Highlighted Paper) · Vancouver, BC · webpage · pdf

Learning Video Representations from Large Language Models

Yue Zhao, Ishan Misra, Philipp Krähenbühl, and Rohit Girdhar
CVPR 2023 (Highlighted Paper) · Vancouver, BC · webpage · pdf

OmniMAE: Single Model Masked Pretraining on Images and Videos

Rohit Girdhar*, Alaaeldin El-Nouby*, Mannat Singh*, Kalyan Vasudev Alwala*, Armand Joulin, and Ishan Misra*
CVPR 2023 · Vancouver, BC · webpage · pdf
Self-Supervised Learning Workshop, NeurIPS 2022 · New Orleans, LA · webpage · pdf

Cut and Learn for Unsupervised Object Detection and Instance Segmentation

Xudong Wang, Rohit Girdhar, Stella X. Yu, and Ishan Misra
CVPR 2023 · Vancouver, BC · webpage · pdf

HierVL: Learning Hierarchical Video-Language Embeddings

Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, and Kristen Grauman
CVPR 2023 (Highlighted Paper) · Vancouver, BC · pdf

Detecting Twenty-thousand Classes using Image-level Supervision

Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl and Ishan Misra
ECCV 2022 · Tel Aviv, Israel · webpage · pdf

Omnivore: A Single Model for Many Visual Modalities

Rohit Girdhar* , Mannat Singh*, Nikhila Ravi*, Laurens van der Maaten, Armand Joulin and Ishan Misra*
CVPR 2022 (oral) · New Orleans, LA · webpage · pdf

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Kristen Grauman, Andrew Westbury, Rohit Girdhar* , ... Jitendra Malik
CVPR 2022 (oral) · New Orleans, LA · webpage · pdf
Best paper finalist [link]

Masked-attention Mask Transformer for Universal Image Segmentation

Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov and Rohit Girdhar
CVPR 2022 · New Orleans, LA · webpage · pdf

Mask2Former for Video Instance Segmentation

Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar* and Alexander G. Schwing*
arXiv 2021 · webpage · pdf

Anticipative Video Transformer

Rohit Girdhar and Kristen Grauman
ICCV 2021 · Virtual · webpage
EPIC-Kitchens Workshop, CVPR 2021 · Virtual
Winner of the EPIC-Kitchens CVPR'21 Action Anticipation Challenge

An End-to-End Transformer Model for 3D Object Detection

Ishan Misra, Rohit Girdhar and Armand Joulin
ICCV 2021 (oral) · Virtual · webpage

Learning Self-supervised 3D Features from Single-view Depth Scans

Zaiwei Zhang, Rohit Girdhar, Armand Joulin and Ishan Misra
ICCV 2021 · Virtual · pdf · code
Self-Supervised Learning Workshop, ICML 2021 · Virtual

Physical Reasoning Using Dynamics-Aware Models

Eltayeb Ahmed, Anton Bakhtin, Laurens van der Maaten and Rohit Girdhar
Self-Supervised Learning Workshop, ICML 2021 (oral) · Virtual · webpage

Forward Prediction for Physical Reasoning

Rohit Girdhar, Laura Gustafson, Aaron Adcock and Laurens van der Maaten
Time Series Workshop, ICML 2021 · Virtual · webpage

3D Spatial Recognition without Spatially Labeled 3D

Zhongzheng Ren, Ishan Misra, Alexander G. Schwing and Rohit Girdhar
CVPR 2021 · Virtual · webpage

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

Rohit Girdhar and Deva Ramanan
ICLR 2020 (oral) · Addis Ababa, Ethiopia · webpage
Holistic Video Understanding (HVU) workshop, ICCV 2019 (oral) · Seoul, South Korea
Best paper award at HVU Workshop, ICCV 2019

MetaPix: Few-Shot Video Retargeting

Jessica Lee, Deva Ramanan and Rohit Girdhar
ICLR 2020 · Addis Ababa, Ethiopia · webpage
MetaLearn workshop, NeurIPS 2019 (oral) · Vancouver, Canada
One of top-2 papers (out of 84 submissions) selected for full oral at MetaLearn, NeurIPS'19

Are we asking the right questions in MovieQA?

Bhavan Jasani, Rohit Girdhar and Deva Ramanan
Closing the Loop Between Vision and Language (CLVL) workshop, ICCV 2019 (spotlight) · Seoul, South Korea · webpage

Video Action Transformer Network

Rohit Girdhar, João Carreira, Carl Doersch and Andrew Zisserman
CVPR 2019 (oral) · Long Beach, CA · webpage

DistInit: Learning Video Representations without a Single Labeled Video

Rohit Girdhar, Du Tran, Lorenzo Torresani and Deva Ramanan
ICCV 2019 · Seoul, South Korea · pdf
Learning from Unlabled Videos (LUV) Workshop, CVPR 2019 · Long Beach, CA · pdf

A Better Baseline for AVA

Rohit Girdhar, João Carreira, Carl Doersch and Andrew Zisserman
ActivityNet Workshop, CVPR 2018 (oral) · Salt Lake City, UT · pdf
Close second in AVA action recognition challenge

Detect-and-Track: Efficient Pose Estimation in Videos

Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri and Du Tran
CVPR 2018 · Salt Lake City, UT · webpage

Simple, efficient and effective keypoint tracking

Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Deva Ramanan, Manohar Paluri and Du Tran
PoseTrack Workshop, ICCV 2017 (oral) · Venice, Italy · pdf
First in keypoint tracking challenge

Attentional Pooling for Action Recognition

Rohit Girdhar and Deva Ramanan
NeurIPS 2017 · Long Beach, CA · webpage

ActionVLAD: Learning spatio-temporal aggregation for action classification

Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic and Bryan Russell
CVPR 2017 · Honolulu, HI · webpage

Binge Watching: Scaling Affordance Learning from Sitcoms

Xiaolong Wang*, Rohit Girdhar* and Abhinav Gupta (* equal contribution)
CVPR 2017 (spotlight) · Honolulu, HI · webpage · pdf · data

Learning a Predictable and Generative Representation for Objects

Rohit Girdhar, David Fouhey, Mikel Rodriguez and Abhinav Gupta
ECCV 2016 (spotlight) · Amsterdam, Netherlands · webpage

Cutting through the clutter: Task-relevant features for image matching

Rohit Girdhar, David Fouhey, Kris Kitani, Abhinav Gupta and Martial Hebert
WACV 2016 · Lake Placid, NY · pdf

Optimizing Storage Intensive Vision Applications to Device Capacity

Rohit Girdhar, Jayaguru Panda and C. V. Jawahar
ACCV 2014 · Singapore · pdf

Posts

Dec 23, 2016 · Compile TensorFlow on CentOS 6

Fun Stuff

Inspired by the amazing work of David Fouhey, I have dabbled in the fine art of joke publications. Here's a taste.

PSYCHO: PerSonalitY CHaracterizatiOn of artificial intelligence

Achal Dave and Rohit Girdhar
SIGBOVIK 2018 · Pittsburgh, PA · pdf