Paper-Conference

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

A dataset to evaluate temporal reasoning in video models.

Rohit Girdhar, Deva Ramanan

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

MetaPix: Few-Shot Video Retargeting

A dataset to evaluate temporal reasoning in video models.

Jessica Lee, Deva Ramanan, Rohit Girdhar

MetaPix: Few-Shot Video Retargeting

DistInit: Learning Video Representations Without a Single Labeled Video

Distilling representations from image models to video models.

Rohit Girdhar, Du Tran, Lorenzo Torresani, Deva Ramanan

DistInit: Learning Video Representations Without a Single Labeled Video

Video Action Transformer Network

Among the first applications of Transformers to model videos. SOTA results: close 2nd at AVA Challenge, CVPR'18.

Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman

Detect-and-Track: Efficient Pose Estimation in Videos

Human keypoint tracking approach that ranked first in ICCV 2017 PoseTrack keypoint tracking challenge!

Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, Du Tran

Detect-and-Track: Efficient Pose Estimation in Videos

Attentional Pooling for Action Recognition

Among the first applications of attention for contemporary video/action understanding.

Rohit Girdhar, Deva Ramanan

Attentional Pooling for Action Recognition

ActionVLAD: Learning spatio-temporal aggregation for action classification

Aggregating visual features for action recognition.

Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell

ActionVLAD: Learning spatio-temporal aggregation for action classification

Binge Watching: Scaling Affordance Learning from Sitcoms

Learning how humans interact with their environment by watching TV.

Xiaolong Wang, Rohit Girdhar, Abhinav Gupta

Binge Watching: Scaling Affordance Learning from Sitcoms

Learning a Predictable and Generative Vector Representation for Objects

A single embedding space, good for both generating and understanding 3D models

Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, Abhinav Gupta