Spatial

InstanceDiffusion: Instance-level Control for Image Generation

SOTA instance-conditioned diffusion model for image generation.

Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra

InstanceDiffusion: Instance-level Control for Image Generation

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

SOTA unsupervised video segmentation using CutLER.

Xudong Wang, Ishan Misra, Ziyun Zeng, Rohit Girdhar, Trevor Darrell

CutLER: Cut and Learn for Unsupervised Object Detection and Instance Segmentation

Discovering objects using DINO features, and learning an unsupervised detection + segmentation model

Xudong Wang, Rohit Girdhar, Stella X. Yu, Ishan Misra

Detecting Twenty-thousand Classes using Image-level Supervision

Leverages image classification data to build an object detector

Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra

Detecting Twenty-thousand Classes using Image-level Supervision

Mask2Former for Video Instance Segmentation

SOTA video segmentation using Mask2Former.

Bowen Cheng, Anwesa Choudhuri, Ishan Misra, Alexander Kirillov, Rohit Girdhar, Alexander G. Schwing

Mask2Former for Video Instance Segmentation

Masked-attention Mask Transformer for Universal Image Segmentation

Single architecture state-of-the-art in instance, semantic and panoptic segmentation.

Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar

Masked-attention Mask Transformer for Universal Image Segmentation

3DETR: An End-to-End Transformer Model for 3D Object Detection

First Transformer based detection architecture for 3D data.

Ishan Misra, Rohit Girdhar, Armand Joulin

3D Spatial Recognition without Spatially Labeled 3D

WyPR can detect and segment objects in a 3D scene without needing any spatial labels at all!

Zhongzheng Ren, Ishan Misra, Alexander G. Schwing, Rohit Girdhar

Video Action Transformer Network

Among the first applications of Transformers to model videos. SOTA results: close 2nd at AVA Challenge, CVPR'18.

Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman

Detect-and-Track: Efficient Pose Estimation in Videos

Human keypoint tracking approach that ranked first in ICCV 2017 PoseTrack keypoint tracking challenge!

Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, Du Tran

Detect-and-Track: Efficient Pose Estimation in Videos