paper-conference

Omnivore: A Single Model for Many Visual Modalities
A single model for images, video and single-view 3D.
Forward Prediction for Physical Reasoning
Forward prediction for PHYRE benchmark.