Deep learning

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks …

Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Recently, temporal action detection (TAD) has seen significant performance improvement with end-to-end training. However, due to the …

Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem

Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content. Given …

Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem

Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards …

Jinjie Mai, Abdullah Hamdi, Silvio Giancola, Chen Zhao, Bernard Ghanem

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. …

Jiwen Yu, Yinhuai Wang, Chen Zhao, Bernard Ghanem, Jian Zhang

A Unified Continual Learning Framework with General Parameter-Efficient Tuning

The ‘pre-training → downstream adaptation’ presents both new opportunities and challenges for Continual Learning (CL). …

Qiankun Gao, Chen Zhao, Yifan Sun, Teng Xi, Gang Zhang, Bernard Ghanem, Jian Zhang

Large-capacity and Flexible Video Steganography via Invertible Neural Network

Video steganography is the art of unobtrusively concealing secret data in a cover video and then recovering the secret data through a …

Chong Mou, Youmin Xu, Jiechong Song, Chen Zhao, Bernard Ghanem, Jian Zhang

ETAD: Training Action Detection End to End on a Laptop

Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing …

Shuming Liu, Mengmeng Xu, Chen Zhao, Xu Zhao, Bernard Ghanem

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning

Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles …

Lama Alssum, Juan Leo ́n Alca ́zar, Merey Ramazanova, Chen Zhao, Bernard Ghanem

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning

Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context

Temporal action localization (TAL) is an important task extensively explored and improved for third-person videos in recent years. …

Merey Ramazanova, Victor Escorcia, Fabian Caba Heilbron, Chen Zhao, Bernard Ghanem