About Me

I am a Research Scientist at King Abdullah University of Science and Technology (KAUST), and Lead of the Video Group in Image and Video Understanding Lab (IVUL) with Prof. Bernard Ghanum. I obtained my Ph.D from Peking University (PKU), advised by Prof. Wen Gao and Prof. Siwei Ma. My research interests focus on image/video understanding (after I got my Ph.D) and image/video compression (during my Ph.D study).

I have published 40+ papers in representative journals and conferences in both fields, such as TPAMI, CVPR, ICCV, ECCV in the field of image/video understanding, and TCSVT, TIP, DCC in the field of image/video compression. I received the Best Paper Nomination in CVPR 2022, the Best Paper Award in CVPR workshop 2023, and the Best Paper Award in NCMT 2015. I have also be awarded the First Prize of Qualcomm Innovation Fellowship Contest (QInF) (only 2 in China), and Goldman Sachs Global Leaders Award (only 26 in the mainland China and 150 worldwide).

Interests
  • Image/video understanding
  • Vision-languange learning
  • Efficient neural networks
  • Image/video processing
  • Image/video compression
  • Image/video continual learning
Education
  • Ph.D. in Computer Science, 2016

    Peking University (PKU), Beijing, China

  • Research Intern, 2016

    National Institute of Informatics (NII), Tokyo, Japan

  • Joint Ph.D. student, 2012

    University of Washington (UW), Seattle, USA

  • B.Eng. in Software Engineering, 2010

    Sichuan University (SCU), Chengdu, China

News

2024

  • [2024-06-18] I gave a talk on “Towards More Realistic Continual Learning at Scale” as an invited speaker in the CLVision Workshop in CVPR 2024.
  • [2024-06-17] We have won the first place in 4 challenges in CVPR 2024: Epic-kitchens audio-based interaction detection, Epic-kitchens action detection, Epic-kitchens action recognition, Ego4D Visual Queries 3D!
  • [2024-06-11] I gave a talk on “Optimizing Memory Efficiency in Pretrained Model Finetuning” in the Berkeley Artificial Intelligence Research (BAIR) Lab, UC Berkeley.
  • [2024-05-05] I gave a lecture in KAUST CEMSE graduate seminar on “Toward Long-form Video Understanding” as part of KAUST Research Open Week!
  • [2024-03-28] We released OpenTAD, an open-source toolbox for temporal action detection (TAD), comprising 14 methods with 8 datasets.
  • [2024-02-27] 4 papers are accepted to CVPR 2024: Dr2Net, AdaTAD, TGT, and Ego-Exo4D!
  • [2024-02-19] I gave a spotlight talk in the Rising Star in AI Symposium 2024 !

2023

  • [2023-12-15] I gave a talk in HIT Webinar on “Challenges and innovation for long-form video understanding: compute, algorithm, and data”.
  • [2023-08-08] EgoLoc is selected as an ORAL in ICCV'23!
  • [2023-08-07] Ego4D was accepted to TPAMI (recommended submission as an CVPR'22 award winner)!
  • [2023-07-14] All three papers (LAE, FreeDoM, EgoLoc) submitted to ICCV'23 were accepted!
  • [2023-06-22] SMILE won the Best Paper Award in CVPRW'23 CLVision!
  • [2023-06-22] We won the first place in CVPR'23 Ego4D VQ3D Challenge!
  • [2023-04-07] ETAD was accepted to CVPRW'23 ECV!
  • [2023-04-04] OWL was accepted to CVPRW'23 L3D-IVU !
  • [2023-03-29] SMILE was accepted to CVPRW'23 CLVision!
  • [2023-02-27] Re2TAL and LF-VSN were accepted to CVPR'23!
  • [2023-02-20] I gave a spotlight talk in the Rising Star in AI Symposium 2023 !

2022

  • [2022-12-02] I was the lecturer in the Artificial Intelligence Bootcamp on behalf of KAUST to Saudi Arabia’s smartest undergraduate students!
  • [2022-07-04] R-DFCIL and EASEE were accepted into ECCV'22!
  • [2022-06-21] Ego4D got into CVPR'22 Best Paper Finalist!
  • [2022-04-18] All Ego4D challenges are live now!
  • [2022-03-29] Ego4D was accepted to CVPR'22 as ORAL presentation!
  • [2022-03-29] MAD was accepted to CVPR'22!

2021

  • [2021-11-30] I gave a talk virtually in the computer vision group of University of Bristol on “Detecting Actions in Videos via Graph Convolutional Networks”.
  • [2021-10-15] Ego4D was released and paper on arxiv!
  • [2021-07-23] VSGN was accepted to ICCV'21!
  • [2021-05-20] I was recognized by CVPR’21 as Outstanding Reviewer!

2020

  • [2020-07-29] ThumbNet was accepted to ACM MM'20!
  • [2020-06-07] We won the 2‑nd place in the HACS’20 Weakly‑supervised action detection Challenge!
  • [2020-02-27] G-TAD was accepted to CVPR'20!

2019

  • [2019-10-23] Our paper for YouTube-8M challenge got accepted as Oral presentation in ICCV'19 Workshop!
  • [2019-10-12] We missed the gold medal by only 0.0004 in Kaggle’s 3rd YouTube‑8M Video Understanding Challenge; rank 9/11 out of 283 teams in the public/private leaderboards!

Publications

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Large pretrained models are increasingly crucial in modern computer vision tasks. These models are …
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
Temporal action localization (TAL) requires long-form reasoning to predict actions of various …
Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries
International Conference on Computer Vision (ICCV), 2023. [Won the first place in Ego4D VQ3D Challenge 2023, Oral].
With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both …
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries
A Unified Continual Learning Framework with General Parameter-Efficient Tuning
International Conference on Computer Vision (ICCV), 2023.
The ‘pre-training → downstream adaptation’ presents both new opportunities and …
A Unified Continual Learning Framework with General Parameter-Efficient Tuning
Just a Glimpse: Rethinking Temporal Information for Video Continual Learning
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2023. [Best Paper Award, Oral].
Class-incremental learning is one of the most important settings for the study of Continual …
Just a Glimpse: Rethinking Temporal Information for Video Continual Learning
Ego4D: Around the World in 3,000 Hours of Egocentric Video
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. [Best Paper Nominee, Oral].
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 …
Ego4D: Around the World in 3,000 Hours of Egocentric Video
SegTAD: Precise Temporal Action Detection via Semantic Segmentation
European Conference on Computer Vision Workshop (ECCVW), 2022.
Temporal action detection (TAD) is an important yet challenging task in video analysis. Most …
SegTAD: Precise Temporal Action Detection via Semantic Segmentation
Video Self‑Stitching Graph Network for Temporal Action Localization
IEEE International Conference on Computer Vision (ICCV), 2021.
Short actions are critical and challenging in the task of action localization. We target this problem and propose a video self-stitching graph network (VSGN), which enhances short action by video self-stitching (VSS) and a cross-scale graph pyramid network (xGPN).
Video Self‑Stitching Graph Network for Temporal Action Localization

Selected Awards

  • 2024 First place, Ego4D Visual Queries 3D at CVPR 2024
  • 2024 First place, Epic-kitchens audio-based interaction detection at CVPR 2024
  • 2024 First place, Epic-kitchens action detection at CVPR 2024
  • 2024 First place, Epic-kitchens action recognition at CVPR 2024
  • 2023 Best Paper Award, CVPR workshop CLVision
  • 2023 First place, Visual Queries 3D Localization Challenge in Ego4D Workshop at CVPR 2023
  • 2022 First place, Visual Queries 3D Localization Challenge in Ego4D Workshop at ECCV 2022
  • 2021 Outstanding Reviewer, IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020 Finalist, MIT Enterprise Forum Saudi Startup Competition
  • 2020 Second place, HACS Temporal Action Localization Challenge
  • 2019 Finalist, Taqadam Startup Accelerator, Saudi Arabia
  • 2016 Outstanding Graduate, Peking University
  • 2016 Scholarship of Outstanding Talent, Peking University
  • 2015 Best Paper Award, National Conference on Multimedia Technology (NCMT)
  • 2012 First Prize of Qualcomm Innovation Fellowship Contest (QInF), only 2 in China
  • 2012 Outstanding Individual in the Summer Social Practice, Peking University
  • 2010 Outstanding Graduate Leader, Sichuan University
  • 2008 Goldman Sachs Global Leaders Award (only 26 in the China mainland and 150 worldwide)
  • 2007 National Scholarship (Top 1 out of 329 students), Sichuan University
  • 2007 First‑Class Scholarship (Top 1 out of 329 students), Sichuan University

Contact

Please fill in the following form to leave me a message.