Deep Learning

Beta-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment

CLIP achieves strong zero-shot image-text retrieval by aligning global vision and text representations, yet it falls behind on fine-grained tasks even when fine-tuned on long, …

Fatimah zohra

• Feb 21, 2026 • 1 min read

Deep Learning

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection

Temporal action detection (TAD) is a fundamental video understanding task that aims to identify human actions and localize their temporal boundaries in videos. Although this field …

Shuming liu

• Mar 2, 2025 • 1 min read

Deep Learning

Effectiveness of Max-Pooling for Fine-Tuning CLIP on Videos

CLIP is a powerful spatial feature extractor trained on a large dataset of image-text pairs. It exhibits strong generalization when extended to other domains and modalities. …

Fatimah zohra

• Mar 2, 2025 • 1 min read

Deep Learning

Towards Automated Movie Trailer Generation

Movie trailers are an essential tool for promoting films and attracting audiences. However the process of creating trailers can be time-consuming and expensive. To streamline this …

Dawit mureja argaw

• Jun 4, 2024 • 1 min read

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning featured image

Deep Learning

Dr<sup>2</sup>Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning, which is highly …

Chen Zhao

• Jan 4, 2024 • 1 min read

Deep Learning

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Recently, temporal action detection (TAD) has seen significant performance improvement with end-to-end training. However, due to the memory bottleneck, only models with limited …

Shuming liu

• Nov 29, 2023 • 1 min read

Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization featured image

Deep Learning

Re<sup>2</sup>TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content. Given limited GPU memory, training TAL end to end …

Chen Zhao

• Jul 25, 2023 • 1 min read

Deep Learning

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are …

Jiwen yu

• Jul 15, 2023 • 1 min read

Deep Learning

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory …

Jinjie mai

• Jul 15, 2023 • 1 min read

Deep Learning

A Unified Continual Learning Framework with General Parameter-Efficient Tuning

The 'pre-training → downstream adaptation' presents both new opportunities and challenges for Continual Learning (CL). Although the recent state-of-the-art in CL is achieved …

Qiankun gao

• Jul 14, 2023 • 1 min read

No results found

Deep Learning