Visual Intelligence Online Seminar #91: Sparse Tokens and Patch Representations

Torger Grytå / Petter Bjørklund

Presented by Martine Hjelkrem-Tan, PhD Research Fellow at the Digital Signal Processing and Image Analysis at the University of Oslo

Abstract

In this talk, we discuss how sparsity and sampling can be used for improving modern vision models. We show that these principles can help us discover which regions of an image a model chooses to attend to perform a given task. Secondly, we show that self-supervised foundational models exhibit unwanted positional noise in patch tokens and propose a simple cleaning method. Finally, we discuss how these findings can help guide future frameworks for training foundational vision models.

When: 12.03.26 kl 13.00–14.00

Where: Online

Location / Campus: Digitalt

Target group: Employees, Students, Guests

Contact: Petter Bjørklund

Phone: 90218792

E-mail: petter.bjorklund@uit.no

Add to calendar

PUBLIC_BRUKER189