Abstract

Teaser image

A key challenge for the widespread application of learning-based models for robotic perception is to significantly reduce the required amount of annotated training data while achieving accurate predictions. This is essential not only to decrease operating costs but also to speed up deployment time. In this work, we address this challenge for PAnoptic SegmenTation with fEw Labels (PASTEL) by exploiting the groundwork paved by visual foundation models. We leverage descriptive image features from such a model to train two lightweight network heads for semantic segmentation and object boundary detection, using very few annotated training samples. We then merge their predictions via a novel fusion module that yields panoptic maps based on normalized cut. To further enhance the performance, we utilize self-training on unlabeled images selected by a feature-driven similarity scheme. We underline the relevance of our approach by employing PASTEL to important robot perception use cases from autonomous driving and agricultural robotics. In extensive experiments, we demonstrate that PASTEL significantly outperforms previous methods for label-efficient segmentation even when using fewer annotation.

Technical Approach

Overview of our approach

Test-time overview of PASTEL illustrating the panoptic fusion scheme. For simplicity, we focus on car and road classes after step (1). The overall module is comprised of the following steps: (1) Overlapping multi-scale predictions; (2) Conversion of soft boundary map to an affinity matrix; (3) Boundary denoising; (4) Extraction of “stuff” to “thing” boundaries; (5) Class majority voting within enclosed areas; (6) Connected component analysis (CCA); (7) Filters on “thing” classes; (8) Filters on “stuff” classes; (9) Recursive two-way normalized cut to separate connected instances; (10) Nearest neighbors-based hole filling of pixels with the ignore class.

Video

Code

A software implementation of this project can be found in our GitHub repository for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Publications

If you find our work useful, please consider citing our paper:

Niclas Vödisch, Kürsat Petek, Markus Käppeler, Abhinav Valada, and Wolfram Burgard
A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation
arXiv preprint arXiv:2405.19035, 2024.

(PDF) (BibTeX)

Authors

Niclas Vödisch

Niclas Vödisch

University of Freiburg

Kürsat Petek

Kürsat Petek

University of Freiburg

Markus Käppeler

Markus Käppeler

University of Freiburg

Abhinav Valada

Abhinav Valada

University of Freiburg

Wolfram Burgard

Wolfram Burgard

University of Technology Nuremberg

Acknowledgment

This work was funded by the German Research Foundation (DFG) Emmy Noether Program grant No 468878300.