✉ Corresponding Author
- 25 June, 2023: 💥💥 Our paper "SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Image Segmentation" has been submitted to IEEE Transactions on Image Processing (TIP).
This repository contains the official implementation and checkpoints of the following paper:
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Abstract: This paper presents SSP-SAM, an end-to-end framework that enables the Segment Anything Model (SAM) with grounding ability to tackle Referring Image Segmentation (RIS). Beyond the vanilla SAM that struggles with languages, SSP-SAM seamlessly manages such situations via a Semantic-Spatial Prompt (SSP) encoder, where images and languages are transformed into semantically enriched and spatially detailed prompts. To integrate spatial information into the semantic of the referent, we incorporate both visual and linguistic attention adapters into the SSP encoder. This process highlights the salient objects within the visual features and the discriminative phrases within the linguistic features. Such a design provides enhanced referent features for the prompt generator, leading to high-quality SSPs. Extensive experiments on widely used RIS benchmarks confirm the superiority of our method, which fully leverages SAM's segmentation capabilities. Moreover, we explore the open-vocabulary capability of SSP-SAM on PhraseCut dataset, which demonstrates improved performance in open-vocabulary scene compared to existing state-of-the-art RIS methods. The code and models will be available at https://github.com/WayneTomas/SSP-SAM once the manuscript is accepted.