ZERO: Multi-modal Prompt-based Visual Grounding
Published in arXiv, 2025
Recommended citation: Sangbum Choi and Kyeongryeol Go. (2025). "ZERO: Multi-modal Prompt-based Visual Grounding." arXiv. https://arxiv.org/abs/2507.04270
ZERO is a zero-shot multi-prompt object detection model designed for robust visual grounding in production environments. It integrates direct image input with user-defined text and visual prompts, and was used for CVPR 2025 Foundational Few-Shot Object Detection work.