ZERO: Multi-modal Prompt-based Visual Grounding

Published in arXiv, 2025

Recommended citation: Sangbum Choi and Kyeongryeol Go. (2025). "ZERO: Multi-modal Prompt-based Visual Grounding." arXiv. https://arxiv.org/abs/2507.04270

ZERO is a zero-shot multi-prompt object detection model designed for robust visual grounding in production environments. It integrates direct image input with user-defined text and visual prompts, and was used for CVPR 2025 Foundational Few-Shot Object Detection work.