Skip to main content
Academic

MultiVer: Zero-Shot Multi-Agent Vulnerability Detection

arXiv:2602.17875v1 Announce Type: cross Abstract: We present MultiVer, a zero-shot multi-agent system for vulnerability detection that achieves state-of-the-art recall without fine-tuning. A four-agent ensemble (security, correctness, performance, style) with union voting achieves 82.7% recall on PyVul, exceeding fine-tuned GPT-3.5 (81.3%) by 1.4 percentage points -- the first zeroshot system to surpass fine-tuned performance on this benchmark. On SecurityEval, the same architecture achieves 91.7% detection rate, matching specialized systems. The recall improvement comes at a precision cost: 48.8% precision versus 63.9% for fine-tuned baselines, yielding 61.4% F1. Ablation experiments isolate component contributions: the multi-agent ensemble adds 17 percentage points recall over single-agent security analysis. These results demonstrate that for security applications where false negatives are costlier than false positives, zero-shot multi-agent ensembles can match and exceed fine-tuned

S
Shreshth Rajan
· · 1 min read · 6 views

arXiv:2602.17875v1 Announce Type: cross Abstract: We present MultiVer, a zero-shot multi-agent system for vulnerability detection that achieves state-of-the-art recall without fine-tuning. A four-agent ensemble (security, correctness, performance, style) with union voting achieves 82.7% recall on PyVul, exceeding fine-tuned GPT-3.5 (81.3%) by 1.4 percentage points -- the first zeroshot system to surpass fine-tuned performance on this benchmark. On SecurityEval, the same architecture achieves 91.7% detection rate, matching specialized systems. The recall improvement comes at a precision cost: 48.8% precision versus 63.9% for fine-tuned baselines, yielding 61.4% F1. Ablation experiments isolate component contributions: the multi-agent ensemble adds 17 percentage points recall over single-agent security analysis. These results demonstrate that for security applications where false negatives are costlier than false positives, zero-shot multi-agent ensembles can match and exceed fine-tuned models on the metric that matters most.

Executive Summary

The article presents MultiVer, a novel zero-shot multi-agent system for vulnerability detection, which achieves state-of-the-art recall without fine-tuning. The proposed architecture, comprising a four-agent ensemble (security, correctness, performance, style) with union voting, outperforms fine-tuned GPT-3.5 on the PyVul benchmark. Notably, MultiVer surpasses fine-tuned models on the crucial recall metric, while incurring a precision cost. The study highlights the efficacy of zero-shot multi-agent ensembles in security applications where false negatives are more detrimental than false positives. The findings have significant implications for the development of robust and efficient vulnerability detection systems.

Key Points

  • MultiVer achieves state-of-the-art recall on PyVul without fine-tuning
  • The proposed architecture outperforms fine-tuned GPT-3.5 on the recall metric
  • Zero-shot multi-agent ensembles can match and exceed fine-tuned models on recall

Merits

Improved Recall

MultiVer's architecture achieves significantly higher recall rates than fine-tuned models, making it a valuable asset in security applications.

Efficient Performance

The zero-shot approach enables efficient performance without the need for fine-tuning, reducing computational overhead.

Demerits

Precision Trade-off

The proposed architecture incurs a precision cost, which may be a concern in applications where false positives are not tolerable.

Limited Generalizability

The study's results may not generalize to other domains or benchmarks, requiring further investigation.

Expert Commentary

The article presents a significant contribution to the field of security and vulnerability detection, showcasing the potential of zero-shot multi-agent ensembles in achieving state-of-the-art recall rates. While the proposed architecture incurs a precision cost, the benefits of improved recall and efficient performance make it a valuable asset in security applications. However, the study's findings are not without limitations, and further investigation is necessary to ensure the generalizability and robustness of the results. The implications of the study are far-reaching, with significant potential for the development of more effective and efficient vulnerability detection systems.

Recommendations

  • Further research should focus on developing more sophisticated architectures or techniques to mitigate the precision trade-off associated with zero-shot multi-agent ensembles.
  • The study's findings should be tested and validated on a broader range of benchmarks and domains to ensure the generalizability of the results.

Sources