AIC CTU@AVerImaTeC: dual-retriever RAG for image-text fact checking
arXiv:2602.15190v1 Announce Type: new Abstract: In this paper, we present our 3rd place system in the AVerImaTeC shared task, which combines our last year's retrieval-augmented generation (RAG) pipeline with a reverse image search (RIS) module. Despite its simplicity, our system delivers competitive performance with a single multimodal LLM call per fact-check at just $0.013 on average using GPT5.1 via OpenAI Batch API. Our system is also easy to reproduce and tweak, consisting of only three decoupled modules - a textual retrieval module based on similarity search, an image retrieval module based on API-accessed RIS, and a generation module using GPT5.1 - which is why we suggest it as an accesible starting point for further experimentation. We publish its code and prompts, as well as our vector stores and insights into the scheme's running costs and directions for further improvement.
arXiv:2602.15190v1 Announce Type: new Abstract: In this paper, we present our 3rd place system in the AVerImaTeC shared task, which combines our last year's retrieval-augmented generation (RAG) pipeline with a reverse image search (RIS) module. Despite its simplicity, our system delivers competitive performance with a single multimodal LLM call per fact-check at just $0.013 on average using GPT5.1 via OpenAI Batch API. Our system is also easy to reproduce and tweak, consisting of only three decoupled modules - a textual retrieval module based on similarity search, an image retrieval module based on API-accessed RIS, and a generation module using GPT5.1 - which is why we suggest it as an accesible starting point for further experimentation. We publish its code and prompts, as well as our vector stores and insights into the scheme's running costs and directions for further improvement.
Executive Summary
The article presents a system that achieved 3rd place in the AVerImaTeC shared task, combining a retrieval-augmented generation (RAG) pipeline with a reverse image search (RIS) module. The system is notable for its simplicity, cost-efficiency, and ease of reproduction, consisting of three decoupled modules: a textual retrieval module, an image retrieval module, and a generation module using GPT5.1. The authors highlight the system's competitive performance at a low cost, providing insights into running costs and potential improvements, and publish the code and prompts for further experimentation.
Key Points
- ▸ The system combines a RAG pipeline with a RIS module for fact-checking.
- ▸ It achieves competitive performance at a low cost of $0.013 per fact-check.
- ▸ The system is easy to reproduce and tweak, consisting of three decoupled modules.
- ▸ The authors publish the code, prompts, and vector stores for further experimentation.
Merits
Cost-Efficiency
The system demonstrates competitive performance at a low cost, making it accessible for widespread use and experimentation.
Simplicity and Reproducibility
The system's modular design and published code make it easy to reproduce and tweak, encouraging further research and development.
Multimodal Capability
The combination of textual and image retrieval modules enhances the system's ability to fact-check across different types of media.
Demerits
Limited Generalization
The system's performance may be specific to the AVerImaTeC shared task, and its effectiveness in other contexts or with different datasets is not thoroughly explored.
Dependency on External APIs
The reliance on external APIs for image retrieval and generation modules could introduce latency and dependency issues, affecting the system's robustness.
Potential Bias in Retrieval
The similarity search and RIS modules may introduce biases, which could affect the accuracy and fairness of the fact-checking process.
Expert Commentary
The article presents a significant contribution to the field of fact-checking by combining retrieval-augmented generation with reverse image search. The system's cost-efficiency and simplicity make it an attractive option for both practical applications and further research. However, the reliance on external APIs and potential biases in retrieval modules are areas that require further investigation. The authors' decision to publish the code and prompts is commendable, as it fosters transparency and encourages community engagement. This work sets a solid foundation for future developments in multimodal fact-checking systems, but it is essential to address the identified limitations to ensure robustness and fairness. The implications of this research extend beyond the technical realm, influencing practical applications and policy considerations related to misinformation and AI regulation.
Recommendations
- ✓ Further research should explore the system's performance across diverse datasets and contexts to assess its generalizability.
- ✓ Future work should address potential biases in the retrieval modules to enhance the system's fairness and accuracy.