A Lightweight LLM Framework for Disaster Humanitarian Information Classification
arXiv:2602.12284v1 Announce Type: cross Abstract: Timely classification of humanitarian information from social media is critical for effective disaster response. However, deploying large language models (LLMs) for this task faces challenges in resource-constrained emergency settings. This paper develops a lightweight, cost-effective framework for disaster tweet classification using parameter-efficient fine-tuning. We construct a unified experimental corpus by integrating and normalizing the HumAID dataset (76,484 tweets across 19 disaster events) into a dual-task benchmark: humanitarian information categorization and event type identification. Through systematic evaluation of prompting strategies, LoRA fine-tuning, and retrieval-augmented generation (RAG) on Llama 3.1 8B, we demonstrate that: (1) LoRA achieves 79.62% humanitarian classification accuracy (+37.79% over zero-shot) while training only ~2% of parameters; (2) QLoRA enables efficient deployment with 99.4% of LoRA performanc
arXiv:2602.12284v1 Announce Type: cross Abstract: Timely classification of humanitarian information from social media is critical for effective disaster response. However, deploying large language models (LLMs) for this task faces challenges in resource-constrained emergency settings. This paper develops a lightweight, cost-effective framework for disaster tweet classification using parameter-efficient fine-tuning. We construct a unified experimental corpus by integrating and normalizing the HumAID dataset (76,484 tweets across 19 disaster events) into a dual-task benchmark: humanitarian information categorization and event type identification. Through systematic evaluation of prompting strategies, LoRA fine-tuning, and retrieval-augmented generation (RAG) on Llama 3.1 8B, we demonstrate that: (1) LoRA achieves 79.62% humanitarian classification accuracy (+37.79% over zero-shot) while training only ~2% of parameters; (2) QLoRA enables efficient deployment with 99.4% of LoRA performance at 50% memory cost; (3) contrary to common assumptions, RAG strategies degrade fine-tuned model performance due to label noise from retrieved examples. These findings establish a practical, reproducible pipeline for building reliable crisis intelligence systems with limited computational resources.
Executive Summary
The article presents a novel framework for disaster humanitarian information classification using lightweight, parameter-efficient fine-tuning techniques. It integrates and normalizes the HumAID dataset into a dual-task benchmark, evaluating prompting strategies, LoRA fine-tuning, and retrieval-augmented generation (RAG) on the Llama 3.1 8B model. The study finds that LoRA achieves high accuracy with minimal parameter training, QLoRA offers efficient deployment, and RAG degrades performance due to label noise. The research provides a practical pipeline for crisis intelligence systems in resource-constrained settings.
Key Points
- ▸ LoRA fine-tuning achieves 79.62% accuracy with only 2% of parameters trained.
- ▸ QLoRA enables efficient deployment with 99.4% of LoRA performance at 50% memory cost.
- ▸ RAG strategies degrade fine-tuned model performance due to label noise from retrieved examples.
Merits
Innovative Approach
The framework addresses the critical need for resource-efficient models in disaster response, leveraging parameter-efficient fine-tuning techniques.
Comprehensive Evaluation
The study systematically evaluates various strategies, providing a robust comparison of their effectiveness in different scenarios.
Practical Applications
The findings offer a practical pipeline for deploying reliable crisis intelligence systems in emergency settings.
Demerits
Dataset Limitations
The reliance on a single dataset (HumAID) may limit the generalizability of the findings to other disaster scenarios or types of social media data.
Model Specificity
The study focuses on the Llama 3.1 8B model, which may not be representative of all large language models, potentially limiting the broader applicability of the results.
RAG Performance Degradation
The negative impact of RAG on model performance highlights a significant challenge in integrating retrieval-augmented generation techniques, which may require further investigation.
Expert Commentary
The article presents a significant advancement in the field of disaster humanitarian information classification. By focusing on parameter-efficient fine-tuning techniques, the study addresses a critical gap in deploying large language models in resource-constrained emergency settings. The findings on LoRA and QLoRA fine-tuning demonstrate the potential for achieving high accuracy with minimal computational overhead, which is crucial for real-world applications. However, the negative impact of RAG on model performance raises important questions about the integration of retrieval-augmented generation techniques in fine-tuned models. This highlights the need for further research to understand and mitigate the challenges associated with label noise in retrieved examples. The study's reliance on a single dataset and model may limit the generalizability of the results, but the practical implications for disaster response and crisis intelligence are substantial. The framework proposed in this article offers a valuable contribution to the field, providing a reproducible pipeline for building reliable crisis intelligence systems. Future research should explore the applicability of these techniques to diverse datasets and models to enhance the robustness and versatility of the framework.
Recommendations
- ✓ Further research should investigate the generalizability of the findings across different datasets and large language models to ensure broader applicability.
- ✓ Exploration of advanced techniques to mitigate label noise in retrieval-augmented generation could enhance the performance of fine-tuned models in disaster information classification.