Academic

POaaS: Minimal-Edit Prompt Optimization as a Service to Lift Accuracy and Cut Hallucinations on On-Device sLLMs

arXiv:2603.16045v1 Announce Type: new Abstract: Small language models (sLLMs) are increasingly deployed on-device, where imperfect user prompts--typos, unclear intent, or missing context--can trigger factual errors and hallucinations. Existing automatic prompt optimization (APO) methods were designed for large cloud LLMs and rely on search that often produces long, structured instructions; when executed under an on-device constraint where the same small model must act as optimizer and solver, these pipelines can waste context and even hurt accuracy. We propose POaaS, a minimal-edit prompt optimization layer that routes each query to lightweight specialists (Cleaner, Paraphraser, Fact-Adder) and merges their outputs under strict drift and length constraints, with a conservative skip policy for well-formed prompts. Under a strict fixed-model setting with Llama-3.2-3B-Instruct and Llama-3.1-8B-Instruct, POaaS improves both task accuracy and factuality while representative APO baselines d

arXiv:2603.16045v1 Announce Type: new Abstract: Small language models (sLLMs) are increasingly deployed on-device, where imperfect user prompts--typos, unclear intent, or missing context--can trigger factual errors and hallucinations. Existing automatic prompt optimization (APO) methods were designed for large cloud LLMs and rely on search that often produces long, structured instructions; when executed under an on-device constraint where the same small model must act as optimizer and solver, these pipelines can waste context and even hurt accuracy. We propose POaaS, a minimal-edit prompt optimization layer that routes each query to lightweight specialists (Cleaner, Paraphraser, Fact-Adder) and merges their outputs under strict drift and length constraints, with a conservative skip policy for well-formed prompts. Under a strict fixed-model setting with Llama-3.2-3B-Instruct and Llama-3.1-8B-Instruct, POaaS improves both task accuracy and factuality while representative APO baselines degrade them, and POaaS recovers up to +7.4% under token deletion and mixup. Overall, per-query conservative optimization is a practical alternative to search-heavy APO for on-device sLLMs.

Executive Summary

This article proposes POaaS, a minimal-edit prompt optimization layer designed for on-device small language models (sLLMs) to improve accuracy and reduce hallucinations. POaaS routes each query to lightweight specialists and merges their outputs under strict constraints. The authors demonstrate POaaS's effectiveness in a fixed-model setting with Llama-3.2-3B-Instruct and Llama-3.1-8B-Instruct, outperforming representative APO baselines. POaaS's conservative optimization approach offers a practical alternative to search-heavy APO methods. The article's findings suggest that POaaS can recover up to +7.4% in token deletion and mixup scenarios. Overall, POaaS presents a promising solution for enhancing the performance of on-device sLLMs.

Key Points

  • POaaS is designed to optimize prompts for on-device sLLMs, reducing hallucinations and improving accuracy.
  • POaaS uses lightweight specialists and strict constraints to optimize prompts, making it a practical alternative to search-heavy APO methods.
  • POaaS outperforms representative APO baselines in a fixed-model setting, demonstrating its effectiveness in improving task accuracy and factuality.

Merits

Improved Accuracy and Factuality

POaaS demonstrates significant improvements in task accuracy and factuality, making it an attractive solution for on-device sLLMs.

Efficient and Practical

POaaS's conservative optimization approach and lightweight specialists make it a practical and efficient solution for on-device sLLMs.

Demerits

Limited to On-Device sLLMs

The effectiveness of POaaS is demonstrated in a fixed-model setting, limiting its applicability to on-device sLLMs.

Dependence on Llama-3.2-3B-Instruct and Llama-3.1-8B-Instruct

The evaluation of POaaS relies heavily on Llama-3.2-3B-Instruct and Llama-3.1-8B-Instruct, which may not be representative of all on-device sLLMs.

Expert Commentary

The article's findings and POaaS's design demonstrate a deep understanding of the challenges and limitations of on-device sLLMs. By providing a practical alternative to search-heavy APO methods, POaaS presents a promising solution for enhancing the performance of on-device sLLMs. However, further research is needed to explore the generalizability of POaaS to other on-device sLLMs and to investigate its potential scalability.

Recommendations

  • Future research should focus on evaluating POaaS's effectiveness in a broader range of on-device sLLMs and exploring its potential applications in other AI-powered systems.
  • Developers and researchers should consider integrating POaaS-like solutions into on-device sLLMs to improve accuracy and reduce hallucinations, enhancing the overall user experience.

Sources