Academic

Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization

arXiv:2603.18037v1 Announce Type: new Abstract: This paper presents a systematic methodology for building domain-specific Japanese small language models using QLoRA fine-tuning. We address three core questions: optimal training scale, base-model selection, and architecture-aware quantization. Stage 1 (Training scale): Scale-learning experiments (1k--5k samples) identify n=4,000 as optimal, where test-set NLL reaches minimum (1.127) before overfitting at 5k samples. Stage 2 (Compare finetuned SLMs): Comparing four Japanese LLMs shows that Llama-3 models with Japanese continual pre-training (Swallow-8B, ELYZA-JP-8B) outperform multilingual models (Qwen2.5-7B). Stage 3 (Quantization): Llama-3 architectures improve under Q4_K_M quantization, while GQA architectures degrade severely (Qwen2.5: -0.280 points). Production recommendation: Swallow-8B Q4_K_M achieves 2.830/3 score, 8.9 s/question, 4.9 GB size. The methodology generalizes to low-resource technical domains and provides actionable

T
Takato Yasuno
· · 1 min read · 10 views

arXiv:2603.18037v1 Announce Type: new Abstract: This paper presents a systematic methodology for building domain-specific Japanese small language models using QLoRA fine-tuning. We address three core questions: optimal training scale, base-model selection, and architecture-aware quantization. Stage 1 (Training scale): Scale-learning experiments (1k--5k samples) identify n=4,000 as optimal, where test-set NLL reaches minimum (1.127) before overfitting at 5k samples. Stage 2 (Compare finetuned SLMs): Comparing four Japanese LLMs shows that Llama-3 models with Japanese continual pre-training (Swallow-8B, ELYZA-JP-8B) outperform multilingual models (Qwen2.5-7B). Stage 3 (Quantization): Llama-3 architectures improve under Q4_K_M quantization, while GQA architectures degrade severely (Qwen2.5: -0.280 points). Production recommendation: Swallow-8B Q4_K_M achieves 2.830/3 score, 8.9 s/question, 4.9 GB size. The methodology generalizes to low-resource technical domains and provides actionable guidance for compact Japanese specialist LMs on consumer hardware.

Executive Summary

This article presents a systematic methodology for building domain-specific Japanese small language models (SLMs) using QLoRA fine-tuning. The authors address three core questions: optimal training scale, base-model selection, and architecture-aware quantization. The study identifies 4,000 samples as the optimal training scale and finds that Llama-3 models with Japanese continual pre-training outperform multilingual models. The authors also demonstrate that Llama-3 architectures improve under Q4_K_M quantization, while GQA architectures degrade severely. The study's findings provide actionable guidance for compact Japanese specialist LMs on consumer hardware and generalize to low-resource technical domains. The research has significant implications for the development of domain-specific language models in various industries and domains.

Key Points

  • Optimal training scale for Japanese SLMs is identified as 4,000 samples
  • Llama-3 models with Japanese continual pre-training outperform multilingual models
  • Architecture-aware quantization improves Llama-3 architectures while degrading GQA architectures

Merits

Strength in Methodology

The study presents a systematic and replicable methodology for building domain-specific Japanese SLMs, addressing key questions in the field.

Practical Relevance

The research provides actionable guidance for compact Japanese specialist LMs on consumer hardware, with significant implications for industry and academia.

Generalizability

The study's findings generalize to low-resource technical domains, expanding the applicability of the methodology.

Demerits

Limitation in Model Selection

The study focuses on Llama-3 and GQA models, limiting the generalizability of the findings to other architectures.

Quantization Methodology

The study relies on Q4_K_M quantization, which may not be the most efficient or effective method for all domains.

Expert Commentary

This study makes a significant contribution to the field of language model development, providing a systematic methodology for building domain-specific Japanese SLMs. The research highlights the importance of addressing key questions in the field, such as optimal training scale and architecture-aware quantization. The study's findings have significant implications for the development of compact Japanese specialist LMs on consumer hardware and generalize to low-resource technical domains. However, the study's reliance on Llama-3 and GQA models limits the generalizability of the findings to other architectures. Furthermore, the study's methodology may not be suitable for all domains, and further research is needed to explore alternative quantization techniques.

Recommendations

  • Future studies should explore the applicability of the methodology to other architectures and domains.
  • Researchers should investigate alternative quantization techniques to improve model efficiency and effectiveness.

Sources