Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya
arXiv:2604.04937v1 Announce Type: new Abstract: Large language models produce fluent text but struggle with systematic reasoning, often hallucinating confident but unfounded claims. When Apple researchers added irrelevant context to mathematical problems, LLM performance degraded by 65% Apple Machine Learning Research, exposing brittle pattern-matching beneath apparent reasoning. This epistemic gap, the inability to ground claims in traceable evidence, limits AI reliability in domains requiring justification. We introduce Pramana, a novel approach that teaches LLMs explicit epistemological methodology by fine-tuning on Navya-Nyaya logic, a 2,500-year-old Indian reasoning framework. Unlike generic chain-of-thought prompting, Navya-Nyaya enforces structured 6-phase reasoning: SAMSHAYA (doubt analysis), PRAMANA (evidence source identification), PANCHA AVAYAVA (5-member syllogism with universal rules), TARKA (counterfactual verification), HETVABHASA (fallacy detection), and NIRNAYA (ascer
arXiv:2604.04937v1 Announce Type: new Abstract: Large language models produce fluent text but struggle with systematic reasoning, often hallucinating confident but unfounded claims. When Apple researchers added irrelevant context to mathematical problems, LLM performance degraded by 65% Apple Machine Learning Research, exposing brittle pattern-matching beneath apparent reasoning. This epistemic gap, the inability to ground claims in traceable evidence, limits AI reliability in domains requiring justification. We introduce Pramana, a novel approach that teaches LLMs explicit epistemological methodology by fine-tuning on Navya-Nyaya logic, a 2,500-year-old Indian reasoning framework. Unlike generic chain-of-thought prompting, Navya-Nyaya enforces structured 6-phase reasoning: SAMSHAYA (doubt analysis), PRAMANA (evidence source identification), PANCHA AVAYAVA (5-member syllogism with universal rules), TARKA (counterfactual verification), HETVABHASA (fallacy detection), and NIRNAYA (ascertainment distinguishing knowledge from hypothesis). This integration of logic and epistemology provides cognitive scaffolding absent from standard reasoning approaches. We fine-tune Llama 3.2-3B and DeepSeek-R1-Distill-Llama-8B on 55 Nyaya-structured logical problems (constraint satisfaction, Boolean SAT, multi-step deduction). Stage 1 achieves 100% semantic correctness on held-out evaluation despite only 40% strict format adherence revealing that models internalize reasoning content even when structural enforcement is imperfect. Ablation studies show format prompting and temperature critically affect performance, with optimal configurations differing by stage. We release all models, datasets, and training infrastructure on Hugging Face to enable further research on epistemic frameworks for AI reasoning.
Executive Summary
The article introduces Pramana, a novel framework for fine-tuning large language models (LLMs) using Navya-Nyaya logic, an ancient Indian epistemological system, to address LLMs' inherent limitations in systematic reasoning and epistemic reliability. By structuring reasoning into six phases—doubt analysis, evidence identification, syllogism, counterfactual verification, fallacy detection, and ascertainment—Pramana demonstrates significant improvements in semantic correctness and reasoning traceability. Empirical results show 100% semantic correctness on held-out evaluations despite imperfect structural adherence, highlighting the potential of integrating formal logic frameworks into modern AI systems. The authors release models, datasets, and infrastructure to foster further research in epistemic AI reasoning.
Key Points
- ▸ LLMs struggle with systematic reasoning and epistemic grounding, as evidenced by performance degradation when irrelevant context is introduced (e.g., 65% drop in mathematical problem-solving).
- ▸ Pramana leverages Navya-Nyaya logic, a 2,500-year-old Indian reasoning framework, to impose a structured 6-phase epistemological methodology on LLMs, addressing the lack of cognitive scaffolding in standard reasoning approaches.
- ▸ Empirical validation with Llama 3.2-3B and DeepSeek-R1-Distill-Llama-8B models shows 100% semantic correctness on held-out evaluations, with ablation studies revealing the critical role of format prompting and temperature tuning in performance optimization.
Merits
Innovative Integration of Ancient and Modern Epistemology
The article uniquely bridges millennia-old Navya-Nyaya logic with contemporary AI reasoning challenges, offering a culturally diverse and philosophically rigorous approach to addressing LLM limitations.
Empirical Rigor and Reproducibility
The authors provide robust empirical validation, including ablation studies and open-source release of models, datasets, and infrastructure, enabling reproducibility and further research.
Addressing Core LLM Vulnerabilities
Pramana directly targets the epistemic gap in LLMs—hallucinations and brittle reasoning—by embedding structured epistemological scaffolding, a critical advancement for reliable AI deployment.
Demerits
Limited Generalizability of Training Data
The reliance on 55 Nyaya-structured logical problems may constrain the model's performance on broader, real-world reasoning tasks, raising questions about scalability and domain adaptation.
Imperfect Structural Adherence vs. Semantic Correctness Trade-off
While models achieve 100% semantic correctness despite only 40% strict format adherence, the long-term reliability of reasoning without strict structural enforcement remains uncertain.
Dependency on Hyperparameter Tuning
Performance is critically dependent on format prompting and temperature settings, suggesting sensitivity to implementation details that may complicate deployment in varied environments.
Expert Commentary
This article represents a seminal contribution to the field of AI reasoning by introducing a philosophically grounded, empirically validated approach to addressing the epistemic limitations of LLMs. The integration of Navya-Nyaya logic is not merely a technical innovation but a paradigm shift that challenges the field's reliance on Western-centric methodologies. The authors' empirical rigor is commendable, particularly the demonstration of 100% semantic correctness despite imperfect structural adherence, which suggests that the underlying epistemological scaffolding is internalized by the models. However, the reliance on a small, curated dataset of Nyaya-structured problems may limit generalizability, and the sensitivity to hyperparameters underscores the need for further research into robust deployment strategies. The open-source release of models and datasets is a laudable step that will likely spur further innovation. Overall, Pramana sets a new benchmark for epistemic AI and invites deeper exploration into the integration of formal logic systems into modern machine learning architectures.
Recommendations
- ✓ Expand the training dataset to include a broader range of reasoning problems and domains to enhance generalizability and robustness.
- ✓ Conduct longitudinal studies to assess the long-term reliability of reasoning systems trained on Navya-Nyaya logic, particularly in dynamic real-world environments.
- ✓ Develop standardized benchmarks for evaluating epistemic frameworks in AI, enabling fairer comparisons across different approaches.
- ✓ Explore hybrid models that combine Navya-Nyaya logic with other epistemological systems (e.g., Bayesian reasoning or abductive logic) to create more versatile and adaptive reasoning frameworks.
- ✓ Engage policymakers and ethicists to develop guidelines for the responsible deployment of epistemic AI systems, ensuring alignment with societal values and regulatory requirements.
Sources
Original: arXiv - cs.AI