Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation
arXiv:2603.03306v1 Announce Type: cross Abstract: Recently presented Token-Oriented Object Notation (TOON) aims to replace JSON as a serialization format for passing structured data to LLMs with significantly reduced token usage. While showing solid accuracy in LLM comprehension, there is a lack of tests against JSON generation. Though never present in training data, TOON syntax is simple enough to suggest one-shot in-context learning could support accurate generation. The inevitable prompt overhead can be an acceptable trade-off for shorter completions. To test this, we conducted a benchmark creating several test cases with regard to structural complexity, a validation pipeline, and comparing plain JSON generation vs structured output (via constrained decoding) JSON generation vs TOON one-shot in-context learning generation. JSON structured output was included to establish a minimum token budget baseline and to set a starting point for future experiments testing TOON constrained de
arXiv:2603.03306v1 Announce Type: cross Abstract: Recently presented Token-Oriented Object Notation (TOON) aims to replace JSON as a serialization format for passing structured data to LLMs with significantly reduced token usage. While showing solid accuracy in LLM comprehension, there is a lack of tests against JSON generation. Though never present in training data, TOON syntax is simple enough to suggest one-shot in-context learning could support accurate generation. The inevitable prompt overhead can be an acceptable trade-off for shorter completions. To test this, we conducted a benchmark creating several test cases with regard to structural complexity, a validation pipeline, and comparing plain JSON generation vs structured output (via constrained decoding) JSON generation vs TOON one-shot in-context learning generation. JSON structured output was included to establish a minimum token budget baseline and to set a starting point for future experiments testing TOON constrained decoding inference enforcement. Key findings: TOON shows promising accuracy/token consumption ratio for in-domain generation tasks, though this advantage is often reduced by the "prompt tax" of instructional overhead in shorter contexts. Plain JSON generation shows the best one-shot and final accuracy, even compared with constrained decoding structured output, where the only significant advantage is the lowest token usage as a trade-off for slightly decreased accuracy overall and significant degradation for some models. Notably, for simple structures, this "lowest token usage" of constrained decoding outperformed even TOON, hinting that TOON enforcing via frameworks such as xgrammar may not yield the desired results. Furthermore, the results suggest a scaling hypothesis: TOON's true efficiency potential likely follows a non-linear curve, shining only beyond a specific point where cumulative syntax savings amortize the initial prompt overhead.
Executive Summary
This article presents a comparative benchmark of Token-Oriented Object Notation (TOON) and JSON in terms of token usage and generation accuracy. The study evaluates TOON's potential to replace JSON as a serialization format for passing structured data to Large Language Models (LLMs). Key findings indicate that TOON shows promising accuracy/token consumption ratio for in-domain generation tasks, but its advantage is often reduced by the 'prompt tax' of instructional overhead. Plain JSON generation demonstrates superior one-shot and final accuracy, although constrained decoding structured output achieves lower token usage at the cost of decreased accuracy. The study suggests a scaling hypothesis, implying that TOON's true efficiency potential may follow a non-linear curve.
Key Points
- ▸ TOON shows promising accuracy/token consumption ratio for in-domain generation tasks
- ▸ Plain JSON generation demonstrates superior one-shot and final accuracy
- ▸ Constrained decoding structured output achieves lower token usage, but at the cost of decreased accuracy
- ▸ TOON's true efficiency potential may follow a non-linear curve
Merits
Comprehensive Benchmarking
The study provides a thorough comparison of TOON and JSON, evaluating their performance in terms of token usage and generation accuracy.
In-Depth Analysis
The authors conduct a detailed analysis of the results, identifying key findings and implications for the use of TOON and JSON in LLMs.
Demerits
Methodological Limitations
The study relies on a specific set of test cases and validation pipeline, which may not be representative of all possible scenarios.
Overemphasis on Token Usage
The study's focus on token usage may lead to an oversimplification of the complexities involved in LLMs and data serialization.
Expert Commentary
The study provides a valuable contribution to the ongoing debate about the use of TOON and JSON in LLMs. However, the results should be interpreted with caution, as they may be influenced by the specific test cases and validation pipeline used. Furthermore, the study's focus on token usage may lead to an oversimplification of the complexities involved in LLMs and data serialization. Despite these limitations, the study's findings have significant implications for the development of efficient data serialization formats for LLMs and highlight the importance of considering the limitations and constraints of LLMs in the design of such formats.
Recommendations
- ✓ Future studies should investigate the scalability and generalizability of TOON's performance across different domains and LLM architectures.
- ✓ Developers should consider the trade-offs between token usage, generation accuracy, and 'prompt tax' when designing data serialization formats for LLMs.