Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
arXiv:2604.05072v1 Announce Type: new Abstract: Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics. Numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and introducing severe token redundancy, often leading to coordinate hallucination and inefficient long-sequence generation. To address these challenges, we propose HiVG, a hierarchical SVG tokenization framework tailored for autoregressive vector graphics generation. HiVG decomposes raw SVG strings into structured \textit{atomic tokens} and further compresses executable command--parameter groups into geometry-constrained \textit{segment tokens}, substantially improving sequence efficiency while preserving syntactic validity. To f
arXiv:2604.05072v1 Announce Type: new Abstract: Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics. Numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and introducing severe token redundancy, often leading to coordinate hallucination and inefficient long-sequence generation. To address these challenges, we propose HiVG, a hierarchical SVG tokenization framework tailored for autoregressive vector graphics generation. HiVG decomposes raw SVG strings into structured \textit{atomic tokens} and further compresses executable command--parameter groups into geometry-constrained \textit{segment tokens}, substantially improving sequence efficiency while preserving syntactic validity. To further mitigate spatial mismatch, we introduce a Hierarchical Mean--Noise (HMN) initialization strategy that injects numerical ordering signals and semantic priors into new token embeddings. Combined with a curriculum training paradigm that progressively increases program complexity, HiVG enables more stable learning of executable SVG programs. Extensive experiments on both text-to-SVG and image-to-SVG tasks demonstrate improved generation fidelity, spatial consistency, and sequence efficiency compared with conventional tokenization schemes.
Executive Summary
The article presents HiVG, a novel hierarchical tokenization framework for vector graphics (SVG) generation, addressing key inefficiencies in existing autoregressive models. Unlike traditional byte-level tokenization, HiVG decomposes SVG strings into structured atomic tokens and geometry-constrained segment tokens, preserving spatial relationships and reducing redundancy. A Hierarchical Mean-Noise (HMN) initialization strategy enhances numerical and semantic coherence in token embeddings, while a curriculum training paradigm improves learning stability. Experiments on text-to-SVG and image-to-SVG tasks demonstrate superior fidelity, spatial consistency, and sequence efficiency compared to conventional methods, marking a significant advancement in scalable vector graphics modeling.
Key Points
- ▸ HiVG introduces a hierarchical tokenization framework that decomposes raw SVG strings into atomic and segment tokens, preserving geometric structure and reducing redundancy.
- ▸ The Hierarchical Mean-Noise (HMN) initialization strategy injects numerical ordering and semantic priors into token embeddings, mitigating spatial mismatch.
- ▸ A curriculum training paradigm progressively increases program complexity, enabling stable learning of executable SVG programs.
- ▸ Experiments on text-to-SVG and image-to-SVG tasks show improved generation fidelity, spatial consistency, and sequence efficiency over conventional tokenization schemes.
- ▸ The approach addresses core challenges in autoregressive SVG generation, such as coordinate hallucination and inefficient long-sequence generation.
Merits
Innovative Tokenization Framework
HiVG's hierarchical approach to SVG tokenization effectively captures geometric structures, addressing the limitations of byte-level tokenization in autoregressive models.
Enhanced Spatial Consistency
The HMN initialization strategy and segment tokens preserve spatial relationships, reducing errors like coordinate hallucination and improving fidelity.
Scalability and Efficiency
Experiments demonstrate significant improvements in sequence efficiency and generation quality, making HiVG scalable for complex SVG tasks.
Curriculum Training Paradigm
Progressive complexity training stabilizes learning, enabling models to handle increasingly intricate SVG programs without degradation in performance.
Demerits
Complexity Overhead
The hierarchical tokenization and HMN initialization introduce additional computational and algorithmic complexity, which may pose challenges for real-time or resource-constrained applications.
Dependency on SVG Structure
HiVG's effectiveness is contingent on the structural integrity of input SVGs; malformed or overly complex SVGs may not benefit equally from the proposed tokenization.
Limited Generalization to Non-SVG Domains
The framework is tailored specifically for SVG tokenization, limiting its applicability to other vector graphics formats or non-graphical sequence generation tasks.
Expert Commentary
The authors present a compelling case for hierarchical tokenization in SVG generation, addressing a critical gap in current autoregressive models. By decomposing SVGs into atomic and segment tokens, HiVG preserves geometric relationships that are otherwise fragmented in byte-level tokenization. The Hierarchical Mean-Noise initialization is particularly innovative, as it explicitly encodes numerical ordering and semantic priors into token embeddings, mitigating common issues like coordinate hallucination. The curriculum training paradigm further ensures stable learning, a crucial factor for scalable deployment. However, the complexity overhead and dependence on SVG structure may limit immediate adoption in resource-constrained environments. This work not only advances SVG generation but also offers a blueprint for structured tokenization in other domains, making it a significant contribution to both applied machine learning and vector graphics research.
Recommendations
- ✓ Further research should explore the integration of HiVG with diffusion-based or transformer architectures to assess compatibility and performance gains in hybrid models.
- ✓ Investigate the applicability of hierarchical tokenization to other vector graphics formats, such as PDF or EPS, to validate the framework's generality.
- ✓ Develop lightweight variants of HiVG for real-time applications, such as web-based SVG generation tools, to broaden practical adoption.
- ✓ Conduct user studies to evaluate the impact of HiVG-generated SVGs on human designers' workflows, ensuring that the technical improvements translate to tangible usability benefits.
- ✓ Collaborate with standardization bodies to incorporate hierarchical tokenization principles into SVG specifications, fostering industry-wide adoption.
Sources
Original: arXiv - cs.LG