Academic

Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling

Ximing Xing, Ziteng Xue, Zhenxi Li, Weicong Liang, Linqing Wang, Zhantao Yang, Tiankai Hang, Zijin Yin, Qinglin Lu, Chunyu Wang, Qian Yu · April 8, 2026 · 1 min read · 36 views

#cs.LG

arXiv:2604.05072v1 Announce Type: new Abstract: Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics. Numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and introducing severe token redundancy, often leading to coordinate hallucination and inefficient long-sequence generation. To address these challenges, we propose HiVG, a hierarchical SVG tokenization framework tailored for autoregressive vector graphics generation. HiVG decomposes raw SVG strings into structured \textit{atomic tokens} and further compresses executable command--parameter groups into geometry-constrained \textit{segment tokens}, substantially improving sequence efficiency while preserving syntactic validity. To further mitigate spatial mismatch, we introduce a Hierarchical Mean--Noise (HMN) initialization strategy that injects numerical ordering signals and semantic priors into new token embeddings. Combined with a curriculum training paradigm that progressively increases program complexity, HiVG enables more stable learning of executable SVG programs. Extensive experiments on both text-to-SVG and image-to-SVG tasks demonstrate improved generation fidelity, spatial consistency, and sequence efficiency compared with conventional tokenization schemes.

Executive Summary

The article presents HiVG, a novel hierarchical tokenization framework for vector graphics (SVG) generation, addressing key inefficiencies in existing autoregressive models. Unlike traditional byte-level tokenization, HiVG decomposes SVG strings into structured atomic tokens and geometry-constrained segment tokens, preserving spatial relationships and reducing redundancy. A Hierarchical Mean-Noise (HMN) initialization strategy enhances numerical and semantic coherence in token embeddings, while a curriculum training paradigm improves learning stability. Experiments on text-to-SVG and image-to-SVG tasks demonstrate superior fidelity, spatial consistency, and sequence efficiency compared to conventional methods, marking a significant advancement in scalable vector graphics modeling.

Key Points

▸ HiVG introduces a hierarchical tokenization framework that decomposes raw SVG strings into atomic and segment tokens, preserving geometric structure and reducing redundancy.
▸ The Hierarchical Mean-Noise (HMN) initialization strategy injects numerical ordering and semantic priors into token embeddings, mitigating spatial mismatch.
▸ A curriculum training paradigm progressively increases program complexity, enabling stable learning of executable SVG programs.
▸ Experiments on text-to-SVG and image-to-SVG tasks show improved generation fidelity, spatial consistency, and sequence efficiency over conventional tokenization schemes.
▸ The approach addresses core challenges in autoregressive SVG generation, such as coordinate hallucination and inefficient long-sequence generation.

Merits

Innovative Tokenization Framework

HiVG's hierarchical approach to SVG tokenization effectively captures geometric structures, addressing the limitations of byte-level tokenization in autoregressive models.

Enhanced Spatial Consistency

The HMN initialization strategy and segment tokens preserve spatial relationships, reducing errors like coordinate hallucination and improving fidelity.

Scalability and Efficiency

Experiments demonstrate significant improvements in sequence efficiency and generation quality, making HiVG scalable for complex SVG tasks.

Curriculum Training Paradigm

Progressive complexity training stabilizes learning, enabling models to handle increasingly intricate SVG programs without degradation in performance.

Demerits

Complexity Overhead

The hierarchical tokenization and HMN initialization introduce additional computational and algorithmic complexity, which may pose challenges for real-time or resource-constrained applications.

Dependency on SVG Structure

HiVG's effectiveness is contingent on the structural integrity of input SVGs; malformed or overly complex SVGs may not benefit equally from the proposed tokenization.

Limited Generalization to Non-SVG Domains

The framework is tailored specifically for SVG tokenization, limiting its applicability to other vector graphics formats or non-graphical sequence generation tasks.

Expert Commentary

The authors present a compelling case for hierarchical tokenization in SVG generation, addressing a critical gap in current autoregressive models. By decomposing SVGs into atomic and segment tokens, HiVG preserves geometric relationships that are otherwise fragmented in byte-level tokenization. The Hierarchical Mean-Noise initialization is particularly innovative, as it explicitly encodes numerical ordering and semantic priors into token embeddings, mitigating common issues like coordinate hallucination. The curriculum training paradigm further ensures stable learning, a crucial factor for scalable deployment. However, the complexity overhead and dependence on SVG structure may limit immediate adoption in resource-constrained environments. This work not only advances SVG generation but also offers a blueprint for structured tokenization in other domains, making it a significant contribution to both applied machine learning and vector graphics research.

Recommendations

✓ Further research should explore the integration of HiVG with diffusion-based or transformer architectures to assess compatibility and performance gains in hybrid models.
✓ Investigate the applicability of hierarchical tokenization to other vector graphics formats, such as PDF or EPS, to validate the framework's generality.
✓ Develop lightweight variants of HiVG for real-time applications, such as web-based SVG generation tools, to broaden practical adoption.
✓ Conduct user studies to evaluate the impact of HiVG-generated SVGs on human designers' workflows, ensuring that the technical improvements translate to tangible usability benefits.
✓ Collaborate with standardization bodies to incorporate hierarchical tokenization principles into SVG specifications, fostering industry-wide adoption.

Sources

Original: arXiv - cs.LG

arXiv - cs.LG

Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling

AI Commentary

Executive Summary

Key Points

Merits

Innovative Tokenization Framework

Enhanced Spatial Consistency

Scalability and Efficiency

Curriculum Training Paradigm

Demerits

Complexity Overhead

Dependency on SVG Structure

Limited Generalization to Non-SVG Domains

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs