Tag: cs.AR

#cs.AR

Latest First Most Viewed Alphabetical

All Conference (266) Law Review (314) Academic (4957) Think Tank (60) News (791) Journal (139) Technology & AI (4) Business & Strategy (1) Finance & Economics (2) Legal & Compliance (1) Innovation & Research (0) International Affairs (2) Cybersecurity (2) Healthcare & Biotech (2)

Academic · 1 min

AXELRAM: Quantize Once, Never Dequantize

arXiv:2604.02638v1 Announce Type: new Abstract: We propose AXELRAM, a smart SRAM macro architecture that computes attention scores directly from quantized KV cache indices without dequantization. …

Yasushi Nishida

26 views Apr 6

Academic · 1 min

Fast NF4 Dequantization Kernels for Large Language Model Inference

arXiv:2604.02556v1 Announce Type: new Abstract: Large language models (LLMs) have grown beyond the memory capacity of single GPU devices, necessitating quantization techniques for practical deployment. …

Xiangbo Qi, Chaoyi Jiang, Murali Annavaram

13 views Apr 6

Academic · 1 min

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

arXiv:2603.14239v1 Announce Type: new Abstract: SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs …

Yutong Wu, Chenrui Cao, Pengwei Jin, Di Huang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu

28 views Mar 17

Academic · 1 min

DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs

arXiv:2603.12269v1 Announce Type: cross Abstract: Early-exit deep neural networks enable adaptive inference by terminating computation when sufficient confidence is achieved, reducing cost for edge AI …

Parth Patne, Mahdi Taheri, Christian Herglotz, Maksim Jenihhin, Milos Krstic, Michael H\"ubner

31 views Mar 17

Academic · 1 min

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators

arXiv:2603.10026v1 Announce Type: cross Abstract: Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has …

Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu

35 views Mar 12

Academic · 1 min

The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

arXiv:2603.10030v1 Announce Type: cross Abstract: AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and …

Marco Graziano

33 views Mar 12

Academic · 1 min

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

arXiv:2603.10100v1 Announce Type: new Abstract: Modern CNNs' high computational demands hinder edge deployment, as traditional ``hard'' sparsity (skipping mathematical zeros) loses effectiveness in deep layers …

Vishal Shashidhar, Anupam Kumari, Roy P Paily

35 views Mar 12

Academic · 1 min

VeriInteresting: An Empirical Study of Model Prompt Interactions in Verilog Code Generation

arXiv:2603.08715v1 Announce Type: cross Abstract: Rapid advances in language models (LMs) have created new opportunities for automated code generation while complicating trade-offs between model characteristics …

Luca Collini, Andrew Hennesee, Patrick Yubeaton, Siddharth Garg, Ramesh Karri

42 views Mar 11

Academic · 1 min

The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference

arXiv:2603.08960v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models deliver high quality at low training FLOPs, but this efficiency often vanishes at inference. We identify a …

Vignesh Adhinarayanan, Nuwan Jayasena

39 views Mar 11

Academic · 1 min

Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning

arXiv:2603.09032v1 Announce Type: new Abstract: Scientific machine learning (SciML) is increasingly applied to in-field processing, controlling, and monitoring; however, wide-area sensing, real-time demands, and strict …

Yuchen Yuan, Junhuan Yang, Hao Wan, Yipei Liu, Hanhan Wu, Youzuo Lin, Lei Yang

39 views Mar 11

Academic · 1 min

Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL

arXiv:2603.09161v1 Announce Type: new Abstract: Learning effective netlist representations is fundamentally constrained by the scarcity of labeled datasets, as real designs are protected by Intellectual …

Siyang Cai, Cangyuan Li, Yinhe Han, Ying Wang

34 views Mar 11

Academic · 1 min

DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data

arXiv:2603.09274v1 Announce Type: new Abstract: Spatiotemporal information is at the core of diverse sensory processing and computational tasks. Feed-forward spiking neural networks can be used …

Jann Krausse, Zhe Su, Kyrus Mama, Maryada, Klaus Knobloch, Giacomo Indiveri, J\"urgen Becker

44 views Mar 11

1 2

#cs.AR

AXELRAM: Quantize Once, Never Dequantize

Fast NF4 Dequantization Kernels for Large Language Model Inference

QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators

The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

VeriInteresting: An Empirical Study of Model Prompt Interactions in Verilog Code Generation

The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference

Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning

Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL

DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data

JCG, PC

HSOLLC Co., Ltd.