Tag: cs.DC

#cs.DC

Latest First Most Viewed Alphabetical

All Conference (266) Law Review (314) Academic (4957) Think Tank (60) News (791) Journal (139) Technology & AI (4) Business & Strategy (1) Finance & Economics (2) Legal & Compliance (1) Innovation & Research (0) International Affairs (2) Cybersecurity (2) Healthcare & Biotech (2)

Academic · 1 min

ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads

arXiv:2604.05426v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter …

Jingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang

46 views Apr 8

Academic · 1 min

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

arXiv:2604.05091v1 Announce Type: new Abstract: We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single …

Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye

41 views Apr 8

Academic · 1 min

Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three …

arXiv:2604.02344v1 Announce Type: new Abstract: WebGPU's security-focused design imposes per-operation validation that compounds across the many small dispatches in neural network inference, yet the true …

J\k{e}drzej Maczan

40 views Apr 6

Academic · 1 min

Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training

arXiv:2604.02651v1 Announce Type: new Abstract: Graph neural networks (GNNs) are widely used for learning on graph datasets derived from various real-world scenarios. Learning from extremely …

Cunyang Wei, Siddharth Singh, Aishwarya Sarkar, Daniel Nichols, Tisha Patel, Aditya K. Ranjan, Sayan Ghosh, Ali Jannesari, Nathan R. Tallent, Abhinav Bhatele

44 views Apr 6

Academic · 1 min

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

arXiv:2604.01489v1 Announce Type: new Abstract: High-performance GPU kernels are critical to modern machine learning systems, yet developing efficient implementations remains a challenging, expert-driven process due …

Tara Saba, Anne Ouyang, Xujie Si, Fan Long

62 views Apr 3

Academic · 1 min

FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models

arXiv:2604.01762v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets. However, …

Juyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang

71 views Apr 3

Academic · 1 min

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

arXiv:2603.22465v1 Announce Type: new Abstract: Federated Learning (FL) is constrained by the communication and energy limitations of decentralized edge devices. While gradient sparsification via Top-K …

Emmanouil M. Athanasakos

62 views Mar 25

Academic · 1 min

ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture

arXiv:2603.21340v1 Announce Type: new Abstract: This paper presents ARYA, a composable, physics-constrained, deterministic world model architecture built on five foundational principles: nano models, composability, causal …

Seth Dobrin, Lukasz Chmiel

43 views Mar 24

Academic · 1 min

Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI

arXiv:2603.18104v1 Announce Type: new Abstract: Prevailing AI training infrastructure assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer …

Houston Haynes

63 views Mar 20

Academic · 1 min

MineDraft: A Framework for Batch Parallel Speculative Decoding

arXiv:2603.18016v1 Announce Type: new Abstract: Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are …

Zhenwei Tang, Arun Verma, Zijian Zhou, Zhaoxuan Wu, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low

69 views Mar 20

Academic · 1 min

Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers

arXiv:2603.12684v1 Announce Type: new Abstract: Federated Clustering (FC) is an emerging and promising solution in exploring data distribution patterns from distributed and privacy-protected data in …

Yue Zhang, Chuanlong Qiu, Xinfa Liao, Yiqun Zhang

50 views Mar 16

Academic · 1 min

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators

arXiv:2603.10026v1 Announce Type: cross Abstract: Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has …

Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu

64 views Mar 12

1 2 3

#cs.DC

ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three …

Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe

FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

ARYA: A Physics-Constrained Composable & Deterministic World Model Architecture

Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI

MineDraft: A Framework for Batch Parallel Speculative Decoding

Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators

JCG, PC

HSOLLC Co., Ltd.