DAPA: Distribution Aware Piecewise Activation Functions for On-Device Transformer Inference and Training
arXiv:2603.19338v1 Announce Type: new Abstract: Non-linear activation functions play a pivotal role in on-device inference and training, as they not only consume substantial hardware resources …
Maoyang Xiang, Bo Wang
7 views