MaBERT:A Padding Safe Interleaved Transformer Mamba Hybrid Encoder for Efficient Extended Context Masked Language Modeling
arXiv:2603.03001v1 Announce Type: new Abstract: Self attention encoders such as Bidirectional Encoder Representations from Transformers(BERT) scale quadratically with sequence length, making long context modeling expensive. …
Jinwoong Kim, Sangjin Park
3 views