Multi-Agent Lipschitz Bandits
arXiv:2602.16965v1 Announce Type: new Abstract: We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. …
All Articles
arXiv:2602.16965v1 Announce Type: new Abstract: We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. …
arXiv:2602.16966v1 Announce Type: new Abstract: Scalable Multi-Agent Reinforcement Learning (MARL) is fundamentally challenged by the curse of dimensionality. A common solution is to exploit locality, …
arXiv:2602.16967v1 Announce Type: new Abstract: Grokking -- the abrupt transition from memorization to generalization after prolonged training -- has been linked to confinement on low-dimensional …
arXiv:2602.16977v1 Announce Type: new Abstract: We identify a structural weakness in current large language model (LLM) alignment: modern refusal mechanisms are fail-open. While existing approaches …
arXiv:2602.16980v1 Announce Type: new Abstract: Modern language models exhibit rich internal structure, yet little is known about how privacy-sensitive behaviors, such as personally identifiable information …
arXiv:2602.16994v1 Announce Type: new Abstract: Multi-path speculative decoding accelerates lossless sampling from a target model by using a cheaper draft model to generate a draft …
arXiv:2602.17009v1 Announce Type: new Abstract: Coordinating actions is the most fundamental form of cooperation in multi-agent reinforcement learning (MARL). Successful decentralized decision-making often depends not …
arXiv:2602.17013v1 Announce Type: new Abstract: We establish a rigorous connection between pathwise (reparameterization) and score-function (Malliavin) gradient estimators by showing that both arise from the …
arXiv:2602.17025v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) is effective for training language models on complex reasoning. However, since the objective is defined …
arXiv:2602.17027v1 Announce Type: new Abstract: Scientific discovery pipelines typically involve complex, rigid, and time-consuming processes, from data preparation to analyzing and interpreting findings. Recent advances …
arXiv:2602.17028v1 Announce Type: new Abstract: Detecting anomalies in time-series data is critical in domains such as industrial operations, finance, and cybersecurity, where early identification of …
arXiv:2602.17063v1 Announce Type: new Abstract: Sub-bit model compression seeks storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a …