New CRUCB Algorithm Tackles Rising Rewards in Combinatorial Bandits

Published 2026-04-01T00:00:00-0700 Cached 2026-03-03T05:19:50+0000

CRB framework models rising rewards across base arms The authors introduce the Combinatorial Rising Bandit (CRB) framework to capture scenarios where playing a base arm yields an immediate reward and also enhances future rewards, affecting multiple super arms that share that base arm—dependencies absent from existing bandit models [1].

CRUCB algorithm offers provable efficiency and empirical performance They propose the Combinatorial Rising Upper Confidence Bound (CRUCB) algorithm, prove its efficiency, and demonstrate strong results in realistic deep‑reinforcement‑learning environments and synthetic settings, achieving tight regret bounds [1].

Rising reward concept applies to robotics, advertising, routing, recommendations The paper highlights practical relevance in domains such as robots improving through practice, social influence strengthening recommendation histories, network routing, and social advertising, where actions have lasting impact on future rewards [1].

Theoretical analysis yields tight regret bounds for CRUCB Their analysis establishes regret bounds that are tight, confirming the algorithm’s theoretical rigor alongside the empirical findings [1].

Code for CRUCB released publicly on GitHub The implementation is made available at https://github.com/ml-postech/Combinatorial-Rising-Bandits, enabling reproducibility and further research by the community [1].

Paper presented at ICLR 2026, 14th International Conference on Learning Representations The work appears in the proceedings of the 14th ICLR, indicating peer‑reviewed validation within the machine‑learning community [1].

Top Headlines

Feeds

New CRUCB Algorithm Tackles Rising Rewards in Combinatorial Bandits

Published 2026-04-01T00:00:00-0700 Cached 2026-03-03T05:19:50+0000

Links