Visual Intelligence Online Seminar #80: Robust RL with Learning Automata: Structure, Timescales, and Mixed Strategies

Abstract

Learning Automata (LA) are lightweight stochastic decision makers that adapt action probabilities from noisy bandit feedback—no environment model required. Many real systems feature resources that deplete with use yet recover with rest, so choosing an action today alters both future reward and access (e.g., budgets, congestion, scheduling). We present a two-timescale LA tailored to this depletion–recovery coupling: a fast loop tracks instantaneous yield and recovery state, while a slow loop adjusts the policy against that quasi-stationary backdrop.

Next, we scale to huge action spaces with the Hierarchical Continuous Pursuit Automaton (HCPA): a tree of two-action pursuit learners which achieves ε-optimality with dramatic speedups in environments with large numbers of actions. Finally, we use artificial reflecting barriers to convert the classical, absorbing Linear Reward–Inaction (LR-I) scheme into an ergodic one that converges to mixed Nash equilibria under limited feedback. Together, these ideas—structure (hierarchies), timescale separation (fast recovery tracking + slow policy), and mild constraints (reflecting barriers)—make LA robust, scalable, and well-suited to environments where rewards both deplete and recover over time.

Presented by Anis Yazidi, Associate Professor at University of Oslo / Professor at the AI Lab, OsloMet

Abstract