N-Slot Bandit Lab
Compare exploration-exploitation strategies in a stochastic multi-armed bandit.
Each arm has an unknown Bernoulli reward probability. At each step the agent chooses one arm and observes reward 0 or 1.
- Epsilon-Greedy: random exploration with probability epsilon.
- UCB1: optimism bonus for uncertain arms.
- Thompson Sampling: Bayesian posterior sampling via Beta distributions.
Average Reward
Cumulative Regret
Arms (True Probabilities)
Edit manually or randomize, then click Apply.
Run Info
Current run: 0 / 0
Step: 0 / 0
Avg reward: -
Cum regret: -