N-Slot Bandit

AI Simulator Platform

N-Slot Bandit Lab

Compare exploration-exploitation strategies in a stochastic multi-armed bandit.

Each arm has an unknown Bernoulli reward probability. At each step the agent chooses one arm and observes reward 0 or 1.

  • Epsilon-Greedy: random exploration with probability epsilon.
  • UCB1: optimism bonus for uncertain arms.
  • Thompson Sampling: Bayesian posterior sampling via Beta distributions.

Average Reward

Cumulative Regret

Arms (True Probabilities)

Edit manually or randomize, then click Apply.

Run Info

Current run: 0 / 0
Step: 0 / 0
Avg reward: -
Cum regret: -