Softmax Strategy 1. epsilon-greedy strategy 11111 2. UCB strategy 222 3. Softmax strategy 333 4. Gradient strategy 444 References [1] 科学网—【RL系列】Multi-Armed Bandit笔记——Softmax选择策略 - 管金昱的博文 [2] The Epsilon-Greedy Algorithm | James D. McCaffrey