Optimistic Thompson sampling : strategic exploration in bandits and reinforcement learning