Pessimistic-Optimistic Bandit Learning with Applications to Communication Networks