Technical Report: UTEP-CS-18-08


Traditionally, in machine learning, the quality of the result improves steadily with time (usually slowly but still steadily). However, as we start applying reinforcement learning techniques to solve complex tasks -- such as teaching a computer to play a complex game like Go -- we often encounter a situation in which for a long time, then is no improvement, and then suddenly, the system's efficiency jumps almost to its maximum. A similar phenomenon occurs in human learning, where it is known as the aha-moment. In this paper, we provide a possible explanation for this phenomenon, and show that this explanation leads to the need to reward students for effort as well, not only for their results.