A few weeks ago, OpenAI attempted a new major milestone in AI development, a (nearly) full game of Dota2 against some of the best human players.
Although the OpenAI Five was defeated by both of its professional opponents, the level of play was high and at times the match looked fairly even.
This is amazing as the full game of Dota2 is very complex.
Even more incredibly, the agent was trained using a relatively simple and very general reinforcement learning algorithm, PPO.
One of the weaknesses of vanilla deep reinforcement learning is that policies and values learned are typically limited to a single environment, the one the agent was trained on.
In other words, it is hard to transfer policies from one setting to another.
This is in sharp contrast to how humans learn to do stuff.
We draw heavily on past experiences and quickly learn what combination of skills we already have that works well in a new environment.