This may take a few seconds as the server wakes up.
1925:13:30
Average Score on Ten Games: 38.1
This autonomous agent was trained for two hundred iterations using deep q learning and experience replay. Observations of human expert gameplay were then used to finetune the agent's performance.
Average Score on Ten Games: 0
This autonomous agent is trained to mimic the original human player. Human state-action observations were used as training data for a policy network that predicts the next action given the current state.
Average Score on Ten Games: 0
This agent uses inverse q learning to estimate the q values for a given state. All training is done completely offline using 497 observations of human gamplay.