An implementation of DeepMind's breakthrough algorithm, with improvements.
Implemented Sampled-MuZero-Reanalyze with sample-efficiency improvements found here, in JAX, in DeepMind’s Acme RL framework, synthesizing these implementations. The goal here was my own understanding, and to build it for my own and others’ general usage, particularly given its outstanding performance in discrete action spaces. (Full understanding of two source implementations complete; I’ve just begun to combine them in JAX/Acme.)
Coming Soon!