Reinforcement Learning with Random DelaysSimon Ramstedt, Yann Bouteiller, G. Beltrame, Christopher Pal & Jonathan BinasICLR 2021[arXiv:2010.02966]A low-bias, low-variance value estimator for environments with random action and observation delays. The estimator is used and evaluated with our new Delay-Correcting Actor-Critic algorithm.
Real-Time Reinforcement LearningSimon Ramstedt & Christopher PalNeurIPS 2019[arXiv:1911.04448]A new framework for Reinforcement Learning in which states and actions evolve simultaneously. It acknowledges that action selection takes time. We introduce the Real-Time Actor-Critic algorithm.
Projects
Robin VLM2023[github.com/cerc-aai/robin]Robin is a software suite to train vision-language models. We released data, training code and weights for an open VLM based on Mistral-7B and SigLIP, implemented using PyTorch and DeepSpeed and trained on eight A100 GPUs on the HessianAI computing cluster.
Uniton2021[github.com/rmst/uniton][Demo Video]Uniton is an asynchronous RPC framework for the Unity game engine and Python with the goal to instrumentalize Unity and make it more useful for non-game applications.
RTRL2019[github.com/rmst/rtrl]Code accompanying our Real-Time Reinforcement Learning paper with implementations of Real-Time Actor-Critic and Soft Actor-Critic in Python and Pytorch.
Avenue2017-2019[github.com/elementai/avenue]A fast, easy-to-use, high-fidelity car simulator based on the Unity game engine.
DDPG2016[github.com/rmst/ddpg]The first open reproduction of the Deep Deterministic Policy Gradient algorithm by Lillicrap, et al., 2015, in both MATLAB with manually-coded gradient computation and also the (at the time) brand new Tensorflow automatic differentiation framework.