We are developing a new kind of algorithm where agents in reinforcement learning can adapt to random unobservable delays. This is done by training up a model of the system and using that to compute distributions over possible future states to make more informed decisions. We use deep neural network to represent the models.