With this project, I aim at investigating new methods to make Reinforcement Learning (RL) more robust. More precisely, I am going to study how to apply the Conformal Prediction method to Off-Policy Evaluation (OPE) algorithms, where the reward of a given policy (the target policy) must be estimated using historical data gathered under a different policy (the behavior policy). Most OPE methods do not come with accuracy and certainty guarantees. The aim is to present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty. The main challenge in OPE stems from the distribution shift due to the discrepancies between the target and the behavior policies. Experiments carried out on this cluster will be useful to validate the theoretical guarantees with practical experiments.