Deep reinforcement learning with constraints and demonstrations

Research division

Reinforcement Learning (RL) has been successfully applied to a number of problems, such as robotic control, task scheduling, and telecommunications.
In the progressive learning process, RL agents are generally free to explore all potential behaviors. However, this freedom is not acceptable in many real-world applications, since "free" exploration could lead to dangerous actions that damage the system or even hurt people. In these types of situations, it must be ensured that the exploration is completely safe and controlled. The first objective of this thesis is therefore to propose a novel method that is capable to handle general constraints that occur commonly in real-world applications (such as discounted cumulative, mean value, state-wise constraints). The respect of constraints throughout the learning process is expected in order to guarantee the security requirements.
The second objective of this thesis is to accelerate the convergence speed. This is motivated by the fact that the convergence of RL algorithms, when it occurs, is often very slow. One way to speed it up is to take advantage of human knowledge, which indicates usually data from expert demonstrations. The method developed in this thesis will be able to utilize experts demonstrations of IFPEN, including both measured data and optimal solutions from determinist optimizations.
Moreover, it is known that successfully applying an RL algorithm to a real-world application is often a challenge. The last objective of this thesis is then to make the proposed method user-friendly : easy to apply to real-world applications. The method should be tested on some IFPEN applications, such as eco-driving, closed-loop control of wind farms, and electrical network control.

Keywords: reinforcement learning, constrained markov decision processes, optimal control, optimization

  • Academic supervisor    CR, BUSIC Ana, Inria Paris / Département d’Informatique de l’ENS, Université PSL
  • Doctoral School    ED386 DI ENS,
  • IFPEN supervisor    Dr, ZHU Jiamin, Control, Signal and System,
  • PhD location    Département d’Informatique de l’ENS, Paris, France & IFP Energies nouvelles, Rueil-Malmaison, France   
  • Duration and start date    3 years, starting in fourth quarter 2021
  • Employer    INRIA, Paris, France
  • Academic requirements    University Master degree in relevant disciplines
  • Language requirements    Fluency in English, willingness to learn French
  • Other requirements    Knowledge on information, probability/ statistics and data science, optimization/ optimal control

To apply, please send your cover letter, your transcripts for levels L3, M1, M2, and CV to the IFPEN supervisor indicated here above.

IFPEN supervisor
ZHU Jiamin
Signal et Système
About IFP Energies nouvelles

IFP Energies nouvelles is a French public-sector research, innovation and training center. Its mission is to develop efficient, economical, clean and sustainable technologies in the fields of energy, transport and the environment. For more information, see our WEB site. 
IFPEN offers a stimulating research environment, with access to first in class laboratory infrastructures and computing facilities. All PhD students have access to dedicated seminars and training sessions. For more information, please see our dedicated WEB pages.