Reinforcement Learning Research

RL Group Dresden

Observe and learn!

Motivation

Optimal control for traffic dynamics

With an ever-growing interest in machine learning applications in a connected world, Reinforcement Learning (RL) occupies a unique position. It can learn and perform tasks in uncertain and complex circumstances. The goal is for an agent to maximize the long-term reward by performing actions in the environment. Typical real-world applications of RL include self-driving vessels, traffic flow optimization, or playing games like Starcraft or Go on a super-human level. RL has the potential to outsource error- and accident-prone human labor to a more risk-free and time-efficient automated agent. We are driven by the motivation that RL brings advances in human labor efficiency and, more importantly, human safety.

Projects

Picture of Two dimensional traffic simulator

Two dimensional traffic simulator

Developed lane-bound and non-lane-bound traffic simulator uses a hierarchical reinforcement learning algorithm to guide traffic agents. We decomposed the control task into different modules as a longitudinal and lateral control policy to control movements or decision policy that decides the overtaking maneuver.

Different RL algorithms are used for each module, such as the Deep Deterministic Policy Gradient (DDPG), or the Twin delayed DDPG (TD3). The models are trained in a synthetic environment and through semi-supervised learning with actual trajectories. The central goal is to develop a system of fully autonomous vehicles based on reinforcement learning methods. Youtube Button to view an example video of the twodimensional traffic simulator

Improving RL with real-world trajectories

When training an RL agent for traffic tasks, the agent's behavior depends entirely on the definition of the reward function based on experience, which does not guarantee perfect actions. Human demonstrations are often used to improve the agent's behavior or to speed up the learning process. These demonstrations are generated and recorded in a simulated or limited real environment. This leads to restrictions on generalization and biases towards data.

We consider a fundamental challenge for developing autonomous car-following RL agents with real-world driving datasets instead of human demonstration in simulated environments. We mix the human experience into the replay buffer of the agent and let the agent improve the behavior by extracting human demonstration in the real world. Youtube Button to view an example video of Carla Simulator

Plots to show overestimation in RL

Tackling Overestimation Bias in RL

Even long-standing and widespread Reinforcement Learning algorithms like Q-Learning have severe limitations under specific conditions. One particular issue is the usage of the maximization operator during the target computation since the algorithm is an instantiation of the Bellman optimality equation in a sample-based procedure. This maximization frequently leads to exaggerated Q-values of state-action pairs, further transmitted through the following updates.

Early contributions in the literature recognized this limitation, and the most common method to combat this problem is Double Q-Learning. However, this procedure introduces an underestimation bias, and more flexible specifications are needed. We are working on a new set of estimators for the underlying max-mu-problem and simultaneously translate it into the case where deep neural networks serve as function approximators.

Picture of heterogenous traffic in India

Calibrating Wiedemann-99 parameters

Drivers show different driving behaviors such as free-flow driving, following a leader, closing/approaching, and emergency braking. It is required to replicate driving behavior for each phase. One of the critical elements of using microscopic models is calibration. We calibrated the Wiedemann-99 models’ various parameters to match the observed real traffic behavior.

We are building a Reinforcement Learning-based two-dimensional traffic flow model, the classical models’ parameters such as Wiedemann parameters are vital for understanding and evaluating the performance of developed RL algorithms, and will compare RL model with classical models to find out if, when, and under which aspect, they are "better".

Software

Github RL Suite

RL Dresden Algorithm Suite [GitHub]

This Suite implements several model-free off-policy deep reinforcement learning algorithms for discrete and continuous action spaces in PyTorch.

Mixed Traffic Simulator

Mixed Traffic Web Simulators [mtreiber.de] [traffic-simulation.de]

Fully operational, 2D, Javascript-based web simulator implementing the "Mixed Traffic flow Model" (MTM). This simulation is intended to demonstrate fully two-dimensional but directed traffic flow and visulalize 2D flow models.

Members

Prof. Dr. Ostap Okhrin

Prof. Dr. Ostap Okhrin

Ostap Okhrin is the professor for Statistics and Econometrics at the Department of Transportation at the TU Dresden. His expertise lies in mathematical statistics and data science with applications in transportation and economics.

Dr. Martin Treiber

Dr. Martin Treiber

Martin Treiber is a senior expert in traffic flow models including human and automated driving, bicycle, and pedestrian traffic. He also works in traffic data analysis and simulation (traffic-simulation.de, mtreiber.de/mixedTraffic).

Fabian Hart

Fabian Hart

Fabian Hart was studying electrical engineering at TU Dresden. His main research focuses on modeling two-dimensional traffic based on Deep Reinforcement Learning methods.

Martin Waltz

Martin Waltz

Martin Waltz conducted his studies in Industrial Engineering and is now a research associate, with his main research focus being (Deep) Reinforcement Learning.

Niklas Paulig

Niklas Paulig

Niklas Paulig is a RL Group research associate, with his main field of research being the modeling of autonomous inland vessel traffic based on reinforcement learning methods, and HPC implementations of currently in use algorithms.

Dianzhao Li

Dianzhao Li

Dianzhao Li is a research assistant in the RL group, focusing on the area of trajectory planning for autonomously driving vehicles with Reinforcement learning algorithms. He now mixes the human driving datasets with RL in simulated environments to achieve better performance for the vehicles.

Ankit Anil Chaudhari

Ankit Anil Chaudhari

Ankit Chaudhari is currently working on "Enhancing Traffic-Flow Understanding by Two-Dimensional Microscopic Models". His research interests are traffic flow modelling, traffic simulation, mixed traffic flow, machine learning and reinforcement learning.

Members

Prof. Dr. Ostap Okhrin

Prof. Dr. Ostap Okhrin

Ostap Okhrin is the professor for Statistics and Econometrics at the Department of Transportation at the TU Dresden. His expertise lies in mathematical statistics and data science with applications in transportation and economics.

Dr. Martin Treiber

Dr. Martin Treiber

Martin Treiber is a senior expert in traffic flow models including human and automated driving, bicycle, and pedestrian traffic. He also works in traffic data analysis and simulation (traffic-simulation.de, mtreiber.de/mixedTraffic).

Fabian Hart

Fabian Hart

Fabian Hart was studying electrical engineering at TU Dresden. His main research focuses on modeling two-dimensional traffic based on Deep Reinforcement Learning methods.

Martin Waltz

Martin Waltz

Martin Waltz conducted his studies in Industrial Engineering and is now a research associate, with his main research focus being (Deep) Reinforcement Learning.

Niklas Paulig

Niklas Paulig

Niklas Paulig is a research associate, with his main field of research being the modeling of autonomous inland vessel traffic based on reinforcement learning methods, and HPC implementations of currently in use algorithms.

Dianzhao Li

Dianzhao Li

Dianzhao Li is a research assistant in the RL group, focusing on the area of trajectory planning for autonomously driving vehicles with Reinforcement learning algorithms. He now mixes the human driving datasets with RL in simulated environments to achieve better performance for the vehicles.

Ankit Anil Chaudhari

Ankit Anil Chaudhari

Ankit Chaudhari is currently working on "Enhancing Traffic-Flow Understanding by Two-Dimensional Microscopic Models". His research interests are traffic flow modelling, traffic simulation, mixed traffic flow, machine learning and reinforcement learning.