RL Dresden

Reinforcement Learning Research Group

Contact us »


Optimal control for traffic dynamics

With an ever-growing interest in machine learning applications in a connected world, Reinforcement Learning (RL) occupies a unique position. It can learn and perform tasks in uncertain and complex circumstances. The goal is for an agent to maximize the long-term reward by performing actions in the environment. Typical real-world applications of RL include self-driving vessels, traffic flow optimization, or playing games like Starcraft or Go on a super-human level. RL has the potential to outsource error- and accident-prone human labor to a more risk-free and time-efficient automated agent. We are driven by the motivation that RL brings advances in human labor efficiency and, more importantly, human safety.

RLCDD 2022

On September 15-16, 2022, the "Friedrich List" faculty of transport and traffic sciences hosted a conference on Reinforcement Learning. The conference was organized by the Reinforcement Learning Group Dresden and was held in a hybrid format, with both online and on-site participation and was attended by over 100 participants from more than 10 countries.

Check out our conference website here .


December 5-6, 2023

Martin Waltz gave a presentation entitled "Local Path Planning in Transportation using Reinforcement Learning" on the, 1st Symposium On Lifelong Explainable Robot Learning in Nürnberg.

September 24-28, 2023

Dianzhao Li paricipated in the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC) where he introduced his work on "Vision-based DRL Autonomous Driving Agent with Sim2Real Transfer".

September 11-14, 2023

Ostap Okhrin and Niklas Paulig paricipated in the Statistical Week 2023 at TU Dortmund, where Ostap Okhrin gave a talk on "Two-sample Testing in Reinforcement Learning" and Niklas Paulig presented his work on "Robust Path Following on Rivers Using Bootstrapped Reinforcement Learning".

August 29, 2023

Chair member Ankit Chaudhari gave an interview at the workshop on 'Integrated Engineering for Future Mobility' in Delhi. The workshop aimed at fostering collaboration and generating innovative research ideas related to urban mobility by using a "Design Thinking" approach.

June 8, 2023

Martin Waltz paper "Spatial-temporal recurrent reinforcement learning for autonomous ships" has been published in Neural Networks

June 7, 2023

Ostap Okhrin and Dianzhao Li participated in the 2. Sächsische KI-Kongress des Freistaates Sachsen. This prestigious event brought together more than 250 distinguished guests representing business, science, society, and politics, creating a dynamic platform for discussions on the latest developments and trends in the field of AI. Livestream

May 8, 2023

The chair recently conducted drone-based traffic data collection effort on the A50 highway in Milan, Italy. With a co-ordinated fleet of 7 Drones flying in a line, traffic was captured over a 1000m road section for 130 minutes. Our team member Ankit Chaudhari oversaw the data collection on site.

April 27, 2023

Ankit Chaudhari recently participated in a "design thinking based" workshop on 'Integrated Engineering for Future Mobility' organized by the German Centre for Research and Innovation (DWIH) in New Delhi, India.

April 25, 2023

Fabian Hart's paper "Vessel-following model for inland waterways based on deep reinforcement learning" is accepted for publication in the journal of Ocean Engineering.


Map of the north sea area

Steering an autonomous vessel through the north sea, using real shipping routes

Current simulation-based research for autonomous surface vehicles (ASV) often makes unrealistic assumptions. The effects of the environmental disturbances (wind, waves, and currents) are generally neglected, and other traffic participants are considered non-reactive, linearly moving obstacles. Further, the control actions are regularly specified to impact the surge force and the yaw moment while neglecting the low-level translation into, e.g., revolutions per second of a propeller and the rudder angle. Finally, the simulation environments do not consider realistic geometries like rivers, the corresponding water depths, and the latter's impact on the ship's maneuverability.

This paper proposes a realistic modularized framework for controlling au- tonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL). The framework comprises two levels: a high- level local path planning (LPP) unit and a low-level path following (PF) unit, each consisting of a DRL agent. The LPP agent is responsible for planning a path under consideration of nearby vessels, traffic rules, and the geometry of the waterway. The PF agent is responsible for low-level actuator control while accounting for shallow water influences on the marine craft and the envi- ronmental forces winds, waves, and currents.

Robust Sim2Real transfer with DRL for Autonomous vehicles

Training an RL agent in the real world is impossible and expensive, thus, simulation is an important tool for developing RL agents. But if we want to transfer the trained agent in the real world, the established discrepancies between the simulation and reality will cause a reality gap and make the transfer of RL policies a challenge.

To train the autonomous lane following and overtaking agent, we separate the agent into two different modules, namely the perception and control modules. The perception module transfers the image input into compact information about the environment. Afterward, the control module utilizes the information from the perception module as part of the observation and outputs control commands for the vehicle. Thus, the trained agent can be transferred from the simulation to the real world with a minor modification of the perception module. See the GitHub repository here.

Map of the north sea area
Path following validation

Robust path following on Rivers for medium-sized cargo ships

Spatial restrictions due to waterway geometry and the resulting challenges, such as high flow velocities or shallow banks, require controlled and precise control commands for steering. This is achieved by augmenting the agent's perception with environmental information such as current-velocity and direction and water depth.

A test-based bootstrapped Q-learning algorithm is used in combination with a versatile training environment generator to develop a robust and accurate rudder-controller. Validation on the lower and middle rhine in both directions indicate a state of the art performance despite stark environmental differences across the scenarios.

Two dimensional traffic simulator

Developed lane-bound and non-lane-bound traffic simulator uses ahierarchical reinforcement learning algorithm to guide trafficagents. We decomposed the control task into different modules as alongitudinal and lateral control policy to control movements or decision policy that decides the overtaking maneuver.

Different RL algorithms are used for each module, such as the Deep Deterministic Policy Gradient (DDPG), or the Twin delayed DDPG (TD3). The models are trained in a synthetic environment and through semi-supervised learning with actual trajectories. The central goal is to develop a system of fully autonomous vehicles based on reinforcement learning methods.

Picture of Two dimensional traffic simulator
Small robotic cars

Improving RL with real-world trajectories

When training an RL agent for traffic tasks, the agent's behavior depends entirely on the definition of the reward function based on experience, which does not guarantee perfect actions. Human demonstrations are often used to improve the agent's behavior or to speed up the learning process. These demonstrations are generated and recorded in a simulated or limited real environment. This leads to restrictions on generalization and biases towards data.

We consider a fundamental challenge for developing autonomous car-following RL agents with real-world driving datasets instead of human demonstration in simulated environments. We mix the human experience into the replay buffer of the agent and let the agent improve the behavior by extracting human demonstration in the real world.

Tackling Overestimation Bias in RL

Even long-standing and widespread Reinforcement Learning algorithms like Q-Learning have severe limitations under specific conditions. One particular issue is the usage of the maximization operator during the target computation since the algorithm is an instantiation of the Bellman optimality equation in a sample-based procedure. This maximization frequently leads to exaggerated Q-values of state-action pairs, further transmitted through the following updates.

Early contributions in the literature recognized this limitation, and the most common method to combat this problem is Double Q-Learning. However, this procedure introduces an underestimation bias, and more flexible specifications are needed. We are working on a new set of estimators for the underlying max-mu-problem and simultaneously translate it into the case where deep neural networks serve as function approximators.

Plots showing overestimatin bias in RL


RL Dresden Algorithm Suite [GitHub]

This suite implements several model-free off-policy deep reinforcement learning algorithms for discrete and continuous action spaces in PyTorch.

Plots showing overestimatin bias in RL

Mixed Traffic Web Simulators [mtreiber.de]

Fully operational, 2D, Javascript-based web simulator implementing the "Mixed Traffic flow Model" (MTM). This simulation is intended to demonstrate fully two-dimensional but directed traffic flow and visulalize 2D flow models.

Mixed traffic web simulator

Sim2Real Transfer package with Duckiebot [GitHub]

This package includes the training and evaluation code under ROS platform for Sim2Real Transfer with Duckiebot for multiple autonomous driving behaviors.

Sim2Real Transfer package with Duckiebot


Card image cap
Vision-based DRL Autonomous Driving Agent with Sim2Real Transfer

Li, D.,, & Okhrin, O. (2023). arXiv preprint arXiv:2305.11589

Card image cap
2-Level Reinforcement Learning for Ships on Inland Waterways

Waltz, M., Paulig, N., & Okhrin, O. (2023). arXiv preprint arXiv:2307.16769

Card image cap
A Platform-Agnostic Deep Reinforcement Learning Framework for Effective Sim2Real Transfer in Autonomous Driving

Li, D., & Okhrin, O. (2023). arXiv preprint arXiv:2304.08235

Card image cap
Robust Path Following on Rivers Using Bootstrapped Reinforcement Learning

Paulig, N., & Okhrin, O. (2023). arXiv preprint arXiv:2303.15178

Card image cap
Enhanced method for reinforcement learning based dynamic obstacle avoidance by assessment of collision risk

Hart, F., & Okhrin, O. (2022). arXiv preprint arXiv:2212.04123

Card image cap
Spatial-temporal recurrent reinforcement learning for autonomous ships

Waltz, M., & Okhrin, O. (2022). Neural Networks

Card image cap
Vessel-following model for inland waterways based on deep reinforcement learning

Hart, F., Okhrin, O., & Treiber, M. (2023). Ocean Engineering, 281, 114679

Card image cap
Two-Sample Testing in Reinforcement Learning

Waltz, M., & Okhrin, O. (2022). arXiv preprint arXiv:2201.08078

Card image cap
Missing Velocity Information in Dynamic Obstacle Avoidance based on Deep Reinforcement Learning

Hart, F., Waltz, M., & Okhrin, O. (2021). arXiv preprint arXiv:2112.12465

Card image cap
Formulation and validation of a car-following model based on deep reinforcement learning

Hart, F., Okhrin, O., & Treiber, M. (2021). arXiv preprint arXiv:2109.14268

Card image cap
DDPG car-following model with real-world human driving experience in CARLA

Li, D., & Okhrin, O. (2023). Transportation Research Part C: Emerging Technologies, Volume 147


Prof. Dr. Ostap Okhrin

Ostap Okhrin is the professor for Statistics and Econometrics at the Department of Transportation at the TU Dresden. His expertise lies in mathematical statistics and data science with applications in transportation and economics.

Prof. Dr. Ostap Okhrin
Dr. Martin Treiber

Dr. Martin Treiber

Martin Treiber is a senior expert in traffic flow models including human and automated driving, bicycle, and pedestrian traffic. He also works in traffic data analysis and simulation (traffic-simulation.de, mtreiber.de/mixedTraffic).

Niklas Paulig

Niklas Paulig is a RL Group research associate, with his main field of research being the modeling of autonomous inland vessel traffic based on reinforcement learning methods, and HPC implementations of currently in use algorithms.

Niklas Paulig
Martin Waltz

Martin Waltz

Martin Waltz conducted his studies in Industrial Engineering and is now a research associate, with his main research focus being (Deep) Reinforcement Learning.

Ankit Anil Chaudhari

Ankit Chaudhari is currently working on "Enhancing Traffic-Flow Understanding by Two-Dimensional Microscopic Models". His research interests are traffic flow modelling, traffic simulation, mixed traffic flow, machine learning and reinforcement learning.

Ankit Anil Chaudhari
Dianzhao Li

Dianzhao Li

Dianzhao Li is a research assistant at RL-Dresden, focusing on the area of trajectory planning for autonomously driving vehicles with reinforcement learning algorithms. He now mixes the human driving datasets with RL in simulated environments to achieve better performance for the vehicles.

Paul Auerbach

Paul Auerbach is a research associate at the Barkhausen Institut and collaborates with RL-Dresden on the simulation and solving of traffic scenarios with the help of reinforcement learning. He aims to transfer the learned RL models to real world model cars.

Paul Auerbach
Gong Chen

Gong Chen

Gong Chen is a research associate within RL-Dresden, concentrating on applying reinforcement learning to simulate shipping traffic under shallow water conditions.

Ehemalige Mitarbeiter
Fabian Hart
Hadil Romdhane