With an ever-growing interest in machine learning applications in a connected world, Reinforcement Learning (RL) occupies a unique position. It can learn and perform tasks in uncertain and complex circumstances. The goal is for an agent to maximize the long-term reward by performing actions in the environment. Typical real-world applications of RL include self-driving vessels, traffic flow optimization, or playing games like Starcraft or Go on a super-human level. RL has the potential to outsource error- and accident-prone human labor to a more risk-free and time-efficient automated agent. We are driven by the motivation that RL brings advances in human labor efficiency and, more importantly, human safety.
On September 15-16, 2022, the "Friedrich List" faculty of transport and traffic sciences hosted a conference on Reinforcement Learning. The conference was organized by the Reinforcement Learning Group Dresden and was held in a hybrid format, with both online and on-site participation and was attended by over 100 participants from more than 10 countries.
Check out our conference website here .
Martin Waltz gave a presentation entitled "Local Path Planning in Transportation using Reinforcement Learning" on the, 1st Symposium On Lifelong Explainable Robot Learning in Nürnberg.
Dianzhao Li paricipated in the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC) where he introduced his work on "Vision-based DRL Autonomous Driving Agent with Sim2Real Transfer".
Ostap Okhrin and Niklas Paulig paricipated in the Statistical Week 2023 at TU Dortmund, where Ostap Okhrin gave a talk on "Two-sample Testing in Reinforcement Learning" and Niklas Paulig presented his work on "Robust Path Following on Rivers Using Bootstrapped Reinforcement Learning".
Chair member Ankit Chaudhari gave an interview at the workshop on 'Integrated Engineering for Future Mobility' in Delhi. The workshop aimed at fostering collaboration and generating innovative research ideas related to urban mobility by using a "Design Thinking" approach.
Martin Waltz paper "Spatial-temporal recurrent reinforcement learning for autonomous ships" has been published in Neural Networks
Ostap Okhrin and Dianzhao Li participated in the 2. Sächsische KI-Kongress des Freistaates Sachsen. This prestigious event brought together more than 250 distinguished guests representing business, science, society, and politics, creating a dynamic platform for discussions on the latest developments and trends in the field of AI. Livestream
The chair recently conducted drone-based traffic data collection effort on the A50 highway in Milan, Italy. With a co-ordinated fleet of 7 Drones flying in a line, traffic was captured over a 1000m road section for 130 minutes. Our team member Ankit Chaudhari oversaw the data collection on site.
Ankit Chaudhari recently participated in a "design thinking based" workshop on 'Integrated Engineering for Future Mobility' organized by the German Centre for Research and Innovation (DWIH) in New Delhi, India.
Fabian Hart's paper "Vessel-following model for inland waterways based on deep reinforcement learning" is accepted for publication in the journal of Ocean Engineering.
Current simulation-based research for autonomous surface vehicles (ASV) often makes unrealistic assumptions. The effects of the environmental disturbances (wind, waves, and currents) are generally neglected, and other traffic participants are considered non-reactive, linearly moving obstacles. Further, the control actions are regularly specified to impact the surge force and the yaw moment while neglecting the low-level translation into, e.g., revolutions per second of a propeller and the rudder angle. Finally, the simulation environments do not consider realistic geometries like rivers, the corresponding water depths, and the latter's impact on the ship's maneuverability.
This paper proposes a realistic modularized framework for controlling au- tonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL). The framework comprises two levels: a high- level local path planning (LPP) unit and a low-level path following (PF) unit, each consisting of a DRL agent. The LPP agent is responsible for planning a path under consideration of nearby vessels, traffic rules, and the geometry of the waterway. The PF agent is responsible for low-level actuator control while accounting for shallow water influences on the marine craft and the envi- ronmental forces winds, waves, and currents.
Training an RL agent in the real world is impossible and expensive, thus, simulation is an important tool for developing RL agents. But if we want to transfer the trained agent in the real world, the established discrepancies between the simulation and reality will cause a reality gap and make the transfer of RL policies a challenge.
To train the autonomous lane following and overtaking agent, we separate the agent into two different modules, namely the perception and control modules. The perception module transfers the image input into compact information about the environment. Afterward, the control module utilizes the information from the perception module as part of the observation and outputs control commands for the vehicle. Thus, the trained agent can be transferred from the simulation to the real world with a minor modification of the perception module. See the GitHub repository here.
Spatial restrictions due to waterway geometry and the resulting challenges, such as high ﬂow velocities or shallow banks, require controlled and precise control commands for steering. This is achieved by augmenting the agent's perception with environmental information such as current-velocity and direction and water depth.
A test-based bootstrapped Q-learning algorithm is used in combination with a versatile training environment generator to develop a robust and accurate rudder-controller. Validation on the lower and middle rhine in both directions indicate a state of the art performance despite stark environmental differences across the scenarios.
Developed lane-bound and non-lane-bound traffic simulator uses ahierarchical reinforcement learning algorithm to guide trafficagents. We decomposed the control task into different modules as alongitudinal and lateral control policy to control movements or decision policy that decides the overtaking maneuver.
Different RL algorithms are used for each module, such as the Deep Deterministic Policy Gradient (DDPG), or the Twin delayed DDPG (TD3). The models are trained in a synthetic environment and through semi-supervised learning with actual trajectories. The central goal is to develop a system of fully autonomous vehicles based on reinforcement learning methods.
When training an RL agent for traffic tasks, the agent's behavior depends entirely on the definition of the reward function based on experience, which does not guarantee perfect actions. Human demonstrations are often used to improve the agent's behavior or to speed up the learning process. These demonstrations are generated and recorded in a simulated or limited real environment. This leads to restrictions on generalization and biases towards data.
We consider a fundamental challenge for developing autonomous car-following RL agents with real-world driving datasets instead of human demonstration in simulated environments. We mix the human experience into the replay buffer of the agent and let the agent improve the behavior by extracting human demonstration in the real world.
Even long-standing and widespread Reinforcement Learning algorithms like Q-Learning have severe limitations under specific conditions. One particular issue is the usage of the maximization operator during the target computation since the algorithm is an instantiation of the Bellman optimality equation in a sample-based procedure. This maximization frequently leads to exaggerated Q-values of state-action pairs, further transmitted through the following updates.
Early contributions in the literature recognized this limitation, and the most common method to combat this problem is Double Q-Learning. However, this procedure introduces an underestimation bias, and more flexible specifications are needed. We are working on a new set of estimators for the underlying max-mu-problem and simultaneously translate it into the case where deep neural networks serve as function approximators.
This suite implements several model-free off-policy deep reinforcement learning algorithms for discrete and continuous action spaces in PyTorch.
This package includes the training and evaluation code under ROS platform for Sim2Real Transfer with Duckiebot for multiple autonomous driving behaviors.
Li, D.,, & Okhrin, O. (2023). arXiv preprint arXiv:2305.11589
Waltz, M., Paulig, N., & Okhrin, O. (2023). arXiv preprint arXiv:2307.16769
Li, D., & Okhrin, O. (2023). arXiv preprint arXiv:2304.08235
Paulig, N., & Okhrin, O. (2023). arXiv preprint arXiv:2303.15178
Hart, F., & Okhrin, O. (2022). arXiv preprint arXiv:2212.04123
Waltz, M., & Okhrin, O. (2022). Neural Networks
Hart, F., Okhrin, O., & Treiber, M. (2023). Ocean Engineering, 281, 114679
Waltz, M., & Okhrin, O. (2022). arXiv preprint arXiv:2201.08078
Hart, F., Waltz, M., & Okhrin, O. (2021). arXiv preprint arXiv:2112.12465
Hart, F., Okhrin, O., & Treiber, M. (2021). arXiv preprint arXiv:2109.14268
Ostap Okhrin is the professor for Statistics and Econometrics at the Department of Transportation at the TU Dresden. His expertise lies in mathematical statistics and data science with applications in transportation and economics.
Martin Treiber is a senior expert in traffic flow models including human and automated driving, bicycle, and pedestrian traffic. He also works in traffic data analysis and simulation (traffic-simulation.de, mtreiber.de/mixedTraffic).
Niklas Paulig is a RL Group research associate, with his main field of research being the modeling of autonomous inland vessel traffic based on reinforcement learning methods, and HPC implementations of currently in use algorithms.
Martin Waltz conducted his studies in Industrial Engineering and is now a research associate, with his main research focus being (Deep) Reinforcement Learning.
Ankit Chaudhari is currently working on "Enhancing Traffic-Flow Understanding by Two-Dimensional Microscopic Models". His research interests are traffic flow modelling, traffic simulation, mixed traffic flow, machine learning and reinforcement learning.
Dianzhao Li is a research assistant at RL-Dresden, focusing on the area of trajectory planning for autonomously driving vehicles with reinforcement learning algorithms. He now mixes the human driving datasets with RL in simulated environments to achieve better performance for the vehicles.
Paul Auerbach is a research associate at the Barkhausen Institut and collaborates with RL-Dresden on the simulation and solving of traffic scenarios with the help of reinforcement learning. He aims to transfer the learned RL models to real world model cars.
Gong Chen is a research associate within RL-Dresden, concentrating on applying reinforcement learning to simulate shipping traffic under shallow water conditions.