Motivations, Opportunities and Challenges
Dr. N. Kemal Ure – Director of AI, Eatron Technologies

Part 3: Challenges and Potential Solutions

In the previous part of these series of blog posts on Reinforcement Learning (RL), we looked at motivations and opportunities in applying RL to real world autonomous driving problems. In this part, we present some of the key challenges and corresponding potential solutions.

1. Capturing Real World Dynamics in Simulation

Most RL algorithms are trained on simulators, due to need for high volumes of data and to avoid executing dangerous maneuvers in real life. However, if the dynamics of simulation and real world are mismatched, there can be a significant drop in real-life performance. One of the promising research directions is to learn generative dynamical models of real world, such as traffic dynamics, so that we can integrate our simulators with these models to simulate complex real-life situations and reduce the simulation gap. In some sense, data collected from real life would help simulators to get more accurate, which in turn would help RL algorithms get better.

2. Capturing Perception Uncertainty in Planning

One of the central challenges in executing planning algorithms in real-life problems is handling perception uncertainty. Most planning algorithms, including RL-based ones, work best when the perception pipeline is working with high precision. Alas, there are many real world scenarios where the perception is partially lost due to bad lighting, occlusion and other external disturbances. Quantifying the perception uncertainties and feeding this information to the planning loop can enable taking safe actions in critical situations in autonomous driving. This topic is known as uncertainty-aware-planning, and it is one of the most promising directions to enable fully automated driving in real world.

3. Solving Highly Complex Decision-Making Tasks

Even the state-of-the-art RL algorithms start to struggle in environments where the rewards are sparse and actions are needed to be executed over long horizons. For instance, current RL algorithms can handle controlled environments such as highways and race tracks, where decisions are relatively short-termed and the dynamics are relatively predictable. On the other hand, navigating in an urban environment involves many unpredictable factors, such as road conditions, pedestrians and accidents. Decisions should also be optimized for longer horizons, for instance to avoid traffic jam, we might need to start planning our route at least an hour ago. It is widely believed that standard RL approaches would not be able to solve these types of complex problems efficiently. A promising direction is developing agents that learn how to decompose complex problems into smaller, more manageable pieces, and then compose the solutions from simple problems to generate structured solutions to the target problem.

4. Safety, Robustness and Evaluation Performance in the Real World

Unlike feedback control systems designed with conventional approaches, Deep RL algorithms do not offer any theoretical guarantees on robustness and safety. Currently, only way to check if an RL algorithm violates safety conditions is stress testing in the simulator. Although this approach can work for small to medium scale problems, it is nearly impossible to check all possible failure cases for large-scale problems. An important research direction today is discovering new methods for incorporating tools from optimization and verification theory to analysis of deep neural networks, so that we can strictly check if an RL-driven autonomous car does not violate safety conditions.