- RL: Reinforcement learning(RL) gained huge recognition as being able to produce agents that can dynamically act in diverse environments. Typically RL solves Markov decision problems where the agent tries to maximize the cumulative reward. Usually, this is done by first training an agent on a huge amount of sample data and then deploying it to a real situation. Although it is possible to train an RL agent also during operation, this is usually not done as the agent's learning performance greatly depends on the exploration vs. exploitation tradeoff. The biggest difference between RL and RBL is that RL does not use any a priori knowledge and, therefore, always starts learning from scratch. With the exploitation of the knowledge given to RBL, it is possible to deploy RBL to a real situation directly and solve the problem right from the start. Even in the case that situations were unforeseen in the a priori knowledge, RBL can learn to circumvent these and stay operational. On the other hand, RL agents can achieve much better performance than RBL on a given task. During training, RL agents also take exploratory actions, which lead them to acquire new knowledge about the world. RBL only takes exploratory actions when a failure occurs and only in the scope of the knowledge provided.
- POND: POND provides an interesting approach to solve partial-observable and non-deterministic planning problems. It combines different search techniques and heuristics and switches between them dynamically. Furthermore, it employs a base representation for the problem from which other representations can be calculated, which are then used for the search or heuristic. Compared to RBL, it probably performs better when everything about the problem is known at the start time. However, POND lacks the capability to use feedback from the execution to reevaluate its plans, and it is also not able to learn new information. \cite{Bryce2006PONDT}
- FF-Replan: FF-Replan was the winner of the 2004 International Probabilistic Planning Competition. It achieves this by first constructing a determinist planning problem out of the probabilistic planning problem and replanning when it encounters encounters encounters encounters encounters a state that differs from the expected one. The conversion from a probabilistic plan to a determinist plan is either done by single-outcome or all-outcome. Single-outcome chooses a single action among many probable ones depending on a heuristic, and all-outcome creates a separate action for each. FF-replan in all-outcome mode is quite similar to our approach. In RBL, we are not concerned with probabilistic actions per se. However, we permit each action to fail without knowing before which action and when the action will fail. We could emulate such behavior in FF-replan by adding a fail outcome to every action. However, FF-replan would still not learn from failures like RBL and would probably get stuck as soon as reality would not conform to its knowledge (e.g., a wall is in reality where there is non in the model). \cite{robert2007}
- PRM-RL: PRM-RL combines Probabilistic Roadmaps (PRM) and RL. It first trains an RL agent via Monte Carlo selection on a similar environment as the target environment to get an agent that can successfully move in the environment. After this step, the agent is deployed to the target environment, and the PRM builder creates a PRM based on a uniform sampling of the agent's movement. Only collision-free point-to-point navigation is retained in the PRM. After these two training steps, PRM-RL can successfully navigate the target environment. Although PRM-RL currently lacks the ability to learn in operation like RBL, it is not hard to imagine that the PRM builder could also be run as soon as the PRM model differs from the environment and therefore signals that a fault occurred. A benefit of PRM-RL is that it does not need a priori knowledge, and it infers everything from training. However, the RL agent and the PRM builder need this training to function properly, compared to RBL, which can be used without training at all. \\cite{tapia2017}
For the characteristics, we selected "Needs model," "Needs Probabilities," "Needs Training," "Learns during operation," "Performs Exploration," "Failure resilient," and "Guarantees." We think these are the most important characteristics when someone wants to decide which approach he should use. Our results are presented in Table \ref{356637}, and a detailed description of the characteristics can be found below.