. In our approach, no reward function has to be defined as we only learn from action failures.