The main contribution of this chapter is the representation of a multi-AUV Mine Countermeasures (MCM) planning problem with a very small state space MDP. This allows using an optimal solution method in low computation time. As a result, this Markov Decision Process (MDP) model has a potential to be applied to an underwater robotic network and provide a real-time output for the agents. Some key points of the model are:
• State space mapped from physical space to time to reduce dimensionality
• Action space is defined in a way that brings the problem from multi-agent to a single agent by explicit task allocation for all vehicles.
• The interval between the elements of t is not always the same, as in most MDP formulations, however, the problem is still discrete and the framework applicable.
Some heavy assumptions in this model are made:
0
50
100
150
200
250
Value
0
10
20
30
40
50
Po
lic
y
orig
G20
G30
G100
RP = 45
RP = 75
gamma = 0.1
gamma = 0.5
gamma = 0.7
speed = 3
swath = 50
Figure 4.18: Convergence variation
• Using a uniform distribution to represent the number and position of the expected targets, • Approximate expected shortest path calculation.
• Sensor is treated as blackbox with consistent specifications.
• Automatic Target Recognition (ATR) is available.
• No navigation disruptions are observed during the mission — no currents to change vehicles tracks or to make them use additional battery or change speed.
4.5
Summary
This chapter develops a decision-making method that balances the tradeoffs of multi-AUV Mine Coun- termeasures (MCM) mission and the Synchronous Rendezvous (SR) scheduling. As a result, the agents can select an optimal action to maximise the desired output of the mission while complying with hard and soft constraints. In the mine hunting scenario, the set goal is to maximise a search area until the battery of the vehicles is exhausted. One constraint is that the detections found during the search phase need to be revisited. In order to exchange status and contact data, the vehicles meet multiple times throughout the mission, at a Synchronous Rendezvous (SR). The time and place for this SR also needs to be decided autonomously. This resource management tradeoff of covering more area while revisiting contacts and scheduling meeting points is what the Synchronous Rendezvous (SR)-Markov Decision Process (MDP) is solving. It provides an optimal action policy for the agents, that takes into
account estimates for all possible future states, up until the end of the mission.
The main contribution of this chapter is that the SR-MDP model is capable of computing a solution in manageable time, with potential to reach real time execution. This is achieved by discretising the action and state spaces without losing vital mission information. The other critical element is that utilising the SR method made the decision making intervals very few throughout the mission and this allowed defining the SR-MDP with a finite horizon.
This work can be expanded by using an actual model for mine field distribution or making the agents learn a model by turning the problem from planning to Reinforcement learning (RL). Another aspect for improvement is using a different solution method, rather than Value Iteration. Novel approximate solutions are available and can significantly reduce the computation time to reach real-time execution. Finally, validation of the model, or parts of it, in a real experiment, will bring insights into how suitable the input functions are and what can be updated to make the MDP applicable for a real MCM mission.
Chapter 5
Simulation and Experimental
Validation
Validation in trials for marine robotics algorithms is not always possible. The reason is the high cost of equipment, such as AUVs, surface vehicles or additional infrastructure. Often, to test autonomy features, the vehicles need to have advanced capabilities in terms of platform design and sensory suite. Buying or building such platforms is expensive. In addition to that, the physical size of the vehicle or the area of interest for data collection might require additional equipment such as hiring and managing a ship and operation crew. This is the reason much of the work in autonomy is validated in simulation or only partially tested in trials.
5.1
Simulation Environment
There are software architectures that serve as ’robot frameworks’, providing operating system capa- bilities, such as managing low-level control and software functionality, and middleware capabilities, bridging the operating system with additional applications. They provide suitable simulation testbeds for development before deployment. The two most common robot frameworks used in the underwater community are Robot Operating System (ROS), usually with the UWSim extension [105], and MOOS- IvP (Mission Oriented Operating Suite - Interval Programming) [106]. Both of them are open source. Neptune [107] is another well accepted framework, adopted mainly by defence institutions, since it is proprietary. Some other available open-source frameworks exist, with more limited exposure in the
community, such as Dune [108], Rock [109], AVA[110].
The tool used for simulation validation in this thesis is MOOS-IvP. Its development started with marine autonomy in mind and subsequent range of capabilities were developed. ROS has similar design and functionality, however it has a larger community, both in underwater and other robotics domains.
This section starts with a general overview of the MOOS-IvP tool. Then, the implementation of the Synchronous Rendezvous (SR) method is discussed. Partial validation of features and assumptions of the SR approach is available from trials, and results are presented at the end of the section.