The battery level of the vehicles and workload, based on detected targets, are used to represent the MDP state space for the SR-MDP problem. This makes the state space a 2D matrix.
The battery level, or the endurance of the vehicle, can be expressed in time units. By knowing how much operational time the vehicle has left, it can decide whether and by how much a task can be postponed.
The workload is based on the shortest path to visit known detections and an example is given in the right-hand side of Figure 3.1 in Chapter 3. The workload depends on the number and location of the detections. The initial point of the path is the current SR and the final point is the next SR. This assignment is called RI and corresponds to the same task from Chapter 3. Once a target is visited, it is removed from the list of detections.
Both the battery (S1) and workload, or RI, (S2) state spaces are defined in time and are continuous. In
order to make an online planner, the MDP needs to be solved for the full 2D state space of positive real integers, S = R2
+, in a computationally efficient way. It is an open research question how to deal with
continuous state spaces and this decision depends on the application and how much approximation it allows. Value function approximation methods are a current research topic, such as neural networks and regression trees representation of the value function output [36]. Another solution is using factored
Figure 4.2: Discretised state space
representation of the transition probably function by a Bayesian network, where states are grouped together and assigned the same value function [98], [99].
Another consideration for the MDP framework, regarding the state space representation, is the assump- tion of observability — the agent knows its current and surrounding states, or has full observability. Real world problems involve the agent receiving information through noisy sensors so there is uncer- tainty about the current state, and often do not see the full extent of the state space. This is the case of partial observability. The POMDP model considers this, however this is not taken into account in the development of the SR-MDP method.
The focus is on simplification of the continuous state space in order to reduce the computation of the planner and make it applicable for real time operations. The straightforward method to solve the continuous state space problem is to discretise it, such as shown in Figure 4.2.
There are two disadvantages to applying discretisation. First, an assumption is made that the evalua- tion within each cell is constant. The larger the cell discretisation intervals are made, the stronger this assumption becomes. However, if the discretisation interval is reduced, this increases the computation load. The second disadvantage is the ‘curse of dimensionality‘, a common problem for MDP frame- works [100]. This refers to an exponential growth of computational complexity with regards to the number of state spaces and does not scale for large problems. For example, if we have n number of state space dimensions and we discretise them in k intervals, the resulting state space will be kn. For a 2D
problem with 100 discrete values, this results in 1002= 10000. However, for a 10D problem, this will
be 1010, which is not computationally feasible at the moment. The discretisation state space method is recommended for small 1D or 2D problems [101], while additional proofs and other approaches are suggested for higher dimentionality [102], [103].
In order to take advantage of the state discretisation method, the MCM problem is represented in the time domain, as described earlier. The location information of the vehicles is contained in the battery state variable, S1. The assumption is that their location is defined by the time the vehicles
spend doing a search task in a lawnmower pattern, with predefined area parameters and constant speed (all environmental disruptions are ignored). The target locations are contained in the workload state variable, S2, by calculating the time to reach the detections based on shortest path calculation.
These state representations enable the use of time-to-track vehicles and targets locations, instead of x and y coordinates, and each state space is transferred from 2D to 1D.
The selected interval discretisation of one minute for both S1 and S2, makes the state space discrete
and finite ordered sequence. Moving from continuous to discrete space approximation, the notation for state changes to ¯s for a specific state and ¯S for the full set.
¯ S1 ∈ [s0, s1, ..., si]T, i ∈ [0 , 600 ) (4.4) ¯ S2 ∈ [s0, s1, ..., sj]T, j ∈ [0 , 300 ) (4.5) ¯ S = ¯si,j ∈ Nl ×m, l = 600 , m = 300 (4.6)
where ¯si,jis the corresponding element from ¯S1and ¯S2respectively. ¯S is a set of non-negative integers,
represented as a matrix1.
The choice of interval discretisation, one minute, is based on the assumption that change for the underwater system will not be significant within that time, due to the slow speed of AUVs.
The choice of limiting the battery state, S1, to 600 minutes means giving the vehicles 10 hours en-
durance, until they need to finish the mission (specifications based on the SeaCat vehicle, further details given in Chapter 5).
The workload state, S2, limit was selected to keep the state space low. This also gives a practical limit
to the number of targets the system can handle. If all three vehicles with combined operational time of 3 × 600 minutes need to do RI task for more than half of this time (3 × 300), then it will be better
to do another coarse search of the area, as there might be too many false alarms and reallocating all of them will have limited benefit for the mission.
The states in an MDP can be terminal and non-terminal. A terminal state triggers termination of the mission. In the simulation, terminal states are all states that have reached the battery state limit: s(−1, :).