Each time step, an agent attempts to determine the optimal action to take given the true state which is based on the true trust model of the environment. In a fully observable world, the agent would know where it is at and who it could trust. In a partially observable world, the agent must estimate these values.
At a given time step, an agent calculates the expected reward for each of its potential actions from each of its possible states. The agent selects the action with the highest estimated reward. After taking the selected action, the agent observes the changes in the environment and the actions of the other agents when possible. If the actions of the other agents provide indications of their trust level, the agent updates its trust model accordingly. Finally, after observing the changed environment, the agent updates its state belief model based on its current trust model before attempting to decide on its next action. Figure 3.1 illustrates the cycle of events during the TI-POMDP execution.
The trust model, τ , is a component of the state belief and is updated based on the observations prior to the state belief update. The agent updates its trust model and then updates its state belief distribution.
In the trust model update, if agent a observes agent b commit an untrustworthy act, agent a reduces its trust rating of agent b based on the rules of the trust modeling representation used (ie. trust vectors [17] or reputation based [24]). Using the binary trust model, agent a reduces agent b’s trust rating from 1 to 0. Additionally, if agent
a believes agent c also observed agent b’s action, agent a lowers its estimate of agent c’s trust rating of b from 1 to 0. The overall effect is that agent a places less trust
in agent b and agent a believes that agent c also lowers its trust in agent b. Agent a uses this trust model in future interactions to decide whether to interact with agent
b and to estimate how agent c interacts with agent b. The new trust model and the
observations about the new state are required for the state belief update which then restarts the process.
The primary reason to separate the trust model update is to increase the flexibil- ity of the trust model. Handling the trust model update separately allows the under- lying trust model to change without impacting the rest of the TI-POMDP framework. The trust model update must provide the agent’s current trust ratings of itself, its ratings of the other agents, and the other agent’s ratings of all of the agents. A vector trust model [17], a reputation based trust model [22, 24], a multidimensional trust model [18], or another trust model can be implemented to return the appropriate ratings as needed. This allows model selection based on applicability to the domain. The ability to use different trust models creates the problem of selecting the appropriate model for a given domain. Comparison testing can determine which model performs best in a specific domain. A more general solution is to implement separate models in parallel and use a decision process to dynamically choose which model to use at a given time [26].
The second reason to update the trust model separately from the state belief update is to reduce the combinatorics of the state belief update. Including the trust model as a complete component of the state multiplies the number of states by the total number of possible trust models an agent can have. In a basic two agent envi-
ronment where agents are either completely trustworthy or completely untrustworthy, that state space is multiplied by an agent’s potential ratings of itself, potential rat- ings of the other agents, and potential ratings of the other agent’s ratings of every other agent. In this case, the state space is needlessly multiplied by a factor of eight. The agent knows its true trust rating and the other trust ratings are based on prior knowledge and experience. Instead of trying to estimate those values, an agent can use its knowledge to focus on making the best possible decision at the current time.
The final reason to separate the trust model update is to eliminate fluctuations in an agent’s trust model that can lead to a breakdown of trust within the system. If an agent’s trust model is a component of its state belief probability distribution, the agent can become extremely unpredictable or uncooperative. The purpose of the trust model is to help the agent choose the most beneficial action for the given state. If the trust model is a component of the state belief, the agent must find the most beneficial action for each possible trust state, calculate the expected reward for each action and then select the action that leads to the highest probable reward. For instance, a trustworthy agent believes with 0.9 probability that the other agent is trustworthy in a binary trust domain where successfully cooperating yields a reward of 10, working alone has an associated reward of 1, and the reward for being betrayed is −100. The agents expected reward for cooperating is 0.9∗10+0.1∗−100 = −1 while the expected reward for working alone is 1. Given this situation, the agent always chooses to work alone because of all the potential states, this one has the highest expected reward.
The state belief update (Equation 3.1) requires the agent incorporate its current observations into its previous state belief in an effort to determine its current state. The agent calculates the likelihood of making its current observations in each of the possible states it may have reached given the distribution over the prior state(s) and the action(s) taken. The previous state belief is then updated based on the observation likelihoods for each state. Once a group of agents updates their state beliefs, they can select and execute their next action. The action execution causes a state transition and the agents receive rewards based on their new state.