As introduced in Section 4.1, the modelled agent society has a series of overall pa- rameters designed to control the behaviour of the system. Specifically, these include the agents’ initial task parameters (rewards and their assigned durations) and the rate of penalty charges incurred due to delays. The following sections explain these parameters in detail.
4.2.1.1 Initial Reward and Task Duration
Each agent within the system has an assigned task with a specific initial duration (de- noted by Tinit) and a specific initial reward (denoted byRinit). To achieve simplicity, both in calculations and the interpretation of results, the initial durations for all the tasks are set to a constant (denoted by k). On the other hand, the initial task rewards are nor- mally distributed within the society with a mean µ and a standard deviation σ. The reason for choosing a normal distribution as opposed to a mere random variation is to simulate a realistic task distribution within the society, where a higher number of tasks have rewards revolving around a specific mean value with a few exceptions of very high or very low rewarding ones. Thus,TinitandRinitare as follows:
Tinit = k (4.4)
Time a0 a1 a2 c(0,0.9), c(1,0.1) c(0,0.1), c(1,0.9) c(0,0.4), c(1,0.5) £6,000 £4,000 £10,000 t0 θ0 :c(0,0.5) θ0 :c(1,0.2) θ0 :c(1,0.5) t1 θ1 :c(1,0.3) θ1 :c(0,0.4) θ1 :c(1,0.7) t2 θ2 :c(1,0.1) θ2 :c(0,0.8) θ2 :c(1,0.6) t3 θ3 :c(0,0.9) θ2 :c(0,0.4) θ2 :c(0,0.5)
TABLE4.2: A detailed specification of the sample scenario.
4.2.1.2 Capabilities and Actions
As explained in Section 4.1.1, the context consists of two main elements; namely the list ofactionsthat these agents are required to achieve and the differentcapabilitiesthey possess to perform them. The following introduce these main elements in more detail:
Capability: All agents within the domain have an array of capabilities. Each such capability has two parameters: (i) a type value (x) defining the type of that capability and (ii) a capability level (d ∈ [0,1]) defining the agent’s competence level in that capability (1 indicates total competence, 0 no competence). Given this, we denote a capability asc(x,d) : [x, d].
Action: Each action has three main parameters: (i) the specified time (ti) the ac- tion needs to be performed, (ii) the capability type (x) required to perform it, and (iii) the minimum capability level (dm) required. Given this, we denote an action as
θi : [ti, c(x,dm)].
Each agent within the context is seeded with a specified number of such actions (denoted by Tinit see Section 4.2.1.1). Table 4.2 depicts one such sample scenario for a three agent context (a0,a1, anda2) with their respective capabilities and actions.
4.2.1.3 Penalty Charges
If an agent does not complete the task in the assigned periodTinit, it is penalised for its delay and, thus, is only eligible to earn a reduced task reward upon completion (refer to Section 4.1). To model this, we introduce a fixed penalty charge per each extended time slot proportional to the task’s initial reward. Thus, as agents take more time to complete the task, their task reward suffers liner depreciation. The rate of depreciation is also inversely proportional to the task’s initial time spanTinit and is controlled via a
parameter termedmdf (referring to the maximum delay factor). Figure 4.1 depicts this depreciation and the calculation of the penalty per extended time slot is as follows:
Penalty=
Rinit
Tinit∗mdf ifTinit < Text<(Tinit∗mdf), 0 ifText ≤Tinit kText≥(Tinit∗mdf)
(4.6)
where:
• Textis the extended task duration taken to achieve the task.
• Tinitis the initial allocated task duration.
• Rinitis the assigned task reward.
• mdf is the maximum delay factor, which is a constant for all agents.
Thus, for example, an agent with a task worth£10,000 spanning 50 time slots, and an
mdf set to 4, will incur a penalty of£50(£(5010000∗4))per each additional time slot taken to complete the task. If the agent takes more than 200(50∗4)slots its reward would be zero, and, thereafter, it will not incur penalties. The choice of a linear model of depre- ciation was made to achieve simplicity in calculations. On the other hand, the rationale for charging a penalty proportional to the task’s initial reward was to simulate the oppor- tunity cost [Samuelson and Nordhaus, 2001] for the society of delaying tasks that are worth more. In more detail, in a society where resources are constrained, they should be ideally utilised to obtain the maximum benefit to the society. In our context, this relates to using the limited capabilities to complete the tasks to achieve a higher reward collectively as a society. However, self-interested agents always attempt to complete their own task irrespective of its impact to the society. In such a context, if a certain limited resource is used to complete a lower rewarding task in place of a task with a higher reward, it has a higher negative impact to the society as a whole. Or, in other words, using the resource to complete the lower rewarding task, has a high opportunity cost for the society since it results in not completing the high rewarding task. By using a penalty function proportional to the task reward, this reflects this higher opportunity cost to the society, since if a task with higher reward is delayed, such a function incurs a higher penalty reflecting it more than when a lower rewarding task is delayed.