CAPÍTULO 3. EXPLORACIÓN Y PLANIFICACIÓN
3.3 Fase de planificación
Explicit learning task (hypotheses)
We assessed to what extent our results extended to another learning condition. As mentioned before, data comes from the same experiment, namely fromN = 92 participants who learned about one of three risk–reward structures explicitly (i.e. using a function learning task).
Sometimes people are aware that their primary task is to learn and have the chance to receive explicit feedback about the underlying structure—for example when they learn about the slim chances of winning (or have to compute them) in school, or even when a friend, colleague or your doctoral supervisor explicitly tells you about the probability of getting a paper accepted in the journal you desire. In laboratory studies on cue–criterion learning (sometimes “intentional learning”, Wattenmaker, 1991; Whittlesea, 1987), it has been shown that positive relationships are easier to learn than negative ones (Klayman, 1988). One of our predictions for the learning phase task was that introducing the context of the relationship they are learning (i.e. a risk–reward relationship), participants may learn an inverse risk–reward relationship sooner than a positive one.
Explicit learning task (setup)
Here, such a task was implemented as follows: Participants saw one payoff at a time. Their task was to guess the probability associated with the payoff of the gamble (the probability was covered with a “?”, Figure C4). Participants were informed that all gambles are drawn from the same set of gambles. In each round, participants entered their estimates with a mouse click on a rating scale (0− 100E$) and confirmed them with a click on the value. After each guess, participants received feedback about how close their estimate was to the actual value, but not in which direction they deviated. Participants received points for closer estimates, and could earn 200 points for learning the association (i.e. reaching the criterion +/− 8) in fewer than 30 trials. After five correct guesses (or 100 trials), participants moved on to the choice task. Participants received points for closer estimates (10 points for a deviation of 0, 9 points for a
deviation of 1(...) and no points for deviating more than 8). Correct guesses in the uncorrelated condition were 50% + /− 8). Points translated to bonuses in E$ (i.e. 200 points were equal to a bonus of 200E$, ore2; bonus rules and exchange rate revealed in instructions). After 100 trials, participants moved on to the next part of the experiment irrespective of having reached the criterion or not. This was the case for almost all participants in the uncorrelated condition.
0 % 100%
Guess the probability!
93 E$
? % prob.
0 % 100 %
Guess the probability!
93 E$
Figure C4. Explicit learning task.
Explicit learning task (results)
In contrast to the idea that priors from nonlaboratory environments would aid the learning of an inverse relationship between risks and rewards, the positive risk–reward relationship was learned faster than the negative risk–reward relationship (Mpos. = 16.8 trials,Mne g. = 30.13 trials, difference positive vs. nega-tive: b = −13.34, CI = [−24.37; −2.42]). As expected, most participants in the uncorrelated risk–reward condition completed all 100 trials, a few participants hit the criterion earlier by indicating 50% on five consecutive trials (resulting in M = 86.2 trials). As shown below, the probability estimation task at the end of the experiment revealed that participants’ probability estimates reflected the risk–reward structure they had been exposed to previously.
Posttests (results)
Figure C5. Posttests. (A) Participants decisions under uncertainty were impacted by the risk–reward structures they had been exposed to previosly. (B, C) Payoff and probability estimates were influenced by the risk–reward structure from the incidental learning phase, but in the uncorrelated condition biased towards an inverse relationship between risks and rewards.
The table shows the coefficients of the probability estimates in panel (C).
Condition Slope (β) Highest Density Interval (β) Negative, Explicit −0.90 (−0.93; −0.87)
Positive, Explicit 0.22 (0.15; 0.29) Uncorrelated, Explicit −0.49 (−0.54; −0.44)
Table C2. Participants’ probability estimates reflected the risk-reward structure they had been exposed to previously. The negative condition provided lower probability estimates for gambles with higher payoffs and the positive condition provided higher estimates for higher payoffs. The uncorrelated condition provided lower estimates for higher payoffs (weaker slope compared to the negative condition; buncorre lated=.32, CI = [.25, .29], bpositive = 1.11, CI = [1.04, 1.19], model predicting estimates from reward × Condition interaction, with negative condition as a baseline).
Test phase (descriptive results)
For the test phase, we conducted the same set of analyses as for the incidental learning task. Again, participants in the negative risk–reward condition maximized expected values less than participants in the other two conditions, when making the best choice was emphasized (bunc.>ne g. = 0.24, CI = [0.01, 0.47];
bpos.>ne g. = 0.40, CI = [0.17, 0.63]). There were no reliable response time differences across risk–reward conditions in the best instruction (all CIs included 0), but the positive condition spent slightly more time choosing in the fast instruction (bpos.>ne g. = 0.10, CI = [0.03, 0.19]). Since we did not find robust differences in processing strategies in the fast condition otherwise, we do not interpret this result further.
Again, participants in the negative condition inspected fewer attributes than the other two conditions (Mne g.= 2.77, CI = [2.55, 3.03]). Participants in the other two conditions seemed to sample information more carefully (Mpos. = 3.27, bpos.>ne g.= 0.50, CI = [0.17, 0.84]); Munc. = 3.09, bunc.>ne g. = 0.31, CI = [−0.02, 0.65]). Inspecting more attributes was linked to choosing the higher–EV option across conditions (bbest = 0.13, CI = [0.06, 0.19]; bf ast = 0.06, CI = [0.00, 0.13], all after controlling for individual variation and expected value differences).
●
●
●
●
0.5 0.6 0.7 0.8 0.9 1.0
Best Fast
p (choose higher EV)
●●
Negative Positive Uncorrelated
A
●
●
●
●
0 1 2 3 4 5
Best Fast
Response Times (s)
B
●
●
●
●
1 2 3 4
Best Fast
#AOIs inspected
C
●
●
●
●
0.0 0.5 1.0 1.5 2.0 2.5
Best Fast
#Transitions (within)
D
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
Best Fast
p (gaze on payoff)
E
Figure C6. Descriptive results for the test phase after explicit learning.
Test phase (computational model)
After explicit learning, the estimated parameters across the three risk–reward environments are largely comparable across conditions (most CI’s include 0, Figure C7). This means that the learned risk–reward environment impacted subsequent learning not the same way incidental learning did: Specifically, partic-ipants in the negative risk–reward did not lower their threshold α—i.e. they did not take less time than participants in the other two conditions. Instead, thresholds (Figure C7A) are highest in the positive condition. Moreover, in contrast to the incidental learning conditions, the distribution of gaze impacted evidence accumulation in all conditions (all coefficients> 0), and similarly across conditions (all condition–
dependent CI’s included 0).
●
●
Best Fast
1.0 1.5 2.0 2.5 3.0 3.5 4.0
Threshold (α)
●
Negative Positive Uncorrelated A
●
●
Best Fast
0.0 0.2 0.4 0.6 0.8 1.0
Nondecision time in s (τ)
B
● ●
Best Fast
0.0 0.1 0.2 0.3 0.4
EV Coefficient (δEV) C
●
●
Best Fast
−0.2 0.0 0.2 0.4 0.6 0.8
Gaze Coefficient (δgaze) D
Figure C7. Posterior distributions for the group-level parameter estimates.
Lastly, Figure C8 shows that generally, the parameter estimates from the extended drift diffusion model were consistent with choice patterns of the behavioral data.
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.1 0.2 0.3
EV Coefficient (β1)
p (choose EV) Best
Fast A
0.0 0.2 0.4 0.6 0.8 1.0
-0.5 0.0 0.5 1.0 1.5 Gaze Coefficient (β2) B
0.0 0.2 0.4 0.6 0.8 1.0
0 1 2 3 4 5
Threshold (α) C
Figure C8. Relationship between DDM parameters and behavioral choice results. Each dot represents one participant. (A) Participants who were sensitive to EV differences chose the higher EV option more often. (B) Participants who distributed their attention more evenly (gaze coefficient of 0) chose the higher EV option more often. (C) Participants who set higher thresholds chose the higher EV option more often.
Test phase (discussion)
What is the conceptual difference between explicit and incidental learning? One way to understand these results is to think back of what participants could learn in incidental learning conditions when pricing gambles from different risk–reward environments. They learn two elements of risk–reward environments:
First, they can learn about the functional form of the risk–reward structure (are risks and rewards posi-tively related, negaposi-tively related or uncorrelated?). Second, they can also learn about the expected value distributions in a given environment, with a negative risk–reward structure being composed of many op-tions with similar values ($20 with p = .8, and $80 with p = .2). This resembles the structure of the environment outside the lab, in which EVs across options are typically identical in monetary domains with a pay–to–play structure. For instance, the probability of obtaining a reward when there is a $1 pay–to–play fee is given by p = 1/(1 + gain). This mechanism is for example found in roulette, and gives rise to risks and rewards being inversely related through a power law. In the current experiments, we exposed people to a linear relationship between risks and rewards and thus intermediate values (50E$) had slightly higher EVs than high and low values.
While the functional form can be learned equally well—if not better—in an explicit learning task, there is an additional step involved in then inferring what that means for EV distributions in a given choice environment. It may be the perception of similar EV distributions that led participants in the negative incidental condition to set lower thresholds and sample less information. This is consistent with other work (Leuker et al., 2018b) which indicates that negative risk–reward environments can elicit “EV surprise” for oddballs with higher expected values than usually experienced.