Learning and Decision Making in Social Contexts: Neural and Computational Models

Our concluding chapter summarizes our contributions to the field of social cognition and suggests directions for further research. Our models lay a foundation for studying increasingly sophisticated forms of social cognition in future work.

Introduction

Social Cognition

Given the pervasiveness of social interaction described above, it is easy to see how a wide variety of cognitive operations fall into this category. We believe that by focusing on the adaptive nature of social intelligence, we can understand the cognitive mechanisms underlying a wide range of social phenomena across many disciplines within social psychology.

Computational Models

We believe that three features in particular characterize a good computational model of social cognition: biological reliability, functional capacity, and explainability. If a model contains enough parameters and mechanisms, it can recreate any data set without necessarily affecting our understanding of social cognition.

Functional Neuroanatomy

The Value Based Framework
Value Estimation
Value Integration
Value Updating
Value Modulation
Action Selection
Working Memory
Summary

In the following sections, we describe how value-based decision making is organized in the human brain. In the value-based framework, the brain tracks the expected value of performing an action (the standard value estimate), observes the impact of the action on the environment, and evaluates that outcome in relation to the individual's current goals.

Neural Engineering Framework

Representation, Transformation, and Dynamics
Optimizing or Learning Encoders and Decoders
Two Example Networks: Working Memory and Action Se- lectionlection

State space transformations in the NEF are specified by describing the dynamics of the state variables. Functional decoders df are used to calculate the negative of the current memory representation, ˆf(x) = −ˆx, so that diff sees an effective input of u(t) − ˆx(t).

Thesis Outline

Introduction

Fear Conditioning Protocols and Terminology
Neuroanatomy
Function

This represents a single “pair” of CS and US; typical fear conditioning experiments consist of multiple pairings to strongly reinforce the association (although single learning is possible in some animals in some contexts). As acquisition continues, the animal learns that the CS predicts the onset of the US and begins to show a fear response to the CS itself.

Computational Models

Minimal Functional Model
Anatomically Detailed Model

The CS signal has dimensionalityimCS (default value is 3) and is presented for a duration of 1 second, followed by 1 second of silence (all zeros). CTX-”, and the CS is presented without an accompanying US.BLA's CTX response begins at zero but steadily decreases as errorCTX-safe drives PES learning.

Results

Fear Expression
Neural Responses
External Activation or Inactivation
Fear Generalization

In CTX+, this CTX-induced fear response combines with CS-induced fear responses in the BLApyr (and CeLon) to produce maximal levels of freezing. We also found that BLApyr inhibition during acquisition attenuated fear responses (row 4, column 2), presumably because contextual fear associations were not properly learned. 2.18 (right panel) shows how the model's fear responses to the CS* vary as a function of this similarity.

Discussion

Biological Realism
Functional Capacity
Comparison to Other Computational Models
Empirical Validation
Summary

As the dimensionality of the input stimulus increases, fear generalization curves retain their sigmoidal shape, regardless of the degree of pattern separation in the model. Finally, our learning rules are implemented online based on errors computed elsewhere in the network. The study of fear conditioning in AMY has a long history of theoretical and computational work.

Conclusion

In our simulations, we presented the US simultaneously with the CS; the simultaneous presence of these signals was required for learning in our model. Our model also does not show effects such as secondary conditioning (pairing the US with CS1, then pairing the CS2 with CS1, will induce a fear response to CS2), spontaneous recovery (forgetting contextual safety associations is slowly forgotten over time), reinstatement (presentation of the US in CTX- eliminates contextual safety associations) or rapid retrieval (replay of previously paired CS-US stimuli leads to faster learning than a new CS-US pair). Our model learns that in CTX-, when the CS is present but the US is not, that CTX- is safe.

Introduction

To study the biological foundations of the human brain and design biologically inspired cognitive algorithms, we need models that integrate biophysical details and cognitive abilities. Our goal is to show that osNEF can be used to construct different functional neural networks from different biologically detailed components. Second, to demonstrate a concrete cognitive application, we construct a biologically detailed model of working memory in the PFC that performs an idealized memory task.

Oracle-Supervised NEF

Target Tuning Curves
Online Learning Rule for Encoders and Decoders
Optimizing Synaptic Time Constants
Neuron Models

In the oracle flow, x(t) passes through filters (rectangles), which convolve (t) with a filter h(t) and nodes (diamonds), where state space transformations are applied. 3.2, the input signal drives a population of pre-neurons according to the standard NEF encoding: x(t) is converted to a synaptic current that drives the dynamics in the neuron model. When smoothing spikes for the purpose of encoder learning (Eq. 3.2) or pre-pop synapsing, the choice of time constant makes little difference, as long as it sufficiently smooths the spike noise (eg, tr >10 ms ).

Results

Representation
Computation
Application

The network architecture is shown in Fig.3.5; the target function is calculated between detailed neuron populations pop1 and pop2. Errors are higher than in Fig.3.6, as expected given the greater difficulty of the calculation. By doing this, we effectively "unroll" the iteration, but still use osNEF to train network parameters given a dynamic input signal.

Discussion

Biological Plausibility
Cognitive Capacity
Usability
Comparison to Other Methods

In the full-FORCE method [49], the recurrent activities of a neural network are trained using a parallel target generation network driven by the desired output of the system. In their paper, the authors show that a recursive least-squares optimization process, which compares the target activities with the activities of the task executing network, can be used to train recurrent weights in the posterior network and reproduce a wide variety of dynamics. These NEF-style techniques can be used to construct models that encode rolling windows of input history, and the resulting neural activities closely resemble the memory responses of temporal cells in the cortex [249].

Conclusion

Author's note: some of the content in this chapter was previously published as a journal article in the Proceedings of the 42nd Annual Conference of the Cognitive Science Society [55].

Introduction

Many natural and artificial DM tasks have some form of SAT, so it is not surprising that the brain's DM systems have evolved to accommodate this trade-off. Urgency affects DM if an agent is rewarded for acting quickly, which can occur if the size of the reward depends on the decision time, or if multiple actions (and associated rewards) are allowed within a fixed time window. It monitors the accumulated evidence and the passage of time, which together control a gate that inhibits action selection until the model's decision criteria are met.

Background

Cognitive Tasks
Neuroanatomy
Theoretical Models

Finally, sequential sampling tasks are largely decoupled from sensory processing, meaning that individual differences in perceptual abilities do not confound the analysis of the SAT. Numerous extensions of the DDM have sought to increase its cognitive and neural realism, for example by capturing the effects of urgency and uncertainty. Other theoretical models provide an alternative account of DM that addresses some of the weaknesses of DDM.

Model

Then the gate receives an input based on the elapsed time t in the current trial; this linearly increasing signal is multiplied by the constant −wtime, which represents the strength of the time urgency. First, the value passed from the accumulator to the action is based on the signed difference between the estimates of the integrated value: this value must exceed a dynamic threshold imposed by the gate before the action is activated and makes a selection. Second, the dynamic threshold calculated after the transition is affected by the confidence of the choice, which depends on the absolute difference between the estimates of the integrated values: when wdelta >0, large differences in the collected evidence will thus increase for faster decisions.

Results

Dynamics
Individual Behavior
Speed Accuracy Tradeoff

The accumulator maintains an accurate estimate of the integrated value over time (blue and orange lines, left panel), and the accumulator–action link calculates the signed difference between these estimates (pink line, right panel). Percentages of responses for each histogram bin are shown on the y-axis, and mean accuracies are given in the plot legends. For each row, we created a population of agents with different values of the specified variable, and then plotted their average number of characters sampled against their average accuracy (y-axis).

Discussion

In terms of cognitive realism, our model included various mechanisms that dynamically regulate the speed and accuracy of DM. In this chapter, we repeatedly showed that each of the parameters of our model contributed independently to the SAT, both in the general case and for the optimized agents. However, we did not investigate whether simpler versions of our model could adequately reproduce the trends we identified in the human data.

Conclusion

Finally, there are some behavioral implications from empirical studies that would be worth examining using our model. The structure of our model should take into account different mechanisms related to cognitive bias: in fact, we have already simulated a value bias mechanism via the S parameter. Overall, using our model to investigate the idiosyncratic properties of human DM and test the hypothesis about their neural bases is worthwhile.

Introduction

In many ways, the NEF network we present in this chapter is the culmination of the models we have developed in previous chapters. We begin this chapter by reviewing the mathematics of RL and related functional neuroanatomy. To gain a broader understanding of how RL can be realized in cognitive systems, we design and simulate three classes of agents using three different cognitive frameworks: deep neural networks, ACT-R, and NEF.

Background

Reinforcement Learning
Q-learning
Social Value Orientation and the Trust Game

It then estimates Q(s, a) by combining R(s, a) with an estimate of the value of possible future states, Q(s′, a∗), where a∗ is any available action in the new state p'. Human behavior in TG has been widely studied from several perspectives, including personal and cultural analyzes [115], neural and cognitive analyzes [47], determinants of prosocial tendencies [236], and more. Behavior in TG has a clear prosocial component distinct from maximizing personal rewards: while investor behavior is most closely correlated with repayment expectations and perceived trustworthiness, trustee behavior is.

Methods

Social Value Orientation and the Reward Function
Deep Q-Network
Instance-Based Learner
Neural Engineering Framework Agent
Human Experiment
Participant SVO
Simulated Opponents

In TG, the state contains two pieces of information: the current round of the game (1-5) and the number of coins available (0-30). To represent the state of the environment in our network, we translate our external input (round number and available coins) into a special high-dimensional vector called a spatial semantic pointer (SSP). During phase one, the agent perceives the current state of the environment s′ and chooses an action a′.

Empirical Results

Learning Trajectories
Eliminating Non-learners
Final Strategies

Our first dependent variable is score, or the number of coins participants earned in each round of the game (0–30 coins). Each plot shows the distribution of generosity of the indicated population of participants in the last three matches of the TG experiment. We only see differences in two of the conditions: prosocial participants are significantly more generous when playing against a greedy investor and against a generous trustee.

Simulated Results

RL Agents Learn Optimal Strategies
Learning Trajectories under cognitive constraints
High-level Trends

First, we compared the generosity of proself and prosocial agents in the last three games of the experiment. Using this procedure, we were able to find population parameters that better reproduced the distribution of human behavior in the final games. As expected, proself agents were more likely to keep all available coins in the final round for a larger payoff, while prosocial agents were more likely to continue their generous behavior in the final round.

Discussion

Summary of Contributions
Summary of Limitations and Future Work

This agent is simply a neural network that evaluates candidate actions based on the current state of the environment as input. The complexity and plausibility of the NEF agent introduced many challenges that need to be addressed in future work. First, our agents represented the state of the world by encoding the current round of the TG and the number of coins available.

Conclusion

Although we have argued that our architectures and representations are sufficiently general to apply to other forms of social cognition, these claims need to be tested. Then we need to teach our officers to play other social games, like the Prisoner's Dilemma. Finally, we want to train our agents to play against other learning agents in tournament-style competitions, and apply the techniques of multi-agent reinforcement learning to study how social norms evolve in a society full of independent, cognitively plausible learners.