Our concluding chapter summarizes our contributions to the field of social cognition and suggests directions for further research. Our models lay a foundation for studying increasingly sophisticated forms of social cognition in future work.
Introduction
Social Cognition
Given the pervasiveness of social interaction described above, it is easy to see how a wide variety of cognitive operations fall into this category. We believe that by focusing on the adaptive nature of social intelligence, we can understand the cognitive mechanisms underlying a wide range of social phenomena across many disciplines within social psychology.
Computational Models
We believe that three features in particular characterize a good computational model of social cognition: biological reliability, functional capacity, and explainability. If a model contains enough parameters and mechanisms, it can recreate any data set without necessarily affecting our understanding of social cognition.
Functional Neuroanatomy
- The Value Based Framework
- Value Estimation
- Value Integration
- Value Updating
- Value Modulation
- Action Selection
- Working Memory
- Summary
In the following sections, we describe how value-based decision making is organized in the human brain. In the value-based framework, the brain tracks the expected value of performing an action (the standard value estimate), observes the impact of the action on the environment, and evaluates that outcome in relation to the individual's current goals.
Neural Engineering Framework
- Representation, Transformation, and Dynamics
- Optimizing or Learning Encoders and Decoders
- Two Example Networks: Working Memory and Action Se- lectionlection
State space transformations in the NEF are specified by describing the dynamics of the state variables. Functional decoders df are used to calculate the negative of the current memory representation, ˆf(x) = −ˆx, so that diff sees an effective input of u(t) − ˆx(t).
Thesis Outline
Introduction
- Fear Conditioning Protocols and Terminology
- Neuroanatomy
- Function
This represents a single “pair” of CS and US; typical fear conditioning experiments consist of multiple pairings to strongly reinforce the association (although single learning is possible in some animals in some contexts). As acquisition continues, the animal learns that the CS predicts the onset of the US and begins to show a fear response to the CS itself.
Computational Models
- Minimal Functional Model
- Anatomically Detailed Model
The CS signal has dimensionalityimCS (default value is 3) and is presented for a duration of 1 second, followed by 1 second of silence (all zeros). CTX-”, and the CS is presented without an accompanying US.BLA's CTX response begins at zero but steadily decreases as errorCTX-safe drives PES learning.
Results
- Fear Expression
- Neural Responses
- External Activation or Inactivation
- Fear Generalization
In CTX+, this CTX-induced fear response combines with CS-induced fear responses in the BLApyr (and CeLon) to produce maximal levels of freezing. We also found that BLApyr inhibition during acquisition attenuated fear responses (row 4, column 2), presumably because contextual fear associations were not properly learned. 2.18 (right panel) shows how the model's fear responses to the CS* vary as a function of this similarity.
Discussion
- Biological Realism
- Functional Capacity
- Comparison to Other Computational Models
- Empirical Validation
- Summary
As the dimensionality of the input stimulus increases, fear generalization curves retain their sigmoidal shape, regardless of the degree of pattern separation in the model. Finally, our learning rules are implemented online based on errors computed elsewhere in the network. The study of fear conditioning in AMY has a long history of theoretical and computational work.
Conclusion
In our simulations, we presented the US simultaneously with the CS; the simultaneous presence of these signals was required for learning in our model. Our model also does not show effects such as secondary conditioning (pairing the US with CS1, then pairing the CS2 with CS1, will induce a fear response to CS2), spontaneous recovery (forgetting contextual safety associations is slowly forgotten over time), reinstatement (presentation of the US in CTX- eliminates contextual safety associations) or rapid retrieval (replay of previously paired CS-US stimuli leads to faster learning than a new CS-US pair). Our model learns that in CTX-, when the CS is present but the US is not, that CTX- is safe.
Introduction
To study the biological foundations of the human brain and design biologically inspired cognitive algorithms, we need models that integrate biophysical details and cognitive abilities. Our goal is to show that osNEF can be used to construct different functional neural networks from different biologically detailed components. Second, to demonstrate a concrete cognitive application, we construct a biologically detailed model of working memory in the PFC that performs an idealized memory task.
Oracle-Supervised NEF
- Target Tuning Curves
- Online Learning Rule for Encoders and Decoders
- Optimizing Synaptic Time Constants
- Neuron Models
In the oracle flow, x(t) passes through filters (rectangles), which convolve (t) with a filter h(t) and nodes (diamonds), where state space transformations are applied. 3.2, the input signal drives a population of pre-neurons according to the standard NEF encoding: x(t) is converted to a synaptic current that drives the dynamics in the neuron model. When smoothing spikes for the purpose of encoder learning (Eq. 3.2) or pre-pop synapsing, the choice of time constant makes little difference, as long as it sufficiently smooths the spike noise (eg, tr >10 ms ).
Results
- Representation
- Computation
- Application
The network architecture is shown in Fig.3.5; the target function is calculated between detailed neuron populations pop1 and pop2. Errors are higher than in Fig.3.6, as expected given the greater difficulty of the calculation. By doing this, we effectively "unroll" the iteration, but still use osNEF to train network parameters given a dynamic input signal.
Discussion
- Biological Plausibility
- Cognitive Capacity
- Usability
- Comparison to Other Methods
In the full-FORCE method [49], the recurrent activities of a neural network are trained using a parallel target generation network driven by the desired output of the system. In their paper, the authors show that a recursive least-squares optimization process, which compares the target activities with the activities of the task executing network, can be used to train recurrent weights in the posterior network and reproduce a wide variety of dynamics. These NEF-style techniques can be used to construct models that encode rolling windows of input history, and the resulting neural activities closely resemble the memory responses of temporal cells in the cortex [249].
Conclusion
Author's note: some of the content in this chapter was previously published as a journal article in the Proceedings of the 42nd Annual Conference of the Cognitive Science Society [55].
Introduction
Many natural and artificial DM tasks have some form of SAT, so it is not surprising that the brain's DM systems have evolved to accommodate this trade-off. Urgency affects DM if an agent is rewarded for acting quickly, which can occur if the size of the reward depends on the decision time, or if multiple actions (and associated rewards) are allowed within a fixed time window. It monitors the accumulated evidence and the passage of time, which together control a gate that inhibits action selection until the model's decision criteria are met.
Background
- Cognitive Tasks
- Neuroanatomy
- Theoretical Models
Finally, sequential sampling tasks are largely decoupled from sensory processing, meaning that individual differences in perceptual abilities do not confound the analysis of the SAT. Numerous extensions of the DDM have sought to increase its cognitive and neural realism, for example by capturing the effects of urgency and uncertainty. Other theoretical models provide an alternative account of DM that addresses some of the weaknesses of DDM.
Model
Then the gate receives an input based on the elapsed time t in the current trial; this linearly increasing signal is multiplied by the constant −wtime, which represents the strength of the time urgency. First, the value passed from the accumulator to the action is based on the signed difference between the estimates of the integrated value: this value must exceed a dynamic threshold imposed by the gate before the action is activated and makes a selection. Second, the dynamic threshold calculated after the transition is affected by the confidence of the choice, which depends on the absolute difference between the estimates of the integrated values: when wdelta >0, large differences in the collected evidence will thus increase for faster decisions.
Results
- Dynamics
- Individual Behavior
- Speed Accuracy Tradeoff
The accumulator maintains an accurate estimate of the integrated value over time (blue and orange lines, left panel), and the accumulator–action link calculates the signed difference between these estimates (pink line, right panel). Percentages of responses for each histogram bin are shown on the y-axis, and mean accuracies are given in the plot legends. For each row, we created a population of agents with different values of the specified variable, and then plotted their average number of characters sampled against their average accuracy (y-axis).
Discussion
In terms of cognitive realism, our model included various mechanisms that dynamically regulate the speed and accuracy of DM. In this chapter, we repeatedly showed that each of the parameters of our model contributed independently to the SAT, both in the general case and for the optimized agents. However, we did not investigate whether simpler versions of our model could adequately reproduce the trends we identified in the human data.
Conclusion
Finally, there are some behavioral implications from empirical studies that would be worth examining using our model. The structure of our model should take into account different mechanisms related to cognitive bias: in fact, we have already simulated a value bias mechanism via the S parameter. Overall, using our model to investigate the idiosyncratic properties of human DM and test the hypothesis about their neural bases is worthwhile.
Introduction
In many ways, the NEF network we present in this chapter is the culmination of the models we have developed in previous chapters. We begin this chapter by reviewing the mathematics of RL and related functional neuroanatomy. To gain a broader understanding of how RL can be realized in cognitive systems, we design and simulate three classes of agents using three different cognitive frameworks: deep neural networks, ACT-R, and NEF.
Background
- Reinforcement Learning
- Q-learning
- Social Value Orientation and the Trust Game
It then estimates Q(s, a) by combining R(s, a) with an estimate of the value of possible future states, Q(s′, a∗), where a∗ is any available action in the new state p'. Human behavior in TG has been widely studied from several perspectives, including personal and cultural analyzes [115], neural and cognitive analyzes [47], determinants of prosocial tendencies [236], and more. Behavior in TG has a clear prosocial component distinct from maximizing personal rewards: while investor behavior is most closely correlated with repayment expectations and perceived trustworthiness, trustee behavior is.
Methods
- Social Value Orientation and the Reward Function
- Deep Q-Network
- Instance-Based Learner
- Neural Engineering Framework Agent
- Human Experiment
- Participant SVO
- Simulated Opponents
In TG, the state contains two pieces of information: the current round of the game (1-5) and the number of coins available (0-30). To represent the state of the environment in our network, we translate our external input (round number and available coins) into a special high-dimensional vector called a spatial semantic pointer (SSP). During phase one, the agent perceives the current state of the environment s′ and chooses an action a′.
Empirical Results
- Learning Trajectories
- Eliminating Non-learners
- Final Strategies
Our first dependent variable is score, or the number of coins participants earned in each round of the game (0–30 coins). Each plot shows the distribution of generosity of the indicated population of participants in the last three matches of the TG experiment. We only see differences in two of the conditions: prosocial participants are significantly more generous when playing against a greedy investor and against a generous trustee.
Simulated Results
- RL Agents Learn Optimal Strategies
- Learning Trajectories under cognitive constraints
- High-level Trends
First, we compared the generosity of proself and prosocial agents in the last three games of the experiment. Using this procedure, we were able to find population parameters that better reproduced the distribution of human behavior in the final games. As expected, proself agents were more likely to keep all available coins in the final round for a larger payoff, while prosocial agents were more likely to continue their generous behavior in the final round.
Discussion
- Summary of Contributions
- Summary of Limitations and Future Work
This agent is simply a neural network that evaluates candidate actions based on the current state of the environment as input. The complexity and plausibility of the NEF agent introduced many challenges that need to be addressed in future work. First, our agents represented the state of the world by encoding the current round of the TG and the number of coins available.
Conclusion
Although we have argued that our architectures and representations are sufficiently general to apply to other forms of social cognition, these claims need to be tested. Then we need to teach our officers to play other social games, like the Prisoner's Dilemma. Finally, we want to train our agents to play against other learning agents in tournament-style competitions, and apply the techniques of multi-agent reinforcement learning to study how social norms evolve in a society full of independent, cognitively plausible learners.