Étude in algorithmic neuroscience
May 31, 2021
I think every systems neuroscientist should read and think about A distributional code for value in dopamine-based reinforcement learning.Dabney, W., Kurth-Nelson, Z., Uchida, N. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).) The result itself is intresting – dopamine neurons don’t just compute reward prediction error, they also compute reward quantile “prediction error.” But that’s not the main reason why I think you should read this paper. You should read this paper because it serves as an excellent case of what I call algorithmic neuroscience, with takeaways that apply to systems neuroscience at large.
Summary of main result
Let’s first briefly discuss the main result.If you are already familiar with the paper, feel free to skip this section and possibly the next. For a lengthier layperson summary of the paper, here’s a summary I wrote for Dave Freedman’s undergraduate systems neuroscience class taught in Spring 2020. The style of that summary is something you might see in Quanta Magazine, but not as good. The traditional reward prediction error (RPE) theory of dopamine states that VTA dopamine neurons compute RPE, which supports learning to predict rewards via something like a temporal-difference (TD) algorithm. For simplicity, assume that we want to assign a value to a single state \(x\) (e.g. a bowl of ice cream). Let \(V\) be the value we assign to \(x\) (the predicted reward), \(R\) be the reward we actually receive from \(x\), and \(\delta\) denote the RPE (the difference between the received and predicted reward). For each exposure to the state \(x\), the TD algorithm computes the RPE \(\delta=R-V\) and makes the update \(V \gets V + \alpha \cdot \delta\) with learning rate \(\alpha>0\). The main testable prediction of the RPE theory is that dopamine neurons signal \(\delta\) (the RPE). Indeed, the population-averaged firing rates of VTA dopamine neurons are consistent with this interpretation. The average firing rate is below baseline when \(\delta < 0\) and above baseline when \(\delta > 0\).Glimcher, P. W. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl Acad. Sci. USA 108, 15647–15654 (2011) Also notice that \(V\) “stabilises” when \(E(\delta) = 0\), i.e. when \(V=E(R)\). So if the brain implements something like TD learning, then the brain learns the means of reward distributions.
Inspired by recent progress in distributional reinforcement learningBellemare, M. G., Dabney, W., & Munos, R. A distributional perspective on reinforcement learning. In International Conference on Machine Learning (eds Precup, D. & The, Y. W.) 449–458 (2017). , the paper posits a revisional theory which we call the distributional theory of dopamine. This new theory states that dopamine neurons support learning something analagous to quantiles – a much richer description of reward distributions.The paper states the theory in terms of quantiles for expository purposes, but actually uses expectiles for analysis. The quantiles are learned via a modified TD algorithm. Let \(\tau_i\) be the estimate of the \(i\)‘th quantile, \(R\) be the received reward, and \(\delta_i\) be the “error.” For each exposure to \(x\), we compute the “error” \(\delta_i = R - \tau_i\) and make an update depending on the sign of the error. If \(\delta_i > 0\), \(\tau_i \gets \tau_i + \alpha^+_i \cdot \text{sign}(\delta_i)\); if \(\delta_i<0\), \(\tau_i \gets \tau_i + \alpha^-_i \cdot \text{sign}(\delta_i)\). Here \(\alpha^+_i\) and \(\alpha^-_i\) are chosen in accordance with the quantile you want to estimate. For example, you would choose \(\alpha^+_i=3\) and \(\alpha^-_i=1\) to estimate the upper quartile \(\tau_{0.75}\), since three in four rewards should fall below the upper quartile. More generally, you choose \(\alpha\) obeying \(\frac{\alpha^+_i}{\alpha^+_i+\alpha^-_i}\) to estimate that quantile. The distributional theory claims that an individual dopamine neuron encodes a particular weighted RPE \(\alpha^{+/-}_i\cdot \delta_i\) for a particular quantile \(\tau_i\).
How to test an algorithmic theory
Here’s where things get interesting as a case study of algorithmic neuroscience. We have an algorithmic theory – the brain computes quantiles via (something like) the modified TD learning algorithm. How do we test an algorithmic theory? Like any theory, we need testable predictions. For an algorithmic theory, these predictions take the form of intermediate computations. If an algorithm computes \(x\), and we can measure \(x\) in the brain, then this is positive evidence that the brain implements that algorithm (or an algorithm like it).
For the distributional theory of dopamine, the authors infer four main predictions. The first is that dopamine neurons should demonstrate a diversity of reversal points, the received reward at which they switch from above baseline rates to below baseline rates. This corresponds to different dopamine neurons computing different “errors” \(\delta_i = R - \tau_i\) for different quantiles \(\tau_i\).
The second prediction is that there should exist dopamine neurons with asymmetric tuning curves centered around their reversal point. This is because unless you’re learning the median, you need to choose \(\alpha^+ \neq \alpha^-\).
The third prediction ties together predictions one and two. That is, the slopes of the asymmetric tuning curve should predict the reversal point of the dopamine neuron. This is due to the fact that we choose \(\alpha\) according to \(\frac{\alpha^+_i}{\alpha^+_i+\alpha^-_i}\) to estimate that quantile.
The fourth and final prediction is that GABA-ergic neurons, thought to represent reward predictions, should have varying degrees of “optimism” corresponding to different quantiles. That is, they compute the different quantiles \(\tau_i\) where anything over the median is an “optimistic” prediction and anything under the median is a “pessimistic” prediction.
To test these four predictions, the authorsThe data was originally collected for the paper Arithmetic and local circuitry underlying dopamine prediction errors by Neir Eshel and Uchida. ran a series of experiments where they make extracellular recordings of VTA dopamine and GABA-ergic neurons in mice while they were given a variable reward of water (one experiment varied probability of reward, while another varied quantity of reward). Their experimental measurements verified the four predictions described above (see the paper for figures and details).
Prediction three – that dopamine neurons have asymmetric tuning curves which predict their reversal points – seems particularly important. Prediction one alone – a diversity of reversal points – could be attributable to measurement noise, or noise in the neural system itself. On the other hand, prediction three suggests that the diversity of reversal points means something. Specifically, it means a diversity of quantiles.
Why did we not see it at first?
The authors ask the same question. I’ll quote their discussion in full:
It is worth emphasizing that none of the effects we have reported are anticipated by the standard RPE theory of dopamine, which implies that all dopamine neurons should transmit essentially the same RPE signal. Why have the present effects not been observed before? In some cases, relevant data have been hiding in plain sight. For example, a number of studies have reported marked variability in the relative magnitude of positive and negative RPEs across dopamine neurons; however, they have treated this as an incidental finding or a reflection of measurement error, or viewed it as a problem for the RPE theory [17]. One of the earliest studies of reward-probability coding in dopaminergic RPEs remarked on apparent diversity across dopamine neurons, but only in a footnote [18]. A more general issue is that the forms of variability we have reported are masked by traditional analysis techniques, which typically focus on average responses across dopamine neurons.
There are two main takeaways from this discussion. The first is that we should always be mindful data of how data is processed, and how that takes us away from the “true” underlying phenomenon. We should always ask ourselves – what can we say and what can’t we say based on what data we have and how we process it? In this case, averaging across neurons means that we lose any single neuron variability. The underlying assumption is that all dopamine neurons are computing the same thing, and that any variance is “noise.”Similarly, averaging across trials loses any notion of single trial variability. Trial averaging seems responsible for the persistent activity view of working memory, which is now being reevaluated. One alternative theory of working memory maintenace posits that short-term synpatic plasticity maintains information, while being periodically refreshed by bursty spiking as needed. Of course, if you average many trials of slightly offset bursty spiking, you’ll end up with diffuse persistent activity.
The second takeaway is that unexplained variance looks like noise. This underlines the importance of generating ambitious theories. So where do you find ambitious theories for algorithmic neuroscience? The distributional theory of dopamine was directly inspired by recent work in distributional reinforcement learning from the artificial intelligence community.Bellemare, M. G., Dabney, W., & Munos, R. A distributional perspective on reinforcement learning. In International Conference on Machine Learning (eds Precup, D. & The, Y. W.) 449–458 (2017). More generally, I think it’s a good idea to first ask WWHD – what would a human do? Think about how humans learn reward distributions. We certainly have the notion of risk, and this alone tells you that we learn more than just the means of reward distributions. After asking WWHD, then ask WWAID – what would an AI do? Asking WWAID forces you to consider concrete algorithmic details, with concrete testable predictions.
Future work
So where do we go from here? On the neuroscience side, we should look for more algorithmic theories from artificial intelligence and test them. Theories at the algorithmic level – theories about what is computed – most readily translate from artificial intelligence to neuroscience. Lower level theories, say about circuit mechanisms, translate less directly because biological neural systems have building blocks and constraints that differ from Python, Numpy, and TensorFlow. Still, it makes sense to use AI modelling to develop lower level theories. After all, AI models are theories where the proof is in the behavior.
On the artificial intelligence side, we should develop interpretability tools for discovering learned algorithms in artifical neural networks. This would allow us to discover algorithms, not just invent them. These discovered algorithms can then become candidate theories for algorithmic neuroscience. I quote from Zoom In:
Just as the early microscope hinted at a new world of cells and microorganisms, visualizations of artificial neural networks have revealed tantalizing hints and glimpses of a rich inner world within our models. This has led us to wonder: Is it possible that deep learning is at a similar, albeit more modest, transition point?
At the intersection of the neuroscience and AI, we should build agents with naturalistic cognitive abilities that act in naturalistic environments. That is, build artificial intelligence agents that model natural intelligence. We should also build agents with “biologically-realistic” building blocks and constraints, in order to develop lower-level theories.
All in all, it’s an exciting time for systems neuroscience. Advances in neural recording means that we’ll have more and better data. The emergence of artificial intelligence means that we have a well of exciting theories, at least on the algorithmic level. Let us go forth, then, and do great work!
Email me your thoughts! I’d love to discuss this topic with you. And thanks to the authors for a very interesting paper.