Computational Neuroscience - University of Washington

(These are my own notes from Coursera's "Computational Neuroscience", as taught by the University of Washington. I'm really enjoying the lecture contents, challenging assignments, level of mathematical treatment, and the otherwise-rare feeling of superlative transcendence that comes with science. It's almost like half the math I learned in college had been dormant, waiting for a non-math professor who knew how to apply it to their subject. This is also the last properly capitalized paragraph you will see on this page. WIP.)

• overview

goal of computational/theoretical neuroscience: explaining the mind (mostly behaviour at this point) in terms of brain computation.

computational/theoretical neuroscience: tools and methods for:

characterizing what nervous systems do
determining how they achieve it
understanding why they operate like that

kinds of models in theoretical neuroscience by purpose:

descriptive (what): quantitative description of stimuli encoding and decoding
mechanistic (how): detailed simulation of neurons, networks, etc.
interpretive/normative (why): purpose (function) and underlying principles behind function.

• types of models

• descriptive

example: in Hubel and Weisel's discoveries in the primary visual field, the frequency at which action potentials occur in single-cell recordings is a function of place and orientation of the stimulus:

$f r e q = f (o r i e n t a t i o n, p l a c e)$ (TeX formula: freq = f(orientation, place) )

receptive field: specific properties of a sensory stimulus that maximize response.

retinal ganglion (efferent) cells and thalamus' Lateral Geniculate Nucleus. center-surround receptive fields only, i.e:

        -    +
       -+-  +-+
        -    +

V1 simple cells. oriented receptive field, e.g:

     --------    +
    -++++++++-  +-+
     --------    +-+
                  +-+
                    +

• mechanistic

V1 simple cell model suggested by Hubel & Wiesel (doesn't take recurrent input into account):

LGN cell 1 ----\
LGN cell 2 -----|--> V1 simple cell
...
LGN cell n ----/

• interpretive

why are receptive fields in V1's simple cells shaped that way? what is their diversity for?

efficient coding hypothesis: evolution favours representing images as faithfully and efficiently as possible.

on V1 simple cells, given image $I$ (TeX formula: I) and receptive fields $R F_{i}$ (TeX formula: RF_i) , we can try to reconstruct it ( $\hat{I}$ $(TeX formula: \hat{I})$ ) using neural response strenghts (multipliers) $r_1, r_2, ...$ (TeX formula: r_1, r_2,
...) like this:

$\hat{I} = \sum_{i} R F_{i} * r_{i}$ $(TeX formula: \hat{I} = ∑_i RF_i * r_i )$

question: what are the $R F_{i}$ (TeX formula: RF_i) that minimize squared pixel-wise errors between $I$ (TeX formula: I) and $\hat{I}$ $(TeX formula: \hat{I})$ , and are as independent as possible?

start with random $R F_{i}$
run an efficient coding algorithm (sparese coding, Independent Coding Analysis, predictive coding)

• neurobiology

• neuron

Some types of neurons:

pyramidal: most common type, cortical building block
Purkinje: multi-level/branching dendrites
ganglion retinal cells, amacrine cells

neuron doctrine:

functional unit of the nervous system
neurons are (mostly) discrete
signals travel from dendrines to axon

simple idealized descriptive model:

$f = {\begin{matrix} 1 & \Leftrightarrow \sum_{i} s y n a p t i c_s t r e n g t h_{i} * x_{i} > t h r e s h o l d \\ 0 & o t h e r w i s e \end{matrix}$ $(TeX formula: f = \left\{ \begin{array}{cc} 1 & ⇔ ∑_i synaptic\_strength_i * x_i > threshold \\ 0 & otherwise \\ \end{array} \right. )$

where $s y n a p t i c_s t r e n g t h \in ℝ$ $(TeX formula: synaptic\_strength ∈ ℝ)$ .

ion channels are variously gated, (like transistors):

voltage-gated (membrane, gap junctions)
chemically-gated (neuroreceptors at post-synaptic cleft, smell-taste sensors)
mechanically-gated

• synapse

synapse: connection or junction between two neurons.

electrical:¹
- uses gap junctions (shared channels)
- fast. ideal for synchronization between neurons, sympathetic reactions
chemical:
- uses neurotransmitters
- thought to be more plastic, because post-synaptic cell can independently change its dendritic membrane depending on input, history, etc. hypothetically ideal for learning.

synapse doctrine: synapses are the basis for learning and memory

• plasticity

hebbian: if neuron A repeatedly fires to neuron B, then their synaptic strength is strengthened. neurons that fire together, wire together.
```
    before: A_{t1} ---||||--- *.01 ---> B_{t1}
    after:  A_{t2} ---------- *.1  ---> B_{t2}
```
LTP/LTD:
- experimentally observed change in synaptic strength
- lasts a few hours or days
- depends on relative timing of input and output spikes
if pre predicts post, post increases the strength (the smaller the delta, the more excitatory the synapse). if pre fires after post, post reduces the strength (the smaller the delta, the more inhibitory the synapse):

$Δ s t r e n g t h = 1 / Δ t i m i n g$ (TeX formula: Δstrength = 1 / Δtiming )

• nervous system

PNS
- somatic: modal afferent and voluntary efferent
- autonomic
- sympathetic: fight or flight
- parasympathetic: sleep or breed
- enteric: digestion
CNS:
- spinal chord
- encephalon/brain
- rhombo/hindbrain
  - medulla oblongata (bulbo raquídeo): breathing, muscle tone, hemobarycs
  - pons: connects to cerebellum. sleep and arousal
  - cerebellum: equilibrium, PNS-somatic calibration, language?, attention?
- mesen/mid
  - visual and auditory reflexes, saccadic movement
  - reticular formation: reflexes, pain, breathing, sleep and arousal
- thalamus
  - relay station for everything modal except smell
- procen/cerebrum
  - basal ganglia
  - hyppocampus
  - amygdala
  - cortex: all of higher cognition, subjectivity

• cortex

6 layers of neurons, relatively uniform in structure. e.g.:

layer	function
1	input from "higher" receptive fields
2	output to "
3	"
4	input from subcortical regions
5	output to " "
6	"

• description: encoding

recording techniques:

fMRI: records aggregate changes in magnetic field resulting from hemodynamics
- accuracy: millions of neurons, varies with voxel resolution
- time scale: seconds
- non-invasive
- in full 3D
- great for differential localisation (dissociation) of function localisation
EEG: records aggregate changes in electric field resulting from neural activity itself
- accuracy: rather noisy. confined to skull surface
- time scale: faster than fMRI
electrode array (single-unit recording(s)): roughly targets single neurons
calcium imaging: single neurons too, based on fluorescence changes resulting from Ca binding.
patch clamp: inserts into cell
- maximum accuracy. requires less amplification
- good for mechanistic models of single cells, ions, pumps, etc.

raster plot: spike plots successively stacked

encoding: from stimulus parameters to pattern of responses

$P (r e s p o n s e | s t i m u l u s)$ (TeX formula: P(response | stimulus) )

decoding: working back stimulus parameters from response

$P (s t i m u l u s | r e s p o n s e)$ (TeX formula: P(stimulus | response) )

examples:

in V1 oriented (angle = θ) receptive fields: $H z = G a u s s i a n (θ)$
motor cortex: $H z = c o s (θ)$

functional maps: the feature selectivity of adjacent tissue can be represented with an spatial field, where each point is associated with the values of an stimulus that maximise its response (i.e. it's receptive field)

one map per parameter: position, orientation, grammatical categories, images of both Brad Pitt and Jennifer Aniston, the concept of Dracula: anything related to him (image, audio, words), etc.

even worse: higher-order representations can feed back into the stimuli upstream, shaping the way things are perceived even at very rudimentary levels.

• simple models

$P (r (t) | s)$ (TeX formula: P(r(t) | s) ) where $r (t) = r e s p o n s e$ (TeX formula: r(t) = response) , $s = s t i m u l u s$ (TeX formula: s = stimulus) .

• linear

most recent stimulus only:

$r (t) = f s (t)$ (TeX formula: r(t) = f s(t) )

where the f is synaptic strength.

or to account for a short delay from stimulus to response:

$r (t) = f s (t - τ)$ (TeX formula: r(t) = f s(t - τ) )

linear with temporal filtering:

a more realistic model will depend on a linear combination of many recent inputs:

$r (t) = \sum_{k = 0}^{n} f_{k} s_{t - k}$ $(TeX formula: r(t) = ∑_{k=0}^{n} f_k s_{t-k} )$

note how constant $f$ (TeX formula: f) turned into weight function $f$ .

in infinitesimal form:

$r (t) = \int_{- \infty}^{t} f (τ) s (t - τ) d τ$ $(TeX formula: r(t) = ∫_{-∞}^{t} f(τ) s(t-τ) \; dτ )$

this is a convolution of $f (τ)$ (TeX formula: f(τ)) and $s (t - τ)$ (TeX formula: s(t-τ)) , where the former acts as the linear filter.

leaky filter/integrator:

$f (τ)$ decreases the more you look into past synapses, usually exponentially.

example: linear filtering applied to center-surround visual fields, where distance in time is replaced by distance in space. the discrete model thus becomes:

$r (x, y) = \sum_{x' = - n, y' = - n}^{n} f (x', y') s (x - x', y - y')$ $(TeX formula: r(x, y) = ∑_{x'=-n, y'=-n}^{n} f(x', y') s(x - x', y - y') )$

in addition to leaking, we can think of the phenomenon of lateral inhibition in retinal ganglionar cells to construct our filter $f$ (TeX formula: f) . this is usually approximated as a difference of a normal distribution and another negative, shallower gaussian. like Sobel's filter this is an edge detector.

• non-linear activation function

finally, we can think of an extra activation function that will map the filter convolution to the actual firing rate:

s(t) -> f(τ) -> g() -> r(t)

$r (t) = g (\int f (t - τ) f (τ) d τ);$ $(TeX formula: r(t) = g\left( ∫ f(t-τ) f(τ) \; dτ \right); )$ where $g ()$ (TeX formula: g()) is the activation function (also called input/output function, or transfer function in ANNs), usually with sigmoid shape.

• dimensionality reduction

or how to regress a good filter. as seen in the last example, the stimulus could easily contain an intractable number of parameters.

• reverse correlation: spike-triggered average

convert $s (t)$ to discrete vector: a function of a single variable can be thought of as a single point in space:
each application (an instant in the spike trail) is a dimension.
the value of that component is the value of that function/stimulus at that point/time:

$s (t) = (\begin{matrix} s_{t 1} \\ s_{t 2} \\ s_{t 3} \\ . . . \end{matrix}) ⊢ P (r | s_{1}, s_{2}, . . ., s_{n})$ $(TeX formula: s(t) = \begin{pmatrix} s_{t1} \\ s_{t2} \\ s_{t3} \\ ... \end{pmatrix} ⊢ P(r | s_1, s_2, ..., s_n) )$

one method uses Gaussian white randomness as $s (t)$ to probe a reaction, and approximate what is common among trials.
a random prior stimuli distribution is constructed from multiple such trials of decomposed random $s (t)$ 's:

per the central limit theorem, this distribution of stimuli will also be gaussian at each dimension, at whatever new dimensions are chosen from a basis change/linear combination; and indeed at the whole space (picture a normal distribution on a plane, or on 3D).

$s 1 (t) = (\begin{matrix} n r a n d {()}_{t 1} & n r a n d {()}_{t 2} & n r a n d {()}_{t 3} & . . . \end{matrix})$ $(TeX formula: s1(t) = \begin{pmatrix} nrand()_{t1} & nrand()_{t2} & nrand()_{t3} & ... \end{pmatrix} )$

$s 2 (t) = (\begin{matrix} n r a n d {()}_{t 1} & n r a n d {()}_{t 2} & n r a n d {()}_{t 3} & . . . \end{matrix})$ $(TeX formula: s2(t) = \begin{pmatrix} nrand()_{t1} & nrand()_{t2} & nrand()_{t3} & ... \end{pmatrix} )$

$s 3 (t) = (\begin{matrix} n r a n d {()}_{t 1} & n r a n d {()}_{t 2} & n r a n d {()}_{t 3} & . . . \end{matrix})$ $(TeX formula: s3(t) = \begin{pmatrix} nrand()_{t1} & nrand()_{t2} & nrand()_{t3} & ... \end{pmatrix} )$

$. . .$ (TeX formula: ... )

pick the cluster of points that caused $r (t)$ to fire, called the spike-conditional distribution
and calculate their spike-triggered average point.
then pick some dimension that passes through that point and project the original spike-conditional distribution onto them.
the spike-triggered average vector is then taken to be a unit vector. $a v g (s)$ is our weight filter/feature detector/convolution $f (τ)$ .

geometrically, it is also a projection. so

$r (t) = \int_{- \infty}^{t} f (τ) s (t - τ) d τ = \hat{f} \cdot \hat{s}$ $(TeX formula: r(t) = ∫_{-∞}^{t} f(τ) s(t-τ) dτ = \hat{f} · \hat{s} )$

• activation function

or how to regress a good g() for our f().

remember that $g ()$ (TeX formula: g()) is the last step in producing $r (s)$ (TeX formula: r(s)) , and that encoding $r (s)$ in its most general form amounts to calculating $P (s p i k e | s t i m u l u s)$ (TeX formula: P(spike|stimulus)) . so we start from the equation:

$r (s) = g () = P (s p i k e | s t i m u l u s)$ (TeX formula: r(s) = g() = P(spike|stimulus) )

$P (s p i k e | s t i m u l u s)$ (TeX formula: P(spike|stimulus)) has become $P (s p i k e | a v g (s))$ (TeX formula: P(spike|avg(s))) after simplifying the multi-dimensional stimulus. so what's the new probability under this subset?

$P (s p i k e | s') = \frac{P (s p i k e \cap s')}{P (s')}$ $(TeX formula: P(spike|s') = \frac {P(spike ∩ s')} {P(s')} )$

expanded into bayesian form:

$\frac{P (s p i k e \cap s')}{P (s')} = \frac{P (s' | s p i k e) P (s p i k e)}{P (s')}$ $(TeX formula: \frac {P(spike ∩ s')} {P(s')} = \frac {P(s'|spike) P(spike)} {P(s')} )$

$P (s')$ (TeX formula: P(s')) simply has a normal probability distribution for the white-noise experiment:

    . .
    . .
  . . . .
. . . . . .

the histogram of $P (s' | s p i k e)$ (TeX formula: P(s'|spike)) is obtained only from those point-values in the stimulus that coincide in time with a spike. let's say those are mostly peaks in the stimulus signal. the distribution would look skewed like this:

       .
      . .
   . . . .
. . . . . .

therefore, $g () = \frac{P (s' | s p i k e)}{P (s')}$ $(TeX formula: g() = \frac {P(s'|spike)} {P(s')})$ :

     ....
    .
   .
...

$P (s p i k e)$ (TeX formula: P(spike)) is just calculated empirically by counting during the random experiment.

if $P (s' | s p i k e)$ (TeX formula: P(s'|spike)) had also a normal distribution (like $P (s')$ (TeX formula: P(s')) ), the ratio of both functions (as used in Bayes' formula) would yield a constant function. i.e., the activation function would look flat. what we actually want is both distributions to be different.

• maximally informative filters (again)

could we work backwards from $s'$ (TeX formula: s')

and $s p i k e$ (TeX formula: spike) so that the distributions $P (s' | s p i k e)$ (TeX formula: P(s'|spike)) and $P (s')$ (TeX formula: P(s')) are as dissimilar as posible?

one measure for such a difference is Kullback-Leibler divergence:

$D_{K L} (P (s), Q (s)) = \int P (s) \frac{l o g (P (s))}{Q (s)} d s$ $(TeX formula: D_{KL}(P(s), Q(s)) = ∫ P(s) \frac {log(P(s))} {Q(s)} \; ds )$

an advantage of this over STA is that the prior $P (s')$ (TeX formula: P(s')) needs not be gaussian random noise. it might as well be natural stimuli.

• Generalized Linear Models

with a refractory period filter:

(s) -> (f) -> (+) -> (g) -> (Poisson) ----> r
               ^                        |
               |_(refractionary filter)_|

with coupling (neighbour neurons) filters, ephaptic transmisison, etc.

• description: decoding

• signal detection theory (hypothesis testing)

likelihood ratio test:

$1 < \frac{P (n e u r a l r e s p o n s e | o b s e r v e d b e h a v i o u r)}{P (n e u r a l r e s p o n s e | n o t (o b s e r v e d b e h a v i o u r))}$ $(TeX formula: 1 < \frac {P(neural\_response|observed\_behaviour)} {P(neural\_response|not(observed\_behaviour))} )$

Neyman-Pearson lemma: the likelihood ratio test is the most efficient statistic for any given size.

maximum lakelihood with asymmetric costs:

type 1 and type 2 errors may not cost the same. we introduce a weighting factor $L$ (TeX formula: L) :

$L o s s_{y e s} = L_{y e s} P (n o | r e s p o n s e)$ $(TeX formula: Loss_{yes} = L_{yes} P(no|response) )$

$L o s s_{n o} = L_{n o} P (y e s | r e s p o n s e)$ $(TeX formula: Loss_{no} = L_{no} P(yes|response) )$

we should clearly answer yes when $L o s s_{y e s} < L o s s_{n o}$ $(TeX formula: Loss_{yes} < Loss_{no})$ . I.e, $L_{yes} P(no|response) < L_{no} P(yes|response)$ $(TeX formula: L_{yes} P(no|response) < L_{no} P(yes|response))$ . rewriting using Bayes' rule yields:

$L_{y e s} \frac{P (n o, r)}{P (r)} < L_{n o} \frac{P (y e s, r)}{P (r)}$ $(TeX formula: L_{yes} \frac{P(no, r)}{P(r)} < L_{no} \frac{P(yes,r)}{P(r)} )$

$L_{y e s} \frac{P (r | n o) P (n o)}{P (r)} < L_{n o} \frac{P (r | y e s) P (y e s)}{P (r)}$ $(TeX formula: L_{yes} \frac{P(r|no)P(no)}{P(r)} < L_{no} \frac{P(r|yes)P(yes)}{P(r)} )$

$L_{y e s} P (r | n o) P (n o) < L_{n o} P (r | y e s) P (y e s)$ $(TeX formula: L_{yes} P(r|no) P(no) < L_{no} P(r|yes) P(yes) )$

...and arranging for the ratio of conditional distributions to obtain the likelihood ratio:

$\frac{P (r | y e s)}{P (r | n o)} > \frac{L_{y e s} P (n o)}{L_{n o} P (y e s)}$ $(TeX formula: \frac {P(r|yes)} {P(r|no)} > \frac {L_{yes}P(no)} {L_{no} P(yes)} )$

• population coding

methods for decoding from populations of neurons:

population vector
bayesian inference
- maximum likelihood
- maximum a posteriori

• population vector

example: cricket cercal cells in the antenna-like surcus structures in the abdomen are sensitive to wind velocity.

how does the group as a whole communicate wind velocity to the animal?

there are 4 overlapping receptive field distributions, corresponding to the 4 directions relative to the body:

f/r_max
       -    |    -    |
      ---  |||  ---  |||
    -----++|||++---++|||||
      45   135  225  315    degrees

given wind direction (in degrees) $s$ (TeX formula: s) and a neuron's preferred direction $s_{a}$ (TeX formula: s_a) , response looks like: (after normalising for $r_{m a x}$ $(TeX formula: r_{max})$ , because some neurons have an intrinsically higher firing rate, even for equally maximal stimuli)

${(\frac{f (s)}{r_{m a x}})}_{a} = {[c o s (s - s_{a})]}_{+}$ $(TeX formula: \left( \frac {f(s)} {r_{max}} \right)_a = [cos(s - s_a)]_+ )$

or in terms of vectors $v$ (TeX formula: v) (wind vector) and $c_{a}$ (TeX formula: c_a) (neuron's preferred vector):

${(\frac{f (s)}{r_{m a x}})}_{a} = {[v \cdot c_{a}]}_{+}$ $(TeX formula: \left( \frac {f(s)} {r_{max}} \right)_a = [v · c_a]_+ )$

then, the population vector is calculated as:

$v_{p o p} = \sum_{a = 1}^{4} {(\frac{r}{r_{m a x}})}_{a} c_{a}$ $(TeX formula: v_{pop} = ∑_{a=1}^{4} \left( \frac{r}{r_{max}} \right)_a c_a )$

note how 4 orthogonal vectors are used, even though 2 in linear combinations would suffice to generate the whole plane. this is due to the lack of negative firing rates to represent wind in the opposite direction, or its component along that direction.

this method is prevalent in brain-machine interfaces targeting the M1 area.

for a sufficiently large number of directions:

$v_{p o p} = \sum_{a = 1}^{N} (v \cdot c_{a}) c_{a}$ $(TeX formula: v_{pop} = ∑_{a=1}^{N} (v·c_a) c_a )$

• bayesian inference

more general than population vector
takes more information into account
less vulnerable to noise

recall Bayes' rule for stimulus $s$ (TeX formula: s) and response $r$ (TeX formula: r) :

$P (s | r) = \frac{P (r | s) P (s)}{P (r)}$ $(TeX formula: P(s|r) = \frac {P(r|s)P(s)} {P(r)} )$

we could also rewrite the marginal distribution $P (r)$ (TeX formula: P(r)) to yield:

$P (s | r) = \frac{P (r | s) P (s)}{\int d s P (r | s) P (s)}$ $(TeX formula: P(s|r) = \frac {P(r|s)P(s)} {∫ \; ds P(r|s)P(s)} )$

maximum likelihood distribution: this bayesian decoding strategy tries to find the stimulus which maximizes the likelihood conditional distribution $P (r | s)$ .
maximum a posteriori distribution: this decoding strategy tries to find the stimulus which maximizes the a posteriori (LHS) conditional distribution $P (s | r)$ .

MAPD is influenced by the prior $P (s)$ (TeX formula: P(s)) , because all factors in Bayes equation are taken into accout.

example: imagine a population of neurons that encode stimulus/parameter $s$ (TeX formula: s) , with response $r = f (s)$ (TeX formula: r = f(s)) a Gaussian probability distribution, and mean/maximum at $s_{1}$ (TeX formula: s_1) .

meanwhile, we have each individual neuron's Gaussian responses scattered along $s$ (TeX formula: s) and overlapping each other:

r=f(s)
       -    |    -    |    -
      ---  |||  ---  |||  ---
    -----++|||++---++|||++-----
      n1   n2   s_1  n3    nn     stimulus

assume the following:

each neuron fires independently
variability in firing is due to a Poisson process, so $v a r i a n c e = m e a n f i r i n g r a t e r$ $(TeX formula: variance = mean\_firing\_rate r)$
the distribution of distributions is not gaussian, but uniform. i.e., there's plenty of single-neuron distributions across the range of $s$ .

spikes are produced randomly and independently in each time bin, with a probability given by the instantaneous rate. from Poisson:

$P_{T} (k) = \frac{{(c T)}^{k} e^{- c T}}{k!}$ $(TeX formula: P_T(k) = \frac {(cT)^k e^{-cT}} {k!} )$

where:

$k$ is the number of occurrences for which $P_{T} ()$ is going to be calculated for time interval $T$ .
$c$ is the constant mean firing rate per time unit.
$T$ is the time interval.

substituting for the variables in our example setting, the probability of seeing $r_{a}$ (TeX formula: r_a) spikes for a single neuron $a$ (TeX formula: a) becomes:

$P (r_{a} | s) = \frac{{(f_{a} (s) T)}^{T r_{a}} e^{- f_{a} (s) T}}{(T r_{a})!}$ $(TeX formula: P(r_a|s) = \frac {(f_a(s)T)^{Tr_a} e^{-f_a(s)T}} {(Tr_a)!} )$

notice how we write the specific number of occurrences $k$ (TeX formula: k) during interval $T$ (TeX formula: T) simply as $r_{a}$ (TeX formula: r_a) times $T$ .

also, the constant $c$ (TeX formula: c) is replaced by our previous probability function $f_{a}$ (TeX formula: f_a) , and the whole equation is accordingly changed to a conditional distribution of stimulus $s$ (TeX formula: s) (that is, our conditional distribution is a bunch of Poisson formulas, one per slice at this neuron's Gaussian).

so the whole population's ( $\hat{r} = (r_{1}, r_{2}, . . ., r_{n})$ $(TeX formula: \hat{r} = (r_1, r_2, ..., r_n))$ ) distribution can be written down as the product of the independent distributions:

$P (\hat{r} | s) = \prod_{a = 1}^{N} \frac{{(f_{a} (s) T)}^{T r_{a}} e^{- f_{a} (s) T}}{(T r_{a})!}$ $(TeX formula: P(\hat{r}|s) = ∏_{a=1}^N \frac{(f_a(s)T)^{Tr_a} e^{-f_a(s)T}}{(Tr_a)!} )$

• maximum likelihood

the previous formula is often written in logarithmic form, so as to avoid rounding products and the exponential, in what is known as log-likelihood:

$l n (P (\hat{r} | s)) = \sum_{a = 1}^{N} T r_{a} l n (f_{a} (s) T) - f_{a} (s) T - l n ((T r_{a})!)$ $(TeX formula: ln(P(\hat{r}|s)) = ∑_{a=1}^N Tr_a ln(f_a(s)T) - f_a(s)T - ln((Tr_a)!) )$

which we try to maximise:

$0 = \frac{\partial}{\partial s} l n (P (\hat{r} | s)) = T \sum_{a = 1}^{N} r_{a} \frac{f' (s)}{f_{a} (s)}$ $(TeX formula: 0 = \frac{∂}{∂s}ln(P(\hat{r}|s)) = T ∑_{a=1}^N r_a \frac {f'(s)} {f_a(s)} )$

recall that $f (s)$ (TeX formula: f(s)) is the normal distribution:

$0 = \sum_{a = 1}^{N} r_{a} \frac{\frac{1}{σ_{a} \sqrt{2 π}} \frac{s - s_{a}}{σ_{a}^{2}} e^{- 1 / 2 {(\frac{s - s_{a}}{σ_{a}})}^{2}}}{\frac{1}{σ_{a} \sqrt{2 π}} e^{- 1 / 2 {(\frac{s - s_{a}}{σ_{a}})}^{2}}}$ $(TeX formula: 0 = ∑_{a=1}^N r_a \frac { \frac{1} {σ_a\sqrt{2π}} \frac{s-s_a} {σ_a^2} e^{-1/2 \left( \frac{s-s_a} {σ_a} \right)^2}} { \frac{1} {σ_a\sqrt{2π}} e^{-1/2 \left( \frac{s-s_a} {σ_a} \right)^2}} )$

$⊢ 0 = \sum_{a = 1}^{N} r_{a} \frac{s - s_{a}}{σ_{a}^{2}}$ $(TeX formula: ⊢ 0 = ∑_{a=1}^N r_a \frac{s-s_a}{σ_a^2} )$

$⊢ s_{f {(s)}_{m a x}} = \frac{\sum_{a = 1}^{N} r_{a} s_{a} / σ_{a}^{2}}{\sum_{a = 1}^{N} r_{a} / σ_{a}^{2}}$ $(TeX formula: ⊢ s_{f(s)_{max}} = \frac {∑_{a=1}^N r_a s_a/σ_a^2}{∑_{a=1}^N r_a/σ_a^2} )$

$⊢ s_{f {(s)}_{m a x}} = \frac{\sum_{a = 1}^{N} r_{a} s_{a}}{\sum_{a = 1}^{N} r_{a}}$ $(TeX formula: ⊢ s_{f(s)_{max}} = \frac {∑_{a=1}^N r_a s_a}{∑_{a=1}^N r_a} )$

think of this as an improved version of the population vector $v_{p o p} = \sum_{a = 1}^{N} (v \cdot c_{a}) c_{a}$ $(TeX formula: v_{pop} = ∑_{a=1}^{N} (v·c_a) c_a)$ , because each neuron's contribution is weighted against its variance $σ_{a}^{2}$ (TeX formula: σ_a^2) , more informative neurons (less spread-out distribution) will contribute more.

• maximum a posteriori

in log-a-posteriori form for a single neuron:

$l n (P (s | r)) = l n (P (r | s)) + l n (P (s)) - l n (P (r))$ (TeX formula: ln(P(s|r)) = ln(P(r|s)) + ln(P(s)) - ln(P(r)) )

and for the whole population:

$l n (P (s | r)) = T \sum_{a = 1}^{N} r_{a} l n (f_{a} (s)) + l n (P (s)) + . . .$ $(TeX formula: ln(P(s|r)) = T ∑_{a=1}^N r_aln(f_a(s)) + ln(P(s)) + ... )$

optimize setting the derivative to 0:

$0 = \sum_{a = 1}^{N} r_{a} \frac{f' (s)}{f (s)} \frac{P' (s)}{P (s)}$ $(TeX formula: 0 = ∑_{a=1}^N r_a \frac{f'(s)}{f(s)} \frac{P'(s)}{P(s)} )$

and after we pour in the gaussian expansions of $f (s)$ (TeX formula: f(s)) :

$s_{f {(s)}_{m a x}} = \frac{T \sum_{a = 1}^{N} \frac{r_{a} s_{a}}{σ_{a}^{2}} + \frac{s_{p r i o r}}{σ_{p r i o r}^{2}}}{T \sum_{a = 1}^{N} \frac{r_{a}}{σ_{a}^{2}} + \frac{1}{σ_{p r i o r}^{2}}}$ $(TeX formula: s_{f(s)_{max}} = \frac { T∑_{a=1}^N \frac{r_a s_a} {σ_a^2} + \frac{s_{prior}} {σ_{prior}^2} } { T∑_{a=1}^N \frac{r_a} {σ_a^2} + \frac{1}{σ_{prior}^2} } )$

if you look closer, this is no otherthan the likelihood, scaled by $T$ (TeX formula: T) and shifted by the biasing effect of the prior's deviation. in other words, the more certain the prior (the more pointy its distribution), the more the $s_{p r i o r}$ $(TeX formula: s_{prior})$ stimulus takes us away from a simple likelihood distribution.

• caveats

these methods don't incorporate rapid time variations, as they are modelled after tuning curves and mean firing rate.
don't take into account correlations in the population; i.e., we assumed independence of events. this is an open research enterprise in neuroscience.

• stimulus reconstruction

or how to extend the bayesian methods to the case where stimuli and response vary continuously in time.

suppose we want the best estimator for stimulus $s_{b a y e s}$ $(TeX formula: s_{bayes})$ . we introduce an error function, $L(s, s_{bayes})$ $(TeX formula: L(s, s_{bayes}))$ , and average it over all possible stimulus choices:

$\int d s L (s, s_{b a y e s}) P (s | r)$ $(TeX formula: ∫ ds \; L(s, s_{bayes})P(s|r) )$

which we will try to minimize. one choice of $L ()$ (TeX formula: L()) is the least squares error function:

$0 = \frac{\partial}{\partial s_{b}} \int d s {(s - s_{b})}^{2} P (s | r)$ $(TeX formula: 0 = \frac{∂}{∂s_{b}} ∫ ds (s-s_{b})^2 P(s|r) )$

$0 = 2 \int d s (s - s_{b}) P (s | r)$ $(TeX formula: 0 = 2∫ ds (s-s_{b}) P(s|r) )$

$\int d s s P (s | r) = \int d s s_{b} P (s | r)$ $(TeX formula: ∫ ds \; s P(s|r) = ∫ ds \; s_{b} P(s|r) )$

$s_{b} = \int d s P (s | r) s$ $(TeX formula: s_{b} = ∫ ds \; P(s|r)s )$

but $P (s | r)$ (TeX formula: P(s|r)) is just another way of saying "all the stimuli that triggered an action potential". in other words, this filter is just the spike triggered average.

example: famous 1999 "mind reading" of a cat's Lateral Geniculate Nucleus.²

here researchers are trying to find $s$ (TeX formula: s) that maximizes the a posteriori distribution $P(s|r) ~ P(r|s)P(s)$ (TeX formula: P(s|r) ~
P(r|s)P(s)) .

the encoding model/likelihood $P (r | s)$ (TeX formula: P(r|s)) needs to be constructed from input movies.

• case study: visual processing in the retina

rods are exquisitely sensitive to the minutest amounts of photons (<0.1% of them contribute signals). averaging would be disastrous under these conditions. how can a noisy neural array reliably detect and relay those signals?

what is the optimal readout of an array of detectors when only a small fraction is active and relevant? this question is pervasive accross attention and decision-making research.

we need a selection process/threshold to discard most sources off-the-bat, based on prior knowledge. once again, probabilistic models for noise and signal are due.

rod -->
       \
rod -->--rod bipolar--
       /              |
rod -->               |
                      v
                   amacrine
                      |
                      |
cone --> cone bipolar--------> ganglion ->

anatomically, such a threshold in non-linearity is thought to occur right at the integrating synapse from rods to rod bipolar cells. no other candidate is left after the bipolar consumes all rod output.

for similar sample sizes of noise and signal, recordings show that the threshold is biased for type-1 errors: even most true positives get rejected, unless the probability of the noise distribution is negligibly small.

P(r)
             --  ||
            ---++|||
           ---++++||||
           noise signal     Amplitude
threshold: .........^

however, under natural conditions the probability of seeing a photon is minuscule compared to the probability of seeing noise. priors matter!

P(r)
              --
             ----
            ------ |
           ------++||
           noise signal     Amplitude
threshold: ........^

• description: codec evaluation

at any given time bin in a spike train the probability of seeing an event $1$ (TeX formula: 1) is given by $P (x = 1) = p$ (TeX formula: P(x=1)=p) , and the probability of its complement is given by $P (0) = 1 - p$ (TeX formula: P(0)=1-p) . their respective information values are $- l o g_{2} (p)$ (TeX formula: -log_2(p)) and $- l o g_{2} (1 - p)$ (TeX formula: -log_2(1-p)) .

entropy counts the yes/no questions necessary to specify a variable's distribution.

the encoding capacity of a spike train to account for variability in its counterpart (ability to represent) is quantified using entropy.

theorem: uniform probability has the highest entropy of all bernoulli distributions:

$H = - \sum B (X) l o g (B (x)) = - p l o g (p) - (1 - p) l o g (1 - p)$ $(TeX formula: H = -∑ B(X) log(B(x)) = -p \; log(p) -(1-p)log(1-p) )$

$H$ (TeX formula: H) , a function of probability $p$ (TeX formula: p) , peaks at $p = 1 / 2$ (TeX formula: p=1/2) .

generally $P (x)$ (TeX formula: P(x)) is not uniform, but it would be best if it were.

• mutual information

how much of the response is encoded in the stimulus?

let $R e s p o n s e$ (TeX formula: Response) and $S t i m u l u s$ (TeX formula: Stimulus) be 2 random variables. noise entropy is defined as the entropy of either of their conditional distributions. for $S$ (TeX formula: S) we have that:

$H (R | S) = E [- l o g (P (R | S))];$ (TeX formula: H(R|S) = E[-log(P(R|S))] ;)

averaged over $S$ (TeX formula: S) . and the mutual information is given by:

$I_{m} (R; S) = H (R) - E [H (R | S)] .$ (TeX formula: I_m(R;S) = H(R) - E[H(R|S)] .)

note that $I_{m} (R; S) = I_{m} (S; R)$ (TeX formula: I_m(R;S) = I_m(S;R)) . mutual information is commutative. whatever amount of information is reduced from R by knowing S, it must be the same amount that would be reduced from S if we knew R.

example: we have 2 posible stimuli/responses (yes or no), and 2 possible errors of (presumably) equal probability $q$ (TeX formula: q) .

Confusion matrix of probabilities:

stimulus	response	no response
yes	1-q	q
no	q	1-q

let's focus on the "yes" stimulus alone. therefore:

Total entropy: H(R) = -P(response)log(P(response)) - P(no response)log(P(no response)).
Noise entropy (hits): H(R|yes) = -q log(q) - (1-q)log(1-q).
Mutual information: $I_{m} (R, y e s) = H (R) - E [H (R | y e s)]$

intuitively, this means that the greater the noise entropy, the less the stimulus encodes the response. but noise entropy is maximum when $q = 1 - q = 1 / 2$ (TeX formula: q = 1-q = 1/2) , thereby telling us that when the response is pure chance, the stimulus has no bearing representing it.

when $I_{m} (R, s) \to 0$ (TeX formula: I_m(R,s) → 0) , $P (R | S) \to P (R)$ (TeX formula: P(R|S) → P(R)) . I.e. they are independent. when $I_{m} (R, s) \to H (R)$ (TeX formula: I_m(R,s) → H(R)) , $P (R | S) \to 0$ (TeX formula: P(R|S) → 0) . I.e., response is perfectly encoded by stimulus.

graphical example with continuous distributions:

High mutual information between S and R. the total response (sum of all possible stimuli, shown as the wrapper) stems from very differentiated stimuli (low noise entropy), each contributing uniquely to explain one part of the response:

R=f(S)
       ~~ ~~~~  ~~~~ ~~~  ~~
      ~  ~    ~     ~   ~   ~
     ~ -    |    -    |    - ~
    ~ ---  |||  ---  |||  --- ~
  ~~ ----++|||++---++|||++---- ~
      s1   s2   s3   s4   s5     stimulus

Low mutual information. Same response, but it is just the sum of very noisy stimuli:

R=f(S)
       ~~ ~~~~  ~~~~ ~~~  ~~
      ~  ~    ~     ~   ~   ~
     ~                       ~
    ~ ---**||***--**||***---  ~
  ~~ --*********************** ~
      s1   s2   s3   s4   s5     stimulus

• relationship to Kullback-Leibler

if mutual information measures variable independence, then it should be equivalent to the Kullback-Leibler divergence between joint probability and the bare product of both distributions:

$I_{m} (S; R) = D_{K L} (P (R, S), P (R) P (S)) =$ $(TeX formula: I_m(S;R) = D_{KL}(P(R,S), P(R)P(S)) = )$

$- \int d R \int d S P (R) P (S) l o g (\frac{P (R, S)}{P (R) P (S)}) =$ $(TeX formula: - ∫dR∫dS \;P(R)P(S) log\left(\frac{P(R,S)}{P(R)P(S)}\right)=)$

or $\int d R \int d S P (R, S) l o g (\frac{P (R, S)}{P (R) P (S)})$ $(TeX formula: ∫dR∫dS\; P(R,S) log \left( \frac{P(R,S)}{P(R)P(S)} \right))$

let's rewrite the second version using conditionals:

$\int d R \int d S P (R, S) l o g (\frac{P (R | S) P (S)}{P (R) P (S)}) =$ $(TeX formula: ∫dR∫dS \; P(R,S)log\left(\frac{P(R|S)P(S)}{P(R)P(S)}\right)=)$

$\int d R \int d S P (R, S) l o g (\frac{P (R | S)}{P (R)}) =$ $(TeX formula: ∫dR∫dS \; P(R,S) log\left( \frac{P(R|S)}{P(R)} \right) = )$

$\int d R \int d S P (R, S) (l o g (P (R | S)) - l o g (P (R))) =$ $(TeX formula: ∫dR∫dS \; P(R,S) ( log(P(R|S)) - log(P(R)) ) = )$

$\int d R \int d S P (R, S) l o g (P (R | S)) - \int d R \int d S P (R, S) l o g (P (R)) =$ $(TeX formula: ∫dR∫dS \; P(R,S)log(P(R|S)) - ∫dR∫dS \; P(R,S)log(P(R)) = )$

$- E [H (P (R | S))] + H (P (R))$ (TeX formula: - E[H(P(R|S))] + H(P(R)) )

• spike trains

• spike patterns

so far we've only dealt with single spikes or firing rates.
what info is carried by entire patterns?
how informative are they?

from spike train to pattern code (macro-symbols):

chop response train into segments (called letters) of size $Δ t$ .
assign each of them one of 2 values based on presence/amount of contained spikes.
group them into words of length L.
plot the distribution of words, P(W).
(all-zero is a common mode, followed by words with a single one-letter spike, etc).
compute total $H (P (W))$
compute $I_{m} (W; S)$ from the responses given random Words, averaged.
rinse and repeat for different Δt and T, until $I_{m}$ is maximum, i.e., the conditional/noise contributes the least entropy because we learned a lot about W by looking at that S.

example:

from Reinagel and Reid's 2000 study on the Lateral Geniculate Nucleus' pattern code:

mutual information was ploted as a function of L and Δt.
information drops off as we get to long words, because they require increasingly larger sample sizes to estimate the distributions.
nonetheless, if a clear pattern emerges from looking at shorter word lengths, one may extrapolate the entropy curve to cases with larger words and predict if entropy would actually increase or not there.

• single spikes

how much does the observation of a single spike tell us about the stimulus? by how much does knowing that a particular stimulus occurred reduce the entropy of the response?

without knowing the stimulus, the probability of seeing a single action potential in the response can be computed from the average spike rate and Δt:

response	no response
P(r=1) = r̅Δt	P(r=0) = 1 - P(r=0)
P(r=1\|s) = r̅(t)Δt	P(r=0\|s) = 1 - P(r=1\|s)

the rates r̅ and r̅(t) are calculated empirically.

note that every time t is a sample of s, so avereraging the noise entropy might as well be done online, as opposed to averaging over stimuli. this property is known as ergodicity:

$I = H (P (r)) - E [P (r | s)] =$ (TeX formula: I = H(P(r)) - E[P(r|s)] = )

$- \overline{r} Δ t l o g (\overline{r} Δ t) - (1 - \overline{r} Δ t) l o g (1 - \overline{r} Δ t) +$ $(TeX formula: -\bar{r}Δtlog(\bar{r}Δt)-(1-\bar{r}Δt)log(1-\bar{r}Δt) + )$

$\frac{1}{T} \int_{0}^{T} d t [\overline{r} (t) Δ t l o g (\overline{r} (t) Δ t) + (1 - \overline{r} (t) Δ t) l o g (1 - \overline{r} (t) Δ t)]$ $(TeX formula: \frac{1}{T} ∫_0^T dt \; [ \bar{r}(t)Δt \; log(\bar{r}(t)Δt) + (1-\bar{r}(t)Δt)log(1-\bar{r}(t)Δt) ] )$

(...more unexplained inferences...)

therefore, the per-spike information is approximately:

$I (r, s) = \frac{1}{T} \int_{0}^{T} d t \frac{r (t)}{\overline{r}} l o g (\frac{r (t)}{\overline{r}})$ $(TeX formula: I(r,s) = \frac{1}{T} ∫_0^T dt \frac{r(t)}{\bar{r}} log\left(\frac{r(t)}{\bar{r}}\right) )$

some properties of this calculation:

when r(t) → r̅, information goes to zero. this means that the single spike was barely modulated by that particular stimulus.
when r̅ is small, information is likely to be large, because the receptive field is a very rare/informative one.
stimulus-independent (measuring information is possible in the absence of a coding/decoding model).
the spike is a meta-symbol that could be any other event composed of multiple spikes.

example: hippocampal place fields. imagine the following place field:

------------------
|          +     |
|         +++    |
|        +++++   |
|         +++    |
|          +     |
------------------

$r (t)$ (TeX formula: r(t)) will be rather large (and information about any particular stimulus low), because the receptive field is rather large and imprecise. such place cell would fire to a considerable range of positions within the room.

• neural code wrap-up: intepretive principles

what challenges do natural stimuli pose?
what does information theory suggest about the purpose of neural systems?
what principles could be at work shaping the neural code?

natural stimuli: usually posses 2 properties, regardless of modality:

huge dynamic range: variations over many orders of magnitude.
power-law scaling: structure at many scales/frequency levels. screams for Fourier analysis?

• efficient coding

noise entropy escapes our hands under natural situations.

histogram equalization: $H_{m a x} (R)$ $(TeX formula: H_{max}(R))$ is obtained by matching limited amount of outputs/symbols to the distribution of inputs $P (S)$ (TeX formula: P(S)) , so that the probability of using certain output symbol is uniform. in other words, our coding filter should be determined by the distribution of natural inputs.

this implies that the response/input-output curve is the cumulative integral of $P (S)$ (TeX formula: P(S)) .

suppose only 20 possible outputs are available. we should divide the area under $P (S)$ (TeX formula: P(S)) in 20 slices of equal area.

example: (Laughlin, 1981) measured the distribution of contrast in natural scenes $P (S)$ (TeX formula: P(S)) . he correctly predicted the empirical $P (R)$ (TeX formula: P(R)) of the fly's monopolar cells that integrate information from photoreceptors, taking the integral of the measure $P (S)$ .

• dynamical efficient coding

$P (S)$ (TeX formula: P(S)) is varying widely over time as the neural system moves from environment to environment.

open question: should a neural system optimize over evolutionary time or locally?

if the amplitude of stimulus fluctuations decreases (less variety of frequencies for the same period of time), the probability distribution of stimulus values will contain less possible states. to adjust for this change, the encoding neuron correspondingly squeezes its activation curve to best encode a poorer range of stimuli, in real time! this way the input-output curve will go back to being similar to the cumulative integral of $P (s)$ (TeX formula: P(s)) .

feature adaptation: the encoding filter vector/function under a natural distribution of stimuli needs not be the same as the one derived from gaussian noise.

maximally-informative dimensions: as mentioned in week 2, the filter is chosen to maximize $D_{K L}$ $(TeX formula: D_{KL})$ between spike-conditional and prior distributions, except that this is done on real time for changing distributions. this is equivalent to dynamically maximizing mutual information, as mentioned above.

• redundancy reduction

the efficient-coding hypothesis says that since membrane re-polarisation is costly, neural codes should be as efficient as possible.

for a population code, this means that their joint entropy should be independent. that is: $P (R_{1}, R_{2}) P (R_{1}) P (R_{2})$ (TeX formula: P(R_1, R_2) ~ P(R_1)P(R_2)) ; because it is true that $H(R_1, R_2) ≤ H(R_1) + H(R_2)$ (TeX formula: H(R_1, R_2) ≤
H(R_1) + H(R_2)) .

however, correlations have been shown to contradict the hypothesis at times:

error correction and robustness
helping in discrimination

example: neurons in the retina use redundancy.³

sparse-firing hypothesis: as few neurons as possible should be firing at any time.

suppose we have basis functions $f_{i}$ (TeX formula: f_i) and noise/error $e$ (TeX formula: e) with which to reconstruct a natural scene $I (x ⃗)$ (TeX formula: I(x⃗)) :

$I (x ⃗) = \sum_{i} a_{i} f_{i} (x ⃗) + e (x ⃗)$ (TeX formula: I(x⃗) = ∑_i a_i f_i(x⃗) + e(x⃗) )

this means choosing $f_{i}$ (TeX formula: f_i) so as to have as few coefficients $a_{i}$ (TeX formula: a_i) as possible. this is done calculating a new error $E$ (TeX formula: E) consisting of two terms: ordinary mean squares to account for fidelity, but also a cost for $a_{i}$ usage:

$E = \sum_{x ⃗} {[I (x ⃗) - \sum_{i} a_{i} f_{i} (x ⃗)]}^{2} + λ \sum_{i} | a_{i} |$ $(TeX formula: E = ∑_{x⃗} \left[ I(x⃗) - ∑_i a_i f_i(x⃗) \right]^2 + λ∑_i |a_i| )$

The vector x in the equation above represents the coordinates of a point in the image.

• quiz

#!/usr/bin/env python3
from math import *
# Suppose that we have a neuron which, in a given time period, will
# fire with probability 0.1, yielding a Bernoulli distribution for the
# neuron's firing (denoted by the random variable F = 0 or 1) with
# P(F = 1) = 0.1.
# Which of these is closest to the entropy H(F) of this distribution
# (calculated in bits, i.e., using the base 2 logarithm)?
def bernoulli_entropy(p):
    return -(p*log2(p) + (1-p)*log2(1-p))
p1 = 0.1
bernoulli_entropy(p1) # 0.4690
# Continued from Question 1:
# Now lets add a stimulus to the picture. Suppose that we think this
# neuron's activity is related to a light flashing in the eye. Let us
# say that the light is flashing in a given time period with probability
# 0.10. Call this stimulus random variable S.
# If there is a flash, the neuron will fire with probability 1/2. If
# there is not a flash, the neuron will fire with probability 1/18. Call
# this random variable F (whether the neuron fires or not).
# Which of these is closest, in bits (log base 2 units), to the mutual
# information MI(S,F)?
# we will go with H(R) - E[H(R|S)]
r_2 = p1
s_2 = 0.1                     # also happens to be 0.1
r_given_s_2 = 1/2
r_given_no_s_2 = 1/18
def mutual_information(response, noises):
    expected_noise_H = 0
    # this is the expected noise entropy (with weights coming from the
    # MARGINAL stimulus probability distribution), and NOT the average
    # over n noise entropies. not all entropies are born equal :-)
    for p_marginal,p_conditional in noises:
        expected_noise_H += p_marginal * bernoulli_entropy(p_conditional)
    return bernoulli_entropy(response) - expected_noise_H
mutual_information(r_2, [[s_2, r_given_s_2],
                         [1 - s_2, r_given_no_s_2]])
# In the following three questions, we will explore Poisson neuron
# models and population coding.
# This exercise is based on a set of artificial "experiments" that
# we've run on four simulated neurons that emulate the behavior found
# in the cercal organs of a cricket. Please note that all the supplied
# data is synthetic. Any resemblance to a real cricket is purely
# coincidental.
# In the first set of experiments, we probed each neuron with a range
# of air velocity stimuli of uniform intensity and differing
# direction. We recorded the firing rate of each of the neurons in
# response to each of the stimulus values. Each of these recordings
# lasted 10 seconds and we repeated this process 100 times for each
# neuron-stimulus combination.
# We've supplied you with a .mat file for each of the neurons that
# contains the recorded firing rates (in Hz). These are named neuron1,
# neuron2, neuron3, and neuron4. The stimulus, that is, the direction
# of the air velocity, is in the vector named stim.
# This will load everything into a dict called data, and you'll be
# able to access the stim and neuron responses using data['stim'],
# data['neuron1'], etc. (In general, data.keys() will show you all the
# keys available in the dict.)
import pickle
with open('tuning_3.4.pickle', 'rb') as f:
    data = pickle.load(f)
len(data['neuron1'])            # 5
data.keys() # dict_keys(['neuron4', 'stim', 'neuron3', 'neuron2', 'neuron1'])
# The matrices contain the results of running a set of experiments in
# which we probed the synthetic neuron with the stimuli in stim. Each
# column of a neuron matrix contains the firing rate of that neuron (in
# Hz) in response to the corresponding stimulus value in stim. That is,
# nth column of neuron1 contains the 100 trials in which we applied the
# stimulus of value stim(n) to neuron1.
len(data['stim'])               # 24
data['stim']
# array([   0.,   15.,   30.,   45.,   60.,   75.,   90.,  105.,  120.,
#         135.,  150.,  165.,  180.,  195.,  210.,  225.,  240.,  255.,
#         270.,  285.,  300.,  315.,  330.,  345.])
len(data['neuron1'])            # 100
data['neuron1']
# array([[ 19.7,  27.4,  28.1, ...,   0. ,   9.8,  16.1],
#        [ 21.3,  28.7,  32.8, ...,   0. ,   8.3,  16.6],
#        [ 21.5,  27.3,  29.3, ...,   0. ,   8.6,  17.2],
#        ...,
#        [ 21.9,  28.7,  30. , ...,   0. ,   8.7,  14.4],
#        [ 24.1,  28.1,  29.1, ...,   0. ,   7.1,  15.9],
#        [ 24. ,  28.3,  28.4, ...,   0. ,   8.2,  19.1]])
len(data['neuron1'][0])              # 24
data['neuron1'][0]
# array([ 19.7,  27.4,  28.1,  32.6,  31.8,  28. ,  22.6,  15.1,   9. ,
#          0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
#          0. ,   0. ,   0. ,   0. ,   9.8,  16.1])
data['neuron1'][0, 0]           # 19.7
# Plot the tuning curve-- the mean firing rate of the neuron as a
# function of the stimulus-- for each of the neurons.
import numpy as np
import matplotlib.pyplot as plt
tuning_curves = {}
for neuron in ['neuron1', 'neuron2', 'neuron3', 'neuron4']:
    # axis=0 means 'average columns'
    tuning_curves[neuron] = np.mean(data[neuron], axis=0)
    plt.plot(data['stim'], tuning_curves[neuron])
    plt.xlabel('Direction (degrees)')
    plt.ylabel('Mean firing rate')
    plt.title('Tuning curve for ' + neuron)
    plt.show()
# Continued from Question 7:
# We have reason to suspect that one of the neurons is not like the
# others. Three of the neurons are Poisson neurons (they are
# accurately modeling using a Poisson process), but we believe that
# the remaining one might not be.
# Which of the neurons (if any) is NOT Poisson?
# Hint: Think carefully about what it means for a neuron to be
# Poisson. You may find it useful to review the last lecture of week
# 2. Note that we give you the firing rate of each of the neurons, not
# the spike count. You may find it useful to convert the firing rates
# to spike counts in order to test for "Poisson-ness", however this is
# not necessary.
# In order to realize why this might be helpful, consider the fact
# that, for a constant a and a random variable X, E[aX]=aE[X] but
# Var(aX)=a^2 * Var(X). What might this imply about the Poisson statistics
# (like the Fano factor) when we convert the spike counts (the raw
# output of the Poisson spike generator) into a firing rate (what we
# gave you)?
# solution: Poisson distributions have variance equal to mean (Fano
# factor = 1). we could compute the mean spike count for each neuron's
# column/stimulus, then compute its variance. repeat for all
# columns/stimulus and plot. a true Poisson process should regress
# into the identity function:
T = 10 # in seconds
for neuron in ['neuron1', 'neuron2', 'neuron3', 'neuron4']:
    mean_spikes = T * np.mean(data[neuron], axis=0)
    spikes_variance = (T**2) * np.var(data[neuron], axis=0)
    plt.scatter(mean_spikes, spikes_variance)
    plt.xlabel('Mean number of spikes')
    plt.ylabel('Variance of number of spikes')
    plt.title('Fano test for ' + neuron)
    plt.show()
# Finally, we ran an additional set of experiments in which we exposed
# each of the neurons to a single stimulus of unknown direction for 10
# trials of 10 seconds each. We have placed the results of this
# experiment in the following file:
import pickle
with open('pop_coding_3.4.pickle', 'rb') as f:
    data2 = pickle.load(f)
# pop_coding contains four vectors named r1, r2, r3, and r4 that
# contain the responses (firing rate in Hz) of the four neurons to
# this mystery stimulus. It also contains four vectors named c1, c2,
# c3, and c4. These are the basis vectors corresponding to neuron 1,
# neuron 2, neuron 3, and neuron 4.
data2.keys()
# dict_keys(['c1', 'r1', 'r3', 'c2', 'c3', 'r2', 'c4', 'r4'])
# Decode the neural responses and recover the mystery stimulus vector
# by computing the population vector for these neurons. You should use
# the maximum average firing rate (over any of the stimulus values in
# 'tuning.mat') for a neuron as the value of rmax for that
# neuron. That is, rmax should be the maximum value in the tuning
# curve for that neuron.
v = np.array([0,0], dtype='float64')
# recall that a v̂_{pop} is the linear combination of vectors:
# ∑_a (r_a/max(max(r_a))ĉ_a
# in our case we are given many r_a's (many responses per neuron), so
# we create a population vector for each response, then average them
# to obtain a final population vector.
for a in ['1', '2', '3', '4']:
    v += (np.average(data2['r'+a]) /
               max(tuning_curves['neuron'+a])) * data2['c'+a]
# # equivalent to:
#
# v = np.zeros((2, 10), dtype='float64')
# for i in range(0, 10):
#     for a in ['1', '2', '3', '4']:
#         v[:,i] += (data2['r'+a][i] /
#                    max(tuning_curves['neuron'+a])) * data2['c'+a]
# v = np.average(v, axis=1)
# What is the direction, in degrees, of the population vector? You
# should round your answer to the nearest degree. Your answer should
# contain the value only (no units!) and should be between 0∘ and
# 360∘. If your calculations give a negative number or a number
# greater than or equal to 360, convert it to a number in the proper
# range (you may use the mod function to do this).
# You may need to convert your resulting vector from Cartesian
# coordinates to polar coordinates to find the angle. You may use the
# atan() function in MATLAB to do this. Note that the the convention
# we're using defines 0∘ to point in the direction of the positive
# y-axis and 90∘ to point in the direction of the positive x-axis
# (i.e., 0 degrees is north, 90 degrees is east).
# we start from 90 because that's our reference, and substract from
# there in order to rotate clock-wisely:
(90 - round(atan(v[1]/v[0]) * (180/pi))) % 360  # atan works in radians

• mechanism: neuron models

neuroelectronics
- membranes
- ion channels
- wiring
simplified neuron models
- basic dynamics of neuronal excitability
neuronal geometry
- dendrites and dendritic computing

• membrane

• bare phospholipid bi-layer

good dielectric (capacitor), but still allows some charge to pass through (paralleled by a resistor), mostly because of embedded channel proteins.

RC circuit:

         ----/\/---
         |        |
inside --|        |--- outside
         |        |
         ----||----

from Kirchhoff's conservation of current:

$I = I_{R} + I_{C}$ (TeX formula: I = I_R + I_C )

from Ohm's law and the definition of capacitance ( $C = Q / V$ (TeX formula: C=Q/V) )

$I = \frac{V_{R}}{R} + C \frac{d V}{d t}$ $(TeX formula: I = \frac{V_R}{R} + C\frac{dV}{dt} )$

• rest potential

because of the active work of pumps, $K^{+}$ in the inside and $N a^{+}, C l^{-}, C a^{2 +}$ $(TeX formula: Na^+, Cl^-, Ca^{2+})$ in the outside are under osmotic force to homogenize in the direction of their respective concentration gradients (voltage source).
until opposed by electrostatic forces.

this results in Nernst equilibrium:

$E = \frac{k_{B} T}{z q} l n (\frac{c o n c e n t r a t i o n_{i n s i d e}}{c o n c e n t r a t i o n_{o u t s i d e}})$ $(TeX formula: E = \frac {k_B T}{zq} ln \left( \frac{concentration_{inside}}{concentration_{outside}} \right) )$

where $k_{B}$ (TeX formula: k_B) is the Boltzmann constant, T is the temperature, q is the ionic charge and z number of charges.

this means that the current that manages to flow through the resistor in our RC model must also go through the electric potential created by the ion battery; i.e., not all the voltage drop is due to the resistor:

                 |
         --/\/--||---
         |       |  |
inside --|          |--- outside
         |          |
         ----||------

$I = \frac{V - V_{r e s t}}{R} + C \frac{d V}{d t}$ $(TeX formula: I = \frac{V - V_{rest}}{R} + C\frac{dV}{dt} )$

multiplying by R on both sides:

$V_{\infty} = V + R C \frac{d V}{d t}$ $(TeX formula: V_∞ = V + RC\frac{dV}{dt} )$

and let τ=RC and $V_{\infty}$ (TeX formula: V_∞) be the whole steady potential. note that when $\frac{d V}{d t} = 0$ $(TeX formula: \frac{dV}{dt} = 0)$ (no current) $V_{\infty} = V$ (TeX formula: V_∞ = V) ; and when constant, $V_∞ = V - V_{rest} = IR - V_{rest}$ $(TeX formula: V_∞ = V - V_{rest} = IR - V_{rest})$ .

with these conditions we can solve the differential equation:

$V (t) = V_{\infty} (1 - e^{- t / τ})$ $(TeX formula: V(t) = V_∞(1 - e^{-t/τ}))$ ; during the capacitors rising phase.
$V (t) = V_{\infty} e^{- t / τ}$ $(TeX formula: V(t) = V_∞e^{-t/τ})$ ; during discharge.

• ion channels

recall their diversity:

voltage-gated
chemically-gated:
- transmitter-gated (synaptic):
  - dopamine
  - GABA
  - serotonin
  - glutamatergic:
    - AMPA
    - NMDA
  - etc.
- $C a^{2 +}$ $(TeX formula: Ca^{2+})$ -gated.
mechanically-gated
thermally-gated

we will be focusing on voltage-gated channels.

let g = 1/R be the conductance. Ohm's law becomes:

$I = V g$ (TeX formula: I = Vg )

we can break down the general voltage source from our previous circuit into equilibrium (rest) potentials for different ion species:

ion	equilibrium potential (E_i)
Na	+050 mV
Ca	+150 mV
K	-080 mV
Cl	-060 mV

$I_{i} = g_{i} (V - E_{i})$ (TeX formula: I_i = g_i(V-E_i) )

these will be modelled with parallel paths.

from our ion-specific cable model for K, Na and leaks (L):

                 |
         --/\/--||---
         |  L    |  |
         |          |
         |       |  |
         --/\/--||---
         |  K    |  |
         |          |
         |      |   |
         --/\/--||---
         |  Na  |   |
inside --|          |--- outside
         |          |
         ----||------

• action potential

there's a limit to how much charge can move into the membrane without losing the capacity to stay in equilibrium. we call it the action potential.

where does the non-linear property that makes irrecoverable excitability (our good old activation function g()) lie?

conductances aren't constant. the resistance imposed by channels changes as voltage changes. we need functions describing them.

functional diagram of a voltage-gated channel:

    :========= =========:
    :========= =========:
    :========= =========: / <-- anchoring subunit
    :========= =========:/
   (___(voltage sensor)___)
                      \
                       \  <-- gate, activated by sensor
    ____________________\_
  -(                      )
    :========= =========:\
    :========= =========: \
    :========= =========:

• K dynamics

we will be using kinetic models for the opening of channels:

P(gate is open) increases with a depolarised neuron ( $K^{+}$ flows away after spike).
gate contains 4 moving subunits. $P_K = ∏$ {i=1}^4 P ≅ P_{sub}^4 $(TeX formula: P_K = ∏_{i=1}^4 P_{sub_i} ≅ P_{sub}^4)$ , for independent subunits.⁴

we go for a Bernoulli probability distribution, n = P(open). transitions between states occur at voltage-dependent rates $α_{n} (V)$ (TeX formula: α_n(V)) (closed → open) and $β_{n} (V)$ (TeX formula: β_n(V)) (open → closed). we don't know the function of voltage describing this probability, but we can write the function describing its change in continuous time:

Hudgkin-Huxley-1: $\frac{d n}{d t} = α_{n} (V) (1 - n) - β_{n} (V) n$ $(TeX formula: \frac{dn}{dt} = α_n(V)(1-n) - β_n(V)n )$

that is, the function that describes the probability is given by how much is added to the open state (proportional to the closed state, times the voltage-varying rate α) minus how much is lost.

rewrite in the form of τ and $n_{\infty}$ (TeX formula: n_∞) :

$\frac{1}{α_{n} (V) + β_{n} (V)} \frac{d n}{d t} = \frac{α_{n} (V)}{α_{n} (V) + β_{n} (V)} - n$ $(TeX formula: \frac{1}{α_n(V) + β_n(V)} \frac{dn}{dt} = \frac{α_n(V)}{α_n(V) + β_n(V)} - n )$

$τ_{n} (V) \frac{d n}{d t} = n_{\infty} (V) - n$ $(TeX formula: τ_n(V) \frac{dn}{dt} = n_∞(V) -n )$

• Na dynamics

3-subunit gate similar to Ka. P(gate is open) = m increases with potential drop.
additional subunit can block channel back even if the gate is open, after P(additional is open) = h decreases if potential is too weak, terminating the rising phase during an action potential.
$P (c h a n n e l i s o p e n) ≅ h m^{3}$ $(TeX formula: P(channel \; is \; open) ≅ hm^3)$ .⁴

similarly, HH2:

$\frac{d m}{d t} = α_{m} (V) (1 - m) - β_{n} (V) m$ $(TeX formula: \frac{dm}{dt} = α_m(V)(1-m) - β_n(V)m )$

$\frac{1}{α_{m} (V) + β_{m} (V)} \frac{d m}{d t} = \frac{α_{m} (V)}{α_{m} (V) + β_{m} (V)} - m$ $(TeX formula: \frac{1}{α_m(V) + β_m(V)} \frac{dm}{dt} = \frac{α_m(V)}{α_m(V) + β_m(V)} - m )$

$τ_{m} (V) \frac{d m}{d t} = m_{\infty} (V) - m$ $(TeX formula: τ_m(V) \frac{dm}{dt} = m_∞(V) -m )$

And for the additional Na-inactivating subunit, HH3 is:

$\frac{d h}{d t} = α_{h} (V) (1 - h) - β_{n} (V) h$ $(TeX formula: \frac{dh}{dt} = α_h(V)(1-h) - β_n(V)h )$

$\frac{1}{α_{h} (V) + β_{h} (V)} \frac{d h}{d t} = \frac{α_{h} (V)}{α_{h} (V) + β_{h} (V)} - h$ $(TeX formula: \frac{1}{α_h(V) + β_h(V)} \frac{dh}{dt} = \frac{α_h(V)}{α_h(V) + β_h(V)} - h )$

$τ_{h} (V) \frac{d h}{d t} = h_{\infty} (V) - h$ $(TeX formula: τ_h(V) \frac{dh}{dt} = h_∞(V) -h )$

• Hudgkin-Huxley model

and from those probabilities we go back to our voltage-dependent channel conductances:

$g_{K} (V) = g_{K} n^{4}$ (TeX formula: g_K(V) = g_K n^4 )

$g_{N a} (V) = g_{N a} h m^{3}$ $(TeX formula: g_{Na}(V) = g_{Na} hm^3 )$

therefore, the diagram's overall current equation, $I = \frac{V - V_{rest}}{R} + C\frac{dV}{dt}$ $(TeX formula: I = \frac{V - V_{rest}}{R} + C\frac{dV}{dt})$ , we rewrite as:

$I = \sum_{i} g_{i} (V - E_{i}) + C \frac{d V}{d t}$ $(TeX formula: I = ∑_i g_i(V-E_i) + C\frac{dV}{dt} )$

this is expanded with the ion-specific terms into the fourth and last equation in Hudgkin-Huxley's dynamical system:

HH4:

$I = g_{L} (V - E_{L}) + g_{K} n^{4} (V - E_{K}) + g_{N a} h m^{3} (V - E_{N a}) + C \frac{d V}{d t}$ $(TeX formula: I = g_L(V-E_L) + g_Kn^4(V-E_K) + g_{Na}hm^3(V-E_{Na}) + C\frac{dV}{dt} )$

• simplified models

the model can't be so simple that it will fail to capture the variety of neural codes neurons are capable of (frequency modulation, timing-dependent firing, train patterns, etc.)

• integrate-and-fire

we observed that at rest potential the neuron behaves very linearly. assume conductances stay constant and C = 1. this is no other than our good old passive membrane:

$I (t) = a (V - V_{0}) + 1 \frac{d V}{d t}$ $(TeX formula: I(t) = a(V - V_0) + 1\frac{dV}{dt} )$

$1 \frac{d V}{d t} = - a (V - V_{0}) + I (t)$ $(TeX formula: 1\frac{dV}{dt} = -a(V - V_0) + I(t) )$

this is a linear function, and since the slope is negative $V_{0}$ (TeX formula: V_0) is a stable fixed point in phase-space:

\           |
   \        |
      \     |
         \  |
------------\-------------
            |  \
            |     \
            |        \
            |           \ dV/dt
---> --> -> V0 <- <-- <---

this is good for latent addition, but there's no end or threshold to how much electric potential can drop before we see excitation.

to account for spikes we delimit the function with a $V_{t h r e s h o l d}$ $(TeX formula: V_{threshold})$ that will drive potential to depolarisation.

$V_{t h r e s h o l d}$ $(TeX formula: V_{threshold})$ marks a new fixed point, unstable this time:

\           |                                      /
   \        |                                   /
      \     |                                /
         \  |                             /
------------\--------------------------/---------------
            |  \                    /
            |     \              /
            |        \        /
            |           \  /
---> --> -> V0 <- <-- <--- <--- <-- <- V_th -> --> --->

now we need an edge $V_{m a x}$ $(TeX formula: V_{max})$ that will mark the end of an increase in depolarisation and a return to a $V_{r e s e t}$ $(TeX formula: V_{reset})$ at the other edge of the graph.

        \           |                                      /
           \        |                                   /
              \     |                                /
                 \  |                             /
--------------------\--------------------------/--------------------
                    |  \                    /
                    |     \              /
                    |        \        /
                    |           \  /
V_reset ---> --> -> V0 <- <-- <--- <--- <-- <- V_th -> --> ---> V_max

in cortical neurons this has been fitted against quadratic functions, or using an exponential for the second half where acceleration is positive (effectively capturing the faster dynamics of depolarisation) in what is known as the exponential integrate-and-fire neuron:⁵

$\frac{d V}{d t} = - a (V - V_{0}) + e^{(V - V_{t h}) / Δ} + I (t)$ $(TeX formula: \frac{dV}{dt} = -a(V - V_0) + e^{(V - V_{th})/Δ} + I(t) )$

where parameter Δ controls how fast the exponential grows.

        \           |                      |
           \        |                     |
              \     |                    |
                 \  |                   |
--------------------\------------------|--------
                    |  \              /
                    |     \          /
                    |        \      /
                    |           \  /
V_reset ---> --> -> V0 <- <-- <--- <-- V_th ---> V_max

• theta neuron

by thinking of potential as a cyclic phenomenon we can simplify $V_{m a x}$ $(TeX formula: V_{max})$ and $V_{r e s e t}$ $(TeX formula: V_{reset})$ into a single state, called $V_{s p i k e}$ $(TeX formula: V_{spike})$ :⁶

$\frac{d θ}{d t} = (1 - c o s (θ)) + (1 + c o s (θ)) I (t)$ $(TeX formula: \frac{dθ}{dt} = (1 - cos(θ)) + (1 + cos(θ))I(t) )$

• two-dimensional models

what if instead of an abrupt reset we had another decrease leading to yet another (stable) fixed point?

        \           |                      |
           \        |                     | |
              \     |                    |   |
                 \  |                   |     |
--------------------\------------------|-------|-
                    |  \              /         \
                    |     \          /           \
                    |        \      /
                    |           \  /
V_reset ---> --> -> V0 <- <-- <--- <- V_th ---> V-K <- <--

we will draw inspiration from the Hudgkin-Huxley model bringing in a second variable, G(u), to take care for inactivation. with G(u) providing negative values for large potentials $\frac{d V}{d t}$ $(TeX formula: \frac{dV}{dt})$ can be brought to zero again; mimicking the role of the Na gate's second inactivating subunit and the delayed role of the K channel.

$\frac{d V}{d t} = F (V) + G (u) + I (t)$ $(TeX formula: \frac{dV}{dt} = F(V) + G(u) + I(t) )$

this new variable $u$ (TeX formula: u) is in turn a function, whose change is contingent upon electric potential. we want $u$ to go hand-in-hand with potential:

$\frac{d u}{d t} = - u + H (V)$ $(TeX formula: \frac{du}{dt} = -u + H(V) )$

• simple™ model

modelled after a subset of the previous phase plane, near the intersection of nullclines. F(V) is chosen to be quadratic, G(u) is simply -u. on the second equation, -u + H(V) gains some weights:

$\frac{d V}{d t} = F (V) + G (u) + I (t) = (- ɑ V + β V^{2} + γ) - u + I (t)$ $(TeX formula: \frac{dV}{dt} = F(V) + G(u) + I(t) = (-ɑV+βV^2+γ) - u + I(t) )$

$\frac{d u}{d t} = a (b V - u)$ $(TeX formula: \frac{du}{dt} = a(bV - u) )$

given a specific quadratic function, two extra parameters (in addition to a (u's decay) and b (u's sensitivity)) are used to reset v and u after reaching $V_{m a x}$ $(TeX formula: V_{max})$ . these are called c (V's original value) and d (u's original value).

these dynamics are enough to capture a variety of codes:

regular (as in repetitive) spiking: | | | | | |
intrinsically bursting: ||| | | | |
fast spiking: |||||||||||
low-threshold spiking: ||||| | | |||||
- intrinsically bursting with chattering intervals: |||| ||| ||| |||
thalamocortical: | | | | | ... |||| | | |
resonator: mmm||||||

• realistic dendritic geometry

our previous models only really capture computation at and near the soma or axon. dendritic arbors have computationally-significant effects.

nobody really knows what the appropriate level of description is, if any.

for any passive membrane the following stochastic properties hold:

effects of traveled distance:
- impulse amplitude decays. that is, spikes shorten.
- frequency decays too. that is, spikes broaden.
effects of neurite radius:
- strength (ΔV) is inversely proportional to thickness.

this affects the contribution of individual inputs coming from different dendrites.

• cable theory

V is now a function of both time and space. off-the-bat, we are dealing with partial differential equations. run cables/resistors on both inside and outside to connect membrane patches, so as to model local currents produced by potential along the cable.

---------------> a = radius
|        /              \
|        \  ----/\/---  /
|        /  |        |  \
| inside \--|        |--/ outside
|        /  |        |  \
|        \  ----||----  /
|        /              \
|        /              \
|        \  ----/\/---  /
|        /  |        |  \
| inside \--|        |--/ outside
|        /  |        |  \
|        \  ----||----  /
|        /              \
v
x = length

generally we take the external resistance to be zero.

our old membrane's Kirchhoff equality $I_m = c_m \frac{∂V_m}{∂t} + \frac{V_m}{r_m}$ $(TeX formula: I_m = c_m \frac{∂V_m}{∂t} + \frac{V_m}{r_m})$ is symmetric to the second length-derivative of voltage:

$\frac{1}{r_{i n s i d e}} \frac{\partial^{2} V_{m} (x, t)}{\partial x^{2}} = c_{m} \frac{\partial V_{m}}{\partial t} + \frac{V_{m}}{r_{m}}$ $(TeX formula: \frac{1}{r_{inside}} \frac{∂^2V_m(x,t)}{∂x^2} = c_m \frac{∂V_m}{∂t} + \frac{V_m}{r_m} )$

which we rearrange to visualize time and space constants:

${(\sqrt{\frac{r_{m}}{r_{i}}})}^{2} \frac{\partial^{2} V_{m}}{\partial x^{2}} = r_{m} c_{m} \frac{\partial V_{m}}{\partial t} + V_{m}$ $(TeX formula: \left( \sqrt{\frac{r_m}{r_i}} \right)^2 \frac{∂^2V_m}{∂x^2} = r_m c_m \frac{∂V_m}{∂t} + V_m )$

$λ^{2} \frac{\partial^{2} V_{m}}{\partial x^{2}} = τ_{m} \frac{\partial V_{m}}{\partial t} + V_{m} .$ $(TeX formula: λ^2 \frac{∂^2V_m}{∂x^2} = τ_m \frac{∂V_m}{∂t} + V_m. )$

potential diffuses exponentially, starting from the point where current is injected (let's call it x=0), according to $V(x) ∝ e^{\left( - \frac{|x|}{λ} \right)}$ $(TeX formula: V(x) ∝ e^{\left( - \frac{|x|}{λ} \right)})$ . half length constants away from the centre, the voltage signal is attenuated by a factor of 0.4. one constant away it's 0.2; etc.

the arrival of peaks at different points gives us the velocity of propagation, which naturally turns out to be the ratio of the length and time constants, multiplied by two: $Vel_{cable} = 2λ/τ$ $(TeX formula: Vel_{cable} = 2λ/τ)$ .

here's what the general solution to the differential equation looks like:

$V (x, t) \propto \sqrt{\frac{τ}{4 π λ^{2} t}} e^{- \frac{t}{τ} - \frac{τ x^{2}}{4 λ^{2} t}}$ $(TeX formula: V(x,t) ∝ \sqrt{\frac{τ}{4πλ^2t}} \; e^{-\frac{t}{τ}-\frac{τx^2}{4λ^2t}} )$

this is summed over different locations and times to come up with our actual filter on the input, as seen on week 2.

• compartmental models

simple cable theory doesn't get us very far modelling truly realistic neurons anyway, because the it isn't taking channels into account, which also exist in dendrites. analytical solutions become intractable.

compartments are a simpler approximation to cable theory which gets rid of our dependence on x, although it's still a passive model.

it's a divide-and-conquer strategy that discretizes dendritic trees into levels.
for a binary tree, when the sum of the diameters of siblings and parent obey the relation:

$d_{1}^{3 / 2} + d_{2}^{3 / 2} = D^{3 / 2}$ $(TeX formula: d_1^{3/2} + d_2^{3/2} = D^{3/2} )$
then the subtree can be approximated with a single cable segment the diameter of the parent, extended by the effective electronic length of the children. the final result will invariably be a single neurite.

• active models

for a given geometry and function of channel density (yes!, channel density variability can have important effects too), compartmentalize into segments of approximately constant properties.
model each segment according to your favorite model (Hudgkin-Huxley, e.g.). Input current can come from either: parent, sibling or children: uphold Kirchhoff's when thinking about input potential/current.
couple segments into a tree, putting resistors between segments to account for drops. these need not be direction-invariant (i.e. there will be two diode-guarded resistors per segment connection).

• dendritic computation

there are many interesting ideas floating around:

lowpass filter, attenuation.
where the inputs enter the dendrite can have an impact:
- inputs at different places of the same segment could compute logical conjunction, whereas the parent segment computes disjunction on its children.
- analogically, this means the ability to segregate or amplify signals.
plasticity: Ca-mediated back-propagating action potentials could be the driving force behind plasticity, based on coincidence detection (mentioned in the introduction).
hippocampal neurons are known to display synaptic scaling: the ability to make inputs "commutative", or position-independent.

examples:

sound localisation: according to the Jeffress model, coincidence detector neurons could be taking advantage of different cable lengths (time-dependent code) to encode something like left-right position. because sounds coming from the left arrives with a delay to the right ear, a soma closer to the right with dendrites of different length coming from each ear could perceive both inputs at the same time, firing only to those sounds. other neurons with different dendritic lengths would detect other sound positions.
direction selectivity on LGN/V1 receptive fields (and elsewhere) could be due to the order in which inputs enter the same dendrite. a timely excitation would add to the signal as it travels down; whereas de-synchronized stimuli would see their individual signals fade away. in other words, the sequence in which pre-synaptic neurons touch the post-synaptic one encodes the sequence the post-synaptic neuron is trained to detect.

• mechanism: networks

• chemical synapse

action potential travels down the axon at the pre-synaptic neuron,
opens $C a^{2 +}$ $(TeX formula: Ca^{2+})$ channels at the terminal,
which in turn causes neurotransmitter vesicles to throw their contents into the synaptic cleft.
after migration via diffusion, neurotransmitters bind to glycoprotein receptors at the post-synaptic neuron.
ligand-gated channels (also known as ionotropic receptors) open, either from the direct action of neurotransmitters or as a consequence of the chain reaction started at metabotropic (aka G-protein-coupled) receptors.
this causes either further depolarisation or polarisation at the post-synaptic neuron, depending on chemical identity.
- excitatory: $N a^{+}$ channels open or some other positive ions enter the cell. common mediating neurotransmitters are:
  - acetylcholine: common in neuromuscular junctions.
  - glutamate: CNS. binds to NMDA, AMPA and kainate receptors.
  - catecholamines: dopamine, epinephrine/adrenaline, norepinephrine/noradrenaline, etc.
  - serotonin: parasympathetic functions
  - histamine
- inhibitory: $K^{+}$ leaves out or $C l^{-}$ comes in.
  - GABA-a, GABA-B. receptors are pentamers which exist in many compositions and conformations.
  - glycine: non-exclusively found in the spinal cord and retina
neurotransmitters detach from the post-synaptic dendritic spine and are reabsorbed.

• modelling

given the specific membrane capacitance ( $c_{m} ≅ 10 n F / m m^{2}$ (TeX formula: c_m ≅ 10 nF/mm^2) ), specific membrane resistance ( $r_{m} ≅ 1 M Ω m m^{2}$ (TeX formula: r_m ≅ 1 MΩ mm^2) ) and area A; $C_m = c_mA$ (TeX formula: C_m
= c_mA) and $R_{m} = \frac{r_{m}}{A}$ $(TeX formula: R_m = \frac{r_m}{A})$ . Notice how surface area becomes irrelevant when determining τ:

$\frac{c r A}{A} \frac{d V}{d t} = (V - V_{e q u i l i b r i u m}) + I R$ $(TeX formula: \frac{crA}{A} \frac{dV}{dt} = (V - V_{equilibrium}) + IR )$

anyway, as with Na and K channels which are modelled through resistors, our neuron circuit model will need yet another conductance per synapse (with its own equilibrium potential $E_{s}$ (TeX formula: E_s) ):

$τ_{m e m} \frac{d V}{d t} = - ((V - E_{l e a k}) + g_{s y n} (V - E_{s y n}) r_{m e m}) + I R$ $(TeX formula: τ_{mem}\frac{dV}{dt} = -((V-E_{leak}) + g_{syn}(V-E_{syn})r_{mem}) + IR )$

for an excitatory synapse, $E_{s y n} > E_{l e a k}$ $(TeX formula: E_{syn} > E_{leak})$ , and vice-versa.

similarly to the Hudking-Huxley model, the synaptic conductance $g_{s}$ (TeX formula: g_s) is a probabilistic function specified by a differential kinetic model:

$g_{s} = g_{s_{m a x}} P_{t r a n s m i t t e r r e l e a s e} P_{o p e n c h a n n e l s}$ $(TeX formula: g_s = g_{s_{max}}P_{transmitter\;release}P_{open\;channels} )$

assume $P_{t r a n s m i t t e r r e l e a s e} = 1$ $(TeX formula: P_{transmitter\;release} = 1)$ , i.e. vesicles are always available for release. as for $P_{o p e n c h a n n e l s}$ $(TeX formula: P_{open\;channels})$ , let $α_{s}$ (TeX formula: α_s) and $β_{s}$ (TeX formula: β_s) be the rate at which closed channels open and open channels close, respectively. therefore:

$\frac{d P_{o p e n c h a n n e l s}}{d t} = α_{s} (1 - P_{o p e n c h a n n e l s}) - β_{s} P_{o p e n c h a n n e l s}$ $(TeX formula: \frac{dP_{open\;channels}}{dt} = α_s(1 - P_{open\;channels}) - β_sP_{open\;channels} )$

evidence shows that $P_{o p e n c h a n n e l s}$ $(TeX formula: P_{open\;channels})$ is best modelled using dissipating exponential function $K (t) = e^{- t / τ_{s}}$ $(TeX formula: K(t) = e^{-t/τ_s})$ for AMPA synapses, whereas the slightly delayed rise in probability in something like GABA-A is better-suited to a so-called "alpha" function, $α(t) = \frac{t}{τ_{peak}}e^{\left( 1 - \frac{t}{τ_{peak}} \right)}$ $(TeX formula: α(t) = \frac{t}{τ_{peak}}e^{\left( 1 - \frac{t}{τ_{peak}} \right)})$ . $τ_{p e a k}$ $(TeX formula: τ_{peak})$ is an empirical parameter that will vary with chemical.

P: 1.0  |
    .8  | \
    .6  |   \
    .4 |     \_
    .2 |        \ _
   0.0 |            \  _
       0  5 10 15 20 25 :t (ms)
       alpha function

in order to model the cumulative effect of a synapse train coming from the pre-synaptic neuron we recur to our old linear filter. this we call the response function, rho ( $ρ(t) = ∑_i δ(t - t_i)$ (TeX formula: ρ(t) = ∑_i δ(t -
t_i)) ).

example of a final aggregate synaptic conductance for AMPA:

$g_{s} (t) = g_{s_{m a x}} \sum_{t_{i} < t} K (t - t_{i}) = g_{s_{m a x}} \int_{- \infty}^{t} K (t - τ) ρ (τ) d τ$ $(TeX formula: g_s(t) = g_{s_{max}} ∑_{t_i<t} K(t - t_i) = g_{s_{max}} ∫_{-∞}^t K(t - τ)ρ(τ) dτ )$

$= g_{s_{m a x}} \int_{- \infty}^{t} e^{- (t - τ) / τ} (\sum_{i} δ (t - t_{i})) d τ$ $(TeX formula: = g_{s_{max}} ∫_{-∞}^t e^{-(t - τ)/τ}(∑_i δ(t-t_i)) dτ )$

imagine the following spike train arriving at the synapse, and its corresponding filtered conductance composed of alpha or exponential $P_{o p e n c h a n n e l s}$ $(TeX formula: P_{open\;channels})$ functions pieced together:

train:        | |   | | |
              ---------------> time
                        |\
                      |\| \
                |\  |\|    \
              |\|  \|       \
conductance:  |              \
              ---------------> time

• networks

now that we know how to feed a neuron's output into another neuron and convert that input into yet more action potentials given by $τ_{m} \frac{d V}{d t} = - ((V - E_{L}) + g_{s} (t) (V - E_{s}) r_{m}) + I R$ $(TeX formula: τ_{m}\frac{dV}{dt} = -((V-E_L) + g_s(t)(V-E_s)r_m) + IR)$ , the next simplest step is to experiment with 2-neuron networks. the following are common properties for a simple feedback loop system with homogeneous synaptic sign:

nature of synapses	behavioural pattern
excitatory	alternating activation (n1, n2, n1, n2, ...)
inhibitory	synchrony⁷ (n1+n2, n1+n2, n1+n2, ...)

Not to be confused with ephaptic transmission, in which local ion currents from one cell's membrane manage to create a significant effect on the membranes of nearby cells. ↩
http://www.jneurosci.org/content/19/18/8036.full ↩
Berry, Chichilnisky. ↩
I'd like to thank Prof. Markus Müller for pointing out to me that the exponents of probability functions were determined using curve fitting and are unrelated to the number of molecule subunits. ↩↩
Fourcaud-Trocmé, Hansel, van Vreeswijk, Brune. ↩
Ermentrout, Kopell. ↩
This demands a deeper explanation. ↩