(These are my own notes from Coursera's "Computational Neuroscience", as taught by the University of Washington. I'm really enjoying the lecture contents, challenging assignments, level of mathematical treatment, and the otherwise-rare feeling of superlative transcendence that comes with science. It's almost like half the math I learned in college had been dormant, waiting for a non-math professor who knew how to apply it to their subject. This is also the last properly capitalized paragraph you will see on this page. WIP.)

## • overview

goal of computational/theoretical neuroscience: explaining the mind (mostly behaviour at this point) in terms of brain computation.

computational/theoretical neuroscience: tools and methods for:

• characterizing what nervous systems do
• determining how they achieve it
• understanding why they operate like that

kinds of models in theoretical neuroscience by purpose:

• descriptive (what): quantitative description of stimuli encoding and decoding
• mechanistic (how): detailed simulation of neurons, networks, etc.
• interpretive/normative (why): purpose (function) and underlying principles behind function.

### • types of models

#### • descriptive

example: in Hubel and Weisel's discoveries in the primary visual field, the frequency at which action potentials occur in single-cell recordings is a function of place and orientation of the stimulus:

receptive field: specific properties of a sensory stimulus that maximize response.

• retinal ganglion (efferent) cells and thalamus' Lateral Geniculate Nucleus. center-surround receptive fields only, i.e:
        -    +
-+-  +-+
-    +

• V1 simple cells. oriented receptive field, e.g:
     --------    +
-++++++++-  +-+
--------    +-+
+-+
+


#### • mechanistic

V1 simple cell model suggested by Hubel & Wiesel (doesn't take recurrent input into account):

LGN cell 1 ----\
LGN cell 2 -----|--> V1 simple cell
...
LGN cell n ----/


#### • interpretive

why are receptive fields in V1's simple cells shaped that way? what is their diversity for?

efficient coding hypothesis: evolution favours representing images as faithfully and efficiently as possible.

on V1 simple cells, given image and receptive fields , we can try to reconstruct it ( ) using neural response strenghts (multipliers) like this:

question: what are the that minimize squared pixel-wise errors between and , and are as independent as possible?

2. run an efficient coding algorithm (sparese coding, Independent Coding Analysis, predictive coding)

### • neurobiology

#### • neuron

Some types of neurons:

• pyramidal: most common type, cortical building block
• Purkinje: multi-level/branching dendrites
• ganglion retinal cells, amacrine cells

neuron doctrine:

• functional unit of the nervous system
• neurons are (mostly) discrete
• signals travel from dendrines to axon

simple idealized descriptive model:

where .

ion channels are variously gated, (like transistors):

• voltage-gated (membrane, gap junctions)
• chemically-gated (neuroreceptors at post-synaptic cleft, smell-taste sensors)
• mechanically-gated

#### • synapse

synapse: connection or junction between two neurons.

• electrical:1

• uses gap junctions (shared channels)
• fast. ideal for synchronization between neurons, sympathetic reactions
• chemical:

• uses neurotransmitters
• thought to be more plastic, because post-synaptic cell can independently change its dendritic membrane depending on input, history, etc. hypothetically ideal for learning.

synapse doctrine: synapses are the basis for learning and memory

#### • plasticity

• hebbian: if neuron A repeatedly fires to neuron B, then their synaptic strength is strengthened. neurons that fire together, wire together.

    before: A_{t1} ---||||--- *.01 ---> B_{t1}
after:  A_{t2} ---------- *.1  ---> B_{t2}

• LTP/LTD:

• experimentally observed change in synaptic strength
• lasts a few hours or days
• depends on relative timing of input and output spikes

if pre predicts post, post increases the strength (the smaller the delta, the more excitatory the synapse). if pre fires after post, post reduces the strength (the smaller the delta, the more inhibitory the synapse):

#### • nervous system

• PNS

• somatic: modal afferent and voluntary efferent
• autonomic

• sympathetic: fight or flight

• parasympathetic: sleep or breed

• enteric: digestion

• CNS:

• spinal chord
• encephalon/brain

• rhombo/hindbrain

• medulla oblongata (bulbo raquídeo): breathing, muscle tone, hemobarycs
• pons: connects to cerebellum. sleep and arousal
• cerebellum: equilibrium, PNS-somatic calibration, language?, attention?
• mesen/mid

• visual and auditory reflexes, saccadic movement
• reticular formation: reflexes, pain, breathing, sleep and arousal
• thalamus

• relay station for everything modal except smell
• procen/cerebrum

• basal ganglia
• hyppocampus
• amygdala
• cortex: all of higher cognition, subjectivity
##### • cortex

6 layers of neurons, relatively uniform in structure. e.g.:

layer function
1 input from "higher" receptive fields
2 output to "
3 "
4 input from subcortical regions
5 output to " "
6 "

## • description: encoding

recording techniques:

• fMRI: records aggregate changes in magnetic field resulting from hemodynamics
• accuracy: millions of neurons, varies with voxel resolution
• time scale: seconds
• non-invasive
• in full 3D
• great for differential localisation (dissociation) of function localisation
• EEG: records aggregate changes in electric field resulting from neural activity itself
• accuracy: rather noisy. confined to skull surface
• time scale: faster than fMRI
• electrode array (single-unit recording(s)): roughly targets single neurons
• calcium imaging: single neurons too, based on fluorescence changes resulting from Ca binding.
• patch clamp: inserts into cell
• maximum accuracy. requires less amplification
• good for mechanistic models of single cells, ions, pumps, etc.

raster plot: spike plots successively stacked

encoding: from stimulus parameters to pattern of responses

decoding: working back stimulus parameters from response

examples:

• in V1 oriented (angle = θ) receptive fields:

• motor cortex:

functional maps: the feature selectivity of adjacent tissue can be represented with an spatial field, where each point is associated with the values of an stimulus that maximise its response (i.e. it's receptive field)

one map per parameter: position, orientation, grammatical categories, images of both Brad Pitt and Jennifer Aniston, the concept of Dracula: anything related to him (image, audio, words), etc.

even worse: higher-order representations can feed back into the stimuli upstream, shaping the way things are perceived even at very rudimentary levels.

### • simple models

where , .

#### • linear

• most recent stimulus only:

where the f is synaptic strength.

or to account for a short delay from stimulus to response:

• linear with temporal filtering:

a more realistic model will depend on a linear combination of many recent inputs:

note how constant turned into weight function .

in infinitesimal form:

this is a convolution of and , where the former acts as the linear filter.

• leaky filter/integrator:

decreases the more you look into past synapses, usually exponentially.

example: linear filtering applied to center-surround visual fields, where distance in time is replaced by distance in space. the discrete model thus becomes:

in addition to leaking, we can think of the phenomenon of lateral inhibition in retinal ganglionar cells to construct our filter . this is usually approximated as a difference of a normal distribution and another negative, shallower gaussian. like Sobel's filter this is an edge detector.

#### • non-linear activation function

finally, we can think of an extra activation function that will map the filter convolution to the actual firing rate:

s(t) -> f(τ) -> g() -> r(t)


where is the activation function (also called input/output function, or transfer function in ANNs), usually with sigmoid shape.

### • dimensionality reduction

or how to regress a good filter. as seen in the last example, the stimulus could easily contain an intractable number of parameters.

#### • reverse correlation: spike-triggered average

1. convert to discrete vector: a function of a single variable can be thought of as a single point in space:

2. each application (an instant in the spike trail) is a dimension.

3. the value of that component is the value of that function/stimulus at that point/time:

1. one method uses Gaussian white randomness as to probe a reaction, and approximate what is common among trials.

2. a random prior stimuli distribution is constructed from multiple such trials of decomposed random 's:

per the central limit theorem, this distribution of stimuli will also be gaussian at each dimension, at whatever new dimensions are chosen from a basis change/linear combination; and indeed at the whole space (picture a normal distribution on a plane, or on 3D).

1. pick the cluster of points that caused to fire, called the spike-conditional distribution

2. and calculate their spike-triggered average point.

3. then pick some dimension that passes through that point and project the original spike-conditional distribution onto them.

4. the spike-triggered average vector is then taken to be a unit vector. is our weight filter/feature detector/convolution .

geometrically, it is also a projection. so

### • activation function

or how to regress a good g() for our f().

remember that is the last step in producing , and that encoding in its most general form amounts to calculating . so we start from the equation:

has become after simplifying the multi-dimensional stimulus. so what's the new probability under this subset?

expanded into bayesian form:

simply has a normal probability distribution for the white-noise experiment:

    . .
. .
. . . .
. . . . . .


the histogram of is obtained only from those point-values in the stimulus that coincide in time with a spike. let's say those are mostly peaks in the stimulus signal. the distribution would look skewed like this:

       .
. .
. . . .
. . . . . .


therefore, :

     ....
.
.
...


is just calculated empirically by counting during the random experiment.

if had also a normal distribution (like ), the ratio of both functions (as used in Bayes' formula) would yield a constant function. i.e., the activation function would look flat. what we actually want is both distributions to be different.

### • maximally informative filters (again)

could we work backwards from

and so that the distributions and are as dissimilar as posible?

one measure for such a difference is Kullback-Leibler divergence:

an advantage of this over STA is that the prior needs not be gaussian random noise. it might as well be natural stimuli.

### • Generalized Linear Models

• with a refractory period filter:
(s) -> (f) -> (+) -> (g) -> (Poisson) ----> r
^                        |
|_(refractionary filter)_|

• with coupling (neighbour neurons) filters, ephaptic transmisison, etc.

## • description: decoding

### • signal detection theory (hypothesis testing)

likelihood ratio test:

Neyman-Pearson lemma: the likelihood ratio test is the most efficient statistic for any given size.

maximum lakelihood with asymmetric costs:

type 1 and type 2 errors may not cost the same. we introduce a weighting factor :

we should clearly answer yes when . I.e, . rewriting using Bayes' rule yields:

...and arranging for the ratio of conditional distributions to obtain the likelihood ratio:

### • population coding

methods for decoding from populations of neurons:

• population vector
• bayesian inference

• maximum likelihood
• maximum a posteriori

#### • population vector

example: cricket cercal cells in the antenna-like surcus structures in the abdomen are sensitive to wind velocity.

how does the group as a whole communicate wind velocity to the animal?

there are 4 overlapping receptive field distributions, corresponding to the 4 directions relative to the body:

f/r_max
-    |    -    |
---  |||  ---  |||
-----++|||++---++|||||
45   135  225  315    degrees


given wind direction (in degrees) and a neuron's preferred direction , response looks like: (after normalising for , because some neurons have an intrinsically higher firing rate, even for equally maximal stimuli)

or in terms of vectors (wind vector) and (neuron's preferred vector):

then, the population vector is calculated as:

note how 4 orthogonal vectors are used, even though 2 in linear combinations would suffice to generate the whole plane. this is due to the lack of negative firing rates to represent wind in the opposite direction, or its component along that direction.

this method is prevalent in brain-machine interfaces targeting the M1 area.

for a sufficiently large number of directions:

#### • bayesian inference

• more general than population vector
• less vulnerable to noise

recall Bayes' rule for stimulus and response :

we could also rewrite the marginal distribution to yield:

• maximum likelihood distribution: this bayesian decoding strategy tries to find the stimulus which maximizes the likelihood conditional distribution .

• maximum a posteriori distribution: this decoding strategy tries to find the stimulus which maximizes the a posteriori (LHS) conditional distribution .

MAPD is influenced by the prior , because all factors in Bayes equation are taken into accout.

example: imagine a population of neurons that encode stimulus/parameter , with response a Gaussian probability distribution, and mean/maximum at .

meanwhile, we have each individual neuron's Gaussian responses scattered along and overlapping each other:

r=f(s)
-    |    -    |    -
---  |||  ---  |||  ---
-----++|||++---++|||++-----
n1   n2   s_1  n3    nn     stimulus


assume the following:

• each neuron fires independently
• variability in firing is due to a Poisson process, so

• the distribution of distributions is not gaussian, but uniform. i.e., there's plenty of single-neuron distributions across the range of .

spikes are produced randomly and independently in each time bin, with a probability given by the instantaneous rate. from Poisson:

where:

• is the number of occurrences for which is going to be calculated for time interval .
• is the constant mean firing rate per time unit.
• is the time interval.

substituting for the variables in our example setting, the probability of seeing spikes for a single neuron becomes:

notice how we write the specific number of occurrences during interval simply as times .

also, the constant is replaced by our previous probability function , and the whole equation is accordingly changed to a conditional distribution of stimulus (that is, our conditional distribution is a bunch of Poisson formulas, one per slice at this neuron's Gaussian).

so the whole population's ( ) distribution can be written down as the product of the independent distributions:

##### • maximum likelihood

the previous formula is often written in logarithmic form, so as to avoid rounding products and the exponential, in what is known as log-likelihood:

which we try to maximise:

recall that is the normal distribution:

think of this as an improved version of the population vector , because each neuron's contribution is weighted against its variance , more informative neurons (less spread-out distribution) will contribute more.

##### • maximum a posteriori

in log-a-posteriori form for a single neuron:

and for the whole population:

optimize setting the derivative to 0:

and after we pour in the gaussian expansions of :

if you look closer, this is no otherthan the likelihood, scaled by and shifted by the biasing effect of the prior's deviation. in other words, the more certain the prior (the more pointy its distribution), the more the stimulus takes us away from a simple likelihood distribution.

#### • caveats

• these methods don't incorporate rapid time variations, as they are modelled after tuning curves and mean firing rate.
• don't take into account correlations in the population; i.e., we assumed independence of events. this is an open research enterprise in neuroscience.

### • stimulus reconstruction

or how to extend the bayesian methods to the case where stimuli and response vary continuously in time.

suppose we want the best estimator for stimulus . we introduce an error function, , and average it over all possible stimulus choices:

which we will try to minimize. one choice of is the least squares error function:

but is just another way of saying "all the stimuli that triggered an action potential". in other words, this filter is just the spike triggered average.

example: famous 1999 "mind reading" of a cat's Lateral Geniculate Nucleus.2

here researchers are trying to find that maximizes the a posteriori distribution .

the encoding model/likelihood needs to be constructed from input movies.

### • case study: visual processing in the retina

rods are exquisitely sensitive to the minutest amounts of photons (<0.1% of them contribute signals). averaging would be disastrous under these conditions. how can a noisy neural array reliably detect and relay those signals?

what is the optimal readout of an array of detectors when only a small fraction is active and relevant? this question is pervasive accross attention and decision-making research.

we need a selection process/threshold to discard most sources off-the-bat, based on prior knowledge. once again, probabilistic models for noise and signal are due.

rod -->
\
rod -->--rod bipolar--
/              |
rod -->               |
v
amacrine
|
|
cone --> cone bipolar--------> ganglion ->


anatomically, such a threshold in non-linearity is thought to occur right at the integrating synapse from rods to rod bipolar cells. no other candidate is left after the bipolar consumes all rod output.

for similar sample sizes of noise and signal, recordings show that the threshold is biased for type-1 errors: even most true positives get rejected, unless the probability of the noise distribution is negligibly small.

P(r)
--  ||
---++|||
---++++||||
noise signal     Amplitude
threshold: .........^


however, under natural conditions the probability of seeing a photon is minuscule compared to the probability of seeing noise. priors matter!

P(r)
--
----
------ |
------++||
noise signal     Amplitude
threshold: ........^


## • description: codec evaluation

at any given time bin in a spike train the probability of seeing an event is given by , and the probability of its complement is given by . their respective information values are and .

entropy counts the yes/no questions necessary to specify a variable's distribution.

the encoding capacity of a spike train to account for variability in its counterpart (ability to represent) is quantified using entropy.

theorem: uniform probability has the highest entropy of all bernoulli distributions:

, a function of probability , peaks at .

generally is not uniform, but it would be best if it were.

### • mutual information

how much of the response is encoded in the stimulus?

let and be 2 random variables. noise entropy is defined as the entropy of either of their conditional distributions. for we have that:

averaged over . and the mutual information is given by:

note that . mutual information is commutative. whatever amount of information is reduced from R by knowing S, it must be the same amount that would be reduced from S if we knew R.

example: we have 2 posible stimuli/responses (yes or no), and 2 possible errors of (presumably) equal probability .

Confusion matrix of probabilities:

stimulus response no response
yes 1-q q
no q 1-q

let's focus on the "yes" stimulus alone. therefore:

• Total entropy: H(R) = -P(response)log(P(response)) - P(no response)log(P(no response)).
• Noise entropy (hits): H(R|yes) = -q log(q) - (1-q)log(1-q).
• Mutual information:

intuitively, this means that the greater the noise entropy, the less the stimulus encodes the response. but noise entropy is maximum when , thereby telling us that when the response is pure chance, the stimulus has no bearing representing it.

when , . I.e. they are independent. when , . I.e., response is perfectly encoded by stimulus.

graphical example with continuous distributions:

High mutual information between S and R. the total response (sum of all possible stimuli, shown as the wrapper) stems from very differentiated stimuli (low noise entropy), each contributing uniquely to explain one part of the response:

R=f(S)
~~ ~~~~  ~~~~ ~~~  ~~
~  ~    ~     ~   ~   ~
~ -    |    -    |    - ~
~ ---  |||  ---  |||  --- ~
~~ ----++|||++---++|||++---- ~
s1   s2   s3   s4   s5     stimulus


Low mutual information. Same response, but it is just the sum of very noisy stimuli:

R=f(S)
~~ ~~~~  ~~~~ ~~~  ~~
~  ~    ~     ~   ~   ~
~                       ~
~ ---**||***--**||***---  ~
~~ --*********************** ~
s1   s2   s3   s4   s5     stimulus


#### • relationship to Kullback-Leibler

if mutual information measures variable independence, then it should be equivalent to the Kullback-Leibler divergence between joint probability and the bare product of both distributions:

or

let's rewrite the second version using conditionals:

### • spike trains

#### • spike patterns

• so far we've only dealt with single spikes or firing rates.
• what info is carried by entire patterns?
• how informative are they?

from spike train to pattern code (macro-symbols):

1. chop response train into segments (called letters) of size .
2. assign each of them one of 2 values based on presence/amount of contained spikes.
3. group them into words of length L.
4. plot the distribution of words, P(W).
5. (all-zero is a common mode, followed by words with a single one-letter spike, etc).
6. compute total

7. compute from the responses given random Words, averaged.

8. rinse and repeat for different Δt and T, until is maximum, i.e., the conditional/noise contributes the least entropy because we learned a lot about W by looking at that S.

example:

from Reinagel and Reid's 2000 study on the Lateral Geniculate Nucleus' pattern code:

• mutual information was ploted as a function of L and Δt.
• information drops off as we get to long words, because they require increasingly larger sample sizes to estimate the distributions.
• nonetheless, if a clear pattern emerges from looking at shorter word lengths, one may extrapolate the entropy curve to cases with larger words and predict if entropy would actually increase or not there.

#### • single spikes

how much does the observation of a single spike tell us about the stimulus? by how much does knowing that a particular stimulus occurred reduce the entropy of the response?

without knowing the stimulus, the probability of seeing a single action potential in the response can be computed from the average spike rate and Δt:

response no response
P(r=1) = r̅Δt P(r=0) = 1 - P(r=0)
P(r=1|s) = r̅(t)Δt P(r=0|s) = 1 - P(r=1|s)

the rates and r̅(t) are calculated empirically.

note that every time t is a sample of s, so avereraging the noise entropy might as well be done online, as opposed to averaging over stimuli. this property is known as ergodicity:

(...more unexplained inferences...)

therefore, the per-spike information is approximately:

some properties of this calculation:

• when r(t) → r̅, information goes to zero. this means that the single spike was barely modulated by that particular stimulus.
• when is small, information is likely to be large, because the receptive field is a very rare/informative one.
• stimulus-independent (measuring information is possible in the absence of a coding/decoding model).
• the spike is a meta-symbol that could be any other event composed of multiple spikes.

example: hippocampal place fields. imagine the following place field:

------------------
|          +     |
|         +++    |
|        +++++   |
|         +++    |
|          +     |
------------------


will be rather large (and information about any particular stimulus low), because the receptive field is rather large and imprecise. such place cell would fire to a considerable range of positions within the room.

### • neural code wrap-up: intepretive principles

• what challenges do natural stimuli pose?
• what does information theory suggest about the purpose of neural systems?
• what principles could be at work shaping the neural code?

natural stimuli: usually posses 2 properties, regardless of modality:

1. huge dynamic range: variations over many orders of magnitude.
2. power-law scaling: structure at many scales/frequency levels. screams for Fourier analysis?

#### • efficient coding

noise entropy escapes our hands under natural situations.

histogram equalization: is obtained by matching limited amount of outputs/symbols to the distribution of inputs , so that the probability of using certain output symbol is uniform. in other words, our coding filter should be determined by the distribution of natural inputs.

this implies that the response/input-output curve is the cumulative integral of .

suppose only 20 possible outputs are available. we should divide the area under in 20 slices of equal area.

example: (Laughlin, 1981) measured the distribution of contrast in natural scenes . he correctly predicted the empirical of the fly's monopolar cells that integrate information from photoreceptors, taking the integral of the measure .

#### • dynamical efficient coding

is varying widely over time as the neural system moves from environment to environment.

open question: should a neural system optimize over evolutionary time or locally?

if the amplitude of stimulus fluctuations decreases (less variety of frequencies for the same period of time), the probability distribution of stimulus values will contain less possible states. to adjust for this change, the encoding neuron correspondingly squeezes its activation curve to best encode a poorer range of stimuli, in real time! this way the input-output curve will go back to being similar to the cumulative integral of .

feature adaptation: the encoding filter vector/function under a natural distribution of stimuli needs not be the same as the one derived from gaussian noise.

maximally-informative dimensions: as mentioned in week 2, the filter is chosen to maximize between spike-conditional and prior distributions, except that this is done on real time for changing distributions. this is equivalent to dynamically maximizing mutual information, as mentioned above.

#### • redundancy reduction

the efficient-coding hypothesis says that since membrane re-polarisation is costly, neural codes should be as efficient as possible.

for a population code, this means that their joint entropy should be independent. that is: ; because it is true that .

however, correlations have been shown to contradict the hypothesis at times:

• error correction and robustness
• helping in discrimination

example: neurons in the retina use redundancy.3

sparse-firing hypothesis: as few neurons as possible should be firing at any time.

suppose we have basis functions and noise/error with which to reconstruct a natural scene :

this means choosing so as to have as few coefficients as possible. this is done calculating a new error consisting of two terms: ordinary mean squares to account for fidelity, but also a cost for usage:

The vector x in the equation above represents the coordinates of a point in the image.

### • quiz

#!/usr/bin/env python3
from math import *
# Suppose that we have a neuron which, in a given time period, will
# fire with probability 0.1, yielding a Bernoulli distribution for the
# neuron's firing (denoted by the random variable F = 0 or 1) with
# P(F = 1) = 0.1.
# Which of these is closest to the entropy H(F) of this distribution
# (calculated in bits, i.e., using the base 2 logarithm)?
def bernoulli_entropy(p):
return -(p*log2(p) + (1-p)*log2(1-p))
p1 = 0.1
bernoulli_entropy(p1) # 0.4690
# Continued from Question 1:
# Now lets add a stimulus to the picture. Suppose that we think this
# neuron's activity is related to a light flashing in the eye. Let us
# say that the light is flashing in a given time period with probability
# 0.10. Call this stimulus random variable S.
# If there is a flash, the neuron will fire with probability 1/2. If
# there is not a flash, the neuron will fire with probability 1/18. Call
# this random variable F (whether the neuron fires or not).
# Which of these is closest, in bits (log base 2 units), to the mutual
# information MI(S,F)?
# we will go with H(R) - E[H(R|S)]
r_2 = p1
s_2 = 0.1                     # also happens to be 0.1
r_given_s_2 = 1/2
r_given_no_s_2 = 1/18
def mutual_information(response, noises):
expected_noise_H = 0
# this is the expected noise entropy (with weights coming from the
# MARGINAL stimulus probability distribution), and NOT the average
# over n noise entropies. not all entropies are born equal :-)
for p_marginal,p_conditional in noises:
expected_noise_H += p_marginal * bernoulli_entropy(p_conditional)
return bernoulli_entropy(response) - expected_noise_H
mutual_information(r_2, [[s_2, r_given_s_2],
[1 - s_2, r_given_no_s_2]])
# In the following three questions, we will explore Poisson neuron
# models and population coding.
# This exercise is based on a set of artificial "experiments" that
# we've run on four simulated neurons that emulate the behavior found
# in the cercal organs of a cricket. Please note that all the supplied
# data is synthetic. Any resemblance to a real cricket is purely
# coincidental.
# In the first set of experiments, we probed each neuron with a range
# of air velocity stimuli of uniform intensity and differing
# direction. We recorded the firing rate of each of the neurons in
# response to each of the stimulus values. Each of these recordings
# lasted 10 seconds and we repeated this process 100 times for each
# neuron-stimulus combination.
# We've supplied you with a .mat file for each of the neurons that
# contains the recorded firing rates (in Hz). These are named neuron1,
# neuron2, neuron3, and neuron4. The stimulus, that is, the direction
# of the air velocity, is in the vector named stim.
# This will load everything into a dict called data, and you'll be
# able to access the stim and neuron responses using data['stim'],
# data['neuron1'], etc. (In general, data.keys() will show you all the
# keys available in the dict.)
import pickle
with open('tuning_3.4.pickle', 'rb') as f:
len(data['neuron1'])            # 5
data.keys() # dict_keys(['neuron4', 'stim', 'neuron3', 'neuron2', 'neuron1'])
# The matrices contain the results of running a set of experiments in
# which we probed the synthetic neuron with the stimuli in stim. Each
# column of a neuron matrix contains the firing rate of that neuron (in
# Hz) in response to the corresponding stimulus value in stim. That is,
# nth column of neuron1 contains the 100 trials in which we applied the
# stimulus of value stim(n) to neuron1.
len(data['stim'])               # 24
data['stim']
# array([   0.,   15.,   30.,   45.,   60.,   75.,   90.,  105.,  120.,
#         135.,  150.,  165.,  180.,  195.,  210.,  225.,  240.,  255.,
#         270.,  285.,  300.,  315.,  330.,  345.])
len(data['neuron1'])            # 100
data['neuron1']
# array([[ 19.7,  27.4,  28.1, ...,   0. ,   9.8,  16.1],
#        [ 21.3,  28.7,  32.8, ...,   0. ,   8.3,  16.6],
#        [ 21.5,  27.3,  29.3, ...,   0. ,   8.6,  17.2],
#        ...,
#        [ 21.9,  28.7,  30. , ...,   0. ,   8.7,  14.4],
#        [ 24.1,  28.1,  29.1, ...,   0. ,   7.1,  15.9],
#        [ 24. ,  28.3,  28.4, ...,   0. ,   8.2,  19.1]])
len(data['neuron1'][0])              # 24
data['neuron1'][0]
# array([ 19.7,  27.4,  28.1,  32.6,  31.8,  28. ,  22.6,  15.1,   9. ,
#          0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
#          0. ,   0. ,   0. ,   0. ,   9.8,  16.1])
data['neuron1'][0, 0]           # 19.7
# Plot the tuning curve-- the mean firing rate of the neuron as a
# function of the stimulus-- for each of the neurons.
import numpy as np
import matplotlib.pyplot as plt
tuning_curves = {}
for neuron in ['neuron1', 'neuron2', 'neuron3', 'neuron4']:
# axis=0 means 'average columns'
tuning_curves[neuron] = np.mean(data[neuron], axis=0)
plt.plot(data['stim'], tuning_curves[neuron])
plt.xlabel('Direction (degrees)')
plt.ylabel('Mean firing rate')
plt.title('Tuning curve for ' + neuron)
plt.show()
# Continued from Question 7:
# We have reason to suspect that one of the neurons is not like the
# others. Three of the neurons are Poisson neurons (they are
# accurately modeling using a Poisson process), but we believe that
# the remaining one might not be.
# Which of the neurons (if any) is NOT Poisson?
# Hint: Think carefully about what it means for a neuron to be
# Poisson. You may find it useful to review the last lecture of week
# 2. Note that we give you the firing rate of each of the neurons, not
# the spike count. You may find it useful to convert the firing rates
# to spike counts in order to test for "Poisson-ness", however this is
# not necessary.
# In order to realize why this might be helpful, consider the fact
# that, for a constant a and a random variable X, E[aX]=aE[X] but
# Var(aX)=a^2 * Var(X). What might this imply about the Poisson statistics
# (like the Fano factor) when we convert the spike counts (the raw
# output of the Poisson spike generator) into a firing rate (what we
# gave you)?
# solution: Poisson distributions have variance equal to mean (Fano
# factor = 1). we could compute the mean spike count for each neuron's
# column/stimulus, then compute its variance. repeat for all
# columns/stimulus and plot. a true Poisson process should regress
# into the identity function:
T = 10 # in seconds
for neuron in ['neuron1', 'neuron2', 'neuron3', 'neuron4']:
mean_spikes = T * np.mean(data[neuron], axis=0)
spikes_variance = (T**2) * np.var(data[neuron], axis=0)
plt.scatter(mean_spikes, spikes_variance)
plt.xlabel('Mean number of spikes')
plt.ylabel('Variance of number of spikes')
plt.title('Fano test for ' + neuron)
plt.show()
# Finally, we ran an additional set of experiments in which we exposed
# each of the neurons to a single stimulus of unknown direction for 10
# trials of 10 seconds each. We have placed the results of this
# experiment in the following file:
import pickle
with open('pop_coding_3.4.pickle', 'rb') as f:
# pop_coding contains four vectors named r1, r2, r3, and r4 that
# contain the responses (firing rate in Hz) of the four neurons to
# this mystery stimulus. It also contains four vectors named c1, c2,
# c3, and c4. These are the basis vectors corresponding to neuron 1,
# neuron 2, neuron 3, and neuron 4.
data2.keys()
# dict_keys(['c1', 'r1', 'r3', 'c2', 'c3', 'r2', 'c4', 'r4'])
# Decode the neural responses and recover the mystery stimulus vector
# by computing the population vector for these neurons. You should use
# the maximum average firing rate (over any of the stimulus values in
# 'tuning.mat') for a neuron as the value of rmax for that
# neuron. That is, rmax should be the maximum value in the tuning
# curve for that neuron.
v = np.array([0,0], dtype='float64')
# recall that a v̂_{pop} is the linear combination of vectors:
# ∑_a (r_a/max(max(r_a))ĉ_a
# in our case we are given many r_a's (many responses per neuron), so
# we create a population vector for each response, then average them
# to obtain a final population vector.
for a in ['1', '2', '3', '4']:
v += (np.average(data2['r'+a]) /
max(tuning_curves['neuron'+a])) * data2['c'+a]
# # equivalent to:
#
# v = np.zeros((2, 10), dtype='float64')
# for i in range(0, 10):
#     for a in ['1', '2', '3', '4']:
#         v[:,i] += (data2['r'+a][i] /
#                    max(tuning_curves['neuron'+a])) * data2['c'+a]
# v = np.average(v, axis=1)
# What is the direction, in degrees, of the population vector? You
# contain the value only (no units!) and should be between 0∘ and
# 360∘. If your calculations give a negative number or a number
# greater than or equal to 360, convert it to a number in the proper
# range (you may use the mod function to do this).
# You may need to convert your resulting vector from Cartesian
# coordinates to polar coordinates to find the angle. You may use the
# atan() function in MATLAB to do this. Note that the the convention
# we're using defines 0∘ to point in the direction of the positive
# y-axis and 90∘ to point in the direction of the positive x-axis
# (i.e., 0 degrees is north, 90 degrees is east).
# we start from 90 because that's our reference, and substract from
# there in order to rotate clock-wisely:
(90 - round(atan(v[1]/v[0]) * (180/pi))) % 360  # atan works in radians


## • mechanism: neuron models

• neuroelectronics

• membranes
• ion channels
• wiring
• simplified neuron models

• basic dynamics of neuronal excitability
• neuronal geometry

• dendrites and dendritic computing

### • membrane

#### • bare phospholipid bi-layer

good dielectric (capacitor), but still allows some charge to pass through (paralleled by a resistor), mostly because of embedded channel proteins.

RC circuit:

         ----/\/---
|        |
inside --|        |--- outside
|        |
----||----


from Kirchhoff's conservation of current:

from Ohm's law and the definition of capacitance ( )

#### • rest potential

• because of the active work of pumps, in the inside and in the outside are under osmotic force to homogenize in the direction of their respective concentration gradients (voltage source).
• until opposed by electrostatic forces.

this results in Nernst equilibrium:

where is the Boltzmann constant, T is the temperature, q is the ionic charge and z number of charges.

this means that the current that manages to flow through the resistor in our RC model must also go through the electric potential created by the ion battery; i.e., not all the voltage drop is due to the resistor:

                 |
--/\/--||---
|       |  |
inside --|          |--- outside
|          |
----||------


multiplying by R on both sides:

and let τ=RC and be the whole steady potential. note that when (no current) ; and when constant, .

with these conditions we can solve the differential equation:

• ; during the capacitors rising phase.
• ; during discharge.

#### • ion channels

recall their diversity:

• voltage-gated
• chemically-gated:
• transmitter-gated (synaptic):
• dopamine
• GABA
• serotonin
• glutamatergic:
• AMPA
• NMDA
• etc.
• -gated.
• mechanically-gated
• thermally-gated

we will be focusing on voltage-gated channels.

let g = 1/R be the conductance. Ohm's law becomes:

we can break down the general voltage source from our previous circuit into equilibrium (rest) potentials for different ion species:

ion equilibrium potential (E_i)
Na +050 mV
Ca +150 mV
K -080 mV
Cl -060 mV

these will be modelled with parallel paths.

from our ion-specific cable model for K, Na and leaks (L):

                 |
--/\/--||---
|  L    |  |
|          |
|       |  |
--/\/--||---
|  K    |  |
|          |
|      |   |
--/\/--||---
|  Na  |   |
inside --|          |--- outside
|          |
----||------


### • action potential

there's a limit to how much charge can move into the membrane without losing the capacity to stay in equilibrium. we call it the action potential.

where does the non-linear property that makes irrecoverable excitability (our good old activation function g()) lie?

conductances aren't constant. the resistance imposed by channels changes as voltage changes. we need functions describing them.

functional diagram of a voltage-gated channel:

    :========= =========:
:========= =========:
:========= =========: / <-- anchoring subunit
:========= =========:/
(___(voltage sensor)___)
\
\  <-- gate, activated by sensor
____________________\_
-(                      )
:========= =========:\
:========= =========: \
:========= =========:


#### • K dynamics

we will be using kinetic models for the opening of channels:

• P(gate is open) increases with a depolarised neuron ( flows away after spike).
• gate contains 4 moving subunits. , for independent subunits.4

we go for a Bernoulli probability distribution, n = P(open). transitions between states occur at voltage-dependent rates (closed → open) and (open → closed). we don't know the function of voltage describing this probability, but we can write the function describing its change in continuous time:

Hudgkin-Huxley-1:

that is, the function that describes the probability is given by how much is added to the open state (proportional to the closed state, times the voltage-varying rate α) minus how much is lost.

rewrite in the form of τ and :

#### • Na dynamics

• 3-subunit gate similar to Ka. P(gate is open) = m increases with potential drop.
• additional subunit can block channel back even if the gate is open, after P(additional is open) = h decreases if potential is too weak, terminating the rising phase during an action potential.
• .4

similarly, HH2:

And for the additional Na-inactivating subunit, HH3 is:

#### • Hudgkin-Huxley model

and from those probabilities we go back to our voltage-dependent channel conductances:

therefore, the diagram's overall current equation, , we rewrite as:

this is expanded with the ion-specific terms into the fourth and last equation in Hudgkin-Huxley's dynamical system:

HH4:

### • simplified models

the model can't be so simple that it will fail to capture the variety of neural codes neurons are capable of (frequency modulation, timing-dependent firing, train patterns, etc.)

#### • integrate-and-fire

we observed that at rest potential the neuron behaves very linearly. assume conductances stay constant and C = 1. this is no other than our good old passive membrane:

or

this is a linear function, and since the slope is negative is a stable fixed point in phase-space:

\           |
\        |
\     |
\  |
------------\-------------
|  \
|     \
|        \
|           \ dV/dt
---> --> -> V0 <- <-- <---


this is good for latent addition, but there's no end or threshold to how much electric potential can drop before we see excitation.

to account for spikes we delimit the function with a that will drive potential to depolarisation.

marks a new fixed point, unstable this time:

\           |                                      /
\        |                                   /
\     |                                /
\  |                             /
------------\--------------------------/---------------
|  \                    /
|     \              /
|        \        /
|           \  /
---> --> -> V0 <- <-- <--- <--- <-- <- V_th -> --> --->


now we need an edge that will mark the end of an increase in depolarisation and a return to a at the other edge of the graph.

        \           |                                      /
\        |                                   /
\     |                                /
\  |                             /
--------------------\--------------------------/--------------------
|  \                    /
|     \              /
|        \        /
|           \  /
V_reset ---> --> -> V0 <- <-- <--- <--- <-- <- V_th -> --> ---> V_max


in cortical neurons this has been fitted against quadratic functions, or using an exponential for the second half where acceleration is positive (effectively capturing the faster dynamics of depolarisation) in what is known as the exponential integrate-and-fire neuron:5

where parameter Δ controls how fast the exponential grows.

        \           |                      |
\        |                     |
\     |                    |
\  |                   |
--------------------\------------------|--------
|  \              /
|     \          /
|        \      /
|           \  /
V_reset ---> --> -> V0 <- <-- <--- <-- V_th ---> V_max


#### • theta neuron

by thinking of potential as a cyclic phenomenon we can simplify and into a single state, called :6

#### • two-dimensional models

what if instead of an abrupt reset we had another decrease leading to yet another (stable) fixed point?

        \           |                      |
\        |                     | |
\     |                    |   |
\  |                   |     |
--------------------\------------------|-------|-
|  \              /         \
|     \          /           \
|        \      /
|           \  /
V_reset ---> --> -> V0 <- <-- <--- <- V_th ---> V-K <- <--


we will draw inspiration from the Hudgkin-Huxley model bringing in a second variable, G(u), to take care for inactivation. with G(u) providing negative values for large potentials can be brought to zero again; mimicking the role of the Na gate's second inactivating subunit and the delayed role of the K channel.

this new variable is in turn a function, whose change is contingent upon electric potential. we want to go hand-in-hand with potential:

##### • simple™ model

modelled after a subset of the previous phase plane, near the intersection of nullclines. F(V) is chosen to be quadratic, G(u) is simply -u. on the second equation, -u + H(V) gains some weights:

given a specific quadratic function, two extra parameters (in addition to a (u's decay) and b (u's sensitivity)) are used to reset v and u after reaching . these are called c (V's original value) and d (u's original value).

these dynamics are enough to capture a variety of codes:

• regular (as in repetitive) spiking: | | | | | |
• intrinsically bursting: ||| | | | |
• fast spiking: |||||||||||
• low-threshold spiking: ||||| | | |||||

• intrinsically bursting with chattering intervals: |||| ||| ||| |||
• thalamocortical: | | | | | ... |||| | | |

• resonator: mmm||||||

### • realistic dendritic geometry

our previous models only really capture computation at and near the soma or axon. dendritic arbors have computationally-significant effects.

nobody really knows what the appropriate level of description is, if any.

for any passive membrane the following stochastic properties hold:

• effects of traveled distance:

• impulse amplitude decays. that is, spikes shorten.
• frequency decays too. that is, spikes broaden.

• strength (ΔV) is inversely proportional to thickness.

this affects the contribution of individual inputs coming from different dendrites.

#### • cable theory

V is now a function of both time and space. off-the-bat, we are dealing with partial differential equations. run cables/resistors on both inside and outside to connect membrane patches, so as to model local currents produced by potential along the cable.

---------------> a = radius
|        /              \
|        \  ----/\/---  /
|        /  |        |  \
| inside \--|        |--/ outside
|        /  |        |  \
|        \  ----||----  /
|        /              \
|        /              \
|        \  ----/\/---  /
|        /  |        |  \
| inside \--|        |--/ outside
|        /  |        |  \
|        \  ----||----  /
|        /              \
v
x = length


generally we take the external resistance to be zero.

our old membrane's Kirchhoff equality is symmetric to the second length-derivative of voltage:

which we rearrange to visualize time and space constants:

potential diffuses exponentially, starting from the point where current is injected (let's call it x=0), according to . half length constants away from the centre, the voltage signal is attenuated by a factor of 0.4. one constant away it's 0.2; etc.

the arrival of peaks at different points gives us the velocity of propagation, which naturally turns out to be the ratio of the length and time constants, multiplied by two: .

here's what the general solution to the differential equation looks like:

this is summed over different locations and times to come up with our actual filter on the input, as seen on week 2.

#### • compartmental models

simple cable theory doesn't get us very far modelling truly realistic neurons anyway, because the it isn't taking channels into account, which also exist in dendrites. analytical solutions become intractable.

compartments are a simpler approximation to cable theory which gets rid of our dependence on x, although it's still a passive model.

1. it's a divide-and-conquer strategy that discretizes dendritic trees into levels.
2. for a binary tree, when the sum of the diameters of siblings and parent obey the relation:

3. then the subtree can be approximated with a single cable segment the diameter of the parent, extended by the effective electronic length of the children. the final result will invariably be a single neurite.

#### • active models

1. for a given geometry and function of channel density (yes!, channel density variability can have important effects too), compartmentalize into segments of approximately constant properties.
2. model each segment according to your favorite model (Hudgkin-Huxley, e.g.). Input current can come from either: parent, sibling or children: uphold Kirchhoff's when thinking about input potential/current.
3. couple segments into a tree, putting resistors between segments to account for drops. these need not be direction-invariant (i.e. there will be two diode-guarded resistors per segment connection).

#### • dendritic computation

there are many interesting ideas floating around:

• lowpass filter, attenuation.
• where the inputs enter the dendrite can have an impact:

• inputs at different places of the same segment could compute logical conjunction, whereas the parent segment computes disjunction on its children.
• analogically, this means the ability to segregate or amplify signals.
• plasticity: Ca-mediated back-propagating action potentials could be the driving force behind plasticity, based on coincidence detection (mentioned in the introduction).

• hippocampal neurons are known to display synaptic scaling: the ability to make inputs "commutative", or position-independent.

examples:

• sound localisation: according to the Jeffress model, coincidence detector neurons could be taking advantage of different cable lengths (time-dependent code) to encode something like left-right position. because sounds coming from the left arrives with a delay to the right ear, a soma closer to the right with dendrites of different length coming from each ear could perceive both inputs at the same time, firing only to those sounds. other neurons with different dendritic lengths would detect other sound positions.
• direction selectivity on LGN/V1 receptive fields (and elsewhere) could be due to the order in which inputs enter the same dendrite. a timely excitation would add to the signal as it travels down; whereas de-synchronized stimuli would see their individual signals fade away. in other words, the sequence in which pre-synaptic neurons touch the post-synaptic one encodes the sequence the post-synaptic neuron is trained to detect.

## • mechanism: networks

### • chemical synapse

1. action potential travels down the axon at the pre-synaptic neuron,
2. opens channels at the terminal,
3. which in turn causes neurotransmitter vesicles to throw their contents into the synaptic cleft.
4. after migration via diffusion, neurotransmitters bind to glycoprotein receptors at the post-synaptic neuron.
5. ligand-gated channels (also known as ionotropic receptors) open, either from the direct action of neurotransmitters or as a consequence of the chain reaction started at metabotropic (aka G-protein-coupled) receptors.
6. this causes either further depolarisation or polarisation at the post-synaptic neuron, depending on chemical identity.

• excitatory: channels open or some other positive ions enter the cell. common mediating neurotransmitters are:

• acetylcholine: common in neuromuscular junctions.
• glutamate: CNS. binds to NMDA, AMPA and kainate receptors.
• serotonin: parasympathetic functions
• histamine
• inhibitory: leaves out or comes in.

• GABA-a, GABA-B. receptors are pentamers which exist in many compositions and conformations.
• glycine: non-exclusively found in the spinal cord and retina
7. neurotransmitters detach from the post-synaptic dendritic spine and are reabsorbed.

#### • modelling

given the specific membrane capacitance ( ), specific membrane resistance ( ) and area A; and . Notice how surface area becomes irrelevant when determining τ:

anyway, as with Na and K channels which are modelled through resistors, our neuron circuit model will need yet another conductance per synapse (with its own equilibrium potential ):

for an excitatory synapse, , and vice-versa.

similarly to the Hudking-Huxley model, the synaptic conductance is a probabilistic function specified by a differential kinetic model:

assume , i.e. vesicles are always available for release. as for , let and be the rate at which closed channels open and open channels close, respectively. therefore:

evidence shows that is best modelled using dissipating exponential function for AMPA synapses, whereas the slightly delayed rise in probability in something like GABA-A is better-suited to a so-called "alpha" function, . is an empirical parameter that will vary with chemical.

P: 1.0  |
.8  | \
.6  |   \
.4 |     \_
.2 |        \ _
0.0 |            \  _
0  5 10 15 20 25 :t (ms)
alpha function


in order to model the cumulative effect of a synapse train coming from the pre-synaptic neuron we recur to our old linear filter. this we call the response function, rho ( ).

example of a final aggregate synaptic conductance for AMPA:

imagine the following spike train arriving at the synapse, and its corresponding filtered conductance composed of alpha or exponential functions pieced together:

train:        | |   | | |
---------------> time
|\
|\| \
|\  |\|    \
|\|  \|       \
conductance:  |              \
---------------> time


### • networks

now that we know how to feed a neuron's output into another neuron and convert that input into yet more action potentials given by , the next simplest step is to experiment with 2-neuron networks. the following are common properties for a simple feedback loop system with homogeneous synaptic sign:

nature of synapses behavioural pattern
excitatory alternating activation (n1, n2, n1, n2, ...)
inhibitory synchrony7 (n1+n2, n1+n2, n1+n2, ...)

1. Not to be confused with ephaptic transmission, in which local ion currents from one cell's membrane manage to create a significant effect on the membranes of nearby cells.

2. http://www.jneurosci.org/content/19/18/8036.full

3. Berry, Chichilnisky.

4. I'd like to thank Prof. Markus Müller for pointing out to me that the exponents of probability functions were determined using curve fitting and are unrelated to the number of molecule subunits.

5. Fourcaud-Trocmé, Hansel, van Vreeswijk, Brune.

6. Ermentrout, Kopell.

7. This demands a deeper explanation.