Computational Neuroscience  University of Washington
(These are my own notes from Coursera's "Computational Neuroscience", as taught by the University of Washington. I'm really enjoying the lecture contents, challenging assignments, level of mathematical treatment, and the otherwiserare feeling of superlative transcendence that comes with science. It's almost like half the math I learned in college had been dormant, waiting for a nonmath professor who knew how to apply it to their subject. This is also the last properly capitalized paragraph you will see on this page. WIP.)
• overview
goal of computational/theoretical neuroscience: explaining the mind (mostly behaviour at this point) in terms of brain computation.
computational/theoretical neuroscience: tools and methods for:
 characterizing what nervous systems do
 determining how they achieve it
 understanding why they operate like that
kinds of models in theoretical neuroscience by purpose:
 descriptive (what): quantitative description of stimuli encoding and decoding
 mechanistic (how): detailed simulation of neurons, networks, etc.
 interpretive/normative (why): purpose (function) and underlying principles behind function.
• types of models
• descriptive
example: in Hubel and Weisel's discoveries in the primary visual field, the frequency at which action potentials occur in singlecell recordings is a function of place and orientation of the stimulus:
$$freq=f(orientation,place)$$
receptive field: specific properties of a sensory stimulus that maximize response.
 retinal ganglion (efferent) cells and thalamus' Lateral Geniculate Nucleus. centersurround receptive fields only, i.e:
 + + ++  +
 V1 simple cells. oriented receptive field, e.g:
 + ++++++++ ++  ++ ++ +
• mechanistic
V1 simple cell model suggested by Hubel & Wiesel (doesn't take recurrent input into account):
LGN cell 1 \ LGN cell 2 > V1 simple cell ... LGN cell n /
• interpretive
why are receptive fields in V1's simple cells shaped that way? what is their diversity for?
efficient coding hypothesis: evolution favours representing images as faithfully and efficiently as possible.
on V1 simple cells, given image $I$ and receptive fields $R{F}_{i}$ , we can try to reconstruct it ( $\hat{I}$ ) using neural response strenghts (multipliers) ${r}_{1},{r}_{2},...$ like this:
$$\hat{I}=\sum _{i}R{F}_{i}*{r}_{i}$$
question: what are the $R{F}_{i}$ that minimize squared pixelwise errors between $I$ and $\hat{I}$ , and are as independent as possible?

start with random $R{F}_{i}$

run an efficient coding algorithm (sparese coding, Independent Coding Analysis, predictive coding)
• neurobiology
• neuron
Some types of neurons:
 pyramidal: most common type, cortical building block
 Purkinje: multilevel/branching dendrites
 ganglion retinal cells, amacrine cells
neuron doctrine:
 functional unit of the nervous system
 neurons are (mostly) discrete
 signals travel from dendrines to axon
simple idealized descriptive model:
$$f=\{\begin{array}{cc}\hfill 1\hfill & \hfill \iff \sum _{i}synaptic\_strengt{h}_{i}*{x}_{i}>threshold\hfill \\ \hfill 0\hfill & \hfill otherwise\hfill \end{array}$$
where $synaptic\_strength\in \mathbb{R}$ .
ion channels are variously gated, (like transistors):
 voltagegated (membrane, gap junctions)
 chemicallygated (neuroreceptors at postsynaptic cleft, smelltaste sensors)
 mechanicallygated
• synapse
synapse: connection or junction between two neurons.

electrical:^{1}
 uses gap junctions (shared channels)
 fast. ideal for synchronization between neurons, sympathetic reactions

chemical:
 uses neurotransmitters
 thought to be more plastic, because postsynaptic cell can independently change its dendritic membrane depending on input, history, etc. hypothetically ideal for learning.
synapse doctrine: synapses are the basis for learning and memory
• plasticity

hebbian: if neuron A repeatedly fires to neuron B, then their synaptic strength is strengthened. neurons that fire together, wire together.
before: A_{t1}  *.01 > B_{t1} after: A_{t2}  *.1 > B_{t2}

LTP/LTD:
 experimentally observed change in synaptic strength
 lasts a few hours or days
 depends on relative timing of input and output spikes
if pre predicts post, post increases the strength (the smaller the delta, the more excitatory the synapse). if pre fires after post, post reduces the strength (the smaller the delta, the more inhibitory the synapse):
$$\Delta strength=1/\Delta timing$$
• nervous system

PNS
 somatic: modal afferent and voluntary efferent

autonomic

sympathetic: fight or flight

parasympathetic: sleep or breed

enteric: digestion

CNS:
 spinal chord

encephalon/brain

rhombo/hindbrain
 medulla oblongata (bulbo raquídeo): breathing, muscle tone, hemobarycs
 pons: connects to cerebellum. sleep and arousal
 cerebellum: equilibrium, PNSsomatic calibration, language?, attention?

mesen/mid
 visual and auditory reflexes, saccadic movement
 reticular formation: reflexes, pain, breathing, sleep and arousal

thalamus
 relay station for everything modal except smell

procen/cerebrum
 basal ganglia
 hyppocampus
 amygdala
 cortex: all of higher cognition, subjectivity
• cortex
6 layers of neurons, relatively uniform in structure. e.g.:
layer  function 

1  input from "higher" receptive fields 
2  output to " 
3  " 
4  input from subcortical regions 
5  output to " " 
6  " 
• description: encoding
recording techniques:
 fMRI: records aggregate changes in magnetic field resulting from
hemodynamics
 accuracy: millions of neurons, varies with voxel resolution
 time scale: seconds
 noninvasive
 in full 3D
 great for differential localisation (dissociation) of function localisation
 EEG: records aggregate changes in electric field resulting from
neural activity itself
 accuracy: rather noisy. confined to skull surface
 time scale: faster than fMRI
 electrode array (singleunit recording(s)): roughly targets single neurons
 calcium imaging: single neurons too, based on fluorescence changes resulting from Ca binding.
 patch clamp: inserts into cell
 maximum accuracy. requires less amplification
 good for mechanistic models of single cells, ions, pumps, etc.
raster plot: spike plots successively stacked
encoding: from stimulus parameters to pattern of responses
$$P(responsestimulus)$$
decoding: working back stimulus parameters from response
$$P(stimulusresponse)$$
examples:

in V1 oriented (angle = θ) receptive fields: $Hz=Gaussian(\theta )$

motor cortex: $Hz=cos(\theta )$
functional maps: the feature selectivity of adjacent tissue can be represented with an spatial field, where each point is associated with the values of an stimulus that maximise its response (i.e. it's receptive field)
one map per parameter: position, orientation, grammatical categories, images of both Brad Pitt and Jennifer Aniston, the concept of Dracula: anything related to him (image, audio, words), etc.
even worse: higherorder representations can feed back into the stimuli upstream, shaping the way things are perceived even at very rudimentary levels.
• simple models
$$P(r(t)s)$$ where $r(t)=response$ , $s=stimulus$ .
• linear
 most recent stimulus only:
$$r(t)=\u0278s(t)$$
where the ɸ is synaptic strength.
or to account for a short delay from stimulus to response:
$$r(t)=\u0278s(t\tau )$$
 linear with temporal filtering:
a more realistic model will depend on a linear combination of many recent inputs:
$$r(t)=\sum _{k=0}^{n}{f}_{k}{s}_{tk}$$
note how constant $\u0278$ turned into weight function $f$ .
in infinitesimal form:
$$r(t)={\int}_{\infty}^{t}f(\tau )s(t\tau )\phantom{\rule{0.278em}{0ex}}d\tau $$
this is a convolution of $f(\tau )$ and $s(t\tau )$ , where the former acts as the linear filter.

leaky filter/integrator:
$f(\tau )$ decreases the more you look into past synapses, usually exponentially.
example: linear filtering applied to centersurround visual fields, where distance in time is replaced by distance in space. the discrete model thus becomes:
$$r(x,y)=\sum _{x\prime =n,y\prime =n}^{n}f(x\prime ,y\prime )s(xx\prime ,yy\prime )$$
in addition to leaking, we can think of the phenomenon of lateral inhibition in retinal ganglionar cells to construct our filter $f$ . this is usually approximated as a difference of a normal distribution and another negative, shallower gaussian. like Sobel's filter this is an edge detector.
• nonlinear activation function
finally, we can think of an extra activation function that will map the filter convolution to the actual firing rate:
s(t) > f(τ) > g() > r(t)
$$r(t)=g(\int f(t\tau )f(\tau )\phantom{\rule{0.278em}{0ex}}d\tau );$$ where $g()$ is the activation function (also called input/output function, or transfer function in ANNs), usually with sigmoid shape.
• dimensionality reduction
or how to regress a good filter. as seen in the last example, the stimulus could easily contain an intractable number of parameters.
• reverse correlation: spiketriggered average

convert $s(t)$ to discrete vector: a function of a single variable can be thought of as a single point in space:

each application (an instant in the spike trail) is a dimension.
 the value of that component is the value of that function/stimulus at that point/time:
$$s(t)=\left(\begin{array}{c}\hfill {s}_{t1}\hfill \\ \hfill {s}_{t2}\hfill \\ \hfill {s}_{t3}\hfill \\ \hfill ...\hfill \end{array}\right)\u22a2P(r{s}_{1},{s}_{2},...,{s}_{n})$$

one method uses Gaussian white randomness as $s(t)$ to probe a reaction, and approximate what is common among trials.

a random prior stimuli distribution is constructed from multiple such trials of decomposed random $s(t)$ 's:
per the central limit theorem, this distribution of stimuli will also be gaussian at each dimension, at whatever new dimensions are chosen from a basis change/linear combination; and indeed at the whole space (picture a normal distribution on a plane, or on 3D).
$$s1(t)=\left(\begin{array}{cccc}\hfill nrand({)}_{t1}\hfill & \hfill nrand({)}_{t2}\hfill & \hfill nrand({)}_{t3}\hfill & \hfill ...\hfill \end{array}\right)$$
$$s2(t)=\left(\begin{array}{cccc}\hfill nrand({)}_{t1}\hfill & \hfill nrand({)}_{t2}\hfill & \hfill nrand({)}_{t3}\hfill & \hfill ...\hfill \end{array}\right)$$
$$s3(t)=\left(\begin{array}{cccc}\hfill nrand({)}_{t1}\hfill & \hfill nrand({)}_{t2}\hfill & \hfill nrand({)}_{t3}\hfill & \hfill ...\hfill \end{array}\right)$$
$$...$$

pick the cluster of points that caused $r(t)$ to fire, called the spikeconditional distribution

and calculate their spiketriggered average point.

then pick some dimension that passes through that point and project the original spikeconditional distribution onto them.

the spiketriggered average vector is then taken to be a unit vector. $avg(s)$ is our weight filter/feature detector/convolution $f(\tau )$ .
geometrically, it is also a projection. so
$$r(t)={\int}_{\infty}^{t}f(\tau )s(t\tau )d\tau =\hat{f}\xb7\hat{s}$$
• activation function
or how to regress a good g() for our f().
remember that $g()$ is the last step in producing $r(s)$ , and that encoding $r(s)$ in its most general form amounts to calculating $P(spikestimulus)$ . so we start from the equation:
$$r(s)=g()=P(spikestimulus)$$
$P(spikestimulus)$ has become $P(spikeavg(s))$ after simplifying the multidimensional stimulus. so what's the new probability under this subset?
$$P(spikes\prime )=\frac{P(spike\cap s\prime )}{P(s\prime )}$$
expanded into bayesian form:
$$\frac{P(spike\cap s\prime )}{P(s\prime )}=\frac{P(s\prime spike)P(spike)}{P(s\prime )}$$
$P(s\prime )$ simply has a normal probability distribution for the whitenoise experiment:
. . . . . . . . . . . . . .
the histogram of $P(s\prime spike)$ is obtained only from those pointvalues in the stimulus that coincide in time with a spike. let's say those are mostly peaks in the stimulus signal. the distribution would look skewed like this:
. . . . . . . . . . . . .
therefore, $g()=\frac{P(s\prime spike)}{P(s\prime )}$ :
.... . . ...
$P(spike)$ is just calculated empirically by counting during the random experiment.
if $P(s\prime spike)$ had also a normal distribution (like $P(s\prime )$ ), the ratio of both functions (as used in Bayes' formula) would yield a constant function. i.e., the activation function would look flat. what we actually want is both distributions to be different.
• maximally informative filters (again)
could we work backwards from $s\prime $
and $spike$ so that the distributions $P(s\prime spike)$ and $P(s\prime )$ are as dissimilar as posible?
one measure for such a difference is KullbackLeibler divergence:
$${D}_{KL}(P(s),Q(s))=\int \frac{P(s)lo{g}_{2}(P(s))}{Q(s)}\phantom{\rule{0.278em}{0ex}}ds$$
an advantage of this over STA is that the prior $P(s\prime )$ needs not be gaussian random noise. it might as well be natural stimuli.
• Generalized Linear Models
 with a refractory period filter:
(s) > (f) > (+) > (g) > (Poisson) > r ^  _(refractionary filter)_
 with coupling (neighbour neurons) filters, ephaptic transmisison, etc.
• description: decoding
• signal detection theory (hypothesis testing)
likelihood ratio test:
$$1<\frac{P(neural\mathrm{responseobserved}behaviour)}{P(neural\mathrm{responsenot(observed}behaviour))}$$
NeymanPearson lemma: the likelihood ratio test is the most efficient statistic for any given size.
maximum lakelihood with asymmetric costs:
type 1 and type 2 errors may not cost the same. we introduce a weighting factor $L$ :
$$Los{s}_{yes}={L}_{yes}P(noresponse)$$
$$Los{s}_{no}={L}_{no}P(yesresponse)$$
we should clearly answer yes when $Los{s}_{yes}<Los{s}_{no}$ . I.e, ${L}_{yes}P(noresponse)<{L}_{no}P(yesresponse)$ . rewriting using Bayes' rule yields:
$${L}_{yes}\frac{P(no,r)}{P(r)}<{L}_{no}\frac{P(yes,r)}{P(r)}$$
$${L}_{yes}\frac{P(rno)P(no)}{P(r)}<{L}_{no}\frac{P(ryes)P(yes)}{P(r)}$$
$${L}_{yes}P(rno)P(no)<{L}_{no}P(ryes)P(yes)$$
...and arranging for the ratio of conditional distributions to obtain the likelihood ratio:
$$\frac{P(ryes)}{P(rno)}>\frac{{L}_{yes}P(no)}{{L}_{no}P(yes)}$$
• population coding
methods for decoding from populations of neurons:
 population vector

bayesian inference
 maximum likelihood
 maximum a posteriori
• population vector
example: cricket cercal cells in the antennalike surcus structures in the abdomen are sensitive to wind velocity.
how does the group as a whole communicate wind velocity to the animal?
there are 4 overlapping receptive field distributions, corresponding to the 4 directions relative to the body:
f/r_max         ++++++ 45 135 225 315 degrees
given wind direction (in degrees) $s$ and a neuron's preferred direction ${s}_{a}$ , response looks like: (after normalising for ${r}_{max}$ , because some neurons have an intrinsically higher firing rate, even for equally maximal stimuli)
$${\left(\frac{f(s)}{{r}_{max}}\right)}_{a}=[cos(s{s}_{a}){]}_{+}$$
or in terms of vectors $v$ (wind vector) and ${c}_{a}$ (neuron's preferred vector):
$${\left(\frac{f(s)}{{r}_{max}}\right)}_{a}=[v\xb7{c}_{a}{]}_{+}$$
then, the population vector is calculated as:
$${v}_{pop}=\sum _{a=1}^{4}{\left(\frac{r}{{r}_{max}}\right)}_{a}{c}_{a}$$
note how 4 orthogonal vectors are used, even though 2 in linear combinations would suffice to generate the whole plane. this is due to the lack of negative firing rates to represent wind in the opposite direction, or its component along that direction.
this method is prevalent in brainmachine interfaces targeting the M1 area.
for a sufficiently large number of directions:
$${v}_{pop}=\sum _{a=1}^{N}(v\xb7{c}_{a}){c}_{a}$$
• bayesian inference
 more general than population vector
 takes more information into account
 less vulnerable to noise
recall Bayes' rule for stimulus $s$ and response $r$ :
$$P(sr)=\frac{P(rs)P(s)}{P(r)}$$
we could also rewrite the marginal distribution $P(r)$ to yield:
$$P(sr)=\frac{P(rs)P(s)}{\int \phantom{\rule{0.278em}{0ex}}dsP(rs)P(s)}$$

maximum likelihood distribution: this bayesian decoding strategy tries to find the stimulus which maximizes the likelihood conditional distribution $P(rs)$ .

maximum a posteriori distribution: this decoding strategy tries to find the stimulus which maximizes the a posteriori (LHS) conditional distribution $P(sr)$ .
MAPD is influenced by the prior $P(s)$ , because all factors in Bayes equation are taken into accout.
example: imagine a population of neurons that encode stimulus/parameter $s$ , with response $r=f(s)$ a Gaussian probability distribution, and mean/maximum at ${s}_{1}$ .
meanwhile, we have each individual neuron's Gaussian responses scattered along $s$ and overlapping each other:
r=f(s)           ++++++++ n1 n2 s_1 n3 nn stimulus
assume the following:
 each neuron fires independently

variability in firing is due to a Poisson process, so $variance=mean\mathrm{firing}rater$

the distribution of distributions is not gaussian, but uniform. i.e., there's plenty of singleneuron distributions across the range of $s$ .
spikes are produced randomly and independently in each time bin, with a probability given by the instantaneous rate. from Poisson:
$${P}_{T}(k)=\frac{(cT{)}^{k}{e}^{cT}}{k!}$$
where:
 $k$ is the number of occurrences for which ${P}_{T}()$ is going to be calculated for time interval $T$ .
 $c$ is the constant mean firing rate per time unit.
 $T$ is the time interval.
substituting for the variables in our example setting, the probability of seeing ${r}_{a}$ spikes for a single neuron $a$ becomes:
$$P({r}_{a}s)=\frac{({f}_{a}(s)T{)}^{T{r}_{a}}{e}^{{f}_{a}(s)T}}{(T{r}_{a})!}$$
notice how we write the specific number of occurrences $k$ during interval $T$ simply as ${r}_{a}$ times $T$ .
also, the constant $c$ is replaced by our previous probability function ${f}_{a}$ , and the whole equation is accordingly changed to a conditional distribution of stimulus $s$ (that is, our conditional distribution is a bunch of Poisson formulas, one per slice at this neuron's Gaussian).
so the whole population's ( $\hat{r}=({r}_{1},{r}_{2},...,{r}_{n})$ ) distribution can be written down as the product of the independent distributions:
$$P(\hat{r}s)=\prod _{a=1}^{N}\frac{({f}_{a}(s)T{)}^{T{r}_{a}}{e}^{{f}_{a}(s)T}}{(T{r}_{a})!}$$
• maximum likelihood
the previous formula is often written in logarithmic form, so as to avoid rounding products and the exponential, in what is known as loglikelihood:
$$ln(P(\hat{r}s))=\sum _{a=1}^{N}T{r}_{a}ln({f}_{a}(s)T){f}_{a}(s)Tln((T{r}_{a})!)$$
which we try to maximise:
$$0=\frac{\partial}{\partial s}ln(P(\hat{r}s))=T\sum _{a=1}^{N}{r}_{a}\frac{f\prime (s)}{{f}_{a}(s)}$$
recall that $f(s)$ is the normal distribution:
$$0=\sum _{a=1}^{N}{r}_{a}\frac{\frac{1}{{\sigma}_{a}\sqrt{2\pi}}\frac{s{s}_{a}}{{\sigma}_{a}^{2}}{e}^{1/2{\left(\frac{s{s}_{a}}{{\sigma}_{a}}\right)}^{2}}}{\frac{1}{{\sigma}_{a}\sqrt{2\pi}}{e}^{1/2{\left(\frac{s{s}_{a}}{{\sigma}_{a}}\right)}^{2}}}$$
$$\u22a20=\sum _{a=1}^{N}{r}_{a}\frac{s{s}_{a}}{{\sigma}_{a}^{2}}$$
$$\u22a2{s}_{f(s{)}_{max}}=\frac{\sum _{a=1}^{N}{r}_{a}{s}_{a}/{\sigma}_{a}^{2}}{\sum _{a=1}^{N}{r}_{a}/{\sigma}_{a}^{2}}$$
$$\u22a2{s}_{f(s{)}_{max}}=\frac{\sum _{a=1}^{N}{r}_{a}{s}_{a}}{\sum _{a=1}^{N}{r}_{a}}$$
think of this as an improved version of the population vector ${v}_{pop}={\sum}_{a=1}^{N}(v\xb7{c}_{a}){c}_{a}$ , because each neuron's contribution is weighted against its variance ${\sigma}_{a}^{2}$ , more informative neurons (less spreadout distribution) will contribute more.
• maximum a posteriori
in logaposteriori form for a single neuron:
$$ln(P(sr))=ln(P(rs))+ln(P(s))ln(P(r))$$
and for the whole population:
$$ln(P(sr))=T\sum _{a=1}^{N}{r}_{a}ln({f}_{a}(s))+ln(P(s))+...$$
optimize setting the derivative to 0:
$$0=\sum _{a=1}^{N}{r}_{a}\frac{f\prime (s)}{f(s)}\frac{P\prime (s)}{P(s)}$$
and after we pour in the gaussian expansions of $f(s)$ :
$${s}_{f(s{)}_{max}}=\frac{T\sum _{a=1}^{N}\frac{{r}_{a}{s}_{a}}{{\sigma}_{a}^{2}}+\frac{{s}_{prior}}{{\sigma}_{prior}^{2}}}{T\sum _{a=1}^{N}\frac{{r}_{a}}{{\sigma}_{a}^{2}}+\frac{1}{{\sigma}_{prior}^{2}}}$$
if you look closer, this is no otherthan the likelihood, scaled by $T$ and shifted by the biasing effect of the prior's deviation. in other words, the more certain the prior (the more pointy its distribution), the more the ${s}_{prior}$ stimulus takes us away from a simple likelihood distribution.
• caveats
 these methods don't incorporate rapid time variations, as they are modelled after tuning curves and mean firing rate.
 don't take into account correlations in the population; i.e., we assumed independence of events. this is an open research enterprise in neuroscience.
• stimulus reconstruction
or how to extend the bayesian methods to the case where stimuli and response vary continuously in time.
suppose we want the best estimator for stimulus ${s}_{bayes}$ . we introduce an error function, $L(s,{s}_{bayes})$ , and average it over all possible stimulus choices:
$$\int ds\phantom{\rule{0.278em}{0ex}}L(s,{s}_{bayes})P(sr)$$
which we will try to minimize. one choice of $L()$ is the least squares error function:
$$0=\frac{\partial}{\partial {s}_{b}}\int ds(s{s}_{b}{)}^{2}P(sr)$$
$$0=2\int ds(s{s}_{b})P(sr)$$
$$\int ds\phantom{\rule{0.278em}{0ex}}sP(sr)=\int ds\phantom{\rule{0.278em}{0ex}}{s}_{b}P(sr)$$
$${s}_{b}=\int ds\phantom{\rule{0.278em}{0ex}}P(sr)s$$
but $P(sr)$ is just another way of saying "all the stimuli that triggered an action potential". in other words, this filter is just the spike triggered average.
example: famous 1999 "mind reading" of a cat's Lateral Geniculate Nucleus.^{2}
here researchers are trying to find $s$ that maximizes the a posteriori distribution $P(sr)\phantom{\rule{0.222em}{0ex}}P(rs)P(s)$ .
the encoding model/likelihood $P(rs)$ needs to be constructed from input movies.
• case study: visual processing in the retina
rods are exquisitely sensitive to the minutest amounts of photons (<0.1% of them contribute signals). averaging would be disastrous under these conditions. how can a noisy neural array reliably detect and relay those signals?
what is the optimal readout of an array of detectors when only a small fraction is active and relevant? this question is pervasive accross attention and decisionmaking research.
we need a selection process/threshold to discard most sources offthebat, based on prior knowledge. once again, probabilistic models for noise and signal are due.
rod > \ rod >rod bipolar /  rod >  v amacrine   cone > cone bipolar> ganglion >
anatomically, such a threshold in nonlinearity is thought to occur right at the integrating synapse from rods to rod bipolar cells. no other candidate is left after the bipolar consumes all rod output.
for similar sample sizes of noise and signal, recordings show that the threshold is biased for type1 errors: even most true positives get rejected, unless the probability of the noise distribution is negligibly small.
P(r)   ++ ++++ noise signal Amplitude threshold: .........^
however, under natural conditions the probability of seeing a photon is minuscule compared to the probability of seeing noise. priors matter!
P(r)     ++ noise signal Amplitude threshold: ........^
• description: codec evaluation
at any given time bin in a spike train the probability of seeing an event $1$ is given by $P(x=1)=p$ , and the probability of its complement is given by $P(0)=1p$ . their respective information values are $lo{g}_{2}(p)$ and $lo{g}_{2}(1p)$ .
entropy counts the yes/no questions necessary to specify a variable's distribution.
the encoding capacity of a spike train to account for variability in its counterpart (ability to represent) is quantified using entropy.
theorem: uniform probability has the highest entropy of all bernoulli distributions:
$$H=\sum B(X)log(B(x))=p\phantom{\rule{0.278em}{0ex}}log(p)(1p)log(1p)$$
$H$ , a function of probability $p$ , peaks at $p=1/2$ .
generally $P(x)$ is not uniform, but it would be best if it were.
• mutual information
how much of the response is encoded in the stimulus?
let $Response$ and $Stimulus$ be 2 random variables. noise entropy is defined as the entropy of either of their conditional distributions. for $S$ we have that:
$$H(RS)=E[log(P(RS))];$$
averaged over $S$ . and the mutual information is given by:
$${I}_{m}(R;S)=H(R)E[H(RS)].$$
note that ${I}_{m}(R;S)={I}_{m}(S;R)$ . mutual information is commutative. whatever amount of information is reduced from R by knowing S, it must be the same amount that would be reduced from S if we knew R.
example: we have 2 posible stimuli/responses (yes or no), and 2 possible errors of (presumably) equal probability $q$ .
Confusion matrix of probabilities:
stimulus  response  no response 

yes  1q  q 
no  q  1q 
let's focus on the "yes" stimulus alone. therefore:
 Total entropy: H(R) = P(response)log(P(response))  P(no response)log(P(no response)).
 Noise entropy (hits): H(Ryes) = q log(q)  (1q)log(1q).
 Mutual information: ${I}_{m}(R,yes)=H(R)E[H(Ryes)]$
intuitively, this means that the greater the noise entropy, the less the stimulus encodes the response. but noise entropy is maximum when $q=1q=1/2$ , thereby telling us that when the response is pure chance, the stimulus has no bearing representing it.
when ${I}_{m}(R,s)\to 0$ , $P(RS)\to P(R)$ . I.e. they are independent. when ${I}_{m}(R,s)\to H(R)$ , $P(RS)\to 0$ . I.e., response is perfectly encoded by stimulus.
graphical example with continuous distributions:
High mutual information between S and R. the total response (sum of all possible stimuli, shown as the wrapper) stems from very differentiated stimuli (low noise entropy), each contributing uniquely to explain one part of the response:
R=f(S) ~~ ~~~~ ~~~~ ~~~ ~~ ~ ~ ~ ~ ~ ~ ~      ~ ~      ~ ~~ ++++++++ ~ s1 s2 s3 s4 s5 stimulus
Low mutual information. Same response, but it is just the sum of very noisy stimuli:
R=f(S) ~~ ~~~~ ~~~~ ~~~ ~~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ********** ~ ~~ *********************** ~ s1 s2 s3 s4 s5 stimulus
• relationship to KullbackLeibler
if mutual information measures variable independence, then it should be equivalent to the KullbackLeibler divergence between joint probability and the bare product of both distributions:
$${I}_{m}(S;R)={D}_{KL}(P(R,S),P(R)P(S))=$$
$$\int dR\int dS\phantom{\rule{0.278em}{0ex}}P(R)P(S)log\left(\frac{P(R,S)}{P(R)P(S)}\right)=$$
or $$\int dR\int dS\phantom{\rule{0.278em}{0ex}}P(R,S)log\left(\frac{P(R,S)}{P(R)P(S)}\right)$$
let's rewrite the second version using conditionals:
$$\int dR\int dS\phantom{\rule{0.278em}{0ex}}P(R,S)log\left(\frac{P(RS)P(S)}{P(R)P(S)}\right)=$$
$$\int dR\int dS\phantom{\rule{0.278em}{0ex}}P(R,S)log\left(\frac{P(RS)}{P(R)}\right)=$$
$$\int dR\int dS\phantom{\rule{0.278em}{0ex}}P(R,S)(log(P(RS))log(P(R)))=$$
$$\int dR\int dS\phantom{\rule{0.278em}{0ex}}P(R,S)log(P(RS))\int dR\int dS\phantom{\rule{0.278em}{0ex}}P(R,S)log(P(R))=$$
$$E[H(P(RS))]+H(P(R))$$
• spike trains
• spike patterns
 so far we've only dealt with single spikes or firing rates.
 what info is carried by entire patterns?
 how informative are they?
from spike train to pattern code (macrosymbols):
 chop response train into segments (called letters) of size $\Delta t$ .
 assign each of them one of 2 values based on presence/amount of contained spikes.
 group them into words of length L.
 plot the distribution of words, P(W).
 (allzero is a common mode, followed by words with a single oneletter spike, etc).

compute total $H(P(W))$

compute ${I}_{m}(W;S)$ from the responses given random Words, averaged.
 rinse and repeat for different Δt and T, until ${I}_{m}$ is maximum, i.e., the conditional/noise contributes the least entropy because we learned a lot about W by looking at that S.
example:
from Reinagel and Reid's 2000 study on the Lateral Geniculate Nucleus' pattern code:
 mutual information was ploted as a function of L and Δt.
 information drops off as we get to long words, because they require increasingly larger sample sizes to estimate the distributions.
 nonetheless, if a clear pattern emerges from looking at shorter word lengths, one may extrapolate the entropy curve to cases with larger words and predict if entropy would actually increase or not there.
• single spikes
how much does the observation of a single spike tell us about the stimulus? by how much does knowing that a particular stimulus occurred reduce the entropy of the response?
without knowing the stimulus, the probability of seeing a single action potential in the response can be computed from the average spike rate and Δt:
response  no response 

P(r=1) = r̅Δt  P(r=0) = 1  P(r=0) 
P(r=1s) = r̅(t)Δt  P(r=0s) = 1  P(r=1s) 
the rates r̅ and r̅(t) are calculated empirically.
note that every time t is a sample of s, so avereraging the noise entropy might as well be done online, as opposed to averaging over stimuli. this property is known as ergodicity:
$$I=H(P(r))E[P(rs)]=$$
$$\stackrel{\u203e}{r}\Delta tlog(\stackrel{\u203e}{r}\Delta t)(1\stackrel{\u203e}{r}\Delta t)log(1\stackrel{\u203e}{r}\Delta t)+$$
$$\frac{1}{T}{\int}_{0}^{T}dt\phantom{\rule{0.278em}{0ex}}[\stackrel{\u203e}{r}(t)\Delta t\phantom{\rule{0.278em}{0ex}}log(\stackrel{\u203e}{r}(t)\Delta t)+(1\stackrel{\u203e}{r}(t)\Delta t)log(1\stackrel{\u203e}{r}(t)\Delta t)]$$
(...more unexplained inferences...)
therefore, the perspike information is approximately:
$$I(r,s)=\frac{1}{T}{\int}_{0}^{T}dt\frac{r(t)}{\stackrel{\u203e}{r}}log\left(\frac{r(t)}{\stackrel{\u203e}{r}}\right)$$
some properties of this calculation:
 when r(t) → r̅, information goes to zero. this means that the single spike was barely modulated by that particular stimulus.
 when r̅ is small, information is likely to be large, because the receptive field is a very rare/informative one.
 stimulusindependent (measuring information is possible in the absence of a coding/decoding model).
 the spike is a metasymbol that could be any other event composed of multiple spikes.
example: hippocampal place fields. imagine the following place field:
  +   +++   +++++   +++   +  
$r(t)$ will be rather large (and information about any particular stimulus low), because the receptive field is rather large and imprecise. such place cell would fire to a considerable range of positions within the room.
• neural code wrapup: intepretive principles
 what challenges do natural stimuli pose?
 what does information theory suggest about the purpose of neural systems?
 what principles could be at work shaping the neural code?
natural stimuli: usually posses 2 properties, regardless of modality:
 huge dynamic range: variations over many orders of magnitude.
 powerlaw scaling: structure at many scales/frequency levels. screams for Fourier analysis?
• efficient coding
noise entropy escapes our hands under natural situations.
histogram equalization: ${H}_{max}(R)$ is obtained by matching limited amount of outputs/symbols to the distribution of inputs $P(S)$ , so that the probability of using certain output symbol is uniform. in other words, our coding filter should be determined by the distribution of natural inputs.
this implies that the response/inputoutput curve is the cumulative integral of $P(S)$ .
suppose only 20 possible outputs are available. we should divide the area under $P(S)$ in 20 slices of equal area.
example: (Laughlin, 1981) measured the distribution of contrast in natural scenes $P(S)$ . he correctly predicted the empirical $P(R)$ of the fly's monopolar cells that integrate information from photoreceptors, taking the integral of the measure $P(S)$ .
• dynamical efficient coding
$P(S)$ is varying widely over time as the neural system moves from environment to environment.
open question: should a neural system optimize over evolutionary time or locally?
if the amplitude of stimulus fluctuations decreases (less variety of frequencies for the same period of time), the probability distribution of stimulus values will contain less possible states. to adjust for this change, the encoding neuron correspondingly squeezes its activation curve to best encode a poorer range of stimuli, in real time! this way the inputoutput curve will go back to being similar to the cumulative integral of $P(s)$ .
feature adaptation: the encoding filter vector/function under a natural distribution of stimuli needs not be the same as the one derived from gaussian noise.
maximallyinformative dimensions: as mentioned in week 2, the filter is chosen to maximize ${D}_{KL}$ between spikeconditional and prior distributions, except that this is done on real time for changing distributions. this is equivalent to dynamically maximizing mutual information, as mentioned above.
• redundancy reduction
the efficientcoding hypothesis says that since membrane repolarisation is costly, neural codes should be as efficient as possible.
for a population code, this means that their joint entropy should be independent. that is: $P({R}_{1},{R}_{2})\phantom{\rule{0.222em}{0ex}}P({R}_{1})P({R}_{2})$ ; because it is true that $H({R}_{1},{R}_{2})\le H({R}_{1})+H({R}_{2})$ .
however, correlations have been shown to contradict the hypothesis at times:
 error correction and robustness
 helping in discrimination
example: neurons in the retina use redundancy.^{3}
sparsefiring hypothesis: as few neurons as possible should be firing at any time.
suppose we have basis functions ${\u0278}_{i}$ and noise/error $e$ with which to reconstruct a natural scene $I(x\u20d7)$ :
$$I(x\u20d7)=\sum _{i}{a}_{i}{\u0278}_{i}(x\u20d7)+e(x\u20d7)$$
this means choosing ${\u0278}_{i}$ so as to have as few coefficients ${a}_{i}$ as possible. this is done calculating a new error $E$ consisting of two terms: ordinary mean squares to account for fidelity, but also a cost for ${a}_{i}$ usage:
$$E=\sum _{x\u20d7}{[I(x\u20d7)\sum _{i}{a}_{i}{\u0278}_{i}(x\u20d7)]}^{2}+\lambda \sum _{i}{a}_{i}$$
The vector x in the equation above represents the coordinates of a point in the image.
• quiz
#!/usr/bin/env python3 from math import * # Suppose that we have a neuron which, in a given time period, will # fire with probability 0.1, yielding a Bernoulli distribution for the # neuron's firing (denoted by the random variable F = 0 or 1) with # P(F = 1) = 0.1. # Which of these is closest to the entropy H(F) of this distribution # (calculated in bits, i.e., using the base 2 logarithm)? def bernoulli_entropy(p): return (p*log2(p) + (1p)*log2(1p)) p1 = 0.1 bernoulli_entropy(p1) # 0.4690 # Continued from Question 1: # Now lets add a stimulus to the picture. Suppose that we think this # neuron's activity is related to a light flashing in the eye. Let us # say that the light is flashing in a given time period with probability # 0.10. Call this stimulus random variable S. # If there is a flash, the neuron will fire with probability 1/2. If # there is not a flash, the neuron will fire with probability 1/18. Call # this random variable F (whether the neuron fires or not). # Which of these is closest, in bits (log base 2 units), to the mutual # information MI(S,F)? # we will go with H(R)  E[H(RS)] r_2 = p1 s_2 = 0.1 # also happens to be 0.1 r_given_s_2 = 1/2 r_given_no_s_2 = 1/18 def mutual_information(response, noises): expected_noise_H = 0 # this is the expected noise entropy (with weights coming from the # MARGINAL stimulus probability distribution), and NOT the average # over n noise entropies. not all entropies are born equal :) for p_marginal,p_conditional in noises: expected_noise_H += p_marginal * bernoulli_entropy(p_conditional) return bernoulli_entropy(response)  expected_noise_H mutual_information(r_2, [[s_2, r_given_s_2], [1  s_2, r_given_no_s_2]]) # In the following three questions, we will explore Poisson neuron # models and population coding. # This exercise is based on a set of artificial "experiments" that # we've run on four simulated neurons that emulate the behavior found # in the cercal organs of a cricket. Please note that all the supplied # data is synthetic. Any resemblance to a real cricket is purely # coincidental. # In the first set of experiments, we probed each neuron with a range # of air velocity stimuli of uniform intensity and differing # direction. We recorded the firing rate of each of the neurons in # response to each of the stimulus values. Each of these recordings # lasted 10 seconds and we repeated this process 100 times for each # neuronstimulus combination. # We've supplied you with a .mat file for each of the neurons that # contains the recorded firing rates (in Hz). These are named neuron1, # neuron2, neuron3, and neuron4. The stimulus, that is, the direction # of the air velocity, is in the vector named stim. # This will load everything into a dict called data, and you'll be # able to access the stim and neuron responses using data['stim'], # data['neuron1'], etc. (In general, data.keys() will show you all the # keys available in the dict.) import pickle with open('tuning_3.4.pickle', 'rb') as f: data = pickle.load(f) len(data['neuron1']) # 5 data.keys() # dict_keys(['neuron4', 'stim', 'neuron3', 'neuron2', 'neuron1']) # The matrices contain the results of running a set of experiments in # which we probed the synthetic neuron with the stimuli in stim. Each # column of a neuron matrix contains the firing rate of that neuron (in # Hz) in response to the corresponding stimulus value in stim. That is, # nth column of neuron1 contains the 100 trials in which we applied the # stimulus of value stim(n) to neuron1. len(data['stim']) # 24 data['stim'] # array([ 0., 15., 30., 45., 60., 75., 90., 105., 120., # 135., 150., 165., 180., 195., 210., 225., 240., 255., # 270., 285., 300., 315., 330., 345.]) len(data['neuron1']) # 100 data['neuron1'] # array([[ 19.7, 27.4, 28.1, ..., 0. , 9.8, 16.1], # [ 21.3, 28.7, 32.8, ..., 0. , 8.3, 16.6], # [ 21.5, 27.3, 29.3, ..., 0. , 8.6, 17.2], # ..., # [ 21.9, 28.7, 30. , ..., 0. , 8.7, 14.4], # [ 24.1, 28.1, 29.1, ..., 0. , 7.1, 15.9], # [ 24. , 28.3, 28.4, ..., 0. , 8.2, 19.1]]) len(data['neuron1'][0]) # 24 data['neuron1'][0] # array([ 19.7, 27.4, 28.1, 32.6, 31.8, 28. , 22.6, 15.1, 9. , # 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , # 0. , 0. , 0. , 0. , 9.8, 16.1]) data['neuron1'][0, 0] # 19.7 # Plot the tuning curve the mean firing rate of the neuron as a # function of the stimulus for each of the neurons. import numpy as np import matplotlib.pyplot as plt tuning_curves = {} for neuron in ['neuron1', 'neuron2', 'neuron3', 'neuron4']: # axis=0 means 'average columns' tuning_curves[neuron] = np.mean(data[neuron], axis=0) plt.plot(data['stim'], tuning_curves[neuron]) plt.xlabel('Direction (degrees)') plt.ylabel('Mean firing rate') plt.title('Tuning curve for ' + neuron) plt.show() # Continued from Question 7: # We have reason to suspect that one of the neurons is not like the # others. Three of the neurons are Poisson neurons (they are # accurately modeling using a Poisson process), but we believe that # the remaining one might not be. # Which of the neurons (if any) is NOT Poisson? # Hint: Think carefully about what it means for a neuron to be # Poisson. You may find it useful to review the last lecture of week # 2. Note that we give you the firing rate of each of the neurons, not # the spike count. You may find it useful to convert the firing rates # to spike counts in order to test for "Poissonness", however this is # not necessary. # In order to realize why this might be helpful, consider the fact # that, for a constant a and a random variable X, E[aX]=aE[X] but # Var(aX)=a^2 * Var(X). What might this imply about the Poisson statistics # (like the Fano factor) when we convert the spike counts (the raw # output of the Poisson spike generator) into a firing rate (what we # gave you)? # solution: Poisson distributions have variance equal to mean (Fano # factor = 1). we could compute the mean spike count for each neuron's # column/stimulus, then compute its variance. repeat for all # columns/stimulus and plot. a true Poisson process should regress # into the identity function: T = 10 # in seconds for neuron in ['neuron1', 'neuron2', 'neuron3', 'neuron4']: mean_spikes = T * np.mean(data[neuron], axis=0) spikes_variance = (T**2) * np.var(data[neuron], axis=0) plt.scatter(mean_spikes, spikes_variance) plt.xlabel('Mean number of spikes') plt.ylabel('Variance of number of spikes') plt.title('Fano test for ' + neuron) plt.show() # Finally, we ran an additional set of experiments in which we exposed # each of the neurons to a single stimulus of unknown direction for 10 # trials of 10 seconds each. We have placed the results of this # experiment in the following file: import pickle with open('pop_coding_3.4.pickle', 'rb') as f: data2 = pickle.load(f) # pop_coding contains four vectors named r1, r2, r3, and r4 that # contain the responses (firing rate in Hz) of the four neurons to # this mystery stimulus. It also contains four vectors named c1, c2, # c3, and c4. These are the basis vectors corresponding to neuron 1, # neuron 2, neuron 3, and neuron 4. data2.keys() # dict_keys(['c1', 'r1', 'r3', 'c2', 'c3', 'r2', 'c4', 'r4']) # Decode the neural responses and recover the mystery stimulus vector # by computing the population vector for these neurons. You should use # the maximum average firing rate (over any of the stimulus values in # 'tuning.mat') for a neuron as the value of rmax for that # neuron. That is, rmax should be the maximum value in the tuning # curve for that neuron. v = np.array([0,0], dtype='float64') # recall that a v̂_{pop} is the linear combination of vectors: # ∑_a (r_a/max(max(r_a))ĉ_a # in our case we are given many r_a's (many responses per neuron), so # we create a population vector for each response, then average them # to obtain a final population vector. for a in ['1', '2', '3', '4']: v += (np.average(data2['r'+a]) / max(tuning_curves['neuron'+a])) * data2['c'+a] # # equivalent to: # # v = np.zeros((2, 10), dtype='float64') # for i in range(0, 10): # for a in ['1', '2', '3', '4']: # v[:,i] += (data2['r'+a][i] / # max(tuning_curves['neuron'+a])) * data2['c'+a] # v = np.average(v, axis=1) # What is the direction, in degrees, of the population vector? You # should round your answer to the nearest degree. Your answer should # contain the value only (no units!) and should be between 0∘ and # 360∘. If your calculations give a negative number or a number # greater than or equal to 360, convert it to a number in the proper # range (you may use the mod function to do this). # You may need to convert your resulting vector from Cartesian # coordinates to polar coordinates to find the angle. You may use the # atan() function in MATLAB to do this. Note that the the convention # we're using defines 0∘ to point in the direction of the positive # yaxis and 90∘ to point in the direction of the positive xaxis # (i.e., 0 degrees is north, 90 degrees is east). # we start from 90 because that's our reference, and substract from # there in order to rotate clockwisely: (90  round(atan(v[1]/v[0]) * (180/pi))) % 360 # atan works in radians
• mechanism: neuron models

neuroelectronics
 membranes
 ion channels
 wiring

simplified neuron models
 basic dynamics of neuronal excitability

neuronal geometry
 dendrites and dendritic computing
• membrane
• bare phospholipid bilayer
good dielectric (capacitor), but still allows some charge to pass through (paralleled by a resistor), mostly because of embedded channel proteins.
RC circuit:
/\/   inside   outside   
from Kirchhoff's conservation of current:
$$I={I}_{R}+{I}_{C}$$
from Ohm's law and the definition of capacitance ( $C=Q/V$ )
$$I=\frac{{V}_{R}}{R}+C\frac{dV}{dt}$$
• rest potential
 because of the active work of pumps, ${K}^{+}$ in the inside and $N{a}^{+},C{l}^{},C{a}^{2+}$ in the outside are under osmotic force to homogenize in the direction of their respective concentration gradients (voltage source).
 until opposed by electrostatic forces.
this results in Nernst equilibrium:
$$E=\frac{{k}_{B}T}{zq}ln\left(\frac{concentratio{n}_{inside}}{concentratio{n}_{outside}}\right)$$
where ${k}_{B}$ is the Boltzmann constant, T is the temperature, q is the ionic charge and z number of charges.
this means that the current that manages to flow through the resistor in our RC model must also go through the electric potential created by the ion battery; i.e., not all the voltage drop is due to the resistor:
 /\/    inside   outside   
$$I=\frac{V{V}_{rest}}{R}+C\frac{dV}{dt}$$
multiplying by R on both sides:
$${V}_{\infty}=V+RC\frac{dV}{dt}$$
and let τ=RC and ${V}_{\infty}$ be the whole steady potential. note that when $\frac{dV}{dt}=0$ (no current) ${V}_{\infty}=V$ ; and when constant, ${V}_{\infty}=V{V}_{rest}=IR{V}_{rest}$ .
with these conditions we can solve the differential equation:
 $V(t)={V}_{\infty}(1{e}^{t/\tau})$ ; during the capacitors rising phase.
 $V(t)={V}_{\infty}{e}^{t/\tau}$ ; during discharge.
• ion channels
recall their diversity:
 voltagegated
 chemicallygated:
 transmittergated (synaptic):
 dopamine
 GABA
 serotonin
 glutamatergic:
 AMPA
 NMDA
 etc.
 $C{a}^{2+}$ gated.
 transmittergated (synaptic):
 mechanicallygated
 thermallygated
we will be focusing on voltagegated channels.
let g = 1/R be the conductance. Ohm's law becomes:
$$I=Vg$$
we can break down the general voltage source from our previous circuit into equilibrium (rest) potentials for different ion species:
ion  equilibrium potential (E_i) 

Na  +050 mV 
Ca  +150 mV 
K  080 mV 
Cl  060 mV 
$${I}_{i}={g}_{i}(V{E}_{i})$$
these will be modelled with parallel paths.
from our ionspecific cable model for K, Na and leaks (L):
 /\/  L        /\/  K        /\/  Na   inside   outside   
• action potential
there's a limit to how much charge can move into the membrane without losing the capacity to stay in equilibrium. we call it the action potential.
where does the nonlinear property that makes irrecoverable excitability (our good old activation function g()) lie?
conductances aren't constant. the resistance imposed by channels changes as voltage changes. we need functions describing them.
functional diagram of a voltagegated channel:
:========= =========: :========= =========: :========= =========: / < anchoring subunit :========= =========:/ (___(voltage sensor)___) \ \ < gate, activated by sensor ____________________\_ ( ) :========= =========:\ :========= =========: \ :========= =========:
• K dynamics
we will be using kinetic models for the opening of channels:
 P(gate is open) increases with a depolarised neuron ( ${K}^{+}$ flows away after spike).
 gate contains 4 moving subunits. ${P}_{K}={\prod}_{i=1}^{4}{P}_{su{b}_{i}}\cong {P}_{sub}^{4}$ , for independent subunits.^{4}
we go for a Bernoulli probability distribution, n = P(open). transitions between states occur at voltagedependent rates ${\alpha}_{n}(V)$ (closed → open) and ${\beta}_{n}(V)$ (open → closed). we don't know the function of voltage describing this probability, but we can write the function describing its change in continuous time:
HudgkinHuxley1: $$\frac{dn}{dt}={\alpha}_{n}(V)(1n){\beta}_{n}(V)n$$
that is, the function that describes the probability is given by how much is added to the open state (proportional to the closed state, times the voltagevarying rate α) minus how much is lost.
rewrite in the form of τ and ${n}_{\infty}$ :
$$\frac{1}{{\alpha}_{n}(V)+{\beta}_{n}(V)}\frac{dn}{dt}=\frac{{\alpha}_{n}(V)}{{\alpha}_{n}(V)+{\beta}_{n}(V)}n$$
$${\tau}_{n}(V)\frac{dn}{dt}={n}_{\infty}(V)n$$
• Na dynamics
 3subunit gate similar to Ka. P(gate is open) = m increases with potential drop.
 additional subunit can block channel back even if the gate is open, after P(additional is open) = h decreases if potential is too weak, terminating the rising phase during an action potential.
 $P(channel\phantom{\rule{0.278em}{0ex}}is\phantom{\rule{0.278em}{0ex}}open)\cong h{m}^{3}$ .^{4}
similarly, HH2:
$$\frac{dm}{dt}={\alpha}_{m}(V)(1m){\beta}_{n}(V)m$$
$$\frac{1}{{\alpha}_{m}(V)+{\beta}_{m}(V)}\frac{dm}{dt}=\frac{{\alpha}_{m}(V)}{{\alpha}_{m}(V)+{\beta}_{m}(V)}m$$
$${\tau}_{m}(V)\frac{dm}{dt}={m}_{\infty}(V)m$$
And for the additional Nainactivating subunit, HH3 is:
$$\frac{dh}{dt}={\alpha}_{h}(V)(1h){\beta}_{n}(V)h$$
$$\frac{1}{{\alpha}_{h}(V)+{\beta}_{h}(V)}\frac{dh}{dt}=\frac{{\alpha}_{h}(V)}{{\alpha}_{h}(V)+{\beta}_{h}(V)}h$$
$${\tau}_{h}(V)\frac{dh}{dt}={h}_{\infty}(V)h$$
• HudgkinHuxley model
and from those probabilities we go back to our voltagedependent channel conductances:
$${g}_{K}(V)={g}_{K}{n}^{4}$$
$${g}_{Na}(V)={g}_{Na}h{m}^{3}$$
therefore, the diagram's overall current equation, $I=\frac{V{V}_{rest}}{R}+C\frac{dV}{dt}$ , we rewrite as:
$$I=\sum _{i}{g}_{i}(V{E}_{i})+C\frac{dV}{dt}$$
this is expanded with the ionspecific terms into the fourth and last equation in HudgkinHuxley's dynamical system:
HH4:
$$I={g}_{L}(V{E}_{L})+{g}_{K}{n}^{4}(V{E}_{K})+{g}_{Na}h{m}^{3}(V{E}_{Na})+C\frac{dV}{dt}$$
• simplified models
the model can't be so simple that it will fail to capture the variety of neural codes neurons are capable of (frequency modulation, timingdependent firing, train patterns, etc.)
• integrateandfire
we observed that at rest potential the neuron behaves very linearly. assume conductances stay constant and C = 1. this is no other than our good old passive membrane:
$$I(t)=a(V{V}_{0})+1\frac{dV}{dt}$$
or
$$1\frac{dV}{dt}=a(V{V}_{0})+I(t)$$
this is a linear function, and since the slope is negative ${V}_{0}$ is a stable fixed point in phasespace:
\  \  \  \  \  \  \  \  \ dV/dt > > > V0 < < <
this is good for latent addition, but there's no end or threshold to how much electric potential can drop before we see excitation.
to account for spikes we delimit the function with a ${V}_{threshold}$ that will drive potential to depolarisation.
${V}_{threshold}$ marks a new fixed point, unstable this time:
\  / \  / \  / \  / \/  \ /  \ /  \ /  \ / > > > V0 < < < < < < V_th > > >
now we need an edge ${V}_{max}$ that will mark the end of an increase in depolarisation and a return to a ${V}_{reset}$ at the other edge of the graph.
\  / \  / \  / \  / \/  \ /  \ /  \ /  \ / V_reset > > > V0 < < < < < < V_th > > > V_max
in cortical neurons this has been fitted against quadratic functions, or using an exponential for the second half where acceleration is positive (effectively capturing the faster dynamics of depolarisation) in what is known as the exponential integrateandfire neuron:^{5}
$$\frac{dV}{dt}=a(V{V}_{0})+{e}^{(V{V}_{th})/\Delta}+I(t)$$
where parameter Δ controls how fast the exponential grows.
\   \   \   \   \  \ /  \ /  \ /  \ / V_reset > > > V0 < < < < V_th > V_max
• theta neuron
by thinking of potential as a cyclic phenomenon we can simplify ${V}_{max}$ and ${V}_{reset}$ into a single state, called ${V}_{spike}$ :^{6}
$$\frac{d\theta}{dt}=(1cos(\theta ))+(1+cos(\theta ))I(t)$$
• twodimensional models
what if instead of an abrupt reset we had another decrease leading to yet another (stable) fixed point?
\   \    \    \    \  \ / \  \ / \  \ /  \ / V_reset > > > V0 < < < < V_th > VK < <
we will draw inspiration from the HudgkinHuxley model bringing in a second variable, G(u), to take care for inactivation. with G(u) providing negative values for large potentials $\frac{dV}{dt}$ can be brought to zero again; mimicking the role of the Na gate's second inactivating subunit and the delayed role of the K channel.
$$\frac{dV}{dt}=F(V)+G(u)+I(t)$$
this new variable $u$ is in turn a function, whose change is contingent upon electric potential. we want $u$ to go handinhand with potential:
$$\frac{du}{dt}=u+H(V)$$
• simple™ model
modelled after a subset of the previous phase plane, near the intersection of nullclines. F(V) is chosen to be quadratic, G(u) is simply u. on the second equation, u + H(V) gains some weights:
$$\frac{dV}{dt}=F(V)+G(u)+I(t)=(\u0251V+\beta {V}^{2}+\gamma )u+I(t)$$
$$\frac{du}{dt}=a(bVu)$$
given a specific quadratic function, two extra parameters (in addition to a (u's decay) and b (u's sensitivity)) are used to reset v and u after reaching ${V}_{max}$ . these are called c (V's original value) and d (u's original value).
these dynamics are enough to capture a variety of codes:
 regular (as in repetitive) spiking:      
 intrinsically bursting:     
 fast spiking: 

lowthreshold spiking:    
 intrinsically bursting with chattering intervals:    

thalamocortical:      ...    
 resonator: mmm
• realistic dendritic geometry
our previous models only really capture computation at and near the soma or axon. dendritic arbors have computationallysignificant effects.
nobody really knows what the appropriate level of description is, if any.
for any passive membrane the following stochastic properties hold:

effects of traveled distance:
 impulse amplitude decays. that is, spikes shorten.
 frequency decays too. that is, spikes broaden.

effects of neurite radius:
 strength (ΔV) is inversely proportional to thickness.
this affects the contribution of individual inputs coming from different dendrites.
• cable theory
V is now a function of both time and space. offthebat, we are dealing with partial differential equations. run cables/resistors on both inside and outside to connect membrane patches, so as to model local currents produced by potential along the cable.
> a = radius  / \  \ /\/ /  /   \  inside \ / outside  /   \  \  /  / \  / \  \ /\/ /  /   \  inside \ / outside  /   \  \  /  / \ v x = length
generally we take the external resistance to be zero.
our old membrane's Kirchhoff equality ${I}_{m}={c}_{m}\frac{\partial {V}_{m}}{\partial t}+\frac{{V}_{m}}{{r}_{m}}$ is symmetric to the second lengthderivative of voltage:
$$\frac{1}{{r}_{inside}}\frac{{\partial}^{2}{V}_{m}(x,t)}{\partial {x}^{2}}={c}_{m}\frac{\partial {V}_{m}}{\partial t}+\frac{{V}_{m}}{{r}_{m}}$$
which we rearrange to visualize time and space constants:
$${\left(\sqrt{\frac{{r}_{m}}{{r}_{i}}}\right)}^{2}\frac{{\partial}^{2}{V}_{m}}{\partial {x}^{2}}={r}_{m}{c}_{m}\frac{\partial {V}_{m}}{\partial t}+{V}_{m}$$
$${\lambda}^{2}\frac{{\partial}^{2}{V}_{m}}{\partial {x}^{2}}={\tau}_{m}\frac{\partial {V}_{m}}{\partial t}+{V}_{m}.$$
potential diffuses exponentially, starting from the point where current is injected (let's call it x=0), according to $V(x)\propto {e}^{(\frac{x}{\lambda})}$ . half length constants away from the centre, the voltage signal is attenuated by a factor of 0.4. one constant away it's 0.2; etc.
the arrival of peaks at different points gives us the velocity of propagation, which naturally turns out to be the ratio of the length and time constants, multiplied by two: $Ve{l}_{cable}=2\lambda /\tau $ .
here's what the general solution to the differential equation looks like:
$$V(x,t)\propto \sqrt{\frac{\tau}{4\pi {\lambda}^{2}t}}\phantom{\rule{0.278em}{0ex}}{e}^{\frac{t}{\tau}\frac{\tau {x}^{2}}{4{\lambda}^{2}t}}$$
this is summed over different locations and times to come up with our actual filter on the input, as seen on week 2.
• compartmental models
simple cable theory doesn't get us very far modelling truly realistic neurons anyway, because the it isn't taking channels into account, which also exist in dendrites. analytical solutions become intractable.
compartments are a simpler approximation to cable theory which gets rid of our dependence on x, although it's still a passive model.
 it's a divideandconquer strategy that discretizes dendritic trees into levels.

for a binary tree, when the sum of the diameters of siblings and parent obey the relation:
$${d}_{1}^{3/2}+{d}_{2}^{3/2}={D}^{3/2}$$

then the subtree can be approximated with a single cable segment the diameter of the parent, extended by the effective electronic length of the children. the final result will invariably be a single neurite.
• active models
 for a given geometry and function of channel density (yes!, channel density variability can have important effects too), compartmentalize into segments of approximately constant properties.
 model each segment according to your favorite model (HudgkinHuxley, e.g.). Input current can come from either: parent, sibling or children: uphold Kirchhoff's when thinking about input potential/current.
 couple segments into a tree, putting resistors between segments to account for drops. these need not be directioninvariant (i.e. there will be two diodeguarded resistors per segment connection).
• dendritic computation
there are many interesting ideas floating around:
 lowpass filter, attenuation.

where the inputs enter the dendrite can have an impact:
 inputs at different places of the same segment could compute logical conjunction, whereas the parent segment computes disjunction on its children.
 analogically, this means the ability to segregate or amplify signals.

plasticity: Camediated backpropagating action potentials could be the driving force behind plasticity, based on coincidence detection (mentioned in the introduction).

hippocampal neurons are known to display synaptic scaling: the ability to make inputs "commutative", or positionindependent.
examples:
 sound localisation: according to the Jeffress model, coincidence detector neurons could be taking advantage of different cable lengths (timedependent code) to encode something like leftright position. because sounds coming from the left arrives with a delay to the right ear, a soma closer to the right with dendrites of different length coming from each ear could perceive both inputs at the same time, firing only to those sounds. other neurons with different dendritic lengths would detect other sound positions.
 direction selectivity on LGN/V1 receptive fields (and elsewhere) could be due to the order in which inputs enter the same dendrite. a timely excitation would add to the signal as it travels down; whereas desynchronized stimuli would see their individual signals fade away. in other words, the sequence in which presynaptic neurons touch the postsynaptic one encodes the sequence the postsynaptic neuron is trained to detect.
• mechanism: networks
• chemical synapse
 action potential travels down the axon at the presynaptic neuron,
 opens $C{a}^{2+}$ channels at the terminal,
 which in turn causes neurotransmitter vesicles to throw their contents into the synaptic cleft.
 after migration via diffusion, neurotransmitters bind to glycoprotein receptors at the postsynaptic neuron.
 ligandgated channels (also known as ionotropic receptors) open, either from the direct action of neurotransmitters or as a consequence of the chain reaction started at metabotropic (aka Gproteincoupled) receptors.

this causes either further depolarisation or polarisation at the postsynaptic neuron, depending on chemical identity.

excitatory: $N{a}^{+}$ channels open or some other positive ions enter the cell. common mediating neurotransmitters are:
 acetylcholine: common in neuromuscular junctions.
 glutamate: CNS. binds to NMDA, AMPA and kainate receptors.
 catecholamines: dopamine, epinephrine/adrenaline, norepinephrine/noradrenaline, etc.
 serotonin: parasympathetic functions
 histamine

inhibitory: ${K}^{+}$ leaves out or $C{l}^{}$ comes in.
 GABAa, GABAB. receptors are pentamers which exist in many compositions and conformations.
 glycine: nonexclusively found in the spinal cord and retina


neurotransmitters detach from the postsynaptic dendritic spine and are reabsorbed.
• modelling
given the specific membrane capacitance ( ${c}_{m}\cong 10nF/m{m}^{2}$ ), specific membrane resistance ( ${r}_{m}\cong 1M\Omega m{m}^{2}$ ) and area A; ${C}_{m}={c}_{m}A$ and ${R}_{m}=\frac{{r}_{m}}{A}$ . Notice how surface area becomes irrelevant when determining τ:
$$\frac{crA}{A}\frac{dV}{dt}=(V{V}_{equilibrium})+IR$$
anyway, as with Na and K channels which are modelled through resistors, our neuron circuit model will need yet another conductance per synapse (with its own equilibrium potential ${E}_{s}$ ):
$${\tau}_{mem}\frac{dV}{dt}=((V{E}_{leak})+{g}_{syn}(V{E}_{syn}){r}_{mem})+IR$$
for an excitatory synapse, ${E}_{syn}>{E}_{leak}$ , and viceversa.
similarly to the HudkingHuxley model, the synaptic conductance ${g}_{s}$ is a probabilistic function specified by a differential kinetic model:
$${g}_{s}={g}_{{s}_{max}}{P}_{transmitter\phantom{\rule{0.278em}{0ex}}release}{P}_{open\phantom{\rule{0.278em}{0ex}}channels}$$
assume ${P}_{transmitter\phantom{\rule{0.278em}{0ex}}release}=1$ , i.e. vesicles are always available for release. as for ${P}_{open\phantom{\rule{0.278em}{0ex}}channels}$ , let ${\alpha}_{s}$ and ${\beta}_{s}$ be the rate at which closed channels open and open channels close, respectively. therefore:
$$\frac{d{P}_{open\phantom{\rule{0.278em}{0ex}}channels}}{dt}={\alpha}_{s}(1{P}_{open\phantom{\rule{0.278em}{0ex}}channels}){\beta}_{s}{P}_{open\phantom{\rule{0.278em}{0ex}}channels}$$
evidence shows that ${P}_{open\phantom{\rule{0.278em}{0ex}}channels}$ is best modelled using dissipating exponential function $K(t)={e}^{t/{\tau}_{s}}$ for AMPA synapses, whereas the slightly delayed rise in probability in something like GABAA is bettersuited to a socalled "alpha" function, $\alpha (t)=\frac{t}{{\tau}_{peak}}{e}^{(1\frac{t}{{\tau}_{peak}})}$ . ${\tau}_{peak}$ is an empirical parameter that will vary with chemical.
P: 1.0  .8  \ .6  \ .4  \_ .2  \ _ 0.0  \ _ 0 5 10 15 20 25 :t (ms) alpha function
in order to model the cumulative effect of a synapse train coming from the presynaptic neuron we recur to our old linear filter. this we call the response function, rho ( $\rho (t)={\sum}_{i}\delta (t{t}_{i})$ ).
example of a final aggregate synaptic conductance for AMPA:
$${g}_{s}(t)={g}_{{s}_{max}}\sum _{{t}_{i}<t}K(t{t}_{i})={g}_{{s}_{max}}{\int}_{\infty}^{t}K(t\tau )\rho (\tau )d\tau $$
$$={g}_{{s}_{max}}{\int}_{\infty}^{t}{e}^{(t\tau )/\tau}(\sum _{i}\delta (t{t}_{i}))d\tau $$
imagine the following spike train arriving at the synapse, and its corresponding filtered conductance composed of alpha or exponential ${P}_{open\phantom{\rule{0.278em}{0ex}}channels}$ functions pieced together:
train:      > time \ \ \ \ \ \ \ \ \ conductance:  \ > time
• networks
now that we know how to feed a neuron's output into another neuron and convert that input into yet more action potentials given by ${\tau}_{m}\frac{dV}{dt}=((V{E}_{L})+{g}_{s}(t)(V{E}_{s}){r}_{m})+IR$ , the next simplest step is to experiment with 2neuron networks. the following are common properties for a simple feedback loop system with homogeneous synaptic sign:
nature of synapses  behavioural pattern 

excitatory  alternating activation (n1, n2, n1, n2, ...) 
inhibitory  synchrony^{7} (n1+n2, n1+n2, n1+n2, ...) 

Not to be confused with ephaptic transmission, in which local ion currents from one cell's membrane manage to create a significant effect on the membranes of nearby cells. ↩

http://www.jneurosci.org/content/19/18/8036.full ↩

Berry, Chichilnisky. ↩

I'd like to thank Prof. Markus Müller for pointing out to me that the exponents of probability functions were determined using curve fitting and are unrelated to the number of molecule subunits. ↩↩

FourcaudTrocmé, Hansel, van Vreeswijk, Brune. ↩

Ermentrout, Kopell. ↩

This demands a deeper explanation. ↩