work in progress...

This marvelous course by Prof. Markus Müller revealed like a hidden treasure in the most unexpected of places: a social-sciences and humanities-oriented postgrad program, in random underfunded university (the cognitive science masters degree from the Autonomous State University of Morelos, Mexico). The aim of self-publishing notes and assignments, and assembling them into a coherent whole, goes beyond giving everyone brave enough1 a shot to enjoy it. It also sits as a personal reminder of my interest at the intersection of mathematical modeling, biophysics and the mind.

the brain meets the operational definition of what a complex2 system is:

  • hierarchical organisation, scale-free structure, many-to-many relations among components, interconnectedness (recurrence)
  • non-linear behaviour, emergent properties (hard to determine from the observation of components alone, which number in the millions)
  • chaotic to initial conditions (slight differences lead to widely different states)
  • non-stationary measurements (dynamical system whose model constants/parameters actually vary; system doesn't stay in an attractor)
  • phase transitions, equilibrium depends on critical states
  • noisy measurements, plus some rather stochastic processes (spiking), whose measurable fluctuations may be difficult to distinguish from chaotic determinism

dynamical systems basics

we'll distinguish between two kinds of systems:

  • stochastic
  • dynamical (deterministic). e.g. the harmonic oscillator:

harmonic oscillator

consider the mechanics of a mass attached to a spring (no other forces involved, e.g. friction, gravity). let's say this is a very simple brain model (or of one of its components thereof) whose activity behaves like an oscillating variable:

(Image "Figure 29")
Figure 29 - source

from Newton's second law and Hooke's law:

F=ma=kx(TeX formula:  F = ma = -kx )


md2xdt2=kx(TeX formula:   m \frac{d^2x}{dt^2} = -kx )

x(t)̈=kmx(TeX formula:   \ddot{x(t)} = - \frac{k}{m}x )

x(t)̈=ω02x(TeX formula:   \ddot{x(t)} = - ω_0^2 x )

in order to solve the second-order linear ordinary homogeneous differential equation, we partition it into a system of two first-order equations. let v=ẋ(TeX formula: v = \dot{x}) , therefore:

{v̇=ω02xẋ=v(TeX formula:  \left\{ \begin{array}{cc} \dot{v} = -ω_0^2 x \\ \dot{x} = v \\ \end{array} \right. )

{v̇=ω02vẋ=ω02x=ω02x(TeX formula:  \left\{ \begin{array}{cc} \dot{v} = -ω_0^2 ∫v \\ \dot{x} = ∫-ω_0^2 x = -ω_0^2∫x \\ \end{array} \right. )

notice that with the last substitutions both equations are the same, except in terms of different variables. both sin(ω0t) and cos(ω0t) satisfy it, because:

ddtsin(ω0t)=ω0cos(ω0t)=ω02sin(ω0t)dt(TeX formula:  \frac{d}{dt} sin(ω_0 t) =  ω_0 cos(ω_0 t) = -ω_0^2 ∫sin(ω_0t)dt )

and ddtcos(ω0t)=ω0sin(ω0t)=ω02cos(ω0t)dt(TeX formula:  \frac{d}{dt} cos(ω_0 t) = -ω_0 sin(ω_0 t) = -ω_0^2 ∫cos(ω_0t)dt )

more generally, solutions will conform to the linear superposition principle:

x(t)=asen(ω0t)+bcos(ω0t);a,b(TeX formula:  x(t) = a sen(ω_0t) + b cos(ω_0t); \; a,b ∈ ℝ)

any possible function of position on time will look sinusoidal, of varying frequency (depending on k/m(TeX formula: \sqrt{k/m}) ), amplitude and phase (a and b together). we rewrite it as:

x(t)=a2+b2[aa2+b2sen(ω0t)+ba2+b2cos(ω0t)]=A[c1sen(ω0t)+c2sen(ω0t)](TeX formula:  x(t) = \sqrt{a^2 + b^2} \left[ \frac{a}{\sqrt{a^2 + b^2}} sen(ω_0t) + \frac{b}{\sqrt{a^2 + b^2}} cos(ω_0t) \right] = A[c_1 sen(ω_0t) + c_2 sen(ω_0t)] )

this has the property that 1<=c1,c2<=1(TeX formula: -1 <= c_1, c_2<= 1) , and c12+c22=1(TeX formula: c_1^2 + c_2^2 = 1) .

let φ0(TeX formula: φ_0) be the initial phase. then φ0(aa2+b2=cos(φ0)ba2+b2=cos(φ0))(TeX formula: ∃ φ_0 \; \left( \frac{a}{\sqrt{a^2 + b^2}} = cos(φ_0) ∧ \frac{b}{\sqrt{a^2 + b^2}} = cos(φ_0)\right))


x(t)=A[cos(φ0)sen(ω0t)+sen(φ0)sen(ω0t)]=(TeX formula:  x(t) = A[cos(φ_0) sen(ω_0t) + sen(φ_0) sen(ω_0t)] = )

A[sen(ω0t+φ0)](TeX formula:   A[sen(ω_0t + φ_0)] )


(Image "Figure 22")
Figure 22 - simple harmonic oscillating movement and its parameters (source)

initial conditions

finally, let's say the spring is totally relaxed at t=0, so x(0)=A=A[sen(φ0)](TeX formula: x(0) = A = A[sen(φ_0)]) . this implies that φ0=π/2(TeX formula: φ_0 = π/2) :

x(t)=A[sen(ω0t+π/2)](TeX formula:  x(t) = A[sen(ω_0t + π/2)] )

complex polar notation

prove that z=eiφ(TeX formula: z = e^{iφ}) is the unit circle.

following Euler's formula, eiφ=cos(φ)+isin(φ)()(TeX formula: e^{iφ} = cos(φ) + isin(φ) ∈ (ℝ→ℂ)) . therefore |z|=eiφeiφ=eiφiφ=e0=1.(TeX formula: |\vec{z}| = \sqrt{e^{iφ}e^{-iφ}} =
\sqrt{e^{iφ-iφ}} = \sqrt{e^0} = 1.) 4

(Image "Figure 23")
Figure 23 - simple harmonic oscillating movement (complex polar notation) and its parameters (source)

solutions of the form Aeλt(TeX formula: Ae^{λt}) include the aforementioned trigonometric solutions when λ, A ∈ ℂ. so the #original differential equation can be rewritten as:

λ2Aeλt+ω02Aeλt=0=(λ2+ω02)Aeλt(TeX formula:  λ^2 A e^{λt} + ω_0^2 Ae^{λt} = 0 = (λ^2 + ω_0^2)Ae^{λt} )

λ2+ω02=0(TeX formula:  ⊢ λ^2 + ω_0^2 = 0 )

λ=ω02=±iω0(TeX formula:  ⊢ λ = \sqrt{-ω_0^2} = ±iω_0 )

the general solution becomes:

x(t)=Aeiω0t+Beiω0t(TeX formula:  x(t) = A e^{iω_0t} + B e^{-iω_0t} )

if we restrict x(t) to the type ℝ→ℝ (position is real), it follows that x(t) equals its complex conjugate.

moreover, x(t)¯=Aeiω0t+Beiω0t¯=Aeiω0t¯+Beiω0t¯=A¯eiω0t+B¯eiω0t(TeX formula: \overline{x(t)} = \overline{A e^{iω_0t} + B
e^{-iω_0t}} = \overline{A e^{iω_0t}} + \overline{B e^{-iω_0t}} =
\overline{A} e^{-iω_0t} + \overline{B} e^{iω_0t}) . this implies that A=B¯(TeX formula: A = \overline{B}) and B=A¯(TeX formula: B = \overline{A}) . so:

x(t)=(a+ib)[cos(ω0t)+isin(ω0t)]+(aib)[cos(ω0t)isin(ω0t)](TeX formula:  x(t) = (a+ib)\left[cos(ω_0t)+isin(ω_0t)\right] + (a-ib)\left[cos(ω_0t)-isin(ω_0t)\right] )

=2acos(ωt)2bsin(ωt)()(TeX formula:  = 2a cos(ωt) - 2bsin(ωt) ∈ (ℝ→ℝ) )

compare this expression against the one using trigonometric functions.

energy loss

(also see #dissipative systems and attractors)

(Image "Figure 28")
Figure 28 - source

now we will damp the oscillator by adding a term for friction, whose force is proportional to velocity (but in opposite direction):

FT=F(TeX formula:  F_T = ∑F )

x(t)̈=ω02x(t)2βx(t)̇(TeX formula:  \ddot{x(t)} = - ω_0^2 x(t) - 2β\dot{x(t)} )

assume a family of solutions of the form x(t)=Aeλt(TeX formula: x(t) = Ae^{λt}) . therefore:

λ2Aeλt+2βλAeλt+ω02Aeλt=0(TeX formula:  λ^2 Ae^{λt} + 2βλAe^{λt} + ω_0^2 Ae^{λt} = 0 )

Aeλt(λ2+2βλ+ω02)=0(TeX formula:  Ae^{λt} (λ^2 + 2βλ + ω_0^2) = 0 )

λ2+2βλ=ω02(TeX formula:  λ^2 + 2βλ = -ω_0^2 )

λ2+2βλ+β2=ω02+β2=(λ+β)2(TeX formula:  λ^2 + 2βλ + β^2 = -ω_0^2 + β^2 = (λ+β)^2 )

λ=β±iω02β2(TeX formula:  λ = -β ± i\sqrt{ω_0^2 - β^2} )

plug that back into the general solution:

x(t)=Ae(β+iω02β2)t+Be(βiω02β2)t(TeX formula:  x(t) = Ae^{(-β+i\sqrt{ω_0^2 - β^2})t} + Be^{(-β-i\sqrt{ω_0^2 - β^2})t} )

=Aeβteiω02β2t+Beβteiω02β2t(TeX formula:       = Ae^{-βt}e^{i\sqrt{ω_0^2 - β^2}t} + Be^{-βt}e^{-i\sqrt{ω_0^2 - β^2}t} )

=eβt(Aeiω02β2t+Beiω02β2t)(TeX formula:       = e^{-βt} \left( Ae^{i\sqrt{ω_0^2 - β^2}t} + Be^{-i\sqrt{ω_0^2 - β^2}t} \right) )

asymptotically, the system will inevitably evolve to the most entropic dynamic state. let's show this on a case-by-case basis, taking note on the different (non)monotonic dynamics:

underdamped: ω0 > β

let ω=ω02β2(TeX formula: ω = \sqrt{ω_0^2 - β^2}) :

x(t)=eβt(Aeiωt+Beiωt)(TeX formula:  x(t) = e^{-βt} \left( Ae^{iωt} + Be^{-iωt} \right) )

by restricting x(t) to ℝ→ℝ, and from our previous knowledge of the simple harmonic oscillator solution:

x(t)¯=eβtDsin(ωt+φ0)(TeX formula:  \overline{x(t)} = e^{-βt} Dsin(ωt+φ_0) )

(Image "Figure 24")
Figure 24 - underdamped harmonic oscillating movement

ω0 = β

λ1=β±i0=β=λ2(TeX formula:  λ_1 = -β ± i\sqrt{0} = -β = λ_2 )

we leverage the fact that amplitude isn't constant anymore:

x(t)=A(t)eβt(TeX formula:  x(t) = A(t)e^{-βt} )

ẋ(t)=Ȧ(t)eβtA(t)βeβt(TeX formula:  \dot{x}(t) = \dot{A}(t)e^{-βt} - A(t)βe^{-βt} )

ẍ(t)=Ä(t)eβtȦ(t)βeβtȦ(t)βeβt+A(t)β2eβt(TeX formula:  \ddot{x}(t) = \ddot{A}(t)e^{-βt} - \dot{A}(t)βe^{-βt} - \dot{A}(t)βe^{-βt} + A(t)β^2e^{-βt})

and the damped harmonic oscillator equation becomes:

eβt[Ä2Ȧβ+Aβ2+2βȦ2β2A+β2A]=0(TeX formula:  e^{-βt}\left[ \ddot{A}-2\dot{A}β+Aβ^2+2β\dot{A}-2β^2A+β^2A \right] = 0 )

eβt[Ä]=0(TeX formula:  e^{-βt}\left[ \ddot{A} \right] = 0 )

Ä(t)=0(TeX formula:  ⊢ \ddot{A}(t) = 0 )

Ä(t)=0A(t)=a1+a2t(TeX formula:  ⊢ \ddot{A}(t) = 0 ↔ A(t) = a_1 + a_2t )

i.e., amplitude is a linear function. so the solution must be of the form:

x(t)=a1eβt+a2teβt(TeX formula:  x(t) = a_1 e^{-βt} + a_2 te^{-βt} )

graphically, the spring oscillates only once:

(Image "Figure 25")
Figure 25 - damped harmonic oscillating movement, single overshoot.

overdamped: ω0 < β

λ=β±iω02β2=β±i2β2ω02(TeX formula:  λ = -β ± i\sqrt{ω_0^2 - β^2} = -β ± i^2\sqrt{β^2 - ω_0^2} ∈ ℝ )

x(t)=Aeβt[Aeωt+Beωt](TeX formula:  x(t) = Ae^{-βt} \left[ Ae^{ωt} + Be^{-ωt} \right] )

because β>ω(TeX formula: β>ω) , Aeωt(TeX formula: Ae^{ωt}) is overshadowed by Beωt(TeX formula: Be^{-ωt}) . the plot never gets to complete an oscillation:

(Image "Figure 26")
Figure 26 - overdamped harmonic oscillating movement

energy input

the damped model will get augmented with the influence of external stimuli, driving our cognitive-unit model to activity. this is known as a driven harmonic oscillator. let's go with an engine rotating at frequency ω̃(TeX formula: \tilde{ω}) and amplitude F, which in turn pushes and pulls the rest of the system:

x(t)̈+2βx(t)̇+ω02x(t)=Fcos(ω̃t)(TeX formula:  \ddot{x(t)} + 2β\dot{x(t)} + ω_0^2 x(t) = Fcos(\tilde{ω}t) )

the equation isn't homogeneous anymore. the solution is the sum of the general and particular solutions: the homogeneous one is of the type eβt(TeX formula: e^{-βt}) and eventually dissipates. the particular one oscillates forever:

xp(t)=Aeiω̃t(TeX formula:  x_p(t) = Ae^{i\tilde{ω}t} )


(Image "Figure 30")
Figure 30 - source

starting from an exploration of the steady state ("final conditions"), we will show that amplitude is a function of the engine's frequency, and that there are privileged frequencies for which steady-state amplitude forms maxima or singularities: resonance frequencies.

xp(t)̇=iω̃Aeiω̃t(TeX formula:  \dot{x_p(t)} = i\tilde{ω}Ae^{i\tilde{ω}t} )

xp(t)̈=ω̃2Aeiω̃t(TeX formula:  \ddot{x_p(t)} = -\tilde{ω}^2 Ae^{i\tilde{ω}t} )

so we rewrite the full differential equation:

[ω̃2A+2βiAω̃+ω02A]eiω̃t=Feiω̃t(TeX formula:  \left[ -\tilde{ω}^2A + 2βiA\tilde{ω} + ω_0^2A \right] e^{i\tilde{ω}t} = Fe^{i\tilde{ω}t} )

[ω02ω̃2+2βiω̃]A=F(TeX formula:  \left[ ω_0^2 - \tilde{ω}^2 + 2βi\tilde{ω} \right] A = F )

A(ω̃)=F[ω02ω̃2+i2βω̃](TeX formula:  A(\tilde{ω}) = \frac{F}{\left[ ω_0^2 - \tilde{ω}^2 + i2β\tilde{ω} \right]} )

=F[(ω02ω̃2)+i2βω̃][(ω02ω̃2)i2βω̃][(ω02ω̃2)i2βω̃](TeX formula:                = \frac{F}{\left[ (ω_0^2 - \tilde{ω}^2) + i2β\tilde{ω} \right]} \frac{\left[ (ω_0^2 - \tilde{ω}^2) - i2β\tilde{ω} \right]}{\left[ (ω_0^2 - \tilde{ω}^2) - i2β\tilde{ω} \right]} )

=F[(ω02ω̃2)2βiω̃][(ω02ω̃2)2+(2βω̃)2](TeX formula:                = \frac{F \left[ (ω_0^2 - \tilde{ω}^2) - 2βi\tilde{ω} \right]}{\left[ (ω_0^2 - \tilde{ω}^2)^2 + (2β\tilde{ω})^2 \right]} )

(...magic happens here...)

|A(ω̃)|=F(ω02ω̃2)2+(2βω̃)2(TeX formula:  |A(\tilde{ω})| = \frac{F}{\sqrt{(ω_0^2 - \tilde{ω}^2)^2 + (2β\tilde{ω})^2}} )

(Image "Figure 27")
Figure 27 - steady-state amplitude as a function of energy-source frequency

phase space

phase space contains all information of a dynamical system, with a different axis for each variable of interest. each point in it represents a state of the whole system, and a trajectory in this virtual space corresponds to the evolution of the system.

signal: disregarding noise, taking a series of measurements is equivalent to observing variation in a single dimension of phase space. graphically, this is a projection of the system upon some vector in phase space.

(Image "Figure 31")
Figure 31 - Simple harmonic oscillator (in real space) and associated quantities (source)

(Image "Figure 5")
Figure 5 - Black dot: state trajectory in a phase space with two variables (position and momentum, for instance). Red and blue: measurements of such variables, or of two mutually independent linear combinations of them thereof. (source)

Fourier analysis

linear independence of Fourier terms

let vn(x)=12aeinπax(TeX formula: v_n(x) = \frac{1}{\sqrt{2a}} e^{i \frac{nπ}{a}x}) with n=0,±1,±2,...(TeX formula: n = 0, ±1, ±2, ...) be functions on the interval [a,a](TeX formula: \left[−a, a\right]) . prove that vn·vm=δnm(TeX formula: v_n · v_m = δ_{nm}) , given the proper definition of the dot product. δnm={1n=m0nm(TeX formula: δ_{nm} = \left\{ \begin{array}{cc} 1 & ⇔ n=m \\ 0 & ⇔ n≠m \\ \end{array} \right.) is Dirac's delta distribution.

let Vn·Vm=aadxVn(x)Vm(x)(TeX formula: V_n·V_m = ∫_{-a}^{a} dx\; V_n^⋆(x) V_m(x)) be the scalar product for functions of the vector basis defined by Vn(x)(TeX formula: V_n(x)) .

proof by modus ponens follows:

  • restatement of conditional premise

given the above definitions; if Vn·Vm=δnm(TeX formula: V_n·V_m = δ_{nm}) , then Vn(x)(TeX formula: V_n(x)) is an orthogonal basis. that is:

nm((Vn·Vn=1)(Vn·Vm=0;nm)Vn(x) is orthogonal)(TeX formula:  ∀n∀m \; ((V_n·V_n = 1) ∧ (V_n·V_m = 0; n≠m) ⇒ V_n(x) \text{ is orthogonal}) )

  • proof of the antecedent premise

indeed: nm((Vn·Vn=1)(Vn·Vm=0))(TeX formula: ∀n∀m \; ((V_n·V_n = 1) ∧ (V_n·V_m = 0)))


(aadxVn(x)Vn(x)=1)(aadxVn(x)Vm(x)=0)(TeX formula:  ⊢ \left( ∫_{-a}^{a} dx\; V_n^⋆(x) V_n(x) = 1 \right) ∧ \left( ∫_{-a}^{a} dx\; V_n^⋆(x) V_m(x) = 0 \right) )

(aadx12aeinπx/a12aeinπx/a=1)(aadx12aeinπx/a12aeimπx/a=0)(TeX formula:  ⊢ \left( ∫_{-a}^{a} dx\; \frac{1}{\sqrt{2a}}e^{inπx/a} \frac{1}{\sqrt{2a}}e^{-inπx/a} = 1 \right) ∧ \left( ∫_{-a}^{a} dx\; \frac{1}{\sqrt{2a}}e^{inπx/a} \frac{1}{\sqrt{2a}}e^{-imπx/a} = 0 \right) )

(12aaadxeinπx/aeinπx/a=1)(12aaadxeinπx/aeimπx/a=0)(TeX formula:  ⊢ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; e^{inπx/a} e^{-inπx/a} = 1 \right) ∧ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; e^{inπx/a} e^{-imπx/a} = 0 \right) )

(12aaadxe(inπx/a)(inπx/a)=1)(12aaadxe(inπx/a)(imπx/a)=0)(TeX formula:  ⊢ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; e^{(inπx/a)-(inπx/a)} = 1 \right) ∧ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; e^{(inπx/a)-(imπx/a)} = 0 \right) )

(12aaadxe0=1)(12aaadxeiπx(nm)/a=0)(TeX formula:  ⊢ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; e^0 = 1 \right) ∧ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; e^{iπx(n-m)/a} = 0 \right) )

from Euler's formula:

(12aaadx=1)(12aaadx(cos(πx(nm)a)+isen(πx(nm)a))=0)(TeX formula:  ⊢ \left( \frac{1}{2a} ∫_{-a}^{a} dx = 1 \right) ∧ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; (cos(\frac{πx(n-m)}{a}) + i\; sen(\frac{πx(n-m)}{a})) = 0 \right) )

(x2a|aa=1)(12aaadx(cos(πx(nm)a)+isen(πx(nm)a))=0)(TeX formula:  ⊢ \left( \left. \frac{x}{2a} \right|_{-a}^{a} = 1 \right) ∧ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; (cos(\frac{πx(n-m)}{a}) + i\; sen(\frac{πx(n-m)}{a})) = 0 \right) )

from the integration of the odd function sen(πx(nm)a)(TeX formula: sen(\frac{πx(n-m)}{a})) over a zero-centered interval, [a,a](TeX formula: [-a, a]) :

(a2aa2a=1)(12aaadx(cos(πx(nm)a)+0)=0)(TeX formula:  ⊢ \left( \frac{a}{2a} - \frac{-a}{2a} = 1 \right) ∧ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; (cos(\frac{πx(n-m)}{a}) + 0) = 0 \right) )

(1=1)(12aaadxcos(πx(nm)a)=0)(TeX formula:  ⊢ \left( 1 = 1 \right) ∧ \left( \frac{1}{2a} ∫_{-a}^{a} dx\; cos(\frac{πx(n-m)}{a}) = 0 \right) )

let u=πx(nm)a(TeX formula: u = \frac{πx(n-m)}{a}) ; then dudx=π(mn)a(TeX formula: \frac{du}{dx} = \frac{π(m-n)}{a}) .

(12aaacos(u)π(nm)adxaπ(nm)=0)(TeX formula:  ⊢ ⊤ ∧ \left( \frac{1}{2a} ∫_{-a}^{a} cos(u)\frac{π(n-m)}{a} \; dx\; \frac{a}{π(n-m)} = 0 \right) )

(a2aπ(nm)aacos(u)du=0)(TeX formula:  ⊢ ⊤ ∧ \left( \frac{a}{2aπ(n-m)} ∫_{-a}^{a} cos(u) \; du = 0 \right) )

(a2aπ(nm)[sen(u)]aa=0)(TeX formula:  ⊢ ⊤ ∧ \left( \frac{a}{2aπ(n-m)} \left[ sen(u) \right]_{-a}^{a} = 0 \right) )

(a2aπ(nm)[sen(π(a)(nm)a)sen(π(a)(nm)a)]=0)(TeX formula:  ⊢ ⊤ ∧ \left( \frac{a}{2aπ(n-m)} \left[ sen(\frac{π(a)(n-m)}{a}) - sen(\frac{π(-a)(n-m)}{a}) \right] = 0 \right) )

note that (nm)(TeX formula: (n-m) ∈ ℤ) . nm(sen(π(nm))=0)(TeX formula: ⊢ ∀n∀m \; \left( sen(π(n-m)) = 0 \right))

(a2aπ(nm)[00]=0)(TeX formula:  ⊢ ⊤ ∧ \left( \frac{a}{2aπ(n-m)} \left[0 - 0 \right] = 0 \right) )

(TeX formula:  ⊢ ⊤ ∧ ⊤ )

  • conclusion

Vn(x)(TeX formula: ⊢ V_n(x)) is an orthogonal basis.

Fourier transform

the Fourier series represents a periodic function as a discrete linear combination of sines and cosines . on the other hand, the Fourier transform is capable of representing a more general set of functions, not restricted to being periodic. how is that achieved?

(for convenience, we will start from the series in complex exponential notation)

f(x)f0=n=1[ancos(nπxa)+bnsen(nπxa)](TeX formula:  f(x) -f_0  = ∑_{n=1}^∞ \left[ a_n cos\left( \frac{nπx}{a} \right) + b_n sen\left( \frac{nπx}{a} \right) \right] )

=(12aaaf(x)einπx/adx)αneinπx/a(TeX formula:  = ∑_{-∞}^∞ \underbrace{ \left( \frac{1}{2a} ∫_{-a}^{a} f(x)e^{-inπx/a} \;dx \right) }_{α_n} e^{inπx/a} )

if f(x)(TeX formula: f(x)) weren't periodic, the series would be representing it only at [a,a](TeX formula: [-a, a]) . nonetheless, we can make the interval arbitrarily large:

=lima(12aaaf(x)einπx/adx)αneinπx/a(TeX formula:  = \lim_{a → ∞} ∑_{-∞}^∞ \underbrace{ \left( \frac{1}{2a} ∫_{-a}^{a} f(x)e^{-inπx/a} \;dx \right) }_{α_n} e^{inπx/a} )

let kn=nπa(TeX formula: k_n = \frac{nπ}{a}) , then Δk=πa(TeX formula: Δk = \frac{π}{a}) . substituting in the last expression yields:

=limaΔk(Δk)1(12a)(aaf(x)eiknxdx)(eiknx)(TeX formula:   = \lim_{a → ∞} ∑_{-∞}^∞ Δk \left( Δk \right)^{-1} \left( \frac{1}{2a} \right) \left( ∫_{-a}^{a} f(x)e^{-ik_nx} \;dx \right) \left( e^{ik_nx} \right) )

=lima(πa)(a2aπ)(aaf(x)eiknxdx)(eiknx)(TeX formula:   = \lim_{a → ∞} ∑_{-∞}^∞ \left( \frac{π}{a} \right) \left( \frac{a}{2aπ} \right) \left( ∫_{-a}^{a} f(x)e^{-ik_nx} \;dx \right) \left( e^{ik_nx} \right) )

=limΔk012πΔk(f(x)eiknxdx)(eiknx)(TeX formula:   = \lim_{Δk → 0} \;\frac{1}{2π} ∑_{-∞}^∞ Δk \left( ∫_{-∞}^{∞} f(x)e^{-ik_nx} \;dx \right) \left( e^{ik_nx} \right) )

which is a Riemman sum for variable kn(TeX formula: k_n) :

=12πdkn(f(x)eiknxdx)(eiknx)(TeX formula:   = \frac{1}{2π} ∫_{-∞}^∞ dk_n\; \left( ∫_{-∞}^{∞} f(x)e^{-ik_nx} \;dx \right) \left( e^{ik_nx} \right) )

=(12π)2(f(x)eiknxdx)eiknxdkn(TeX formula:   = \left(\sqrt{\frac{1}{2π}}\right)^2 ∫_{-∞}^∞ \left( ∫_{-∞}^{∞} f(x)e^{-ik_nx} \;dx \right) e^{ik_nx} \;dk_n )

=12π(12πf(x)eiknxdx)Fourier transformeiknxdknInverse Fourier transform(TeX formula:   = \underbrace{ \frac{1}{\sqrt{2π}} ∫_{-∞}^∞ \underbrace{ \left( \frac{1}{\sqrt{2π}} ∫_{-∞}^{∞} f(x)e^{-ik_nx} \;dx \right) }_{\text{Fourier transform}} e^{ik_nx} \;dk_n }_{\text{Inverse Fourier transform}} )


which symmetry properties does H(f) has if h(t) is:

  1. odd real
  2. odd imaginary
  3. odd complex

and which symmetry properties does h(t) has if H(f ) is:

  1. odd real
  2. even complex
  3. even imaginary

recall that:

H(f)=[r(t)+ij(t)][cos(2πft)+isen(2πft)]dt(TeX formula:  H(f) = ∫_{-∞}^∞ [r(t)+ij(t)][cos(2πft) + isen(2πft)]dt )

=[r(t)cos(2πft)+j(t)sen(2πft)]dt+i[j(t)cos(2πft)r(t)sen(2πft)]dt;(TeX formula:       =  ∫_{-∞}^∞ [r(t)cos(2πft)+j(t)sen(2πft)]dt + i∫_{-∞}^∞ [j(t)cos(2πft)-r(t)sen(2πft)]dt; )

h(t)=[R(f)+iJ(f)][cos(2πft)+isen(2πft)]dt(TeX formula:  h(t) = ∫_{-∞}^∞ [R(f)+iJ(f)][cos(2πft) + isen(2πft)]dt )

=[R(f)cos(2πft)+J(f)sen(2πft)]dt+i[J(f)cos(2πft)R(f)sen(2πft)]dt.(TeX formula:       =  ∫_{-∞}^∞ [R(f)cos(2πft)+J(f)sen(2πft)]dt + i∫_{-∞}^∞ [J(f)cos(2πft)-R(f)sen(2πft)]dt. )

  • odd real h(t)

we get rid of the imaginary part of H(f):

H(f)=[r(t)cos(2πft)+0)]dt+i[0r(t)sen(2πft)]dt(TeX formula:  H(f)  =  ∫_{-∞}^∞ [r(t)cos(2πft)+{0})]dt + i∫_{-∞}^∞ [{0}-r(t)sen(2πft)]dt )

from integration of an odd function (r(t)cos(2πft)) at a symmetric interval:

H(f)=0+i[r(t)sen(2πft)]dt(TeX formula:  H(f)  = {0} + i∫_{-∞}^∞ [r(t)sen(2πft)]dt )

sen(2πft)(TeX formula: sen(2πft)) is odd. H(f)(TeX formula: ∴ { H(f)}) is odd and imaginary.

  • odd imaginary h(t)

same reasoning:

H(f)=[0+j(t)sen(2πft)]dt+i[j(t)cos(2πft)0]dt(TeX formula:  H(f)  =  ∫_{-∞}^∞ [{0}+j(t)sen(2πft)]dt + i∫_{-∞}^∞ [j(t)cos(2πft)-{0}]dt )

H(f)=[j(t)sen(2πft)]dt+0(TeX formula:  H(f)  =  ∫_{-∞}^∞ [j(t)sen(2πft)]dt + {0} )

sen(2πft)(TeX formula: sen(2πft)) is odd. H(f)(TeX formula: ∴ {H(f)}) is odd and real.

  • odd h(t)

H(f)=[0+j(t)sen(2πft)]dt+i[0r(t)sen(2πft)]dt(TeX formula:  H(f) =  ∫_{-∞}^∞ [{0}+j(t)sen(2πft)]dt + i∫_{-∞}^∞ [{0}-r(t)sen(2πft)]dt )

sen(2πft)(TeX formula: sen(2πft)) is odd. H(f)(TeX formula: ∴ {H(f)}) is odd and complex.

  • odd real H(f)

h(t)=[R(f)cos(2πft)+0]dt+i[0R(f)sen(2πft)]dt(TeX formula:  h(t) =  ∫_{-∞}^∞ [R(f)cos(2πft)+{0}]dt + i∫_{-∞}^∞ [{0}-R(f)sen(2πft)]dt )

h(t)=0+iR(f)sen(2πft)dt(TeX formula:  h(t) =  {0} + i∫_{-∞}^∞ R(f)sen(2πft)dt )

sen(2πft)(TeX formula: sen(2πft)) is odd. h(t)(TeX formula: ∴ {h(t)}) is imaginary and odd.

  • complex even H(f)

h(t)=[R(f)cos(2πft)+0]dt+i[J(f)cos(2πft)0]dt(TeX formula:  h(t) =  ∫_{-∞}^∞ [R(f)cos(2πft)+{0}]dt + i∫_{-∞}^∞ [J(f)cos(2πft)-{0}]dt )

cos(2πft)(TeX formula: cos(2πft)) is even. h(t)(TeX formula: ∴ {h(t)}) is even and complex.

  • imaginary even H(f)

h(t)=[0+J(f)sen(2πft)]dt+i[J(f)cos(2πft)0]dt(TeX formula:  h(t) =  ∫_{-∞}^∞ [{0}+J(f)sen(2πft)]dt + i∫_{-∞}^∞ [J(f)cos(2πft)-{0}]dt )

h(t)=0+iJ(f)cos(2πft)dt(TeX formula:  h(t) =  {0} + i∫_{-∞}^∞ J(f)cos(2πft)dt )

cos(2πft)(TeX formula: cos(2πft)) is even. h(t)(TeX formula: ∴ {h(t)}) is even and imaginary.

uncertainty principle

the more short-lived the wave in the time domain, the more widespread its power spectrum (a.k.a. frequency spectrum). vice versa.

convolution theorem

y(t)=x(t)h(t)=x(τ)h(tτ)dτ(TeX formula: y(t) = x(t)∗h(t) = ∫_{-∞}^∞ x(τ)h(t-τ)dτ) is the convolution of functions x(t) y h(t). F[y(t)] denotes its Fourier transform:

F[y(t)]=y(t)e2πiftdt(TeX formula:  F[y(t)] = ∫_{-∞}^∞ y(t)e^{-2πift}dt )

=[x(τ)h(tτ)dτ]e2πiftdt(TeX formula:          = ∫_{-∞}^∞ \left[ ∫_{-∞}^∞ x(τ)h(t-τ)dτ \right] e^{-2πift}dt )

=x(τ)[h(tτ)e2πiftdt]dτ(TeX formula:          = ∫_{-∞}^∞ x(τ) \left[ ∫_{-∞}^∞ h(t-τ) e^{-2πift}dt \right]dτ )

let u=t-τ.

=x(τ)[h(u)e2πif(u+τ)du]dτ(TeX formula:          = ∫_{-∞}^∞ x(τ) \left[ ∫_{-∞}^∞ h(u)e^{-2πif(u+τ)}du \right]dτ )

=x(τ)[h(u)e2πifue2πifτdu]dτ(TeX formula:          = ∫_{-∞}^∞ x(τ) \left[ ∫_{-∞}^∞ h(u)e^{-2πifu}e^{-2πifτ}du \right]dτ )

=[x(τ)e2πifτdτ][h(u)e2πifudu](TeX formula:          = \left[ ∫_{-∞}^∞ x(τ) e^{-2πifτ}dτ \right] \left[ ∫_{-∞}^∞ h(u)e^{-2πifu}du \right] )

=X(f)H(f)(TeX formula:  = X(f)H(f))


x(t)h(t)=F1[X(f)H(f)](TeX formula:  x(t)∗h(t) = F^{-1} \left[ X(f)H(f) \right] )

time-domain convolution is equivalent to frequency-domain product!

cross-correlation theorem

zx,h(t)=x(τ)h(t+τ)dτ(TeX formula: z_{x,h}(t) = ∫_{-∞}^∞ x(τ)h(t+τ)dτ) is the correlation (covariance, actually) of functions x(t) y h(t) as a function of a time lag between them. F[z(t)] denotes its Fourier transform:

F[z(t)]=z(t)e2πiftdt(TeX formula:  F[z(t)] = ∫_{-∞}^∞ z(t)e^{-2πift}dt )

=[x(τ)h(t+τ)dτ]e2πiftdt(TeX formula:          = ∫_{-∞}^∞ \left[ ∫_{-∞}^∞ x(τ)h(t+τ)dτ \right] e^{-2πift}dt )

=x(τ)[h(t+τ)e2πiftdt]dτ(TeX formula:          = ∫_{-∞}^∞ x(τ) \left[ ∫_{-∞}^∞ h(t+τ) e^{-2πift}dt \right]dτ )

let u=t+τ.

=x(τ)[h(u)e2πif(uτ)du]dτ(TeX formula:          = ∫_{-∞}^∞ x(τ) \left[ ∫_{-∞}^∞ h(u)e^{-2πif(u-τ)}du \right]dτ )

=x(τ)[h(u)e2πifue2πifτdu]dτ(TeX formula:          = ∫_{-∞}^∞ x(τ) \left[ ∫_{-∞}^∞ h(u)e^{-2πifu}e^{2πifτ}du \right]dτ )

=[x(τ)e2πifτdτ][h(u)e2πifudu](TeX formula:          = \left[ ∫_{-∞}^∞ x(τ) e^{2πifτ}dτ \right] \left[ ∫_{-∞}^∞ h(u)e^{-2πifu}du \right] )

=X*(f)H(f)(TeX formula:  = X^*(f)H(f))


zx,h=F1[X*(f)H(f)](TeX formula:  z_{x,h} = F^{-1} \left[ X^*(f)H(f) \right] )

cross-correlation is equivalent to frequency domain product (with conjugate X(f)).

discrete Fourier transform

the computation of a DFT is done segment by segment. the signal is multiplied by some "window" function (e.g. a square pulse) which is zero outside the window. this is known as windowing.

the effect of using a finite window is that unexisting frequencies appear at the power spectrum (around the original frequencies). this phenomenon is known as leakage. the explanation is evident from the #convolution theorem: one can use similar reasoning to show that h(t)x(t)=F1[H(f)X(f)](TeX formula: h(t)x(t) = F^{-1} \left[ H(f)∗X(f) \right]) . therefore, multiplication by a window will induce a convolution in the frequency domain.

low frequencies (and therefore #non-stationary systems) are the most problematic to compute a DFT for, since the window may not be long enough to capture them. there's no single #test for stationarity. it can be tested for nonetheless, measuring the effect of varying window sizes on different parameters (centrality and spread statistics, Lyapunov exponent, etc.).

Whittaker–Shannon (sinc) interpolation

what function should we convolve the signal's Fourier transform with in order to obtain the same result as windowing it with a rectangular function? we will analyze the closely-related case of a square pulse window:

calculate the first three terms of the Fourier series for function f(x)={12nπx(2n+1)π1(2n+1)πx(2n+2)π(TeX formula: f(x) = \left\{ \begin{array}{cc} 1 & ⇔ 2nπ ≤ x ≤ (2n+1)π \\ -1 & ⇔ (2n+1)π ≤ x ≤ (2n+2)π \\ \end{array} \right.) , with n=0,±1,±2,...(TeX formula: n = 0, ±1, ±2, ...)

in terms of the Fourier series:

f(x)=12aaaf(x)dxf0+n=1[(1aaadxf(x)cos(nπxa))ancos(nπxa)+(1aaadxf(x)sen(nπxa))bnsen(nπxa)](TeX formula:  f(x) = \underbrace{ \frac{1}{2a}∫_{-a}^{a}f(x)dx}_{f_0} + ∑_{n=1}^∞ \left[ \underbrace{ \left( \frac{1}{a} ∫_{-a}^{a}dx\; f(x)cos\left( \frac{nπx}{a} \right) \right)}_{a_n} cos\left( \frac{nπx}{a} \right) + \underbrace{ \left( \frac{1}{a} ∫_{-a}^{a}dx\; f(x)sen\left( \frac{nπx}{a} \right) \right)}_{b_n} sen\left( \frac{nπx}{a} \right) \right] )

from the integration of odd functions ( f(x)cos(nπxa)(TeX formula: f(x)cos\left(\frac{nπx}{a}\right)) and f(x)(TeX formula: f(x)) ) at symmetrical interval ( [a,a](TeX formula: [-a, a]) ):

f(x)=0f0+n=1[(0)an(cos(nπxa))+(1aaadxf(x)sen(nπxa))bnsen(nπxa)](TeX formula:  ⊢ f(x) = \underbrace{0}_{f_0} + ∑_{n=1}^∞ \left[ \underbrace{\left(0\right)}_{a_n} \left( cos\left( \frac{nπx}{a} \right)\right) + \underbrace{ \left( \frac{1}{a} ∫_{-a}^{a}dx\; f(x)sen\left( \frac{nπx}{a} \right) \right)}_{b_n} sen\left( \frac{nπx}{a} \right) \right] )

f(x)=n=1(1aaadxf(x)sen(nπxa))bnsen(nπxa)(TeX formula:  ⊢ f(x) = ∑_{n=1}^∞ \underbrace{ \left( \frac{1}{a} ∫_{-a}^{a}dx\; f(x)sen\left( \frac{nπx}{a} \right) \right)}_{b_n} sen\left( \frac{nπx}{a} \right) )

we will consider the interval [a,a]=[π,π](TeX formula: [-a, a] = [-π, π]) , which includes a full period of f(x)sen(nπx/π)(TeX formula: f(x)sen(nπx/π)) for any n. therefore:

f(x)=n=1(1πππdxf(x)sen(nπxπ))bnsen(nπxπ)(TeX formula:  ⊢ f(x) = ∑_{n=1}^∞ \underbrace{ \left( \frac{1}{π} ∫_{-π}^{π}dx\; f(x)sen\left( \frac{nπx}{π} \right) \right)}_{b_n} sen\left( \frac{nπx}{π} \right) )

f(x)=n=11π(π0dx(1)sen(nx)+0πdx(+1)sen(nx))bnsen(nx)(TeX formula:  ⊢ f(x) = ∑_{n=1}^∞ \underbrace{ \frac{1}{π} \left( ∫_{-π}^{0} dx\; (-1)sen(nx) + ∫_{0}^{π}  dx\; (+1)sen(nx) \right)}_{b_n} sen(nx) )

f(x)=n=11π(1n[cos(nx)]π01n[cos(nx)]0π)bnsen(nx)(TeX formula:  ⊢ f(x) = ∑_{n=1}^∞ \underbrace{ \frac{1}{π} \left( \frac{1}{n} \left[ cos(nx) \right]_{-π}^{0} - \frac{1}{n} \left[ cos(nx) \right]_{0}^{π} \right)}_{b_n} sen(nx) )

f(x)=n=11nπ(cos(0)cos(nπ)cos(nπ)+cos(0))bnsen(nx)(TeX formula:  ⊢ f(x) = ∑_{n=1}^∞ \underbrace{ \frac{1}{nπ} \left( cos(0) - cos(-nπ) -cos(nπ) + cos(0) \right)}_{b_n} sen(nx) )

f(x)=n=11nπ(2cos(0)2cos(nπ))bnsen(nx)(TeX formula:  ⊢ f(x) = ∑_{n=1}^∞ \underbrace{ \frac{1}{nπ} \left( 2cos(0) - 2cos(nπ) \right)}_{b_n} sen(nx) )

f(x)=n=11nπ(22cos(nπ))bnsen(nx)(TeX formula:  ⊢ f(x) = ∑_{n=1}^∞ \underbrace{ \frac{1}{nπ} \left( 2 - 2cos(nπ) \right)}_{b_n} sen(nx) )

f(x)=n=11nπ({22n even2+2n odd)bnsen(nx)(TeX formula:  ⊢ f(x) = ∑_{n=1}^∞ \underbrace{ \frac{1}{nπ} \left( \left\{ \begin{array}{cc} 2-2 & ⇔ n \text{ even} \\ 2+2 & ⇔ n \text{ odd} \\ \end{array} \right. \right)}_{b_n} sen(nx) )

f(x)=n=1{0n even4nπsen(nx)n odd(TeX formula:  ⊢ f(x) = ∑_{n=1}^∞ \left\{ \begin{array}{cc} 0 & ⇔ n \text{ even} \\ \frac{4}{nπ}sen(nx) & ⇔ n \text{ odd} \\ \end{array} \right. )

f(x)=n=04π(2n+1)sen(x(2n+1))(TeX formula:  ⊢ f(x) = ∑_{n=0}^∞ \frac{4}{π(2n+1)}sen(x(2n+1)) )

=4πn=0sen(x(2n+1))2n+1(TeX formula:         = \frac{4}{π} ∑_{n=0}^∞ \frac{sen(x(2n+1))}{2n+1} )

f(x)=0f0+4πsen(x)+0+43πsen(3x)+0+45πsen(5x)+...(TeX formula:  ⊢ f(x) = \underbrace{0}_{f_0} + \frac{4}{π}sen(x) + 0 + \frac{4}{3π}sen(3x) + 0 + \frac{4}{5π}sen(5x) + ...)

f(x) could be just a discrete sample of a continuous signal, but because convolution with sinc(x) produces a continuous function (a sum of sine waves), the operation will fill in all the missing time points.

Nyquist-Shannon sampling theorem

continuous functions of bounded frequency bandwidth ("band-limited") contain a bounded amount of information. they can be perfectly represented using a function of countable (i.e. discrete) sets and an interpolation operation.

the Nyquist frequency is the maximum sampling frequency at which the digitization of a continuous signal still losses fidelity. that is, aliasing occurs:

fNyquist=2fmax(TeX formula:  f_{Nyquist} = 2 \; f_{max} )

sampling frequency should be greater than twice the maximum frequency in the power spectrum of the original signal.

undersampling results in the unconsidered high frequencies aliasing (adding up) over fNyquist(ffNyquist)(TeX formula: f_{Nyquist} - (f - f_{Nyquist})) at the frequency domain. oversampling is harmless, beyond being a waste of disk space.

testing for undersampling

what if the maximum frequency isn't known in advance? it is possible to test that the maximum frequency has been considered, because aliasing will make two power spectra of the same signal look dissimilar under two different sampling frequencies:

  1. register using two sampling frequencies, one greater than the one to test for the Nyquist criterion.
  2. compute both power spectra ( |F[x(t)]|2(TeX formula: \left|F[x(t)]\right|^2) ).
  3. if they are simply scaled versions of one another, aliasing didn't occur. i.e. the ratio between two features (e.g. maxima) should stay constant.

other time-series topics

stochastic correlations

suppose the signal is generated by an stochastic process. if low frequencies dominate the power spectrum, then the process is non-stationary (time-series mean value will be trending over long periods of time) and autocorrelation will be large.

corollary: even two independent sources of noise can display high statistical dependence.

(Image "Figure 6")

averaging the time series would be no good. prediction should take autocovariance times into account, detrending methods are also available.


  • strict/particular stationarity: system's parameters are truly constant. too hard a requirement, since we must know the system's differential equations in advance.

  • weak/general stationarity: empirical transition probabilities in the F map stay constant. affected by low-band frequencies. alternatively, 1st moment and autocovariance stay constant and finite. this requires greater data availability, because short acquisition times may not be enough to establish statistical relevance.

linear prediction

suppose there's a time series s=(s1,s2,...,sN)(TeX formula: \vec{s} = (s_1, s_2, ..., s_N)) whose value sN+1(TeX formula: s_{N+1}) we want to predict. in a linear prediction algorithm we define ŝN+1(TeX formula: \hat{s}_{N+1}) as a linear combination of the last m points:

ŝN+1=j=1majsnm+j(TeX formula:  \hat{s}_{N+1} = ∑_{j=1}^m a_j s_{n-m+j} )

(Image "Figure 7")

it's clear that there are as many linear combinations of the last m points as there are permutations (with substitution) of coefficients aj(TeX formula: a_j) . in order to find the optimal coefficients we try to minimize the mean square error predicting the previous m - 1 values:

n=mN1(ŝn+1sn+1)2(TeX formula:  ∑_{n=m}^{N-1} (\hat{s}_{n+1} - s_{n+1})^2 )


  • in-sample accuracy: estimated from predictions made for the same data segment used during training/optimisation phase. it doesn't reflect the algorithm's true predictive power (overestimate), since tests aren't predictions but "post-dictions".
  • out-sample accuracy: error estimated from true predictions, using the values of data never "seen" before by our predictive model.

phase-space methods

dissipative systems and attractors

dissipative system: on average, the volume of the manifold describing the state trajectory contracts under dynamics, after certain initial conditions. alternatively, it can be said that #energy loss creates an attractor for that system, reducing the set of possible states that it can take with time. nonetheless, the system still operates far from thermodynamic equilibrium.

in order to verify the presence of an attractor, the L1 metric of the determinant of the system's Jacobian (i.e., of the state transition function's (F) partial derivatives) must be smaller than 1, on average:

|det((f1x1f1xnfnx1fnxn))|<1(TeX formula:  \left| det \left( \begin{pmatrix} \frac{∂f_1}{∂x_1} & ⋯ & \frac{∂f_1}{∂x_n} \\ ⋮                &   & ⋮                \\ \frac{∂f_n}{∂x_1} & ⋯ & \frac{∂f_n}{∂x_n} \\ \end{pmatrix} \right) \right| < 1 )

this means that the transformation's scaling factor produced by change on the different directions is a contracting one. visually, if phase space contains an attractor then the system is dissipative.

visual inspection

  • Poincaré section: allows to tell whether the trajectory behavior is classically deterministic, chaotic deterministic or random. it amounts to finding the intersection between trajectories and a secant plane.

(Image "Figure 8")

  • stroboscopic map: graphical representation of Poincaré's section technique for attractors. section is adjusted until the scatter plot is maximally decorrelated. also used for qualitative detection of strange/fractal attractors.

(Image "Figure 9")

  • "phase portrait": also known as delay representation. it is a bidimensional representation of any m-dimensional phase space. each point is given by a pair of subsequent states in phase space (see also #Takens-Whitney embedding theorem). it eases the visualisation of multidimensional systems and helps identify determinism, periodicity, linearity and chaos.

(Image "Figure 10")

Takens-Whitney embedding theorem

it is possible to reconstruct a phase space manifold that is isomorphic to the original one, even when all information we have is a time series, through the method known as delay reconstruction.


  • stationary data
  • low-dimensional system
  • negligible noise

the basic idea is that for each subset of m equidistant points in the signal, there's an associated unique point in m-dimensional phase space. each value in this subset of the time series becomes a component of the state vector:

(Image "Figure 11")

note that the closer s1(TeX formula: s_1) and s2(TeX formula: s_2) are, the more correlated the reconstructed attractor; rendering it less useful:

(Image "Figure 12")

the spacing parameter among points in the time series is called the time lag τ. to set an adequate value for τ, we want τ > L; with L being the autocovariance length. that is, we want the time between two measurements to be long enough so that they aren't correlated anymore.

finding the right number m of dimensions involves increasing the number of embedding components per point, until the manifold can't unfold anymore. we say there are false neighbors (and therefore that the embedding isn't high-dimensional enough) whenever an increase in m separates seemingly close points considerably in reconstructed phase space.

(Image "Figure 13")

non-linear prediction

unlike the #linear prediction algorithm for time series; a non-linear algorithm doesn't use the last m points combined. rather, it builds upon the intuition that the most relevant states for making a prediction some time Δn after the present are the ones that most resembled the current state. it then averages those similar states after Δn:

ŝN+Δn=1|𝒰(sN)|sn𝒰(sN)sn+Δn(TeX formula:  \hat{s}_{N+Δn} = \frac{1}{\left| 𝒰(\vec{s_N})\right|} ∑_{\vec{s_n} ∈ 𝒰(\vec{s_N})} s_{n+Δn} )

where |𝒰(sN)|(TeX formula: \left| 𝒰(\vec{s_N})\right|) denotes the cardinality of the current state vector's neighborhood (in phase space).

(Image "Figure 14")

as stationarity test

  1. divide the series in intervals so short so as to have lineal segments.

  2. predict and cross-validate using all possible segment pairs.

  3. if some training segments display unusually different performance, then they don't generalise to the whole series. therefore we can conclude that the attractor changed as we swapped training segments; and the time series isn't stationary.

(Image "Figure 15")

as a noise smoother

suppose the measured time series is the sum of its true value and noise:

sn=xn+zn.(TeX formula:  s_n = x_n + z_n.)

where zn(TeX formula: z_n) is the random component. also suppose xn(TeX formula: x_n) and zn(TeX formula: z_n) are independent, and that temporal correlation is short.

we want to substitute sn(TeX formula: s_n) with ŝn(TeX formula: \hat{s}_n) . starting from the #non-linear prediction algorithm:

  1. in phase space, use trajectories which are close to the current one (both in the past and future portions), since the points inside that neighborhood belong to an n-dimensional sphere with center at the vector state sn(TeX formula: s_n) under consideration.

  2. apply the nonlinear prediction algorithm afterwards: average the vectors in the neighborhood. thus the stochastic component will be largely cancelled out.

ŝn0m/2=1|𝒰(sn0)|sn𝒰(sn0)snm/2(TeX formula: \hat{s}_{n_0-m/2}=\frac{1}{\left|𝒰(\vec{s_{n_0}})\right|} ∑_{\vec{s_n}∈𝒰(\vec{s_{n_0}})}s_{n-m/2})

(Image "Figure 16")

chaos theory

Lyapunov coefficient

quantifies chaos levels in a dynamical system. it is the exponent in the exponential divergence measurement of phase space. the system has as many Lyapunov exponents as dimensions, positive ones are chaotic dimensions:

Lyapunov coefficient kind of attractor
λ < 0 stable fixed point
λ = 0 stable limit cycle
λ > 0 chaos
λ → ∞ noise

estimation from time series

it is advisable to perform a #phase portrait beforehand, so as to rule out that the system is nondeterministic.

λ̂(Δt)=1Nn0=1Nln(1|𝒰(sn0)|sn𝒰(sn0)|sn0+Δnsn+Δn|)(TeX formula:  \hat{λ}(Δt) = \frac{1}{N} ∑_{n_0=1}^{N} ln \left( \frac{1}{\left|𝒰(\vec{s_{n_0}})\right|} \; ∑_{\vec{s_n}∈𝒰(\vec{s_{n_0}})} \left| s_{n_0+Δ_n} - s_{n+Δ_n} \right| \right) )


  1. embedding: if the #embedding dimension is too low divergence will be underestimated because of false neighbors.
  2. noise level: overestimated because of noise
  3. #stochastic correlations: if autocorrelation is big, the attractor will look compressed along a diagonal, underestimating true divergence

fractal dimension

strange attractors are the flagship of chaos: trajectories are still deterministic (no crossroads) yet non-cyclic (exact knowledge is required to tell accurately which trajectory the system is in). since some chaotic systems are also #dissipative, all of this implies that a subset of infinitely-dense phase space is being relentlessly filled by a line of infinite detail.

box counting

box counting is one of many methods to assess scaling property of fractals.

let ε be the side of each box (n-dimensional squares), which are in turn used to segment the figure. then let N(ε) be the number of boxes. note that:

limN(ε)N(ε)=(1ε)D(TeX formula:  \lim_{N(ε)→∞} \; N(ε) = \left(\frac{1}{ε}\right)^D )

D being the box-counting fractal dimension.

limN(ε)ln(N(ε))=Dln(1/ε)(TeX formula:  ⊢ \lim_{N(ε)→∞} \; ln(N(ε)) = D \; ln(1/ε) )

D=limN(ε)ln(N(ε))ln(1/ε).(TeX formula:  ⊢ D = \lim_{N(ε)→∞} \; \frac{ln(N(ε))}{ln(1/ε)}. )


(Image "Figure 17")
Figure 17 - Sierpinski's triangle

N(ε) ε
30 20
31 21
32 22
... ...
3n 2n

D=limnln(3n)ln(2n)=limnnln(3)nln(2)=ln(3)ln(2)=1.5849...(TeX formula:  D = \lim_{n→∞} \; \frac{ln(3^n)}{ln(2^n)} = \lim_{n→∞} \; \frac{n\;ln(3)}{n\;ln(2)} = \frac{ln(3)}{ln(2)} = 1.5849... )

correlation dimension

the correlation dimension can be used as an estimator of an attractor's dimension.(under certain circumstances), including fractal dimensions.

the idea now isn't to take the ratio of the number of boxes and their length ε in the limit to infinity. instead, we take the ratio of points (among all possible valid pairs) which fall within the neighborhood (defined by a radius of length ε) of some other point:

D=limε0limNlnC(ε,N)lnε;(TeX formula:  D = \lim_{ε→0}\;\lim_{N→∞} \; \frac{∂ln\;C(ε,N)}{∂lnε};)

where C(N, ϵ) is the correlation sum:

C(N,ϵ)=2(N)(N1)i=1Nj=i+1+nminN𝛩(ϵ||sisj||).(TeX formula:  C(N,ϵ) = \frac{2}{(N)(N-1)} ∑_{i=1}^{N}\;∑_{j=i+1+n_{min}}^N 𝛩(ϵ - ||\vec{s_i}-\vec{s_j}||). )

𝛩 is Heaviside's step function

estimation from time series

given the lack of a phase space, it is necessary to #reconstruct it first. the right choice of delay τ during reconstruction of embedded vectors provides a suitable delay to estimate correlation dimension.

the aforementioned estimator of C(m, ϵ) is biased to underestimate the dimension, because data that are close in a trajectory of phase space tend to retain some temporal correlation.

(Image "Figure 18")
Figure 18 - this figure shows how for point B the attractor seems to have a dimension of 1, however, B's neighbors aren't statistically independent.

the corrected estimator of C(m, ϵ) excludes sufficiently-close vector pairs (in phase space); which usually correspond to subsequent vectors in the time series:

C(m,ϵ)=2(Nnmin)(Nnmin1)i=1Nj=i+1+nminN𝛩(ϵ||sisj||)(TeX formula:  C(m, ϵ) = \frac{2}{(N-n_{min})(N-n_{min}-1)} ∑_{i=1}^{N}\;∑_{j=i+1+n_{min}}^N 𝛩(ϵ - ||\vec{s_i}-\vec{s_j}||) )

with nmin=tmin/Δt(TeX formula: n_{min} = t_{min}/Δt) . tmin(TeX formula: t_{min}) doesn't necessarily have to be the average correlation time.

the following plots from Kantz & Schreiber's book5 come from a map-like system. for a tmin=500steps(TeX formula: t_{min} = 500\;steps) only 2.5% of all pairs is lost in the corrected correlation dimension estimator. this is statistically negligible.

(Image "Figure 19")
Figure 19 - correlation sum (C(m,ϵ)) as a function of the neighborhood threshold (ϵ) for various embeddings. for low-dimensional attractors (small m, upper plots) state vectors are denser, and estimated dimension is consequently greater. nonetheless, this only happens up to a certain point, when m isn't big enough to capture all dimensions of the system. Log scale allows scaling to be identified as straight lines. for small ϵ's point neighborhoods are indistinguishable from noise, so dimension slopes look exaggerated, after which a scaling region arises.

(Image "Figure 20")
Figure 20 - correlation dimension (D(m,ϵ)). as in the previous figure, scaling is observed for ϵ > 100, which makes those values a good choice (we don't want the estimation to depend on a parameter). the scaling region reveals that the hypothetical strange attractor is of dimension less than 2. if ϵ → ∞ then the whole of phase space reduces to a single point, as far as the correlation sum estimation is concerned. that's why dimension eventually drops to 0 (macroscopic regime).

possible error sources:

  • the correlation is a well-defined construction (for finite data sets), whereas the correlation dimension is just an extrapolation that must be handled with care.
  • it isn't enough to observe fractal scaling for a single choice of m. saturation should also occur for many values.
  • if measurements are too noisy, a scaling regime may never be observed. the dimension estimate can be as big as m whenever ϵ is small enough, so that only noise is captured by it.
  • noise affects scales 3 orders of magnitude as big as its own level. in order to observe a scaling region, noise should be at most 1/8 of the whole attractor.
  • it is therefore advisable to use a #noise reduction algorithm in advance.
  • certain stochastic systems may pass for low-dimension deterministic systems.
  • low-resolution digitization will make the estimate look smaller
  • self-similarity may never be observed, even for good-quality deterministic data, depending on the manifold's large-scale structure.
minimum delay

a space-time separation plot shows the proportion of points inside the neighborhood as a function of Δt and ϵ. we hope to find some value of Δt after which increments in ϵ don't add extra points. such delay marks the ideal choice of Δt when computing the correlation sum.

this is more rigorous a technique than using the autocorrelation distance.

(Image "Figure 21")

as a noise meter

figures 19 and 20 introduced the value ε that marks the limit between noise and scaling regimes. we found a resolution limit in the data, when neighborhoods were so tiny that they only capture randomly distributed points in all directions.

in a nutshell, given that ε is a distance in phase space units, we can define the noise level as the ratio between ε and the attractor's "size".

neuroscience applications

  • how not to use it:

Lehnertz and Elger (1998)6 analysed localized intracranial EEG data from 16 epilepsy patients. they claimed that a loss of complexity in brain activity should result from the synchronization of pathological cells before and during seizures.

linear techniques can only predict seizures a few seconds in advance. here the intention is to use the correlation dimension estimate as a complexity detector, even though it cannot actually estimate the fractal dimension for these data.

a state called state 2 coincides with pre-ictal synchronization some 10 to 30 minutes in advance to the seizure. nothing is mentioned with regard to confounding sources of complexity reduction, though.

quasi-scaling regions were observed for state 2 (D2eff = 3.7) and the seizure state (D2eff = 5.3), but not the control state. in other words, complexity apparently increased, counter to theory, and there's no base level to compare it with.

plots don't really show a clear power law either that would justify the reliability of their estimates. nor was any statistical hypothesis test performed using surrogate data.

  • a more credible one?

Martinerie et al (1998)7 attempted to predict seizures by looking at phase transitions between two subsections in reconstructed phase space, also built from intracranial EEG series. it is claimed that for most cases (17 out of 19) it is possible to foretell seizure onset 2 to 6 seconds in advance, and that all of them follow a similar trajectory in phase space.

the alternative hypothesis is actually explored with surrogate data, and authors made sure to smooth noise out previous to the embedding. the number of dimensions (16) may be too unwieldy, though.

hypothesis testing

the maximal Lyapunov coefficient and the correlation dimension come handy in detecting non-linear dynamics, however they cannot establish such a fact by themselves. it is necessary to observe a clear scaling region.

a stricter analysis would test the hypothesis that observed data are due to a linear process. this implies that certain parameter's conditional distribution (given H0) has to be empirically estimated, so that it can be contrasted with the real data.

surrogate data

surrogate data is a akin to resampling methods (permutation, Monte Carlo, bootstrapping) for time series. surrogate data share many similarities with true data for some respects, while consistent with H0 for others. when testing for non-linearity, Fourier phases are generated randomly while the power spectrum is retained, since the imaginary part is where non-linearity could emerge from.

the general procedure for generating surrogate data is as follows:

  1. decompose the signal in amplitudes and phases using the (fast, discrete) Fourier transform:

    F[s0(t)]{A(f),φ(f)}(TeX formula:  F[s_0(t)] \; → \; \{A(f), \; φ(f)\} )

  2. substitute the alleged non-linear properties with random numbers drawn from the uniform distribution ranging from 0 to 2π radians (trying to make ζNk=ζk(TeX formula: ζ_{N-k} = -ζ_{k}) , so that the imaginary part being odd will result in a substitute series belonging to ℝ. see the #Fourier transform properties section):

    {A(f),ζ(f)}(TeX formula:  \{A(f), \; ζ(f)\} )

  3. wrap the surrogate series back with the inverse Fourier transform:

    s1(t)F1[{A(f),ζ(f)}](TeX formula:  s_1(t) \; ← \; F^{-1}[\{A(f), \; ζ(f)\}] )

  4. test the series out (dimension estimation, Lyapunov coefficient, autocorrelation, non-linear prediction error, etc.).

  5. rinse and repeat till the parameter's distribution given H0 is dense enough to yield good type-I error estimates.

statistical significance

as in standard hypothesis testing, for a significance level of α (type I error: probability that H0 is erroneously rejected) we would like at most α% of the area under the surrogate data curve to overlap the original measurements, or be equal or more extreme than our point measurement. such a rank-based test is a non-parametric way of deriving a p-value.

for instance, in order to reject H0 with 95% of success, it would be enough to perform 1.051=19(TeX formula: \frac{1}{.05} - 1 = 19) tests (for a one-side test. 2.051=39(TeX formula: \frac{2}{.05} - 1 = 39) for the two-tail case).

as a determinism detector

does passing the test also mean that the system is deterministic (since we tested for non-linear determinism)?

partially yes, with some probability. recall that all statistical inference is based on limited evidence and inductive reasoning:

accept H0 reject H0
H0 is true 1-α (confidence) α (type I error)
H0 is false β (type II error) 1-β (power)

more caveats:

  • we have assumed that the available power spectrum captures at least one whole period in the time domain (i.e. no systemic sampling bias due to long-term process). otherwise, random events in a non-stationary process could be misinterpreted as deviations from H0.
  • the process may not follow the conjectured probability distribution (this problem is mitigated by non-parametric tests).
  • multiple comparisons problem: performing many independent tests overblows type I error. for those cases, significance levels should be made more stringent according to some correction method (Bonferroni e.g.)

as a phase transition detector

the same non-linearity tests have been used to distinguish states of a system.

compute the same statistic for different time segments and compare with one another. the reasoning follows what we did in the #stationarity test.

  1. complexity doesn't imply advanced math, for it can emerge from very simple descriptions (see rule 110 for instance). that said, this post relies on calculus (including some differential equations), inferential statistics (also the fundamentals of information theory), bits of graph theory and linear algebra; most (or all) of which should be familiar to STEM undergraduates. topics like Fourier analysis and chaos theory are introduced from there. 

  2. not to be confused with "complicated" or "difficult". 

  3. we could have arrived at the same harmonic oscillator equation using the mechanics of a pendulum, or an LC circuit. See system equivalence

  4. z(|z|=z·z¯)(TeX formula: ∀\vec{z}∈ℂ \; \left(|\vec{z}| = \sqrt{\vec{z}·\overline{\vec{z}}} \right)) , because z·z¯=z12+iz1z2iz1z2+z22(TeX formula: \sqrt{\vec{z}·\overline{\vec{z}}} = \sqrt{z_1^2 + iz_1z_2 - iz_1z_2 + z_2^2}) 

  5. Holger Kantz, Thomas Schreiber (2003). Nonlinear Time Series Analysis. Cambridge University Press. 

  6. Klaus Lehnertz, Christian E. Elger. Can Epileptic Seizures be Predicted? Evidence from Nonlinear Time Series Analysis of Brain Electrical Activity (1998). Phys. Rev. Lett. 80, 5019. 

  7. Martinerie, J., Adam, C., Le Van Quyen, M., Baulac, M., Clemenceau, S., Renault, B., & Varela, F. J. (1998). Epileptic seizures can be anticipated by non-linear analysis. Nature medicine, 4(10), 1173.