Charles Himmelberg, Goldman Sachs, Aug 2014: “…largest HY outflow on record – a 6-sigma event!”

David Viniar,CFO/Goldman Sachs,Aug 2007: “…seeing things that are 25-sigma moves”

Here’s a six sigma event:

x = seq(6,7,0.01)
y = dnorm(x)
x2 = seq(-3,7,0.01)
y2 = dnorm(x2)
par(mfrow=c(1,2))
plot(x2,y2,xlim=c(-3,8),ylim=c(1e-11,0.5),col='black', type='l', xlab="Look at the red line between [6,7]")
lines(x2[x2>6],y2[x2>6],xlim=c(6,8),ylim=c(1e-11,0.5),col='red', type='l')
plot(x,y,xlim=c(6,7),ylim=c(1e-11,1e-8),col='red', type='l', xlab = "This is the same red line!")
polygon(c(x[x>=6], max(x), 6), c(y[x>=6], 0, 0), col="red")

How does your average programmer find the probability of a 6-sigma event ?

Naive Algo 1 “Toss a Gaussian dice million times. Count if the dice shows six or more. The fraction of counts is your probability.”

million=1000*1000
dice = rnorm(million)
count = sum(dice[dice > 6])
cat(sprintf("Probability of 1-tailed 6 sigma event: %f", count/million))
## Probability of 1-tailed 6 sigma event: 0.000000

Heh heh! That got us nowhere!!!

Now, sampling per-se is not an unsound strategy.

Suppose you’d like to estimate a 1-sigma event:

count = sum(dice[dice > 1])
cat(sprintf("Probability of 1-tailed 1 sigma event: %f", count/million))
## Probability of 1-tailed 1 sigma event: 0.242027

24% is the correct probability of a 1-tailed 1-sigma event. (Or 48%, if you care about both tails)

The reason it doesn’t work out for rare events is ’cause the event is rare!

So you’d have to toss your die trillions of times to ever see such a 6-sigma event.

So how else could you estimate the probability of a rare event if sampling fails you ?

A probability is an integral of the density function over the appropriate range.

Algo 2. Numerical Integration Integrate the Gaussian density function from 6 to infinity.

# Use Numeric Integration
integrate(function(x) { exp(-x^2/2)/sqrt(2*pi)} , 6, Inf)
## 9.865877e-10 with absolute error < 6.5e-13

Ofcourse, this definite integral (from -infinity to some x) is also called the CDF(Cumulative Distribution Function) of x.

Algo 3 Use the CDF.

# Use the CDF
1 - pnorm(6) 
## [1] 9.865877e-10

The desired probability of the 6-sigma event, as we’ve confirmed twice now, is 9.9e-10….very,very small.

But say you don’t have access to CDFs & integrals.

All you have are samples.

What then ?

Well, sample from some other distribution!

Consider proposal distribution q = N(6,1) ie. Gaussian centered at 6.

Integral(p) = Integral(pq/q) = Integral(p/q)*q = Expectation(p/q) under q.

Simply sample from q & compute the ratio p/q whenever samples are greater than 6.

Mean of the above samples is the desired probability.

So what is the ratio p/q ?

p = density of N(0,1) = exp(-x^2/2)/sqrt(2*pi)

q = density of N(6,1) = exp(-(x-6)^2/2)/sqrt(2*pi)

=> p/q = exp((36-12x)/2)

Algo 4 Importance Sampling from Gaussian proposal, some Algebra

# 1 line Importance Sampling, with Gaussian proposal
mean(sapply(rnorm(100000,6,1), function(x) { ifelse(x>6, exp((36-12*x)/2),0) }))
## [1] 9.885549e-10

Once again, we obtain 9.9e-10.

This technique - finding expectation via change of measure, is what mathematicians term change of measure (duh).

Statisticians must get fancy, so it goes by the name Importance Sampling

(Notice we don’t need millions of samples. We’ve only used 100k above)

Now, you’d have to know the densities of p & q to do the p/q algebra.

Most times, you don’t know these things.

No worries!

Densities are built-in as well (called dnorm in R)

Algo 5 Importance Sampling from Gaussian proposal, no Algebra

# Since densities are built into the dnorm function, p/q = dnorm(...)/dnorm(...)
# This is super-useful if you are uncomfortable with algebra. 
mean(sapply(rnorm(100000,6,1), function(x) { ifelse(x>6, dnorm(x,0,1)/dnorm(x,6,1),0) }))
## [1] 9.757876e-10

You might wonder…this proposal distribution, how did we know we had to pick N(6,1) ?

Wy not N(4,1) ? Or N(5,2) ? Or some other distribution altogether, say a uniform distribution.

After all, every PL has a uniform distribution (its the rand() function)

Consider proposal distribution of Uniform, centered at 6 with width 20, ie. U[-4,16]

p/q = 20exp(-x^2/2)/sqrt(2pi)

Algo 6 I hate Gaussians! Importance Sampling from Uniform distribution, U[-4,16]

# 1 line Importance Sampling, with Uniform proposal
mean(sapply(runif(100000,-4,16), function(x) { ifelse(x>6, 20*exp(-x^2/2)/sqrt(2*pi),0) }))
## [1] 9.650514e-10

So now you start optimizing.

Note that we don’t use samples with x < 6.

So why produce them in the first place ?

Simply start the distribution at 6 & explore the right tail.

Say the proposal q = U[6,16] Density is 1/10. So p/q = 10p = 10dnorm

Algo 7 Importance Sampling from Uniform distribution over right tail, U[6,16]

10*mean(sapply(runif(100000,6,16), dnorm))
## [1] 1.005291e-09

So far, we’ve been playing with toy distributions like the standard normal.

Now, lets get real.

When the quants at Goldman talk about the 6-sigma & 25-sigma events, they refer to fat-tailed distributions.

Let’s pick a typical fat-tailed candidate with finite moments, say a central T with 3 degrees of freedom.

A central T with df 3 has variance 3, so 1 sigma is sqrt(3). A 6 sigma is 10.4

Algo 8 Six Sigma on Student T(3): Importance Sampling from U[10.4,20.4]

# 6 sigma via CDF
cat(sprintf("Probability of 6-sigma event on Student-T(df=3) via CDF: %f\n", 1-pt(10.4, df=3)))
## Probability of 6-sigma event on Student-T(df=3) via CDF: 0.000949
# 6 sigma via Importance Sampling from U[10.4,20.4]
cat(sprintf("Probability of 6-sigma event on Student-T(df=3) via Importance Sampling: %f\n", 
            10*mean(sapply(runif(100000,10.4,20.4), dt, df =3))))
## Probability of 6-sigma event on Student-T(df=3) via Importance Sampling: 0.000819

Notice how importance sampling underestimates the true probability.

We used a candidate proposal distribution of U[10.4, 20.4], but the t-distribution has really fat tails, significant mass well beyond 20.4!

So let’s extend the range ten times, i.e. try U[10.4, 110.4]

Algo 9 Six Sigma on Student T(3): Importance Sampling from U[10.4,110.4]

# 6 sigma via Importance Sampling from U[10.4,110.4]
cat(sprintf("Probability of 6-sigma event on Student-T(df=3) via Importance Sampling: %f\n", 
            100*mean(sapply(runif(100000,10.4,110.4), dt, df =3))))
## Probability of 6-sigma event on Student-T(df=3) via Importance Sampling: 0.000957

So the likelihood of 6-sigma events in real life is not that small, 0.000949 = 1 in 1053, quite high in fact!

How about a 25-sigma event ?

Algo 10 25 Sigma on Student T(3): Importance Sampling from U[43.3, 143.3]

# 25 sigma via CDF
cat(sprintf("Probability of 25-sigma event on Student-T(df=3) via CDF: %f\n", 1-pt(43.3, df=3)))
## Probability of 25-sigma event on Student-T(df=3) via CDF: 0.000014
# 25 sigma via Importance Sampling from U[43.3, 143.3]
cat(sprintf("Probability of 25-sigma event on Student-T(df=3) via Importance Sampling: %f\n", 
            100*mean(sapply(runif(100000,43.3,143.3), dt, df =3))))
## Probability of 25-sigma event on Student-T(df=3) via Importance Sampling: 0.000013

25 sigma events on a typical fat-tailed distribution have a one in a seventy thousand chance.

Takeaway: Rare events have a nonzero probability. Importance Sampling ftw!