5.3
Math with Gaussians
Let’s say we believe that our dog is at 23m, and the variance is 5, or posdog= N (23, 5)). We can represent that in a plot:
In [10]: import stats
stats.plot_gaussian(mean=23, variance=5)
This corresponds to a fairly inexact belief. While we believe that the dog is at 23, note that roughly 21 to 25 are quite likely as well. Let’s assume for the moment our dog is standing still, and we query the sensor again. This time it returns 23.2 as the position. Can we use this additional information to improve our estimate of the dog’s position?
Intuition suggests ‘yes’. Consider: if we read the sensor 100 times and each time it returned a value between 21 and 25, all centered around 23, we should be very confident that the dog is somewhere very near 23. Of course, a different physical interpretation is possible. Perhaps our dog was randomly wandering back and forth in a way that exactly emulated a normal distribution. But that seems extremely unlikely - I certainly have never seen a dog do that. So the only reasonable assumption is that the dog was mostly standing still at 23.0.
Let’s look at 100 sensor readings in a plot:
In [11]: dog = DogSensor(x0=23, velocity=0,
measurement_variance=5, process_variance=0.0) xs = range(100)
ys = [] for i in xs:
ys.append(dog.sense_position())
bp.plot_track(xs,ys, label=’Dog position’) plt.legend(loc=’best’)
Eyeballing this confirms our intuition - no dog moves like this. However, noisy sensor data certainly looks like this. So let’s proceed and try to solve this mathematically. But how?
Recall the histogram code for adding a measurement to a preexisting belief: def update(pos, measure, p_hit, p_miss):
q = array(pos, dtype=float) for i in range(len(hallway)):
if hallway[i] == measure: q[i] = pos[i] * p_hit else:
q[i] = pos[i] * p_miss normalize(q)
return q
Note that the algorithm is essentially computing:
new_belief = old_belief * measurement * sensor_error
The measurement term might not be obvious, but recall that measurement in this case was always 1 or 0, and so it was left out for convenience.
If we are implementing this with Gaussians, we might expect it to be implemented as: new_gaussian = measurement * old_gaussian
where measurement is a Gaussian returned from the sensor. But does that make sense? Can we multiply gaussians? If we multiply a Gaussian with a Gaussian is the result another Gaussian, or something else?
It is not particularly difficult to perform the algebra to derive the equation for multiplying two Gaussians, but I will just present the result:
N (µ1, σ21) ∗ N (µ2, σ22) = N (σ 2 1µ2+ σ 2 2µ1 σ2 1+ σ22 , 1 1 σ2 1 +σ12 2 )
5.3. MATH WITH GAUSSIANS 105 µ =σ 2 1µ2+ σ22µ1 σ2 1+ σ22 , σ2= 1 1 σ2 1 +σ12 2
Without doing a deep analysis we can immediately infer some things. First and most importantly the result of multiplying two Gaussians is another Gaussian. The expression for the mean is not particularly illuminating, except that it is a combination of the means and variances of the input. But the variance of the result is merely some combination of the variances of the variances of the input. We conclude from this that the variances are completely unaffected by the values of the mean!
Let’s immediately look at some plots of this. First, let’s look at the result of multiplying N (23, 5) to itself. This corresponds to getting 23.0 as the sensor value twice in a row. But before you look at the result, what do you think the result will look like? What should the new mean be? Will the variance by wider, narrower, or the same?
In [12]: from __future__ import division import numpy as np
def multiply(mu1, var1, mu2, var2): if var1 == 0.0:
var1=1.e-80
if var2 == 0: var2 = 1e-80
mean = (var1*mu2 + var2*mu1) / (var1+var2) variance = 1 / (1/var1 + 1/var2)
return (mean, variance) xs = np.arange(16, 30, 0.1) mean1, var1 = 23, 5
mean, var = multiply(mean1, var1, mean1, var1) ys = [stats.gaussian(x, mean1, var1) for x in xs] plt.plot (xs, ys, label=’original’)
ys = [stats.gaussian(x, mean, var) for x in xs] plt.plot (xs, ys, label=’multiply’)
plt.legend(loc=’best’) plt.show()
The result is either amazing or what you would expect, depending on your state of mind. I must admit I vacillate freely between the two! Note that the result of the multiplication is taller and narrow than the original Gaussian but the mean is the same. Does this match your intuition of what the result should have been?
If we think of the Gaussians as two measurements, this makes sense. If I measure twice and get the same value, I should be more confident in my answer than if I just measured once. If I measure twice and get 23 meters each time, I should conclude that the length is close to 23 meters. So the mean should be 23. I am more confident with two measurements than with one, so the variance of the result should be smaller.
“Measure twice, cut once” is a useful saying and practice due to this fact! The Gaussian is just a mathematical model of this physical fact, so we should expect the math to follow our physical process.
Now let’s multiply two Gaussians (or equivalently, two measurements) that are partially separated. In other words, their means will be different, but their variances will be the same. What do you think the result will be? Think about it, and then look at the graph.
In [13]: xs = np.arange(16, 30, 0.1) mean1, var1 = 23, 5
mean2, var2 = 25, 5
mean, var = multiply(mean1, var1, mean2, var2) ys = [stats.gaussian(x, mean1, var1) for x in xs] plt.plot(xs, ys, label=’measure 1’)
ys = [stats.gaussian(x, mean2, var2) for x in xs] plt.plot(xs, ys, label=’measure 2’)
ys = [stats.gaussian(x, mean, var) for x in xs] plt.plot(xs, ys, label=’multiply’)
plt.legend() plt.show()
5.3. MATH WITH GAUSSIANS 107
Another beautiful result! If I handed you a measuring tape and asked you to measure the distance from table to a wall, and you got 23m, and then a friend make the same measurement and got 25m, your best guess must be 24m.
That is fairly counter-intuitive, so let’s consider it further. Perhaps a more reasonable assumption would be that either you or your coworker just made a mistake, and the true distance is either 23 or 25, but certainly not 24. Surely that is possible. However, suppose the two measurements you reported as 24.01 and 23.99. In that case you would agree that in this case the best guess for the correct value is 24? Which interpretation we choose depends on the properties of the sensors we are using. Humans make galling mistakes, physical sensors do not.
This topic is fairly deep, and I will explore it once we have completed our Kalman filter. For now I will merely say that the Kalman filter requires the interpretation that measurements are accurate, with Gaussian noise, and that a large error caused by misreading a measuring tape is not Gaussian noise.
For now I ask that you trust me. The math is correct, so we have no choice but to accept it and use it. We will see how the Kalman filter deals with movements vs error very soon. In the meantime, accept that 24 is the correct answer to this problem.
One final test of your intuition. What if the two measurements are widely separated?
In [14]: xs = np.arange(0, 60, 0.1) mean1, var1 = 10, 5
mean2, var2 = 50, 5
mean, var = multiply(mean1, var1, mean2, var2) ys = [stats.gaussian(x, mean1, var1) for x in xs] plt.plot (xs, ys, label=’measure 1’)
ys = [stats.gaussian(x, mean2, var2) for x in xs] plt.plot (xs, ys, label=’measure 2’)
ys = [stats.gaussian(x, mean, var) for x in xs] plt.plot(xs, ys, label=’multiply’)
plt.legend() plt.show()
This result bothered me quite a bit when I first learned it. If my first measurement was 10, and the next one was 50, why would I choose 30 as a result? And why would I be more confident? Doesn’t it make sense that either one of the measurements is wrong, or that I am measuring a moving object? Shouldn’t the result be nearer 50? And, shouldn’t the variance be larger, not smaller?
Well, no. Recall the g-h filter chapter. In that chapter we agreed that if I weighed myself on two scales, and the first read 160lbs while the second read 170lbs, and both were equally accurate, the best estimate was 165lbs. Furthermore I should be a bit more confident about 165lbs vs 160lbs or 170lbs because I know have two readings, both near this estimate, increasing my confidence that neither is wildly wrong.
Let’s look at the math again to convince ourselves that the physical interpretation of the Gaussian equations makes sense.
µ = σ 2
1µ2+ σ22µ1 σ2
1+ σ22
If both scales have the same accuracy, then σ12= σ22, and the resulting equation is µ = µ1+ µ2
2
which is just the average of the two weighings. If we look at the extreme cases, assume the first scale is very much more accurate than than the second one. At the limit, we can set σ2
1 = 0, yielding µ = 0 ∗ µ2+ σ 2 2µ1 σ2 2 , or just µ = µ1 Finally, if we set σ21= 9σ22, then the resulting equation is
µ =9σ 2 2µ2+ σ22µ1 9σ2 2+ σ22 or just µ = 1 10µ1+ 9 10µ2