Best underestimate cubic fitting
I was recently confronted with a problem: Given some curve or collection of data points, determine the polynomial that fits closest to the data but lies completely below it.
The solution I came up with: Use quadratic programming!
Quadratic programming is the mathematical process of solving optimization problems involving quadratic functions and linear constraints. In particular, they come in the form:
$$ \textrm{minimize } \frac{1}{2} x^T Q x + c^T x \ \textrm{ subject to } Ax \preceq b $$
If you think about least squares, you can convert least squares to a quadratic programming problem. This allows us to add constraints, namely the constraint that our solution must lie below the curve.
After we formalize the problem and write some code to use an off-the-shelf quadratic program solver, in this case quadprog
, we have a solution right away.
# use different powers for a different degree polynomial
=
= @
= @ -
=
= -
=
return -
It's beautifully short because we rely on quadprog
to work the magic.
To test, we can simulate a curve using the function below:
=
=
=
=
= / # ensuring they sum to 1
return
It's just building a Gaussian mixture model manually given num_hills
Gaussians with widths drawn from the width_distribution
over the input domain x
.
The width_distribution = (width_mean, width_standard_deviation)
.
Then, our script to solve this is:
# replace with whatever curve you have
=
= *10 +
# then fit!
=
We can quickly visualize this with Matplotlib.
=
,
Then, we get something like:
In my particular case, I'm working with noisy data. So there will sometimes be major outliers stemming from bad pixels in images. My quick solution is just to throw out outliers less than three standard deviations by filtering the input $x$ and $y$. This results in a fit like the following:
We're using this to build better F-corona models for PUNCH.
More to come on that!
All together, the script is:
# use different powers for a different degree polynomial
=
=
= @
= @ -
=
= -
=
return -
=
=
=
=
= / # ensuring they sum to 1
return
=
= * 10 + # or whatever curve you want
=
=
,