|
| 1 | +## Mixture of Experts (MOE) |
| 2 | + |
| 3 | +!!! note |
| 4 | + This surrogate requires the 'SurrogatesMOE' module which can be added by inputting "]add SurrogatesMOE" from the Julia command line. |
| 5 | + |
| 6 | +The Mixture of Experts (MOE) Surrogate model represents the interpolating function as a combination of other surrogate models. SurrogatesMOE is a Julia implementation of the [Python version from SMT](https://smt.readthedocs.io/en/latest/_src_docs/applications/moe.html). |
| 7 | + |
| 8 | +MOE is most useful when we have a discontinuous function. For example, let's say we want to build a surrogate for the following function: |
| 9 | + |
| 10 | +### 1D Example |
| 11 | + |
| 12 | +```@example MOE_1D |
| 13 | +function discont_1D(x) |
| 14 | + if x < 0.0 |
| 15 | + return -5.0 |
| 16 | + elseif x >= 0.0 |
| 17 | + return 5.0 |
| 18 | + end |
| 19 | +end |
| 20 | +
|
| 21 | +nothing # hide |
| 22 | +``` |
| 23 | + |
| 24 | +Let's choose the MOE Surrogate for 1D. Note that we have to import the `SurrogatesMOE` package in addition to `Surrogates` and `Plots`. |
| 25 | + |
| 26 | +```@example MOE_1D |
| 27 | +using Surrogates |
| 28 | +using SurrogatesMOE |
| 29 | +using Plots |
| 30 | +default() |
| 31 | +
|
| 32 | +lb = -1.0 |
| 33 | +ub = 1.0 |
| 34 | +x = sample(50, lb, ub, SobolSample()) |
| 35 | +y = discont_1D.(x) |
| 36 | +scatter(x, y, label="Sampled Points", xlims=(lb, ub), ylims=(-6.0, 7.0), legend=:top) |
| 37 | +``` |
| 38 | + |
| 39 | +How does a regular surrogate perform on such a dataset? |
| 40 | + |
| 41 | +```@example MOE_1D |
| 42 | +RAD_1D = RadialBasis(x, y, lb, ub, rad = linearRadial(), scale_factor = 1.0, sparse = false) |
| 43 | +RAD_at0 = RAD_1D(0.0) #true value should be 5.0 |
| 44 | +``` |
| 45 | + |
| 46 | +As we can see, the prediction is far away from the ground truth. Now, how does the MOE perform? |
| 47 | + |
| 48 | +```@example MOE_1D |
| 49 | +expert_types = [ |
| 50 | + RadialBasisStructure(radial_function = linearRadial(), scale_factor = 1.0, |
| 51 | + sparse = false), |
| 52 | + RadialBasisStructure(radial_function = linearRadial(), scale_factor = 1.0, |
| 53 | + sparse = false) |
| 54 | + ] |
| 55 | +
|
| 56 | +MOE_1D_RAD_RAD = MOE(x, y, expert_types) |
| 57 | +MOE_at0 = MOE_1D_RAD_RAD(0.0) |
| 58 | +``` |
| 59 | + |
| 60 | +As we can see the accuracy is significantly better. |
| 61 | + |
| 62 | +### Under the Hood - How SurrogatesMOE Works |
| 63 | + |
| 64 | +First, we create Gaussian Mixture Models for the number of expert types provided using the x and y values. For example, in the above example, we create two clusters. Then, using a small test dataset kept aside from the input data, we choose the best surrogate model for each of the clusters. At prediction time, we use the appropriate surrogate model based on the cluster to which the new point belongs. |
| 65 | + |
| 66 | +### N-Dimensional Example |
| 67 | + |
| 68 | +```@example MOE_ND |
| 69 | +using Surrogates |
| 70 | +using SurrogatesMOE |
| 71 | +
|
| 72 | +# helper to test accuracy of predictors |
| 73 | +function rmse(a, b) |
| 74 | + a = vec(a) |
| 75 | + b = vec(b) |
| 76 | + if (size(a) != size(b)) |
| 77 | + println("error in inputs") |
| 78 | + return |
| 79 | + end |
| 80 | + n = size(a, 1) |
| 81 | + return sqrt(sum((a - b) .^ 2) / n) |
| 82 | +end |
| 83 | +
|
| 84 | +# multidimensional input function |
| 85 | +function discont_NDIM(x) |
| 86 | + if (x[1] >= 0.0 && x[2] >= 0.0) |
| 87 | + return sum(x .^ 2) + 5 |
| 88 | + else |
| 89 | + return sum(x .^ 2) - 5 |
| 90 | + end |
| 91 | +end |
| 92 | +lb = [-1.0, -1.0] |
| 93 | +ub = [1.0, 1.0] |
| 94 | +n = 150 |
| 95 | +x = sample(n, lb, ub, SobolSample()) |
| 96 | +y = discont_NDIM.(x) |
| 97 | +x_test = sample(10, lb, ub, GoldenSample()) |
| 98 | +
|
| 99 | +expert_types = [ |
| 100 | + RadialBasisStructure(radial_function = linearRadial(), scale_factor = 1.0, |
| 101 | + sparse = false), |
| 102 | + RadialBasisStructure(radial_function = linearRadial(), scale_factor = 1.0, |
| 103 | + sparse = false), |
| 104 | +] |
| 105 | +moe_nd_rad_rad = MOE(x, y, expert_types, ndim = 2) |
| 106 | +moe_pred_vals = moe_nd_rad_rad.(x_test) |
| 107 | +true_vals = discont_NDIM.(x_test) |
| 108 | +moe_rmse = rmse(true_vals, moe_pred_vals) |
| 109 | +rbf = RadialBasis(x, y, lb, ub) |
| 110 | +rbf_pred_vals = rbf.(x_test) |
| 111 | +rbf_rmse = rmse(true_vals, rbf_pred_vals) |
| 112 | +println(rbf_rmse > moe_rmse) |
| 113 | +
|
| 114 | +``` |
| 115 | + |
| 116 | +### Usage Notes - Example With Other Surrogates |
| 117 | + |
| 118 | +From the above example, simply change or add to the expert types: |
| 119 | + |
| 120 | +```@example SurrogateExamples |
| 121 | +using Surrogates |
| 122 | +#To use Inverse Distance and Radial Basis Surrogates |
| 123 | +expert_types = [ |
| 124 | + KrigingStructure(p = [1.0, 1.0], theta = [1.0, 1.0]), |
| 125 | + InverseDistanceStructure(p = 1.0) |
| 126 | +] |
| 127 | +
|
| 128 | +#With 3 Surrogates |
| 129 | +expert_types = [ |
| 130 | + RadialBasisStructure(radial_function = linearRadial(), scale_factor = 1.0, |
| 131 | + sparse = false), |
| 132 | + LinearStructure(), |
| 133 | + InverseDistanceStructure(p = 1.0), |
| 134 | +] |
| 135 | +nothing # hide |
| 136 | +``` |
0 commit comments