|
| 1 | +```@meta |
| 2 | +CurrentModule = MathOptInterface |
| 3 | +DocTestSetup = quote |
| 4 | + import MathOptInterface as MOI |
| 5 | +end |
| 6 | +DocTestFilters = [r"MathOptInterface|MOI"] |
| 7 | +``` |
| 8 | + |
| 9 | +# SymbolicAD |
| 10 | + |
| 11 | +The `Nonlinear.SymbolicAD` submodule contains data structures and functions for |
| 12 | +working with the symbolic derivatives of a nonlinear optimization problem. This |
| 13 | +page explains the API and describes the rationale behind its design. |
| 14 | + |
| 15 | +## Background |
| 16 | + |
| 17 | +The code in `SymbolicAD` is inspired by Hassan Hijazi's work on |
| 18 | +[coin-or/gravity](https://github.com/coin-or/Gravity), a high-performance |
| 19 | +algebraic modeling language in C++. |
| 20 | + |
| 21 | +Hassan made the following observations: |
| 22 | + |
| 23 | + * For large scale models, symbolic differentiation is slower than other |
| 24 | + automatic differentiation techniques, such as the reverse mode algorithm |
| 25 | + implemented in `MOI.Nonlinear.ReverseAD`. |
| 26 | + * However, most large-scale nonlinear programs have a lot of structure. |
| 27 | + * Gravity asks the user to provide structure in the form of |
| 28 | + _template constraints_, where the user gives the symbolic form of the |
| 29 | + constraint as well as a set of data to convert from a symbolic form to the |
| 30 | + numerical form. |
| 31 | + * Instead of differentiating each constraint in its numerical form, we can |
| 32 | + compute one symbolic derivative of the constraint in symbolic form, and then |
| 33 | + plug in the data in to get the numerical derivative of each function. |
| 34 | + * As a final step, if users don't provide the structure, we can still infer it |
| 35 | + --perhaps with less accuracy--by comparing the expression tree of each |
| 36 | + constraint. |
| 37 | + |
| 38 | +The symbolic differentiation approach of Gravity works well when the problem is |
| 39 | +large with few unique constraints. For example, a model like: |
| 40 | +```julia |
| 41 | +model = Model() |
| 42 | +@variable(model, 0 <= x[1:10_000] <= 1) |
| 43 | +@constraint(model, [i=1:10_000], sin(x[i]) <= 1) |
| 44 | +@objective(model, Max, sum(x)) |
| 45 | +``` |
| 46 | +is ideal, because although the Jacobian matrix has 10,000 rows, we can compute |
| 47 | +the derivative of `sin(x[i])` as `cos(x[i])`, and then fill in the Jacobian by |
| 48 | +evaluating the derivative function instead of having to differentiation 10,000 |
| 49 | +expressions. |
| 50 | + |
| 51 | +The symbolic differentiation approach of Gravity works poorly if there are a |
| 52 | +large number of unique constraints in the model (which would require a lot of |
| 53 | +expressions to be symbolically differentiated), or if the nonlinear functions |
| 54 | +contain a large number of nonlinear terms (which would make the symbolic |
| 55 | +derivative expensive to compute). |
| 56 | + |
| 57 | +`SymbolicAD` started life as [MathOptSymbolicAD.jl](https://github.com/lanl-ansi/MathOptSymbolicAD.jl), |
| 58 | +development of which began in early 2022. This initial version of `SymbolicAD` |
| 59 | +used the `Symbolics.jl` package to compute the symbolic derivatives. In 2025, we |
| 60 | +rewrote MathOptSymbolicAD.jl to remove the dependence on `Symbolics.jl`, and, |
| 61 | +since the rewrite depended only on MathOptInterface, we added it to |
| 62 | +`MOI.Nonlinear` as a new submodule. |
| 63 | + |
| 64 | +For more details on `MathOptSymbolicAD.jl`, see Oscar's [JuMP-dev 2022 talk](https://www.youtube.com/watch?v=d_X3gj3Iz-k), |
| 65 | +although note that the syntax has changed since the original recording. |
| 66 | + |
| 67 | +## Use SymbolicAD with JuMP |
| 68 | + |
| 69 | +To use `SymbolicAD` with JuMP, set the [`AutomaticDifferentiationBackend`](@ref) |
| 70 | +attribute to [`Nonlinear.SymbolicMode`](@ref): |
| 71 | + |
| 72 | +```julia |
| 73 | +using JuMP, Ipopt |
| 74 | +model = Model(Ipopt.Optimizer) |
| 75 | +set_attribute( |
| 76 | + model, |
| 77 | + MOI.AutomaticDifferentiationBackend(), |
| 78 | + MOI.Nonlinear.SymbolicMode(), |
| 79 | +) |
| 80 | +@variable(model, x[1:2]) |
| 81 | +@objective(model, Min, (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2) |
| 82 | +optimize!(model) |
| 83 | +``` |
| 84 | + |
| 85 | +To revert back to the default sparse reverse mode algorithm, set the |
| 86 | +[`AutomaticDifferentiationBackend`](@ref) attribute to |
| 87 | +[`Nonlinear.SparseReverseMode`](@ref). |
| 88 | + |
| 89 | +## `simplify` |
| 90 | + |
| 91 | +Use [`Nonlinear.SymbolicAD.simplify`](@ref) to simplify nonlinear expressions. |
| 92 | +The simplification algorithm performs simple rewrites such as lifting nested |
| 93 | +summations: |
| 94 | + |
| 95 | +```jldoctest |
| 96 | +julia> x = MOI.VariableIndex(1) |
| 97 | +MOI.VariableIndex(1) |
| 98 | +
|
| 99 | +julia> f = MOI.ScalarNonlinearFunction( |
| 100 | + :+, |
| 101 | + Any[MOI.ScalarNonlinearFunction(:+, Any[1.0, x]), 2.0 * x + 3.0], |
| 102 | + ) |
| 103 | ++(+(1.0, MOI.VariableIndex(1)), 3.0 + 2.0 MOI.VariableIndex(1)) |
| 104 | +
|
| 105 | +julia> MOI.Nonlinear.SymbolicAD.simplify(f) |
| 106 | +4.0 + 3.0 MOI.VariableIndex(1) |
| 107 | +``` |
| 108 | + |
| 109 | +and trivial identities such as ``x^1 = x``: |
| 110 | + |
| 111 | +```jldoctest |
| 112 | +julia> x = MOI.VariableIndex(1) |
| 113 | +MOI.VariableIndex(1) |
| 114 | +
|
| 115 | +julia> f = MOI.ScalarNonlinearFunction(:^, Any[x, 1]) |
| 116 | +^(MOI.VariableIndex(1), (1)) |
| 117 | +
|
| 118 | +julia> MOI.Nonlinear.SymbolicAD.simplify(f) |
| 119 | +MOI.VariableIndex(1) |
| 120 | +``` |
| 121 | + |
| 122 | +The list of rewrites that will be made is intentionally limited to keep the |
| 123 | +codebase simple. `Nonlinear.SymbolicAD` is not a substitute for a Computer |
| 124 | +Algebraic System (CAS). For example, we do not detect the relationship |
| 125 | +``sin(x)^2+cos(x)^2=1``: |
| 126 | + |
| 127 | +```jldoctest |
| 128 | +julia> x = MOI.VariableIndex(1) |
| 129 | +MOI.VariableIndex(1) |
| 130 | +
|
| 131 | +julia> sin_x = MOI.ScalarNonlinearFunction(:sin, Any[x]) |
| 132 | +sin(MOI.VariableIndex(1)) |
| 133 | +
|
| 134 | +julia> cos_x = MOI.ScalarNonlinearFunction(:cos, Any[x]) |
| 135 | +cos(MOI.VariableIndex(1)) |
| 136 | +
|
| 137 | +julia> f = MOI.ScalarNonlinearFunction( |
| 138 | + :+, |
| 139 | + Any[ |
| 140 | + MOI.ScalarNonlinearFunction(:^, Any[sin_x, 2]), |
| 141 | + MOI.ScalarNonlinearFunction(:^, Any[cos_x, 2]), |
| 142 | + ], |
| 143 | + ) |
| 144 | ++(^(sin(MOI.VariableIndex(1)), (2)), ^(cos(MOI.VariableIndex(1)), (2))) |
| 145 | +
|
| 146 | +julia> MOI.Nonlinear.SymbolicAD.simplify(f) |
| 147 | ++(^(sin(MOI.VariableIndex(1)), (2)), ^(cos(MOI.VariableIndex(1)), (2))) |
| 148 | +``` |
| 149 | + |
| 150 | +In addition to [`Nonlinear.SymbolicAD.simplify`](@ref), there is an in-place |
| 151 | +version [`Nonlinear.SymbolicAD.simplify!`](@ref) that may make changes to the |
| 152 | +existing function. |
| 153 | + |
| 154 | +## `variables` |
| 155 | + |
| 156 | +Use [`Nonlinear.SymbolicAD.variables`](@ref) to return a sorted list of the |
| 157 | +variables that appear in the function: |
| 158 | + |
| 159 | +```jldoctest |
| 160 | +julia> x = MOI.VariableIndex.(1:3) |
| 161 | +3-element Vector{MathOptInterface.VariableIndex}: |
| 162 | + MOI.VariableIndex(1) |
| 163 | + MOI.VariableIndex(2) |
| 164 | + MOI.VariableIndex(3) |
| 165 | +
|
| 166 | +julia> f = MOI.ScalarNonlinearFunction(:atan, Any[x[3], 2.0 * x[1]]) |
| 167 | +atan(MOI.VariableIndex(3), 0.0 + 2.0 MOI.VariableIndex(1)) |
| 168 | +
|
| 169 | +julia> MOI.Nonlinear.SymbolicAD.variables(f) |
| 170 | +2-element Vector{MathOptInterface.VariableIndex}: |
| 171 | + MOI.VariableIndex(1) |
| 172 | + MOI.VariableIndex(3) |
| 173 | +``` |
| 174 | + |
| 175 | +## `derivative` |
| 176 | + |
| 177 | +Use [`Nonlinear.SymbolicAD.derivative`](@ref) to compute the symbolic derivative |
| 178 | +of a function with respect to a decision variable: |
| 179 | + |
| 180 | +```jldoctest |
| 181 | +julia> x = MOI.VariableIndex(1) |
| 182 | +MOI.VariableIndex(1) |
| 183 | +
|
| 184 | +julia> f = MOI.ScalarNonlinearFunction(:sin, Any[x]) |
| 185 | +sin(MOI.VariableIndex(1)) |
| 186 | +
|
| 187 | +julia> MOI.Nonlinear.SymbolicAD.derivative(f, x) |
| 188 | +cos(MOI.VariableIndex(1)) |
| 189 | +``` |
| 190 | + |
| 191 | +Note that the resultant expression can often be simplified. Thus, in most cases |
| 192 | +you should call [`Nonlinear.SymbolicAD.simplify`](@ref) on the expression before |
| 193 | +using it in other places: |
| 194 | + |
| 195 | +```jldoctest |
| 196 | +julia> x = MOI.VariableIndex(1) |
| 197 | +MOI.VariableIndex(1) |
| 198 | +
|
| 199 | +julia> f = MOI.ScalarNonlinearFunction(:sin, Any[x + 1.0]) |
| 200 | +sin(1.0 + 1.0 MOI.VariableIndex(1)) |
| 201 | +
|
| 202 | +julia> df_dx = MOI.Nonlinear.SymbolicAD.derivative(f, x) |
| 203 | +*(cos(1.0 + 1.0 MOI.VariableIndex(1)), 1.0) |
| 204 | +
|
| 205 | +julia> MOI.Nonlinear.SymbolicAD.simplify!(df_dx) |
| 206 | +cos(1.0 + 1.0 MOI.VariableIndex(1)) |
| 207 | +``` |
| 208 | + |
| 209 | +## `gradient_and_hessian` |
| 210 | + |
| 211 | +In some cases, you may want to compute the gradient and (sparse) Hessian matrix |
| 212 | +of a function. One way to achieve this is by recursively calling |
| 213 | +[`Nonlinear.SymbolicAD.derivative`](@ref) on the result of [`Nonlinear.SymbolicAD.derivative`](@ref) |
| 214 | +for each variable in the list of [`Nonlinear.SymbolicAD.variables`](@ref). But, |
| 215 | +to simplify the process, you should use [`Nonlinear.SymbolicAD.gradient_and_hessian`](@ref): |
| 216 | + |
| 217 | +```jldoctest |
| 218 | +julia> x = MOI.VariableIndex.(1:2); |
| 219 | +
|
| 220 | +julia> f = MOI.ScalarNonlinearFunction(:sin, Any[1.0 * x[1] + 2.0 * x[2]]) |
| 221 | +sin(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2)) |
| 222 | +
|
| 223 | +julia> y, ∇f, H, ∇²f = MOI.Nonlinear.SymbolicAD.gradient_and_hessian(f); |
| 224 | +
|
| 225 | +julia> y |
| 226 | +2-element Vector{MathOptInterface.VariableIndex}: |
| 227 | + MOI.VariableIndex(1) |
| 228 | + MOI.VariableIndex(2) |
| 229 | +
|
| 230 | +julia> ∇f |
| 231 | +2-element Vector{Any}: |
| 232 | + cos(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2)) |
| 233 | + *(cos(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2)), 2.0) |
| 234 | +
|
| 235 | +julia> H |
| 236 | +3-element Vector{Tuple{Int64, Int64}}: |
| 237 | + (1, 1) |
| 238 | + (1, 2) |
| 239 | + (2, 2) |
| 240 | +
|
| 241 | +julia> ∇²f |
| 242 | +3-element Vector{Any}: |
| 243 | + -(sin(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2))) |
| 244 | + *(-(sin(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2))), 2.0) |
| 245 | + *(-(sin(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2))), 4.0) |
| 246 | +``` |
| 247 | + |
| 248 | +where: |
| 249 | + |
| 250 | + * `y` is the list of variables that appear in `f` |
| 251 | + * `∇f` is the first partial derivatives of `f` with respect to each variable in |
| 252 | + `y` |
| 253 | + * `H` and `∇²f` form a sparse Hessian matrix, were `H` is the `(row, column)` |
| 254 | + index of each element, and the corresponding element in `∇²f` is the second |
| 255 | + partial derivative of `f` with respect to `y[row]` and `y[column]`. |
| 256 | + |
| 257 | +Unlike [`Nonlinear.SymbolicAD.derivative`](@ref), the gradient and Hessian |
| 258 | +expressions have already been simplified; you do not need to call |
| 259 | +[`Nonlinear.SymbolicAD.simplify`](@ref). |
0 commit comments