Document not found (404)
+This URL is invalid, sorry. Please use the navigation bar or search to continue.
+ +diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml new file mode 100644 index 0000000..c90d4c3 --- /dev/null +++ b/.github/workflows/deploy.yml @@ -0,0 +1,39 @@ +name: Deploy +on: + push: + branches: + - main + +jobs: + deploy: + runs-on: ubuntu-latest + permissions: + contents: write # To push a branch + pull-requests: write # To create a PR from that branch + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 0 + - name: Install latest mdbook + run: | + tag=$(curl 'https://api.github.com/repos/rust-lang/mdbook/releases/latest' | jq -r '.tag_name') + url="https://github.com/rust-lang/mdbook/releases/download/${tag}/mdbook-${tag}-x86_64-unknown-linux-gnu.tar.gz" + mkdir mdbook + curl -sSL $url | tar -xz --directory=./mdbook + echo `pwd`/mdbook >> $GITHUB_PATH + - name: Deploy GitHub Pages + run: | + # This assumes your book is in the root of your repository. + # Just add a `cd` here if you need to change to another directory. + mdbook build + git worktree add gh-pages + git config user.name "Deploy from CI" + git config user.email "" + cd gh-pages + # Delete the ref to avoid keeping history. + git update-ref -d refs/heads/gh-pages + rm -rf * + mv ../book/* . + git add . + git commit -m "Deploy $GITHUB_SHA to gh-pages" + git push --force --set-upstream origin gh-pages diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..c4e8e9c --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +book +target/ +Cargo.lock diff --git a/404.html b/404.html new file mode 100644 index 0000000..59457cd --- /dev/null +++ b/404.html @@ -0,0 +1,199 @@ + + +
+ + +This URL is invalid, sorry. Please use the navigation bar or search to continue.
+ +Since Rust-AD is still in early development, crashes are not unlikely.
+If you see a proper Rust stacktrace after a compilation failure, our frontend (thus rustc) has likely crashed.
+It should often be trivial to create a minimal reproducer, by deleting most of the body of the
+function being differentiated, or by replacing the function body with a loop {}
statement.
+Please create an issue with such a reproducer, it will likely be easy to fix!
For the unexpected case, that you produce an ICE in our frontend that +is harder to minimize, please consider using icemelter.
+If you see llvm-ir (a language which might remind you of assembly), then our backend crahed. +You can find instructions on how to create an issue and help us to fix it on the next page.
+Rust-AD supports passing an autodiff
flag to RUSTFLAGS
, which supports changing the behaviour of Enzyme in various ways.
+Documentation is availabile here.
If after a compilation failure you are greeted by a large amount of LLVM-IR code, +then our Enzyme backend likely failed to compile your code. +These cases are harder to debug, so your help is highly appreciated. +Please also keep in mind, that release builds are usually much more likely to work at the moment.
+The final goal here is to reproduce your bug in the Enzyme compiler explorer, +in order to create a bug report in the Enzyme core repository.
+We have an autodiff
flag which you can pass to RUSTFLAGS
to help with this. It will print the whole LLVM-IR module,
+along with dummy functions called enzyme_opt_dbg_helper_<i>
. A potential workflow on Linux could look like:
RUSTFLAGS="-Z autodiff=OPT" cargo +enzyme build --release &> out.ll
+
+This also captures a few warnings and info messages above and below your module.
+Open out.ll and remove every line above ; ModuleID = <SomeHash>
.
+Now look at the end of the file and remove everything that's not part of LLVM-IR, i.e. remove errors and warnings.
+The last line of your LLVM-IR should now start with !<someNumber> =
, i.e.
+!40831 = !{i32 0, i32 1037508, i32 1037538, i32 1037559}
or !43760 = !DILocation(line: 297, column: 5, scope: !43746)
.
+The actual numbers will depend on your code.
To confirm that you're previous step worked, let's will use LLVM's opt tool.
+Find your path to the opt binary, with a path similar to
+<some_dir>/rust/build/<x86/arm/...-target-tripple>/build/bin/opt
.
+Also find LLVMEnzyme-19.<so/dll/dylib>
path, similar to /rust/build/target-tripple/enzyme/build/Enzyme/LLVMEnzyme-19
.
+Once you have both, run the following command:
<path/to/opt out.ll> -load-pass-plugin=/path/to/LLVMEnzyme-19.so -passes="enzyme" -S
+
+If the previous step succeeded, you are going to see the same error that you saw when compiling your Rust code with Cargo. +If you fail to get the same error, please open an issue in the Rust repository. If you succeed, congrats! +The file is still huge, so let's automatically minimize it.
+First find your llvm-extract binary, it's in the same folder as your opt binary. Then run:
+<path/to/llvm-extract> -S --func=<name> --recursive --rfunc="enzyme_opt_helper_*" out.ll -o mwe.ll
+
+Please adjust the name passed with the --func
flag.
+You can either apply the #[no_mangle]
attribute to the function you differentiate,
+then you can replace it with the Rust name. Otherwise you will need to look up the mangled function name.
+To do that open out.ll and search for __enzyme_fwddiff
or __enzyme_autodiff
.
+The first string in that function call is the name of your function. Example:
define double @enzyme_opt_helper_0(ptr %0, i64 %1, double %2) {
+ %4 = call double (...) @__enzyme_fwddiff(ptr @_ZN2ad3_f217h3b3b1800bd39fde3E, metadata !"enzyme_const", ptr %0, metadata !"enzyme_const", i64 %1, metadata !"enzyme_dup", double %2, double %2)
+ ret double %4
+}
+
+Here, _ZN2ad3_f217h3b3b1800bd39fde3E
is the correct name. Make sure to not copy the leading @
.
+Redo step 2), but now pass mwe.ll instead of out.ll to mod, to see if your minimized example reproduces your crash.
After the previous step you should have an mwe.ll
file with ~5k LoC. Let's try to get it down to 50.
+Find your llvm-reduce
binary next to opt
and llvm-extract
.
+Copy the first line of your error message, an example could be:
opt: /home/manuel/prog/rust/src/llvm-project/llvm/lib/IR/Instructions.cpp:686: void llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, const llvm::Twine&): Assertion `(Args.size() == FTy->getNumParams() || (FTy->isVarArg() && Args.size() > FTy->getNumParams())) && "Calling a function with bad signature!"' failed.
+
+If you just get a segfault there is no sensible error message and not much to do automatically, so continue to 5).
+Otherwise, create a script.sh file containing
#!/bin/bash
+<path/to/your/opt> $1 -load-pass-plugin=/path/to/LLVMEnzyme-19.so -passes="enzyme" \
+ |& grep "/some/path.cpp:686: void llvm::CallInst::init"
+
+Experiment a bit with which error message you pass to grep. It should be long enough to make sure that the error is unique.
+However, for longer errors including (
or )
you will need to escape them correctly which can become annoying. Run
<path/to/llvm-reduce> --test=script.sh mwe.ll
+
+If you see Input isn't interesting! Verify interesting-ness test
, you got the error message in script.sh wrong,
+you need to make sure that grep matches your actuall error.
+If all works out, you will see a lot of iterations, ending with a new reduced.ll
file.
+Verify with opt
that you still get the same error.
Afterwards, you should be able to copy and paste your mwe.ll
(and reduced.ll
) example into our compiler explorer.
+Select LLVM IR
as language and opt 20
as compiler. Replace the field to the right of your compiler with -passes="enzyme"
, if it is not already set.
+Hopefully, you will see once again your now familiar error. Please use the share button to copy links to them.
Please create an issue on https://github.com/EnzymeAD/Enzyme/issues and share mwe.ll
and (if you have it) reduced.ll
, as well as links to the compiler explorer. Please feel free to also add your Rust code or a link to it. With that, hopefully someone from the Enzyme core repository will be able to fix your bug. Once that happened, I will update the Enzyme submodule inside the rust compiler, which should allow you to now differentiate your Rust code. Thanks for helping us to improve Rust-AD.
Beyond having a minimal LLVM-IR reproducer, it is also helpful to have a minimal Rust reproducer without dependencies. +This allows us to add it as a testcase to CI once we fix it, which avoids regressions for the future.
+There are a few solutions to help you with minimizing the Rust reproducer. +This is probably the most simple automated approach: +cargo-minimize
+Otherwise we have various alternatives, including +treereduce, +halfempty, or +picireny
+Potentially also +creduce
+ +To support you while debugging, we have added support for an experimental -Z autodiff
flag to RUSTFLAGS
,
+which allow changing the behaviour of Enzyme, without recompiling rustc.
+We currently support the following values for autodiff
:
PrintTA // Print TypeAnalysis information
+PrintAA // Print ActivityAnalysis information
+Print // Print differentiated functions while they are being generated and optimized
+PrintPerf // Print AD related Performance warnings
+PrintModBefore // Print the whole LLVM-IR module before running opts
+PrintModAfterOpts // Print the whole LLVM-IR module after running opts, before AD
+PrintModAfterEnzyme // Print the whole LLVM-IR module after running opts and AD
+LooseTypes // Risk incorect derivatives instead of aborting when missing Type Info
+OPT // Most Important debug helper: Print a Module that can run with llvm-opt + enzyme
+
+LooseTypes
is often helpful to get rid of Enzyme errors stating
+Can not deduce type of <X>
and to be able to run some code. But please
+keep in mind that this flag absolutely has the chance to cause incorrect gradients.
+Even worse, the gradients might be correct for certain input values, but not for others.
+So please create issues about such bugs and only use this flag temporarily while you wait for your
+bug to be fixed.
For performance experiments and benchmarking we also support
+NoModOptAfter // We won't optimize the whole LLVM-IR Module after AD
+EnableFncOpt // We will optimize each derivative function generated individually
+NoVecUnroll // Disables vectorization and loop unrolling
+NoSafetyChecks // Disables Enzyme specific safety checks
+RuntimeActivity // Enables the runtime activity feature from Enzyme
+Inline // Instructs Enzyme to apply additional inlining beyond LLVM's default
+AltPipeline // Don't optimize IR before AD, but optimize the whole module twice after AD
+
+You can combine multiple autodiff
values using a comma as separator:
RUSTFLAGS="-Z autodiff=LooseTypes,NoVecUnroll" cargo +enzyme build
+
+The normal compilation pipeline of Rust-Enzyme is
+The alt pipeline will not run opts before AD, but 2x after AD - the first time without vectorization or loop unrolling, the second time with.
+The two flags above allow you to adjust this default behaviour.
+ +Enzyme started as a project created by William Moses and Valentin Churavy to differentiate the LLVM-IR, including languages with an LLVM frontends like C, Julia, Swift, Fortran, etc. Operating within the compiler enables Enzyme to interoperate with optimizations, allowing for higher performance than conventional methods while simultaneously not needing special handling for each language and construct. Enzyme is an LLVM Incubator projects and intends to ask for upstreaming later in 2024.
+In 2020, initial investigations on using Enzyme on Rust was led by Tiberius Ferreria and William Moses through the use of foreign function calls (https://internals.rust-lang.org/t/automatic-differentiation-differential-programming-via-llvm/13188/7).
+In 2021, Manuel Drehwald and Lorenz Schmidt worked on Oxide-Enzyme which aimed to directly integrate Enzyme as a compiler-aware cargo plugin.
+The current Rust-Enzyme project direct embeds Enzyme into rust and makes available autodiff macros for easy usage. The project is led by Manuel Drehwald, in collaboration with Jed Brown, William Moses, Lorenz Schmidt, Ningning Xie, and Rodrigo Vargas-Hernandez.
+We hope that as part of the nightly releases Rust-Enzyme can mature relatively fast because:
+std::map decrement
.The key aspect for the performance of our solution is that AD is performed after compiler optimizations have been applied +(and is able to run additional optimizations). This observation is mostly language independent and motivated in the +2020 Enzyme Neurips paper, and also mentioned towards the end of this non-Enzyme java autodiff case-study.
+We can use Enzyme without modifying rustc, as demonstrated in oxide-enzyme.
+This PoC required the use of build-std
, to be able to see the llvm-ir of functions from the std lib.
+An alternative would have been to provide rules for Enzyme on how to differentiate every function from the Rust std, which seems undesirable. It would however not be impossible, C++-Enzyme has various rules for the C++ std lib.
This approach also assumes that linking llvm-ir generated by two different cargo invocations and passing Rust objects between those works fine.
+This approach is further limited in compile times and reliability. See the example at the bottom left of this poster. LLVM types are often too limited to determine the correct derivative (e.g. opaque ptr), +and as such Enzyme has to run a usage analysis to determine the relevant type of a variable. This can be time consuming +(we encountered multiple cases with > 1000x longer compile times) and it can be unreliable, if Enzyme fails to deduce the correct type +of a variable due to insufficient usages. When calling Enzyme from within rustc, we are able to provide high-level type information to Enzyme. +For oxide-enzyme, we tried to mitigate this by using a Dwarf debug parser (requirering debug information even in release builds), but even with this helpers we were completely unable to support Enums due to their ability of representing different types. This approach was also limited since rustc (at the time we wrote it) did not emit Dwarf information for all Rust types with unstable layout.
+Various Rust libraries for the training of Neural Networks exist (burn/candle/dfdx/rai/autograph).
+We talked with developers from burn, rai, and autograph to compare the autodiff performance under the Microsoft ADBench Benchmark suite. After some investigation all three decided that supporting such cases would require significant redesigns of their projects, which they can't afford in the forseeable future.
+When training Neural Networks, we often look at few large variables (tensors) and a small set of functions (layers) which dominate the runtime. Using these properties it's possible to amortize some inefficiencies by getting the most expensive operations efficient. Such optimizations stop working, once we look at the larger set of applications for scientific computing or HPC.
Enzyme supports the ability to efficiently differentiate parallel code. Enzyme's unique ability to combine optimization (including parallel optimization) enables orders of magnitude improvements on performance and scaling parallel code. Each parallel framework needs only provide Enzyme lightweight markers describing where the parallelism is created (e.g. this is a parallel for or spawn/sync). Such markers have been added for various parallel paradigms, including: CUDA, ROCm, OpenMP, MPI, Julia tasks, and RAJA.
+Such markers have not been added for Rust parallel libraries (i.e. rayon). Enzyme only does need to support the lowest level of parallelism for each language,
+so adding support for rayon should cover most cases. We assume 20-200 lines of code in
+Enzyme core should be sufficient, making it a nice task to get started.
+rsmpi (Rust wrapper for MPI) should already work, but it would be good to test.
Batching allows computing multiple derivatives at once. This can help amortizing the cost
+of the forward pass. It can also be used to enable future vectorization. This feature
+is quite popular for ML projects. The JAX documentation gives an example here.
+Batching is supported by Enzyme core and can be trivially implemented for Rust-Enzyme in a few hours,
+the main blocker is bikesheding are around the frontend. Do we want to accept N
individual shadow arguments?
+Do we want to accept a tuple of N elements? An array [T;N]
?
Let's assume that you want to use differentiable rendering,
+but someone added a "fast" version of the inverse square root function to your render engine,
+breaking your Rust-Enzyme tool, which can't figure out how i = 0x5f3759df - ( i >> 1 );
would affect your gradient.
+AutoDiff packages for this reason allow declaring a custom derivative f'
for a function f
.
+In such a case the AD tool will not look at the implementation of f
and directly use the user provided f'
.
+Jax documentation also has a large list of other reasons due to which you might want to use custom derivatives: link.
+Julia has a whole ecosystem called ChainRules.jl around custom derivatives.
+Enzyme does support custom derivatives, but we do not expose this feature on the Rust side yet.
+Together with the Batching features, this is one of the highest rewards / lowest effort improvements planed for Rust-Enzyme.
Enzyme does support custom allocators, but Rust-Enzyme does not expose support for it yet. +Please let us know if you have an application that can benefit from a custom allocator and autodiff, +otherwise this likely won't be implemented in the forseeable future.
+While Enzyme is very fast due to running optimizations before AD, including various partial checkpointing algorithms -- such as a min-cut algorithm. The ability to control checkpointing (e.g. whether to recompute or store) has not yet been added to Rust. Optimal checkpointing generally lies in NP to find the optimal balance for each given program, but there are good approximations. You can think of it in terms of custom allocators. Replacing the algorithm might affect your runtime performance, but does not affect the result of your function calls. In the future it might be interesting to let the user interact with checkpointing.
+Enzyme consists of ~50k LoC. Most of the rules around generating derivatives for instructions are written in LLVM Tablegen.td declarations and as such it should be relatively easy to port them. Enzyme also includes various experimental features which we don't need on the Rust side, an implementation for another codegen backend could therefore also end up a bit smaller.
+The cranelift backend would also benefit from ABI compability, which makes it very easy to test correctness of a new autodiff tool against Enzyme. Our modifications to rustc_codegen_ssa
and previous layers of rustc are written in a generic way, s.t. no changes would be needed there to enable support for additional backends.
Enzyme supports differentiating CUDA/ROCm Kernels. +There are various ways towards exposing this capabilities to Rust. +Manuel and Jed will be experimenting with two different approaches in 2024, +and there is also a lot of simultaneous research. Please reach out if +you are also working on GPU programming in Rust.
+Enzyme partly supports multiple MLIR dialects. MLIR can offer great runtime
+performance benefits for certain workloads. It would be nice to have a
+rustc_codegen_mlir
, but there is a very large number of open questions around the design.
When using forward mode, we only have three choices of activity values, Dual
, DualOnly
and Const
.
+Dual arguments get a second "shadow" variable.
+Usually we will only seed the shadow variable of one Dual input to one and all others to zero,
+and then read the shadow values of our output arguments.
+We can also seed more then one input shadow, in which case the shadow of output variables will
+be a linear combination based on the seed values.
+If we use a &mut
reference as input and output argument and mark it as Dual,
+the corresponding shadow seed might get overwritten. Otherwise, the seed value will remain unchanged.
Activity | Dual | DualOnly | Const |
---|---|---|---|
Non integer input T | Accept T ,T | Accept byVal(T) , T | Unchanged |
Integer scalar input | N/A | N/A | Unchanged |
f32 or f64 output T | Return (T,T) | Return T | Unchanged |
Other output types | N/A | N/A | Unchanged |
DualOnly
is a potentially faster version of Dual
.
When applied to a return type, it will cause the primal return to not be computed.
+So in the case of fn f(x: f32) -> f32 { x * x }
,
+we would now only return 2.0 * x
, instead of
+(x * x, 2.0 * x)
, which we would get with Dual
.
In the case of an input variable, DualOnly
will cause the first value to be
+passed by Value, even when passed by Reference in the original function.
+So fn f(x: &f32, out: &mut f32) {..}
would become
+fn df(x: f32, dx &mut f32, out: f32, dout: &mut f32) {..}
.
+This makes x
and out
inaccessible for the user, so we can use it as buffer
+and potentially skip certain computations. This is mostly valuable for larger Types, or more complex functions.
We propose to add automatic differentiation to Rust. This would allow Rust users to compute derivatives of arbitrary functions, which is the essential enabling technology for differentiable programming. This feature would open new opportunities for Rust in scientific computing, machine learning, robotics, computer vision, probabilistic analysis, and other fields.
+++Automatic differentiation (AD, also known as autodiff or back-propagation) has been used at Argonne and other national laboratories, at least, since the 1980s. For example, we have used AD to obtain gradients of computational fluid dynamics applications for shape-optimization, which allows the automated design of aircraft wings or turbine blades to minimize drag or fuel consumption. AD is used extensively in many other applications including seismic imaging, climate modeling, quantum computing, or software verification.
+Besides the aforementioned “conventional” uses of AD, it is also a cornerstone for the development of ML methods that incorporate physical models. The 2022 department of energy report on Advanced Research Directions on AI for Science, Energy, and Security states that “End-to-end differentiability for composing simulation and inference in a virtuous loop is required to integrate first-principles calculations and advanced AI training and inference”. It is therefore conceivable that AD usage and development will become even more important in the near future. +1
+
++My primary applications are in computational mechanics (constitutive modeling and calibration), where it'll enable us to give a far better user experience than commercial packages, but differentiable programming is a key enabler for a lot of scientific computing and ML research and production.
+
Autodiff is widely used to evaluate gradients for numerical optimization, which is otherwise intractable for a large number of parameters. +Indeed, suppose we have a scalar-valued loss function \(f(\theta)\) where the parameter vector \(\theta\) has length \(n\). +If the cost to evaluate \(f(x)\) once is \(c\) (which will often be \(O(n)\)), then evaluating the gradient \(\partial f/\partial x\) +costs less than \(3n\) with autodiff or tedious and brittle by-hand implementation, but \(cn\) otherwise. +Optimization of systems of size \(n\) in the hundreds to billions are common in applications such as calibration, data assimilation, and design optimization of physical models, in perception and control systems for robotics, and machine learning.
+Derivatives are also instrumental to thermodynamically admissible physical models, in which models are developed using non-observable free energy functionals and dissipation potentials, with observable dynamics represented by their derivatives. Commercial engineering software requires users to implement these derivatives by hand (e.g., Abaqus UHYPER
and UMAT
) and constitutive modeling papers routinely spend many pages detailing how to efficiently compute the necessary derivatives since these are among the most computationally intensive parts of simulation-based workflows and numerical stability is necessary.