Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Design proposal for dynamic component loading. #47

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions proposed/dynamic-component-loading.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Dynamic component loading

## Problem statement
.NET Core runtime has only limited support for dynamically loading assemblies which have additional dependencies or possibly collide with the app in any way. Out of the box only these scenarios really work:
* Assemblies which have no additional dependencies other than those found in the app itself (`Assembly.Load`)
* Assemblies which have additional dependencies in the same folder and which don't collide with anything in the app itself (`Assembly.LoadFrom`)
* Loading a single assembly in full isolation (from the app), but all its dependencies must come from the app (`Assembly.LoadFile`)

Other scenarios are technically supported by implementing a custom [`AssemblyLoadContext`](https://github.com/dotnet/coreclr/blob/master/Documentation/design-docs/assemblyloadcontext.md) but doing so is complex.
Additionally, there's no inherent synergy with the .NET Core SDK tooling. Components produced by the SDK can't be easily loaded at runtime.

The goal of this feature is to provide an easy-to-use way to dynamically load a component with its dependencies.

## Scenarios
List of few scenarios where dynamic loading of full components is required:
* MSBuild tasks - all tasks in MSBuild are dynamically loaded. Some tasks come with additional dependencies which can collide with each other or MSBuild itself as well.
* Roslyn analyzers - similar to MSBuild tasks, the Roslyn compiler dynamically loads analyzers which are separate components with potentially conflicting dependencies.
* XUnit loading tests - the test runner acts as an app and the test is loaded dynamically. The test can have any number of dependencies. Finding and resolving those dependencies is challenging.
* ASP .NET's `dotnet watch` - ability to dynamically reload an app without restarting the process. Each version of the app is inherently in collision with any previous version. The old version should be unloaded.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include ASP.NET Core plugins like z-pages (see performance-profiling-controller.md) and monitoring tools like Application Insights.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKanzhelev In which way do you think the performance profiling controller is related to the dynamic component loading? I can potentially see some use cases, but I don't see that as a good example of the scenarios we want to support (not saying it's not applicable).

Monitoring tools and other profiler-like capabilities are probably more related to the [statup-hook] (https://github.com/dotnet/core-setup/blob/master/Documentation/design-docs/host-startup-hook.md) functionality. Also this new feature will explicitely not customize startup loading in any way. It will be on-demand - that is called by a user code. It is possible for startup-hooks to use this feature to load their own dependencies though.

I guess I'm missing the point here though - can you please describe in a bit more detail how do you think these are related?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion was how one can load ASP.NET components to implement profiling controller into an application that has no ASP.NET stack loaded or has it's own version of ASP.NET stack. Also those profiling controllers may implement some pluggable model to load data collection plug-ins on demand in runtime. Versioning and isolation was one of the discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For isolation, AssemblyLoadContext is definitely the solution. For the on demand runtime loading - that would be this feature (it's possible, but rather hard right now). Thanks for pointing that out - I'll include it in the list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another scenario is startup hook. Perhaps spec of profiler hook should be updated to suggest using isolation loading for those hooks. @sbomer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to ICorProfiler when you say profiler? That is native and is not impacted by this work. The startup hook is interesting though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I misspoke. I meant startup hook I referenced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry about the previous comment, didn't see this one yet). I agree that using isolation for startup hooks makes sense in some cases. But that functionality is available already. This feature is actually not about providing isolation per-se. It's about the ability to load components with dependencies (which has been hard so far). As seen in the discussion in this proposal, it's likely we will provide a way to do this loading both in isolation or not.

* Possibly profiling controller - see [performance-profiling-controller](https://github.com/dotnet/designs/blob/master/accepted/performance-profiling-controller.md) for the proposal. There are plans to load "plugin" like components at runtime to provide the controller functionality.

In most of these cases the component which is to be loaded dynamically has a non-trivial set of dependencies which are unknown to the app itself. So the loading mechanism has to be able to resolve them.

## Declaring dependencies
The .NET Core SDK tooling produces `.deps.json` which is a dependency manifest for dotnet apps and components. It enables them to load dependencies from locations other than the base directory (ex: from packages, platform-specific publish directories, etc.).
At application start .NET Core first builds a set of directories to look for binaries (called ProbingPaths) based on application/component base directory, framework directory, `.runtimeconfig.dev.json`, command line, servicing locations, etc. For more details see [host-probing](https://github.com/dotnet/core-setup/blob/master/Documentation/design-docs/host-probing.md). It then uses the `.deps.json` to locate dependencies in these paths.

For an application or component, the `.deps.json` file specifies:
* A set of dependencies and their assets
* Relative paths to locate them - relative paths in `.deps.json` file can be used to locate architecture-specific dependencies within the probing locations.

If the app depends on any frameworks, the `.deps.json` files of those framework are similarly processed.
Further details about the algorithm used for processing dependencies can be found in [assembly-conflict-resolution](
https://github.com/dotnet/core-setup/blob/master/Documentation/design-docs/assembly-conflict-resolution.md).

## Dynamic loading with dependencies
We propose to add a new public API which would dynamically load a component with these properties:
* Component is loaded in isolation from the app (and other components) so that potential collisions are not an issue
* Component can use `.deps.json` to describe its dependencies. This includes the ability to describe additional NuGet packages, RID-specific and/or native dependencies
* Component can chose to rely on the app for certain dependencies by not including them in its `.deps.json`
* Optionally such component can be enabled for unloading

Public API (early thinking):
```csharp
class Assembly
{
public static Assembly LoadFileWithDependencies(string path);
}
```

At its core this is similar to `Assembly.LoadFile` but it supports resolving dependencies through `.deps.json`. Just like `Assembly.LoadFile` it provides isolation, but also for the dependencies.

```csharp
class AssemblyLoadContext
{
public static AssemblyLoadContext CreateForAssemblyWithDependencies(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we are going to have scenarios where we will want multiple plug-ins loaded into the same assembly load context. I prefer this factory method to be an instance method so that developers can choose.

string assemblyPath,
AssemblyLoadContext fallbackContext,
bool enableUnloading);
}
```

"Advanced" version which would return an ALC instance and not just the main assembly. Allows for additional changes to the `AssemblyLoadContext`, like registering an event handler to the `Resolving` event for example.
Also adds the ability to specify:
* `fallbackContext` which is the `AssemblyLoadContext` to defer to when the assembly resolution is not possible in this current context. By default this is the `AssemblyLoadContext.Default` (the app's context). This allows for creating effectively parent-child relationships between the load contexts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the parent-child relationship and the various possible strategies mentioned below, it seems this would be better named parentContext

* `enableUnloading` which will mark the newly created load context to track references and to trigger unload for the load context and all the assemblies in it when possible.

## High-level description of the implementation
* Implement a new `AssemblyLoadContext` which will provide the isolation boundary and act as the "root" for the component. It can be enabled for unloading.
* The new load context is initialized by specifying the full path to the main assembly of the component to load.
* It will look for the `.deps.json` next to that assembly to determine its dependencies. Lack of `.deps.json` will be treated in the same way it is today for executable apps - that is all the assemblies in the same folder will be used as dependencies.
* Parsing and understanding of the `.deps.json` will be performed by the same host components which do this for executable apps (so same behavior/quirks/bugs, very little code duplication). Specifically `hostpolicy.dll` is the component which parses `.deps.json` during app startup. See [host-components](https://github.com/dotnet/core-setup/blob/master/Documentation/design-docs/host-components.md) for more details. If this functionality is required when running with a custom host, the said host would need to provide this functionality to the runtime.
* If the component has `.runtimeconfig.json` and/or `.runtimeconfig.dev.json` it will only be used to verify runtime version and provide probing paths.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given @natemcmaster 's post here
https://natemcmaster.com/blog/2017/12/21/netcore-primitives/ and the related cited documentation in CLI...

It seems the .runtimeconfig.json is intended for the human edited configuration. While this is in its infancy, I wonder if it is the only config file we should support. Once we work through exactly what we want we can adjust the tooling to write the .deps.json file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry - probably missing the point of your question.
.runtimeconfig.json serves different purpose than .deps.json.
The former is used to locate the runtime and the runtime options to start the runtime with. It also stores framework dependencies - this is somewhat weird since the "dependencies" part of that should be in .deps.json logically. On the other hand, the way to find the runtime is to locate the lowest-level framework, so that part needs to be in .runtimeconfig.json.
In a way the runtime-config part is meant to be human editted. The framework dependencies are generally machine generated as those fall out of the compilation process.
The latter is used to locate specific assembly dependencies. This almost entirely a result of the compilation process and thus machine generated.

Please note that vast majority of dynamically loaded components will not have .runtimeconfig.json. The proposed new functionality is about finding assembly dependencies and as such it's not really tied to the .runtimeconfig.json (other than the probing paths), on the other hand it is very much tied to .deps.json

* The load context will determine a list of assemblies similar to the TPA and list of resources and native search paths and remember them. These will be used to resolve binding events in the load context.

## Handling of various asset types
What happens with various asset types once the ALC decides to load it in isolation:
* Normal managed assembly (code) - The `.deps.json` parsing code will return list of full file paths. The ALC will just find it there and load it.
Note that R2R images are handled by this as well since they are basically just a slightly different managed assembly. Also note that .NET Core can only load a given R2R image once as R2R. Any subsequent load of the same file will work, but it will be only used as a pure IL assembly (and thus require JITing). Loading the same file multiple times can occur if two load contexts decide to load the assembly in isolation.
* Satellite assemblies (resources) - Two possibilities:
* Imitate app start behavior exactly - `.deps.json` only provides list of resource probing paths. ALC would then try to find the `<culture>/AssemblyName.dll` in each probing path and resolve the first match.
* Use full paths - `.deps.json` resolution actually internally produces a list of full file paths and then trims it to just probing paths. The full file paths could be used by the ALC in a very similar manner to code assemblies.
* Native libraries - To integrate well the ALC would only get list of native probing paths from the `.deps.json`. It would then use new API to load native library given a probing path and a simple name. Internally this would call into the existing runtime behavior (which tries various prefix/suffix combinations and so on).

## Isolation strategies
There are several ways the new dynamic component loading can handle isolation:
* **Full isolation** - in this case a new load context is created and it always tries to resolve the bind operation first. If it can do so, it will load the dependency from the resolved location into the new load context. Only the dependencies which can't be resolved (typically framework assemblies) will be handed to the parent (Default) load context.
* This provides full isolation to the component. Every dependency the component carries with it will be used from the component and loaded in isolation from the rest of the app. Basically avoids any potential collisions.
* On the downside, this doesn't provide implicit sharing. Typically only framework assemblies would be shared in this scenario. Component would have to explicitly choose which assemblies to share, by not carrying them with it (this can be setup in project file by using `CopyLocal=false` for assembly references, similar option exists for project and NuGet references as well). This means that types used for communication between the app and the component would have to be explicitly shared by the component (via the exclusion of the assembly in the component). This is not done by the SDK by default, so it's easy to get this wrong and even with improved diagnostics will be relatively hard to debug.
* **Always load into default** - in this case all loads are done to the parent (Default) load context. In the extreme case a new load context is not really needed. Alternative could be that the load is attempted to the default load context (first by name, then by resolved file path). If that fails (should really only happen in case of a collision), the dependency would be loaded into the new load context in isolation.
* Inherently shares as much as possible - avoids problems of sharing types used for communication.
* Downside is that loading a component will "pollute" the default load context with assemblies from the component.
* Loading multiple components with similar dependencies can easily lead to unpredictable results as what gets loaded where would depend on ordering.
* Auto-upgrades - if the app uses a higher version of a given dependency, component which uses a lower version of the dependency will get the app's version - this can lead to incompatibilities.
* **Prefer default** - in this case a new load context is created. When it tries to resolve a dependency, it will first try to resolve it against the parent (Default) load context. If the parent can satisfy the dependency, it will be used. Otherwise if the dependency can't be resolved by the parent, the dependency will be resolved to a file path and loaded in isolation into the new load context.
* Inherently shares all assemblies from the parent (Default) - avoids problems with sharing types used for communication.
* There's no pollution of the parent context - no new files will be loaded into the parent context.
* Auto-upgrade - components will auto-upgrade to the version of a dependency they share with the app (downgrade will not occur, in that case the dependency would be loaded in isolation). This can lead to incompatibilities.

Open questions:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important consideration for me is versioning of EventSource names and manifests when multiple versions are loaded into the process. Same for diagnostics source.

It is my understanding that all isolation models will allow to subscribe to event and diagnostics sources from the loaded assembly as those sources are defined in system assemblies. If so - some naming issue may occur.

CC: @vancem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that isolation is not a new feature. It has been available for a long time now. It's done through the AssemblyLoadContext class which is not new. Also some methods on Assembly provide isolation (Assembly.LoadFile loads the specified file in isolation). So it's already possible to load even the same assembly twice (or two similar assemblies and so on). I don't know if there's been any thinking on the impact of this capability on the EventSource mechanisms. I'll let @vancem or @brianrob to comment on that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vancem, @brianrob what do you think? Can you confirm that loading in separate AssemblyLoadContext will still make everybody use the "global" Event and Diagnostics Source?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that from assembly binding perspective the System.Private.CoreLib which is where most of the code for EventSource and alike lives is only ever loaded into the app once. In fact corelib is special and the runtime will not allow loading a second copy even if custom code would specifically ask for it. But that doesn't mean that the event source code doesn't have some internal data structures which might have issues with multiple copies of the same assembly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKanzhelev - I view all this loader stuff as pretty orthogonal to event/source naming issues in EventSource/DiagnosticSource. (Yes there are issues but this feature does not make the problem any better or worse).

The one place I can think of that is a issue is that TODAY we make it illegal to create two EventSources with the same name in the same AppDomain. We did this for relatively weak reasons (it is a perf issue). This behavior WILL cause grief if an assembly is loaded twice (because of isolation). It will call the SHARED System.Private.Corelib version of EventSource which will fail because the name is already used. I have always been ambivalent about this rule, so I don't mind if we stop checking for it.

As far as the rest goes, I think it is roughly orthogonal. The names WILL overlap but that is good (you think of them as the 'same' events really). For DiagnosticSource, there will be a separate publication point for each System.Diagnostic.DiagnosticSource assembly that is loaded (since there is a static in their that you subscribe to to get events.

So we may wish to relax the uniqueness check in EventSource, but otherwise I think it will be OK. If you have specific concerns, we can walk through specific cases in more detail.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior WILL cause grief if an assembly is loaded twice (because of isolation). It will call the SHARED System.Private.Corelib version of EventSource which will fail because the name is already used.

So custom EventSource creation will crash? Potentially an entire process? Or this will be handled? In any case - this may become a common case with plugins using libraries like Application Insights for telemetry reporting. Application Insights SDK instantiate an EventSource for troubleshooting.

For DiagnosticSource, there will be a separate publication point for each System.Diagnostic.DiagnosticSource assembly that is loaded

Will different DiagnosticSource-s share the AsyncLocal (Activity.Current)? If not - distributed tracing scenarios like Activity started in ASP.NET core host and than child activity created inside the plugin would not work. Is there anything that can be done to special-case the DiagnosticsSource?

* Which is the default behavior?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial preference was for the Full Isolation case, but the more I think about it the more I prefer the Prefer default implementation.

I would rename it Prefer parent and use the hierarchy. This would help allow for plugin families. It could easily be mimic with natural hierarchical directory structure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like class loaders in Java? Perhaps than assemblies loading can have an API as well so it may decide whether to give a chance to parent to load assembly first. The same way as class loaders works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKanzhelev I have only very limited knowledge of Java in general. So far what I've read about class loaders they are very similar to .NET Core's AssemblyLoadContext. This class already provides ways for a custom implementation to defer to the parent first if the custom code wants to do that.

This proposal heavily relies on AssemblyLoadContext as the way to achieve isolation. I guess one way of thinking about this proposal is that we're trying to add an easier-to-use class loader to dynamically load components with their dependencies. But again, I know very little about the Java world, so maybe I'm completely wrong.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to read about AssemblyLoadContext more.

Yes, I was mostly referring to the fact that class loaders can be custom coded to decide on loading ordering themselves. And also there are few well-known class loaders which user is aware of and can decide to load some assemblies there so they will share statics.

If simplified version if enough for most scenarios - it should be fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a description of AssemblyLoadContext. It's not entirely up-to-date, but the basics are there. Also added this link to the doc. Let me know if there are unclear parts, I'll go fix the other doc as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know my comment won't be of any help in terms of how you guys would proceed but I just wanted to say, with 20 or so many years of software development background, the AssemblyLoadContext is very hard to understand. Either due to the API design or due to the lack of documentation (samples rather than API description), I don't know. Just wanted to share how I feel about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CoskunSunali Thanks. We are hearing the same thing from lots of customers.

* Does the framework implement more than one behavior and lets users choose?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas suggested in another thread that we defer the convenience API until we get a feel for which is the right approach.

When we were prototyping hardware intrinsics for 2.1, late in the cycle we pushed the work into experimental nuget packages. I am wondering whether that might be the right approach here.

I don't see a reason why we couldn't create a nuget package to expose these API's experimentally so that we can get real world experience and hand on feedback.

The experimental package(s) could be used to provide sample code for the usage of the existing 2.1 AssemblyLoadContext and help us develop proper documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help move forward providing an instance method on AssemblyLoadConext to enable the LoadWithDependencies seems like a reasonable way to acheive the balance we have been discussing. It will lead people towards going for isolation, but enables them to choose which ALC to load into. We can then choose which version is most appropriate for Assembly.Load*WithDependencies (eg. the default)

* In the "prefer default" behavior - what does it mean for the parent context to "satisfy" the dependency? Does it mean exact version match, or does it allow auto-upgrade (patch, minor or even major)? Do we even allow downgrades?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the default should be auto upgrade.

However, it seems this should also be configurable. I would suggest a new config property in the .runtime.config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.runtimeconfig.json already has settings which control auto-upgrade - also called roll-forward (I should use that term instead). See this doc.

Those apply to frameworks though, not assemblies. But in the context of components (since those rely on framework fromt he app) it would make sense to use them for individual assemblies as well.

## Important implications and limitations
* Only framework dependent components will be supported. Self-contained components will not be supported even if there was a way to produce them.
* The host (the app) can be any configuration (framework dependent or self-contained). The notion of frameworks is completely ignored by this new functionality.
* All framework dependencies of the component must be resolvable by the app - simply put, the component must use the same frameworks as the app.
* Components can't add frameworks to the app - the app must "pre-load" all necessary frameworks.
* In all cases framework assemblies (and thus types) will be shared between the app and the component. Sharing of other assemblies depends on the isolation strategy used - see above.
* Pretty much all settings in `.runtimeconfig.json` and `.runtimeconfig.dev.json` will be ignored with the exception of runtime version (probably done through TFM) and additional probing paths.