Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Add startup hook design document #4421

Merged
merged 12 commits into from
Aug 31, 2018
Merged

Add startup hook design document #4421

merged 12 commits into from
Aug 31, 2018

Conversation

sbomer
Copy link
Member

@sbomer sbomer commented Aug 1, 2018

This adds a proposal for an early startup hook that allows managed
code to run before the entry point.

@jeffschwMSFT @vitek-karas @swaroop-sridhar @AaronRobinsonMSFT @richlander @jkotas

This adds a proposal for an early startup hook that allows managed
code to run before the entry point.
@jkotas
Copy link
Member

jkotas commented Aug 1, 2018

@brianrob Does this look like something you would want to use for injecting Performance Controller into existing apps?


Specifically, hostpolicy starts up coreclr and sets up a new AppDomain
with `ManagedHost.dll` on the TPA list. If it read these settings from
environment variables, it clears them so that child processes do not

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.


## Example

This could be used with AssemblyLoadContext APIs to resolve

This comment was marked as spam.

This comment was marked as spam.


Specifically, hostpolicy starts up coreclr and sets up a new AppDomain
with `ManagedHost.dll` on the TPA list. If it read these settings from
environment variables, it clears them so that child processes do not

This comment was marked as spam.

The producer of `ManagedHost.dll` needs to ensure that
`ManagedHost.dll` is compatible with the dependencies specified in the
main application's deps.json, since those dependencies are put on the
TPA list when the runtime is started.

This comment was marked as spam.

This comment was marked as spam.


Specifically, hostpolicy starts up coreclr and sets up a new AppDomain
with `ManagedHost.dll` on the TPA list. If it read these settings from
environment variables, it clears them so that child processes do not

This comment was marked as spam.


## Example

This could be used with AssemblyLoadContext APIs to resolve

This comment was marked as spam.

centralized, while still allowing user code to do its own thing if it
so desires.

The producer of `ManagedHost.dll` needs to ensure that

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

```
{
"runtimeOptions": {
"managedHostAssembly": "/path/to/ManagedHost.dll",

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.


This would allow hosting providers to put configuration and policy in
a single location, including settings that potentially influence load
behavior of the main entry point such as the AssemblyLoadContext

This comment was marked as spam.

## Proposed behavior

Environment variables or `<appname>.runtimeconfig.json` can be used to
specify a managed assembly and type that contains an `Initialize`

This comment was marked as spam.


The environment variables, if set, take precedence over the
`<appname>.runtimeconfig.json` settings. These settings result in
`ManagedHostType.Initialize()` being called in hostpolicy, before the

This comment was marked as spam.

This comment was marked as spam.

@brianrob
Copy link
Member

brianrob commented Aug 1, 2018

This does seem like something that could help with injecting the performance controller. One thing that comes to mind:

This mechanism will probably need to support multiple entries into the start-up hook. People will use these hooks in different ways, and I can see them wanting to have two or more.

For example, one hook is used to set assembly loading policy, while the other is used to start-up the performance controller or some other telemetry service before Main() is called. These two cases might be implemented by two different parties (say the assembly loading policy is implemented by the hoster while the performance controller policy is implemented by Microsoft). Thus, they can't be combined.

FYI @vancem, @valenis

@vancem
Copy link

vancem commented Aug 1, 2018

My biggest feedback is the same as @brianrob, which is that users that are injecting their startup code have not reason to believe they are the ONLY ones. Thus the scheme has to work when more than one system is doing this. Thus it is not a DLL/Method name, but a LIST of those specifications.

Thus I would recommend instead having NOT having a tuple, but a syntax. I would recommend that you have PATH!Type, but I would also recommend making a default type name (e.g. DotNetStartup). This allows people to only specify the PATH (since they will be building the type specificaly for this purpose, so can conform to whatever convention we wish).

In either case as single string represents the 'beforeStartup' point. The environment variable/config entry is then a semicolon separated list of these.

This does bring up the question of ordering of these things (everybody wants to be first). Certainly we would order them by the order in the semicolon delimited list, but if we get this data from multiple places (some from env vars, some from config), then it is not so clear. Even if it was clear, it is not obvious that the software updating the config/env vars knows enough to do a good job.

To address this, we could have convention if the DLL name (before suffix) ends in a _N* where N is a digit, that it is a desire to be point in that order (lexographic) Thus X_12.dll comes before A_2.dll ...).

The main point, however is that multiple 'things' will want to use this hook, and they need to be able to do so without collision.

Comments?

@brianrob
Copy link
Member

brianrob commented Aug 1, 2018

With respect to what @vancem says about ordering, there is precedent for using a priority value to handle ordering. GCC does this with usages of __attribute__((constructor)) and __attribute__((destructor)) which are responsible for running before and after entrypoint execution.

From https://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Function-Attributes.html:

You may provide an optional integer priority to control the order in which constructor and destructor functions are run. A constructor with a smaller priority number runs before a constructor with a larger priority number; the opposite relationship holds for destructors. So, if you have a constructor that allocates a resource and a destructor that deallocates the same resource, both functions typically have the same priority. The priorities for constructor and destructor functions are the same as those specified for namespace-scope C++ objects (see C++ Attributes).

I'd recommend making it part of the config and not part of the DLL name, just because it will be easier to change when needed.

@jeffschwMSFT
Copy link
Member

@vancem @brianrob we discussed multiple possible entries and decided to go with a similar model to what we have in .NET framework. https://msdn.microsoft.com/en-us/library/system.appdomainmanager(v=vs.110).aspx Do you know how the performance scenarios operate today in .NET Framework? Given this is an advanced scenario it may be that the developer has to wrap these 3rd party providers and call them in the order they choose explicitly. This is not ideal, but hopefully this is not incredibly common.

@yizhang82
Copy link

@vancem @brianrob @jeffschwMSFT
In the future it would be very nice to have nuget package have their own managed host and flow that into deps.json which contains all the combinations, and .NET Core loader can find the correct managed host depending on which assembly is being loaded.

Having said that, we (service fabric) need a stop-gap solution where there is a single application/orchestration platform that launches the user micro-service and allow them to load shared service fabric assemblies from the platform (rather than app-local), as if we are carrying our own GAC for those user services. We are also hoping this feature can be back ported to .NET Core 2.0, so a simpler solution with less dependencies/complexity is preferable at this point.

@jkotas
Copy link
Member

jkotas commented Aug 1, 2018

FWIW, I agree with @vancem that we should allow more than one of these to be specified (as you know I have mentioned it during our discussions).

I do not think we need to have a way to tweak the order. The moment you get multiple of these and they are colliding, you are in trouble.

@sbomer
Copy link
Member Author

sbomer commented Aug 1, 2018

Thanks all for the feedback!

@brianrob, @vancem I can see why people would want multiple such hooks, and I don't think the native host is the right place. Rather, I'd hope that the ManagedHost.dll can set up further hooks in managed code if that's needed.

This startup hook exists to solve the bootstrapping problem of "how can we (host provider or user code) control, in managed code, how the entry point dependencies are loaded, given that those dependencies are usually loaded before any managed code runs?". This hook makes that possible by separating the concepts of "user entry point" and "first managed code to run", while imposing some constraints on the host implementation (it needs to be compatible with the app's dependencies).

In other words, this setting specifies which managed code runs first, and we're recommending it as a solution for folks who need to solve the bootstrapping problem because they don't control the managed entry point. Anyone who does control the entry point needn't do this - and anyone who uses this setting is expected to understand that it only works in a cooperative environment.

I agree that if our recommendation is ignored (if a bunch of folks go and try to use this hook to control which managed code runs first), we'll hit issues like this. Perhaps we can design this in a way that would allow chained "first managed code" settings (which inevitably has the same problem - who is first of the first?), but I hope we never have to go there.

@vancem How about this: we put the setting in one variable, so it just becomes a path to an assembly and a type. This would make it possible to extend with additional syntax if we ever need to. But in this iteration we only allow one "first managed code" setting.

Thoughts?

edit: thanks @AaronRobinsonMSFT for our chat in the hall about this!

@sbomer
Copy link
Member Author

sbomer commented Aug 1, 2018

@yizhang82 I think the long-term solution that allows managed hosts to be shipped as nuget packages is just to provide a managed hosting library that has to be explicitly initialized from Main. This feature is intended more as the mentioned stop-gap solution for cases where, for whatever reason, the entry point is a bad place to put this.

@jkotas
Copy link
Member

jkotas commented Aug 1, 2018

This startup hook exists to solve the bootstrapping problem of "how can we (host provider or user code) control, in managed code, how the entry point dependencies are loaded,

This startup hook is very powerful. I had seen people coming to me with all sorts of problems that could have been solved by it. I am pretty sure it will be used for more just the scenario motivating it.

stop-gap solution

I do not see this as stop-gap solution. If this is a stop-gap solution, we should not build this.

@sbomer
Copy link
Member Author

sbomer commented Aug 1, 2018

@jkotas I should say "narrow use case" rather than "stop-gap" in my previous post. But if a hook like this is likely to be used for other reasons, I agree that we need to consider those. Do you have any examples of other problems solved by it, that wouldn't be better solved by adding managed hooks to user code?

@vancem
Copy link

vancem commented Aug 1, 2018

@sbomer - The obvious use case for this is telemetry injection. Tools which to be able to 'install' (wrap) an application so that it logs telemetry automatically. They can do all the magic once they are in the process, but they need to be started, and the sooner the better. Note these tools do not wish to modify the apps (but modifying config is acceptable).

Note that there could be multiple telemetry tools, each of which want their hook. A Debugger is a kind of telemetry tool and would probably want its own hook. But even without two telemetry systems, the telemetry system wants to be able to inject even if you are already using the injection point.

I know that there is a desire to keep things 'simple' but frankly we are not talking about alot of code here. If we simpy change the syntax (@sbomer seemed to be OK with that (It is better anyway, modifying one thing is less error prone that modifying two things).

GIven that, then the actual implementation difference is trivial (you split the string and make it a loop). It is not hard. Yes, there are issues, but as you point out most of them go away in the singleton case. Even in the multi-client case, often it is not a problem (last modifier gets to make the choice (and it will probably be that his is first)).

We can defer the question of priorities for now (the mechanism I described before can be added after the fact if we need it).

So I don't see the problem. There is an obvious way forward.

@ashishnegi
Copy link

ashishnegi commented Aug 2, 2018

One requirement : (a) Is there any existing api in .Net Core that gives precedence to a path over app local (as mentioned in deps.json) to load dlls ? This is what GAC does in .Net Framework.

Scenario: We have existing deployed coreclr apps that have X.dll in app local. In .Net Framework X.dll is GACed.

Without (a), upgrading to next Service Fabric runtime (which use this feature) will break existing apps.

This is required for our Linux cluster where customers have deployed FDD coreclr apps and using dotnet installed by Service Fabric runtime.

For new apps, we can change our SDK to put X.dll in ref folder of nupkg. X.dll will not become part of deps.json. So, this problem does not occur with new apps.

@jkotas
Copy link
Member

jkotas commented Aug 2, 2018

@ashishnegi Why can't you leave the existing service fabric applications alone and not use any of this for them?

@ashishnegi
Copy link

@jkotas Leaving older coreclr apps as is mean :
a) either Not shipping new features in platform runtime.
b) Or not upgrading a cluster with newer runtime when it may have only one coreclr app. The cluster may have many other non-coreclr apps. This is equivalent to (a) for apps running on this cluster.

I believe one of the motivation for this startup hook is to solve following scenario :

We have sdk_X.dll and impl_X.dll. 
sdk_X.dll is part of our SDK. 
impl_X.dll is part of platform runtime and has actual implementation code.

In .Net Framework, sdk_X.dll is GACed and hence for all .net framework apps, 
they get same version of sdk_X.dll and impl_X.dll. They just works.

In .Net Core, sdk_X.dll is app local while with platform runtime upgrade, impl_X.dll is newer.
impl_X.dll will try to access new properties in sdk_X.dll which are missing in older sdk_X.dll present within apps. Thus, older apps fail.

As you see, sdk_X.dll (i.e. coreclr apps) is coupled with runtime (impl_X.dll).

Not breaking older coreclr apps, mean Not shipping new features (accessing new properties) in runtime.

@jkotas
Copy link
Member

jkotas commented Aug 2, 2018

The cluster may have many other non-coreclr apps

How are you dealing with this problem for the non-coreclr apps?

Not breaking older coreclr apps

It sounds like that you are rolling forward all coreclr apps to the latest .NET Core runtime today. It has potential to break them. If you want to have high confidence in not breaking older coreclr apps, you should keep running them against the runtime that they were built for. We do not guarantee bug-for-bug compatibility between major .NET Core versions.

How does this work for non .NET apps? If I have say Java service fabric app, are you going to force a particular Java runtime version on me, or can I choose which Java runtime to run on?

@yizhang82
Copy link

How are you dealing with this problem for the non-coreclr apps?

In full desktop everything is loaded from GAC - even though some SDK assemblies are app-local they are still loaded from GAC.

@jkotas
Copy link
Member

jkotas commented Aug 2, 2018

I meant non-.NET apps - e.g. Java apps?

@yizhang82
Copy link

Java apps don't have this problem today because they are on an entirely different technology stack that is based on our native store/replicator/logger. The "managed" SDK layer is very thin and the majority work is done through JNI into native layer with a well defined ABI. So having everything app local works great there, and native libraries are loaded from platform path (from enviroment variable when process is launched)

The new .NET SDK is also based on the native store/replicator/logger so it is also app-local and mostly p/invoke to native code.

As you can see, the new model is better suited for CoreCLR and all "managed" componets can be app-local.

However, the context here is the existing "old" .NET SDK where the implementation is mostly managed (yes, we have both managed and native implementation of the same thing, and the native implementation being the new thing) and exist in platform folder (internally we call it runtime folder but I intentionally avoid it here...), and that's where the app-local vs platform folder diverge happens. I can imagine if we built this technology entirely in Java it would have the same problem/requirement.

of any startup hook.

-- Invalid syntax. This will throw an `ArgumentException` with "The

This comment was marked as spam.

This comment was marked as spam.

existing structured dependency injection framework in place. An
example of such a use case is a hook that injects logging, telemetry,
or profiling into an existing deployed application at run tim.e

This comment was marked as spam.

It is prone to ordering issues when multiple hooks are used, and does
nothing to attempt to make dependencies of hooks easy to
manage. Multiple hooks, if used, should be fairly independent of each

This comment was marked as spam.

dependency manager that loads components in a specific order. If this
kind of behavior is required, a proper dependency injection framework
should be used instead of multpile startup hooks.

This comment was marked as spam.

@sbomer
Copy link
Member Author

sbomer commented Aug 17, 2018

@davidfowl similar issues exist for this startup hook. The hook on its own does not introduce any loading policy, so whoever is deploying the hook/app needs to ensure that their load policies work well together. This feature is more low-level than --additionalDeps and doesn't even try to make loading of transitive closures just work. From what I've heard, --additionalDeps made this sound easy but was difficult to use in practice. The startup hook makes no claim that this will be easy in the first place. :)

those are the sort of things that people don't think through when building these plugins

I'm hoping that folks won't see this as a solution to their plugin problems. I think the use case for the startup hooks is a bit more narrow than that, and more work needs to go into providing a proper plugin model.

I think we need strong guidance for the types of code people should put in these startup hooks

I added a section with some guidance to start with. Please let me know if you think something needs to be added, changed, clarified, etc.!


Problems with the startup hook should be fairly straightforward to
diagnose. All of these exceptionts will contain the startup hook path
(`System.StartupHookProvider.ProcessStartupHooks`) on the stack

This comment was marked as spam.

This comment was marked as spam.

}
```

## Error handling details

This comment was marked as spam.

This comment was marked as spam.

The type must now be called StartupHook, and can not be set by the
environment variable. The descriptions and examples were updated to
reflect this.

Also:
- add a few notes about the visibility of this type
- describe behavior of uncaught exceptions in startup hooks
- give example with platform-specific separators
- fix some typos
@sbomer
Copy link
Member Author

sbomer commented Aug 23, 2018

@jeffschwMSFT please give this another look when you get a chance.

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Instead of the host creating a delegate and inoking it,
ExecuteMainMethod in the runtime will call the System.Private.CoreLib
function that processes startup hooks. This allows the thread
apartment state to be set based on any attributes in the Main method,
before startup hooks execute.
in the `Main` method of the app, before startup hooks execute. As a
result, attemps to explicitly set the thread apartment state in a
startup hook will likely fail.

This comment was marked as spam.

@@ -51,13 +51,14 @@ will be inherited by child processes by default. It is up to the
desired.

Specifically, hostpolicy starts up coreclr and sets up a new
AppDomain. It then invokes a private method in
`System.Private.CoreLib`, which will call each
AppDomain, passing in the startup hook variable if it was

This comment was marked as spam.

@sbomer
Copy link
Member Author

sbomer commented Aug 29, 2018

Two more concerns:

  1. Unfortunately, it looks like it won't be possible to use the startup hook to forcibly preload app dependencies - LoadFromAssemblyPath will resolve the assembly's simple name to the path it was given on the TPA list.
  2. Even if we make some changes to the clr assembly loading logic to let LoadFromAssemblyPath override TPA paths, moving the hook to ExecuteMainMethod (after reading the main method's attributes) means that the main dll's static dependencies will already be loaded by the time the startup hook executes.

The takeaway is that we're back in a place where the startup hook alone won't be useful as a way to modify load behavior of dependencies of main on the TPA list.

@yizhang82 for Service Fabric, it could be used to set a resolving event that applies to non-TPA assemblies. It sounds like this would still enable you to add new API surface for newly published apps by having them reference the new assembly, without putting it on the TPA list or deploying it app-local. Can you confirm that this is your intended use?

@jeffschwMSFT
Copy link
Member

@sbomer can we share the modified components with @yizhang82? I was talking with @jkotas about the Service Fabric scenario and it seems they may have a viable way of using this feature to enable their scenario.

@sbomer
Copy link
Member Author

sbomer commented Aug 30, 2018

@jeffschwMSFT That's what I was mentioning. I'll get in touch with Service Fabric and see if they want to test some private bits. I'd still like confirmation from @yizhang82 that I've understood their use case.

@sbomer sbomer closed this Aug 30, 2018
@sbomer sbomer reopened this Aug 30, 2018
@sbomer
Copy link
Member Author

sbomer commented Aug 30, 2018

Talked to @yizhang82 - so far it sounds like we'll enable their scenario as I described. I'm going to work with them to test this, since this area has given us a few unexpected surprises so far...

@sbomer sbomer merged commit c245dc4 into dotnet:master Aug 31, 2018
@samsosa
Copy link

samsosa commented Sep 26, 2018

From a security perspective this could be a major risk factor. Malicious dll injection made easy? Are there any plans to protect against unwanted loading?

@AaronRobinsonMSFT
Copy link
Member

Malicious dll injection made easy?

These kind of vectors already exist in coreclr. See the CORECLR_PROFILER_PATH environment variable. A discussion was had regarding this issue.

sbomer added a commit that referenced this pull request Oct 17, 2018
* Add startup hook design document (#4421)

* Add startup hook design document

This adds a proposal for an early startup hook that allows managed
code to run before the entry point.

* Change env var behavior and clarify version constraints

* Fix formatting, specify Initialize() signature

* Update spec for multiple hooks in environment variable

- Move the configuration to a single environment variable,
  DOTNET_STARTUP_HOOKS
- Introduce syntax to allow multiple hooks to be defined
- Remove .runtimeconfig file options
  This will only support the environment variable, since
  this is a low-level feature that should only be used when implicit
  injection of code is desired.
- Update implementation details

* Remove default type name and other fixes

- Require an explicit type name for each startup hook
- Fix Initialize() signature (it must be static)
- Clarify that the Initialize() methods are called synchronously

* Require full path, clarify syntax, and change TPA behavior

- Relative paths to the assemblies are not allowed
- Use platform-specific path separator
- Clarify restrictions on the syntax (no empty entries, trailing separator)
- Startup assemblies will not be placed on TPA list
  Instead, they will be loaded directly using LoadFromAssemblyPath in
  the implementation.

* Add guidance, error behavior, and thread behavior

* Remove specific error text, fix typos and formatting

* Remove type name from synax

The type must now be called StartupHook, and can not be set by the
environment variable. The descriptions and examples were updated to
reflect this.

Also:
- add a few notes about the visibility of this type
- describe behavior of uncaught exceptions in startup hooks
- give example with platform-specific separators
- fix some typos

* Change startup hook to run closer to Main

Instead of the host creating a delegate and inoking it,
ExecuteMainMethod in the runtime will call the System.Private.CoreLib
function that processes startup hooks. This allows the thread
apartment state to be set based on any attributes in the Main method,
before startup hooks execute.

* Clarify apartment threading state behavior

* Specify the property name passed to the runtime

* Add host startup hook and tests (#4465)

* Expose coreclr_create_delegate function

* Add startup hook call

Read DOTNET_STARTUP_HOOKS environment variable, and pass it along to a
private function in
System.Private.CoreLib (StartupHookProvider.ProcessStartupHookS).

* Add startup hook tests

* Add some more syntax checks

* Reorder the tests

* Add test for exclamation mark in assembly path

* Fix error behavior for missing methods

MissingMethodException is now only thrown when the Initialize method
doesn't exist at all. This also reorganizes the existing tests for the
invalid signature, and adds some cases.

* Change test behavior to not expect eager file exists check

* Expect full paths for startup hook.

Fixes this in existing tests, and adds a new check for the error
behavior when a relative path is passed.

* Add test for startup hook trace output

* Update tests to use new syntax (without type name)

* Pass STARTUP_HOOKS property key to coreclr_initialize

STARTUP_HOOKS is now passed to the clr during initialization, instead
of using create_delegate in the host. This will allow the clr to call
the startup hooks closer to the execution of the main method, allowing
the threading apartment state to be set based on the main method's
attributes.

* Allow non-public Initialize method

* Remove coreclr::create_delegate

* Add comments clarifying purpose of startup hook test projects

* Add test with ALC assembly resolve event

The startup hook will inject a dependency into the app.

* Fix the ALC startup hook testcase

- Don't set ReferenceOutputAssembly in the projectreference, because
  that results in it not being published.
- Dispose the new projects
- Don't throw exceptions from the resolver, since they are swallowed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.