Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multicore JIT in the compilers #23173

Merged
merged 1 commit into from
Dec 8, 2017
Merged

Conversation

agocke
Copy link
Member

@agocke agocke commented Nov 15, 2017

Here is the output from my perf testing tool currently in review that
shows a significant difference in csc.exe after the change:

// * Detailed results *
PlaceholderBenchmarkRunner.PlaceholderMethod: Job-TGYVCW(Toolchain=Perf.ExternalProcessToolchain, LaunchCount=0, RunStrategy=Monitoring, TargetCount=25, WarmupCount=0) [Commit=HEAD^]
Runtime = ; GC =
Mean = 4.8018 s, StdErr = 0.0215 s (0.45%); N = 25, StdDev = 0.1073 s
Min = 4.6717 s, Q1 = 4.7297 s, Median = 4.7892 s, Q3 = 4.8438 s, Max = 5.1998 s
IQR = 0.1141 s, LowerFence = 4.5585 s, UpperFence = 5.0149 s
ConfidenceInterval = [4.7214 s; 4.8822 s] (CI 99.9%), Margin = 0.0804 s (1.67% of Mean)
Skewness = 1.87, Kurtosis = 7.98

PlaceholderBenchmarkRunner.PlaceholderMethod: Job-TGYVCW(Toolchain=Perf.ExternalProcessToolchain, LaunchCount=0, RunStrategy=Monitoring, TargetCount=25, WarmupCount=0) [Commit=HEAD]
Runtime = ; GC =
Mean = 3.6409 s, StdErr = 0.0650 s (1.78%); N = 25, StdDev = 0.3249 s
Min = 3.4381 s, Q1 = 3.5510 s, Median = 3.5781 s, Q3 = 3.6189 s, Max = 5.1720 s
IQR = 0.0679 s, LowerFence = 3.4491 s, UpperFence = 3.7209 s
ConfidenceInterval = [3.3975 s; 3.8842 s] (CI 99.9%), Margin = 0.2433 s (6.68% of Mean)
Skewness = 4.16, Kurtosis = 19.76

Total time: 00:07:11 (431.13 sec)

// * Summary *

BenchmarkDotNet=v0.10.9, OS=Windows 10 Redstone 1 (10.0.14393)
Processor=Intel Xeon CPU E5645 2.40GHz, ProcessorCount=24
Frequency=2337892 Hz, Resolution=427.7358 ns, Timer=TSC
.NET Core SDK=2.1.1-preview-007094
  [Host] : .NET Core 2.0.0 (Framework 4.6.00001.0), 64bit RyuJIT

Toolchain=Perf.ExternalProcessToolchain  LaunchCount=0  RunStrategy=Monitoring
TargetCount=25  WarmupCount=0

            Method | Commit |    Mean |    Error |   StdDev | Rank |
------------------ |------- |--------:|---------:|---------:|-----:|
 PlaceholderMethod |   HEAD | 3.641 s | 0.2433 s | 0.3249 s |    1 |
 PlaceholderMethod |  HEAD^ | 4.802 s | 0.0804 s | 0.1073 s |    2 |

// * Legends *
  Commit : Value of the 'Commit' parameter
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements
  Rank   : Relative position of current benchmark mean among all benchmarks (Arabic style)
  1 s    : 1 Second (1 sec)

@agocke agocke requested review from jaredpar and a team November 15, 2017 01:00
@Pilchie
Copy link
Member

Pilchie commented Nov 15, 2017

But are both of those runs are with compilers that are neither ngen'd or crossgen'd

@agocke
Copy link
Member Author

agocke commented Nov 15, 2017

@Pilchie that is true. Do you think a full matrix is necessary?

@Pilchie
Copy link
Member

Pilchie commented Nov 15, 2017

To be clear, I'm not against the change, I just don't think we will see nearly as much of an effect in practice.

@agocke
Copy link
Member Author

agocke commented Nov 15, 2017

@Pilchie Somewhat true. Notably, if we override the compiler via the Microsoft.NETCore.Compilers NuGet package, those binaries never get crossgened, so this is pretty representative of that scenario (without the xplat compiler server).

@Pilchie
Copy link
Member

Pilchie commented Nov 15, 2017

Fair point. Like I said, I'm not against this change :) There is also some support for Tiered JIT in .NET Core that we should evaluate for similar scenarios. I think it's still experimental, but it might be worth the experiment.

@agocke
Copy link
Member Author

agocke commented Nov 15, 2017

test windows_release_unit32_prtest please

@agocke
Copy link
Member Author

agocke commented Nov 15, 2017

I've looked into getting crossgen running -- it's quite a bit of work. I'd recommend not blocking this PR. I'm going to open an issue on CLI to get better support for this.

@Pilchie
Copy link
Member

Pilchie commented Nov 16, 2017

Fine with me :)

@agocke
Copy link
Member Author

agocke commented Nov 16, 2017

ping @dotnet/roslyn-compiler for review

@jasonmalinowski jasonmalinowski changed the base branch from post-dev15.5-contrib to master November 17, 2017 17:46
@@ -3,6 +3,8 @@
using Microsoft.CodeAnalysis;
using System;
using System.IO;
using System.Reflection;
using System.Runtime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only change is addition of usings. Consider reverting changes to this file, if the usings are not required.

out bool hasShared,
out string keepAlive,
out string sessionKey,
out string errorMessage))
Copy link
Contributor

@AlekseyTs AlekseyTs Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out string errorMessage)) [](start = 20, length = 25)

out string errorMessage)) [](start = 20, length = 25)

IMHO, changes like that do not make code more readable. I would say quite the opposite. #Closed

{
textWriter.WriteLine(errorMessage);
return RunCompilationResult.Failed;
}

if (!TryEnableMulticoreJitting(out errorMessage))
{
textWriter.WriteLine(errorMessage);
Copy link
Contributor

@AlekseyTs AlekseyTs Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

textWriter [](start = 16, length = 10)

textWriter [](start = 16, length = 10)

It looks like the textWriter can be null. Why is it safe to call WriteLine of off it? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this textWriter can ever be null. We check for null at the top of the file and, if it is, we set it to Console.Out, which can never be null.

@@ -86,6 +89,37 @@ internal RunCompilationResult RunCompilation(IEnumerable<string> originalArgumen
return new RunCompilationResult(exitCode);
}

private static bool TryEnableMulticoreJitting(out string errorMessage)
Copy link
Contributor

@AlekseyTs AlekseyTs Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out string errorMessage [](start = 54, length = 23)

Consider passing the TextWriter and writing to it instead. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to centralize access to I/O as much as possible. I'm not even that pleased at directly accessing a TextWriter in this method, but due to the multitargeting and reference restrictions there isn't really a suitable abstraction I could put this in.

// Enable multi-core JITing
// https://blogs.msdn.microsoft.com/dotnet/2012/10/18/an-easy-solution-for-improving-app-launch-performance/
var profileRoot = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
Copy link
Contributor

@AlekseyTs AlekseyTs Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalApplicationData [](start = 72, length = 20)

Is this a per user folder? Is there an alternative that can be shared across all users? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a per-user folder. Shared temporary storage isn't usually accessible for non-admin users. I could see this being a security risk, so I think we should be conservative for now.

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Nov 17, 2017

Done with review pass (iteration 1). #Closed

Here is the output from my perf testing tool currently in review that
shows a significant difference in csc.exe after the change:

```
// * Detailed results *
PlaceholderBenchmarkRunner.PlaceholderMethod: Job-TGYVCW(Toolchain=Perf.ExternalProcessToolchain, LaunchCount=0, RunStrategy=Monitoring, TargetCount=25, WarmupCount=0) [Commit=HEAD^]
Runtime = ; GC =
Mean = 4.8018 s, StdErr = 0.0215 s (0.45%); N = 25, StdDev = 0.1073 s
Min = 4.6717 s, Q1 = 4.7297 s, Median = 4.7892 s, Q3 = 4.8438 s, Max = 5.1998 s
IQR = 0.1141 s, LowerFence = 4.5585 s, UpperFence = 5.0149 s
ConfidenceInterval = [4.7214 s; 4.8822 s] (CI 99.9%), Margin = 0.0804 s (1.67% of Mean)
Skewness = 1.87, Kurtosis = 7.98

PlaceholderBenchmarkRunner.PlaceholderMethod: Job-TGYVCW(Toolchain=Perf.ExternalProcessToolchain, LaunchCount=0, RunStrategy=Monitoring, TargetCount=25, WarmupCount=0) [Commit=HEAD]
Runtime = ; GC =
Mean = 3.6409 s, StdErr = 0.0650 s (1.78%); N = 25, StdDev = 0.3249 s
Min = 3.4381 s, Q1 = 3.5510 s, Median = 3.5781 s, Q3 = 3.6189 s, Max = 5.1720 s
IQR = 0.0679 s, LowerFence = 3.4491 s, UpperFence = 3.7209 s
ConfidenceInterval = [3.3975 s; 3.8842 s] (CI 99.9%), Margin = 0.2433 s (6.68% of Mean)
Skewness = 4.16, Kurtosis = 19.76

Total time: 00:07:11 (431.13 sec)

// * Summary *

BenchmarkDotNet=v0.10.9, OS=Windows 10 Redstone 1 (10.0.14393)
Processor=Intel Xeon CPU E5645 2.40GHz, ProcessorCount=24
Frequency=2337892 Hz, Resolution=427.7358 ns, Timer=TSC
.NET Core SDK=2.1.1-preview-007094
  [Host] : .NET Core 2.0.0 (Framework 4.6.00001.0), 64bit RyuJIT

Toolchain=Perf.ExternalProcessToolchain  LaunchCount=0  RunStrategy=Monitoring
TargetCount=25  WarmupCount=0

            Method | Commit |    Mean |    Error |   StdDev | Rank |
------------------ |------- |--------:|---------:|---------:|-----:|
 PlaceholderMethod |   HEAD | 3.641 s | 0.2433 s | 0.3249 s |    1 |
 PlaceholderMethod |  HEAD^ | 4.802 s | 0.0804 s | 0.1073 s |    2 |

// * Legends *
  Commit : Value of the 'Commit' parameter
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements
  Rank   : Relative position of current benchmark mean among all benchmarks (Arabic style)
  1 s    : 1 Second (1 sec)

```
Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agocke
Copy link
Member Author

agocke commented Nov 29, 2017

test this please

@agocke
Copy link
Member Author

agocke commented Nov 29, 2017

@jaredpar For ask-mode approval

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agocke
Copy link
Member Author

agocke commented Nov 30, 2017

@MeiChin-Tsai for ask-mode approval

@MeiChin-Tsai
Copy link

Andy, multi-core jit is also guided with training data, correct? if so, how do we intend to keep these IBC data up to date?

Otherwise, approved. However I would consider to train with property NGEN configuration so we carry only necessary IBC data.

@agocke
Copy link
Member Author

agocke commented Nov 30, 2017

@MeiChin-Tsai I'm not sure what you're asking. To my knowledge, we regularly produce IBC data for the desktop compiler via OptProf runs before inserting into Visual Studio. Is that sufficient to keep IBC data up-to-date?

@MeiChin-Tsai
Copy link

@agocke - Let me follow up with Brian Robin off line. I believe that multi-core JIT also uses IBC data to predetermine what gets JIT in the multi-core while user program is running. IBC data is also driving the partial NGEN. When you combine both technology, you will hope that most of JIT time for startup is saved by NGEN. multi-core JIT is trying to cover what is not being NGEN ahead of time. So naively I will assume that they should be different set of IBC data.

@MeiChin-Tsai
Copy link

@agocke - multi-core jit does not use IBC data to guide but use previous run to guide the background jitting. So you are fine here. thanks.

@agocke agocke merged commit eed147d into dotnet:master Dec 8, 2017
@agocke agocke deleted the multicore-jit branch December 8, 2017 19:04
333fred added a commit to 333fred/roslyn that referenced this pull request Dec 9, 2017
* dotnet/master: (39 commits)
  Prefer by-val methods over in methods in overload resolution (dotnet#23122)
  Update build to account for additional Mono.Cecil assemblies
  Fix build break
  Enable multicore JIT in the compilers (dotnet#23173)
  Separate debugging workspace service and EnC service (dotnet#23630)
  UseInferredMemberName: use one code style option and share more code (dotnet#23506)
  Add negative test cases
  Remove unnecessary Mono.Cecil reference
  Updated formatting of decompiler license notice
  Use sentence case for the decompiler legal notice
  Use ConcurrentDictionary to avoid locks in IsGeneratedCode and HasHiddenRegions
  Recognize condition with logical negation
  Refactor so that GetSymbolInfo can be called last
  Change the code fix to find the expression so that we don't need to unwrap the argument
  Add check for null argument syntax
  Fix argument names
  Formatting adjustments
  Add VB tests for omitted arguments
  Add WorkItem attributes
  Pool the Stopwatch instance used in the analyzer driver
  ...
@jcouv jcouv added this to the 15.6 milestone Jan 9, 2018
jaredpar added a commit that referenced this pull request Feb 20, 2018
Revert "Enable multicore JIT in the compilers (#23173)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants