Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize BigInteger.Divide #96895

Merged
merged 31 commits into from
Feb 4, 2025
Merged

Optimize BigInteger.Divide #96895

merged 31 commits into from
Feb 4, 2025

Conversation

kzrnm
Copy link
Contributor

@kzrnm kzrnm commented Jan 12, 2024

The current BigInteger.Divide, which is implemented in the "grammar-school" algorithm, runs in $O(n^2)$ where n is the number of digits. The implementation of the pull request is implemented in Burnikel and Ziegler's Fast Recursive Division algorithm. It runs in $O(n^{\log 3} + n \log n)$.

Burnikel-Ziegler Algorithm is used in other languages.



BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-alpha.1.23615.4
  [Host]     : .NET 9.0.0 (9.0.23.61410), X64 RyuJIT AVX2
  Job-DGWAFF : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-ITISLD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2


Method Job Toolchain arguments Mean Error StdDev Ratio RatioSD Code Size Gen0 Allocated Alloc Ratio
Divide Job-DGWAFF \main\corerun.exe 1024,512 bits 298.2 ns 1.47 ns 1.38 ns 1.00 0.00 1,781 B 0.0143 184 B 1.00
Divide Job-ITISLD \pr\corerun.exe 1024,512 bits 312.6 ns 6.05 ns 8.68 ns 1.04 0.03 2,735 B 0.0143 184 B 1.00
Divide Job-DGWAFF \main\corerun.exe 262144,131072 bits 66,281,748.3 ns 1,185,267.18 ns 1,108,699.66 ns 1.00 0.00 5,584 B - 32824 B 1.00
Divide Job-ITISLD \pr\corerun.exe 262144,131072 bits 3,749,637.7 ns 74,025.70 ns 69,243.68 ns 0.06 0.00 5,625 B - 32816 B 1.00
Divide Job-DGWAFF \main\corerun.exe 65536,32768 bits 4,452,706.5 ns 81,142.27 ns 71,930.51 ns 1.00 0.00 5,385 B - 8248 B 1.00
Divide Job-ITISLD \pr\corerun.exe 65536,32768 bits 452,050.8 ns 5,390.30 ns 5,042.09 ns 0.10 0.00 5,659 B 0.4883 8248 B 1.00

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 12, 2024
@ghost
Copy link

ghost commented Jan 12, 2024

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

The current BigInteger.Divide, which is implemented in the "grammar-school" algorithm, runs in $O(n^2)$ where n is the number of digits. The implementation of the pull request is implemented in Burnikel and Ziegler's Fast Recursive Division algorithm. It runs in $O(n^{\log 3} + n \log n)$.

Burnikel-Ziegler Algorithm is used in other languages.

Author: kzrnm
Assignees: -
Labels:

area-System.Numerics, community-contribution

Milestone: -

@tannergooding
Copy link
Member

There's a few different BigInteger PRs open right now, this is on my list to get to but I want to try and get some of the other ones completed/merged first

@lskyum
Copy link

lskyum commented Jan 25, 2024

Will this optimized Divide also result in a faster GreatestCommonDivisor method?

I'm on the verge of rewriting a lot of C# code to Rust, only because of the BigInteger performance.
So if this PR helps (a lot) it would save me months of work :-)

@danmoseley
Copy link
Member

@lskyum Just curious, what do you use big integer for?

@lskyum
Copy link

lskyum commented Jan 26, 2024

@danmoseley I'm using two BigInteger's to represent a "BigFraction", when performing various geometric calculations.
By using fractions it possible to guarentee 100% precision in contrast to, for example double's. The algorithms becomes much easier to reason about and it guarantees correctness. It is the same approach CGAL (https://www.cgal.org/) uses and it uses GMP for the integer/fraction calculations, which sadly is a lot faster than the C# equivalents.

@kzrnm
Copy link
Contributor Author

kzrnm commented Jan 26, 2024

@lskyum The division also affects a GCD method.


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3007/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.101
  [Host]   : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2

Job=ShortRun  Toolchain=.NET 8.0  IterationCount=3  
LaunchCount=1  WarmupCount=3  

Method N Mean Error StdDev Ratio
Orig 10000 4.171 ms 0.7849 ms 0.0430 ms 1.00
New 10000 1.232 ms 0.1088 ms 0.0060 ms 0.30
Orig 100000 422.306 ms 225.2295 ms 12.3456 ms 1.00
New 100000 85.521 ms 9.6445 ms 0.5286 ms 0.20
    [GlobalSetup]
    public void Setup()
    {
        o1 = OrigBigInteger.Pow(3, N) * OrigBigInteger.Pow(5, N);
        o2 = OrigBigInteger.Pow(3, N) * OrigBigInteger.Pow(23, N);
        m1 = MyBigInteger.Pow(3, N) * MyBigInteger.Pow(5, N);
        m2 = MyBigInteger.Pow(3, N) * MyBigInteger.Pow(23, N);
    }

Environment: https://github.com/kzrnm/BigInteger/tree/main/Benchmark

Benchmark code

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Toolchains.CsProj;

using MyBigInteger = Kzrnm.Numerics.BigInteger;
using OrigBigInteger = System.Numerics.BigInteger;

public class BenchmarkConfig : ManualConfig
{
    static void Main(string[] args)
    {
#if DEBUG
        BenchmarkSwitcher.FromAssembly(typeof(BenchmarkConfig).Assembly).Run(args, new DebugInProcessConfig());
#else
        _ = BenchmarkRunner.Run(typeof(BenchmarkConfig).Assembly);
#endif
    }
    public BenchmarkConfig()
    {
        AddExporter(BenchmarkDotNet.Exporters.MarkdownExporter.GitHub);
        AddJob(Job.ShortRun.WithToolchain(CsProjCoreToolchain.NetCoreApp80));

        SummaryStyle = SummaryStyle.Default
        .WithRatioStyle(BenchmarkDotNet.Columns.RatioStyle.Value)
        ;
    }
}


[Config(typeof(BenchmarkConfig))]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByParams)]
public class BigIntegerGcd
{
    [Params(10000, 100000)]
    public int N;
    public OrigBigInteger o1, o2;
    public MyBigInteger m1, m2;

    [GlobalSetup]
    public void Setup()
    {
        o1 = OrigBigInteger.Pow(3, N) * OrigBigInteger.Pow(5, N);
        o2 = OrigBigInteger.Pow(3, N) * OrigBigInteger.Pow(23, N);
        m1 = MyBigInteger.Pow(3, N) * MyBigInteger.Pow(5, N);
        m2 = MyBigInteger.Pow(3, N) * MyBigInteger.Pow(23, N);
    }

    [Benchmark(Baseline = true)]
    public OrigBigInteger Orig() => OrigBigInteger.GreatestCommonDivisor(o1, o2);

    [Benchmark]
    public MyBigInteger New() => MyBigInteger.GreatestCommonDivisor(m1, m2);
}

@lskyum
Copy link

lskyum commented Jan 26, 2024

@kzrnm That looks really awesome! Thanks for making this possible.
I hope that will be available in .NET 9 ;-)

@danmoseley
Copy link
Member

danmoseley commented Jan 26, 2024

Do we have enough coverage in dotnet/performance to protect this change?


return ActualLength(bits.Slice(0, modulus.Length));
}
return bits.Length;
}

[Conditional("DEBUG")]
public static void DummyForDebug(Span<uint> bits)
Copy link
Member

@tannergooding tannergooding Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a better name, like InitializeForDebug or InitializeAsCleanMemoryForDebug, as this is the constant used for "clean memory" in MSVC

Alternatively it shouldn't exist at all as we typically don't do such debug vs release differences today. Any such support should rather be centralized across the .NET libraries or done automatically by the runtime.

Comment on lines 482 to 490
Debug.Assert((uint)sigmaSmall <= 32);
int carryShift = 32 - sigmaSmall;
uint carry = 0;
for (int i = rt.Length - 1; i >= 0; i--)
{
remainder[i] = rt[i] >> sigmaSmall | carry;
carry = rt[i] << carryShift;
}
Debug.Assert(carry == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the code here and other places is very squashed together. This makes it harder to read

We should allow for inserting newlines to help break apart the considerations and help improve the legibility. This applies to several cases in this PR, but this one I found personally troublesome to parse. A better formatting that helps me is:

                    Debug.Assert((uint)sigmaSmall <= 32);
                    
                    int carryShift = 32 - sigmaSmall;
                    uint carry = 0;
                    
                    for (int i = rt.Length - 1; i >= 0; i--)
                    {
                        remainder[i] = rt[i] >> sigmaSmall | carry;
                        carry = rt[i] << carryShift;
                    }
                    
                    Debug.Assert(carry == 0);

Potentially inserting explanatory comments for why things like the asserts are expected to hold true, as that helps future readers who are not as familiar with the algorithm or its details

@tannergooding
Copy link
Member

Code looks correct to me, but there are some necessary formatting changes (required by the coding-style guidelines) before this can be merged.

I've also suggested a couple renames of helper methods (required) and opportunities to improve readability of the code (not required, but appreciated).

Once those are resolved, this can get merged. Sorry again for the delay in getting it reviewed, it ended up being too late in the cycle for .NET 9 and work in other areas kept deprioritizing the ability to get the review finished.

@kzrnm
Copy link
Contributor Author

kzrnm commented Feb 4, 2025

@tannergooding Thank you for your review! I've made formatting changes.

@tannergooding tannergooding merged commit 37b1764 into dotnet:main Feb 4, 2025
86 checks passed
@kzrnm kzrnm deleted the BigIntegerDivide branch February 5, 2025 01:28
grendello added a commit to grendello/runtime that referenced this pull request Feb 5, 2025
* main:
  JIT: Set PGO data inconsistent when flow disappears in cast expansion (dotnet#112147)
  [H/3] Fix handling H3_NO_ERROR (dotnet#112125)
  Change some workflows using `pull_request` to use `pull_request_target` instead (dotnet#112161)
  Annotate ConfiguredCancelableAsyncEnumerable T with allows ref struct and update extensions (dotnet#111953)
  Delete copy of performance pipelines in previous location (dotnet#112113)
  Optimize BigInteger.Divide (dotnet#96895)
  Use current STJ in HostModel and remove unnecessary audit suppressions (dotnet#109852)
  JIT: Unify handling of InstParam argument during inlining (dotnet#112119)
  Remove unneeded DiagnosticSource content (dotnet#112116)
  Improve compare-and-branch sequences produced by Emitter (dotnet#111797)
  Jit: Conditional Escape Analysis and Cloning (dotnet#111473)
  Re-enable HKDF-SHA3 on Azure Linux
  Remove fstream usage from corehost (dotnet#111859)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Numerics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants