Optimize BigInteger.Divide #96895

kzrnm · 2024-01-12T13:29:02Z

The current BigInteger.Divide, which is implemented in the "grammar-school" algorithm, runs in $O(n^2)$ where n is the number of digits. The implementation of the pull request is implemented in Burnikel and Ziegler's Fast Recursive Division algorithm. It runs in $O(n^{\log 3} + n \log n)$.

Burnikel-Ziegler Algorithm is used in other languages.


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 9.0.100-alpha.1.23615.4
  [Host]     : .NET 9.0.0 (9.0.23.61410), X64 RyuJIT AVX2
  Job-DGWAFF : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-ITISLD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Job	Toolchain	arguments	Mean	Error	StdDev	Ratio	RatioSD	Code Size	Gen0	Allocated	Alloc Ratio
Divide	Job-DGWAFF	\main\corerun.exe	1024,512 bits	298.2 ns	1.47 ns	1.38 ns	1.00	0.00	1,781 B	0.0143	184 B	1.00
Divide	Job-ITISLD	\pr\corerun.exe	1024,512 bits	312.6 ns	6.05 ns	8.68 ns	1.04	0.03	2,735 B	0.0143	184 B	1.00

Divide	Job-DGWAFF	\main\corerun.exe	262144,131072 bits	66,281,748.3 ns	1,185,267.18 ns	1,108,699.66 ns	1.00	0.00	5,584 B	-	32824 B	1.00
Divide	Job-ITISLD	\pr\corerun.exe	262144,131072 bits	3,749,637.7 ns	74,025.70 ns	69,243.68 ns	0.06	0.00	5,625 B	-	32816 B	1.00

Divide	Job-DGWAFF	\main\corerun.exe	65536,32768 bits	4,452,706.5 ns	81,142.27 ns	71,930.51 ns	1.00	0.00	5,385 B	-	8248 B	1.00
Divide	Job-ITISLD	\pr\corerun.exe	65536,32768 bits	452,050.8 ns	5,390.30 ns	5,042.09 ns	0.10	0.00	5,659 B	0.4883	8248 B	1.00

ghost · 2024-01-12T13:29:14Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

The current BigInteger.Divide, which is implemented in the "grammar-school" algorithm, runs in $O(n^2)$ where n is the number of digits. The implementation of the pull request is implemented in Burnikel and Ziegler's Fast Recursive Division algorithm. It runs in $O(n^{\log 3} + n \log n)$.

Burnikel-Ziegler Algorithm is used in other languages.

Author:	kzrnm
Assignees:	-
Labels:	`area-System.Numerics`, `community-contribution`
Milestone:	-

tannergooding · 2024-01-12T18:35:56Z

There's a few different BigInteger PRs open right now, this is on my list to get to but I want to try and get some of the other ones completed/merged first

lskyum · 2024-01-25T07:46:13Z

Will this optimized Divide also result in a faster GreatestCommonDivisor method?

I'm on the verge of rewriting a lot of C# code to Rust, only because of the BigInteger performance.
So if this PR helps (a lot) it would save me months of work :-)

danmoseley · 2024-01-25T16:13:27Z

@lskyum Just curious, what do you use big integer for?

lskyum · 2024-01-26T06:36:57Z

@danmoseley I'm using two BigInteger's to represent a "BigFraction", when performing various geometric calculations.
By using fractions it possible to guarentee 100% precision in contrast to, for example double's. The algorithms becomes much easier to reason about and it guarantees correctness. It is the same approach CGAL (https://www.cgal.org/) uses and it uses GMP for the integer/fraction calculations, which sadly is a lot faster than the C# equivalents.

kzrnm · 2024-01-26T13:36:53Z

@lskyum The division also affects a GCD method.


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3007/23H2/2023Update/SunValley3)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.101
  [Host]   : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2

Job=ShortRun  Toolchain=.NET 8.0  IterationCount=3  
LaunchCount=1  WarmupCount=3

Method	N	Mean	Error	StdDev	Ratio
Orig	10000	4.171 ms	0.7849 ms	0.0430 ms	1.00
New	10000	1.232 ms	0.1088 ms	0.0060 ms	0.30

Orig	100000	422.306 ms	225.2295 ms	12.3456 ms	1.00
New	100000	85.521 ms	9.6445 ms	0.5286 ms	0.20

    [GlobalSetup]
    public void Setup()
    {
        o1 = OrigBigInteger.Pow(3, N) * OrigBigInteger.Pow(5, N);
        o2 = OrigBigInteger.Pow(3, N) * OrigBigInteger.Pow(23, N);
        m1 = MyBigInteger.Pow(3, N) * MyBigInteger.Pow(5, N);
        m2 = MyBigInteger.Pow(3, N) * MyBigInteger.Pow(23, N);
    }

Environment: https://github.com/kzrnm/BigInteger/tree/main/Benchmark

Benchmark code

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Toolchains.CsProj;

using MyBigInteger = Kzrnm.Numerics.BigInteger;
using OrigBigInteger = System.Numerics.BigInteger;

public class BenchmarkConfig : ManualConfig
{
    static void Main(string[] args)
    {
#if DEBUG
        BenchmarkSwitcher.FromAssembly(typeof(BenchmarkConfig).Assembly).Run(args, new DebugInProcessConfig());
#else
        _ = BenchmarkRunner.Run(typeof(BenchmarkConfig).Assembly);
#endif
    }
    public BenchmarkConfig()
    {
        AddExporter(BenchmarkDotNet.Exporters.MarkdownExporter.GitHub);
        AddJob(Job.ShortRun.WithToolchain(CsProjCoreToolchain.NetCoreApp80));

        SummaryStyle = SummaryStyle.Default
        .WithRatioStyle(BenchmarkDotNet.Columns.RatioStyle.Value)
        ;
    }
}


[Config(typeof(BenchmarkConfig))]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByParams)]
public class BigIntegerGcd
{
    [Params(10000, 100000)]
    public int N;
    public OrigBigInteger o1, o2;
    public MyBigInteger m1, m2;

    [GlobalSetup]
    public void Setup()
    {
        o1 = OrigBigInteger.Pow(3, N) * OrigBigInteger.Pow(5, N);
        o2 = OrigBigInteger.Pow(3, N) * OrigBigInteger.Pow(23, N);
        m1 = MyBigInteger.Pow(3, N) * MyBigInteger.Pow(5, N);
        m2 = MyBigInteger.Pow(3, N) * MyBigInteger.Pow(23, N);
    }

    [Benchmark(Baseline = true)]
    public OrigBigInteger Orig() => OrigBigInteger.GreatestCommonDivisor(o1, o2);

    [Benchmark]
    public MyBigInteger New() => MyBigInteger.GreatestCommonDivisor(m1, m2);
}

lskyum · 2024-01-26T13:53:16Z

@kzrnm That looks really awesome! Thanks for making this possible.
I hope that will be available in .NET 9 ;-)

danmoseley · 2024-01-26T21:40:28Z

Do we have enough coverage in dotnet/performance to protect this change?

kzrnm · 2024-01-27T06:19:35Z

https://github.com/dotnet/performance/blob/8fe05289b450e96c3f2c646af979d20110c46db2/src/benchmarks/micro/libraries/System.Runtime.Numerics/Perf.BigInteger.cs#L69-L74

https://github.com/dotnet/performance/blob/8fe05289b450e96c3f2c646af979d20110c46db2/src/benchmarks/micro/libraries/System.Runtime.Numerics/Perf.BigInteger.cs#L87-L90

The 65536 bits correspond to a uint array of length 4096, which would cover this change.

tannergooding · 2025-02-04T16:36:23Z

src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.Utils.cs


                return ActualLength(bits.Slice(0, modulus.Length));
            }
            return bits.Length;
        }
+
+        [Conditional("DEBUG")]
+        public static void DummyForDebug(Span<uint> bits)


This needs a better name, like InitializeForDebug or InitializeAsCleanMemoryForDebug, as this is the constant used for "clean memory" in MSVC

Alternatively it shouldn't exist at all as we typically don't do such debug vs release differences today. Any such support should rather be centralized across the .NET libraries or done automatically by the runtime.

src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.DivRem.cs

tannergooding · 2025-02-04T16:48:34Z

src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.DivRem.cs

+                    Debug.Assert((uint)sigmaSmall <= 32);
+                    int carryShift = 32 - sigmaSmall;
+                    uint carry = 0;
+                    for (int i = rt.Length - 1; i >= 0; i--)
+                    {
+                        remainder[i] = rt[i] >> sigmaSmall | carry;
+                        carry = rt[i] << carryShift;
+                    }
+                    Debug.Assert(carry == 0);


nit: the code here and other places is very squashed together. This makes it harder to read

We should allow for inserting newlines to help break apart the considerations and help improve the legibility. This applies to several cases in this PR, but this one I found personally troublesome to parse. A better formatting that helps me is:

Debug.Assert((uint)sigmaSmall <= 32); int carryShift = 32 - sigmaSmall; uint carry = 0; for (int i = rt.Length - 1; i >= 0; i--) { remainder[i] = rt[i] >> sigmaSmall | carry; carry = rt[i] << carryShift; } Debug.Assert(carry == 0);

Potentially inserting explanatory comments for why things like the asserts are expected to hold true, as that helps future readers who are not as familiar with the algorithm or its details

tannergooding · 2025-02-04T16:51:51Z

Code looks correct to me, but there are some necessary formatting changes (required by the coding-style guidelines) before this can be merged.

I've also suggested a couple renames of helper methods (required) and opportunities to improve readability of the code (not required, but appreciated).

Once those are resolved, this can get merged. Sorry again for the delay in getting it reviewed, it ended up being too late in the cycle for .NET 9 and work in other areas kept deprioritizing the ability to get the review finished.

kzrnm · 2025-02-04T17:15:00Z

@tannergooding Thank you for your review! I've made formatting changes.

* main: JIT: Set PGO data inconsistent when flow disappears in cast expansion (dotnet#112147) [H/3] Fix handling H3_NO_ERROR (dotnet#112125) Change some workflows using `pull_request` to use `pull_request_target` instead (dotnet#112161) Annotate ConfiguredCancelableAsyncEnumerable T with allows ref struct and update extensions (dotnet#111953) Delete copy of performance pipelines in previous location (dotnet#112113) Optimize BigInteger.Divide (dotnet#96895) Use current STJ in HostModel and remove unnecessary audit suppressions (dotnet#109852) JIT: Unify handling of InstParam argument during inlining (dotnet#112119) Remove unneeded DiagnosticSource content (dotnet#112116) Improve compare-and-branch sequences produced by Emitter (dotnet#111797) Jit: Conditional Escape Analysis and Cloning (dotnet#111473) Re-enable HKDF-SHA3 on Azure Linux Remove fstream usage from corehost (dotnet#111859)

dotnet-issue-labeler bot added the area-System.Numerics label Jan 12, 2024

ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 12, 2024

kzrnm force-pushed the BigIntegerDivide branch from 27070b7 to dd63906 Compare January 12, 2024 13:29

build-analysis bot mentioned this pull request Jan 12, 2024

[browser][MT] various WebWorkerTest CI failures #96628

Closed

kzrnm force-pushed the BigIntegerDivide branch from dd63906 to 5079b54 Compare January 12, 2024 18:37

kzrnm added 8 commits January 18, 2024 10:30

Add divide tests

919f4e6

Divide by Burnikel-Ziegler algorithm

e5948e9

Copy from UInt128

e84cbb6

fix

d270ae8

Test

8d5040b

Assert(CompareActual())

cfcd62e

OuterLoop

7d64f49

RunWithFakeThreshold

c72c0b6

kzrnm force-pushed the BigIntegerDivide branch from 1ac62fc to c72c0b6 Compare January 18, 2024 17:31

build-analysis bot mentioned this pull request Jan 18, 2024

CI error on WasmTestOnBrowser-System.Numerics.Tensors.Net8.Tests #97124

Open

Use ActualLength

48c2f75

build-analysis bot mentioned this pull request Jan 22, 2024

Tests crashing in CI with no dump: exit code 137 means SIGKILL Killed #97049

Closed

Clear

29f2fb1

Merge branch 'main' into BigIntegerDivide

4f087e3

This was referenced Jan 29, 2024

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

[wasi] System.Globalization.Tests AOT build fails with LLVM ERROR: out of memory on Windows #95365

Closed

tannergooding added 2 commits February 3, 2025 07:16

Merge remote-tracking branch 'dotnet/main' into BigIntegerDivide

c71a482

Resolve merge conflicts

801462a

build-analysis bot mentioned this pull request Feb 3, 2025

[WASI] Sockets - unknown handle index #108726

Open