Skip to content

Rework u128 #885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mborland opened this issue Feb 28, 2025 · 27 comments · May be fixed by #887
Open

Rework u128 #885

mborland opened this issue Feb 28, 2025 · 27 comments · May be fixed by #887
Assignees
Labels
enhancement New feature or request optimization

Comments

@mborland
Copy link
Member

Right now it has macros to define operators and is entirely missing a bunch of comparison operators. This can also be the time to explore inserting intrinsics into say the mul operator

@mborland mborland added enhancement New feature or request optimization labels Feb 28, 2025
@mborland mborland self-assigned this Feb 28, 2025
@mborland
Copy link
Member Author

mborland commented Mar 5, 2025

Some results on ARM64 where old is our current emulated128 and new is the one I'm building now:

---------------------------
Two Word Operations
---------------------------

comparisons<builtin    >: 141153     us (s=299999985)
comparisons<old        >: 166175     us (s=299999985)
comparisons<new        >: 128813     us (s=299999985)

add<Builtin    >: 20494      us (s=6185515908288643546)
add<Old        >: 19391      us (s=6185515908288643546)
add<New        >: 18791      us (s=6185515908288643546)

sub<Builtin    >: 20559      us (s=6843120887274356308)
sub<Old        >: 19733      us (s=6843120887274356308)
sub<New        >: 18890      us (s=6843120887274356308)

mul<Builtin    >: 20399      us (s=4541227745721732167)
mul<Old        >: 19099      us (s=4541227745721732167)
mul<New        >: 18995      us (s=4541227745721732167)

---------------------------
One Word Operations
---------------------------

comparisons<builtin    >: 141399     us (s=299999985)
comparisons<old        >: 162425     us (s=299999985)
comparisons<new        >: 126388     us (s=299999985)

add<Builtin    >: 24094      us (s=7061247353260042742)
add<Old        >: 19478      us (s=7061247353260042742)
add<New        >: 18831      us (s=7061247353260042742)

sub<Builtin    >: 20488      us (s=18357599163646591540)
sub<Old        >: 19613      us (s=18357599163646591540)
sub<New        >: 18680      us (s=18357599163646591540)

mul<Builtin    >: 20308      us (s=1771533121646374909)
mul<Old        >: 19077      us (s=1771533121646374909)
mul<New        >: 19075      us (s=1771533121646374909)

@mborland
Copy link
Member Author

mborland commented Mar 5, 2025

Here are x64 results:

---------------------------
Two Word Operations
---------------------------

comparisons<builtin    >: 149900     us (s=299999985)
comparisons<old        >: 135456     us (s=299999985)
comparisons<new        >: 73474      us (s=299999985)

add<Builtin    >: 33581      us (s=6185515908288643546)
add<Old        >: 33157      us (s=6185515908288643546)
add<New        >: 32833      us (s=6185515908288643546)

sub<Builtin    >: 34504      us (s=6843120887274356308)
sub<Old        >: 56222      us (s=6843120887274356308)
sub<New        >: 37566      us (s=6843120887274356308)

mul<Builtin    >: 32725      us (s=4541227745721732167)
mul<Old        >: 58778      us (s=4541227745721732167)
mul<New        >: 58626      us (s=4541227745721732167)

---------------------------
One Word Operations
---------------------------

comparisons<builtin    >: 154713     us (s=299999985)
comparisons<old        >: 87551      us (s=299999985)
comparisons<new        >: 119382     us (s=299999985)

add<Builtin    >: 33533      us (s=7061247353260042742)
add<Old        >: 32923      us (s=7061247353260042742)
add<New        >: 32776      us (s=7061247353260042742)

sub<Builtin    >: 33148      us (s=18357599163646591540)
sub<Old        >: 57439      us (s=18357599163646591540)
sub<New        >: 37660      us (s=18357599163646591540)

mul<Builtin    >: 33921      us (s=1771533121646374909)
mul<Old        >: 59083      us (s=1771533121646374909)
mul<New        >: 58890      us (s=1771533121646374909)

@mborland
Copy link
Member Author

mborland commented Mar 6, 2025

Here's where we sit with Windows (ARM64). I'm happy with the big speedup in comps and mul since those are the main things we use:

---------------------------
Two Word Operations
---------------------------

comparisons<old        >: 186732     us (s=299999985)
comparisons<new        >: 137874     us (s=299999985)

add<Old        >: 34412      us (s=6185515908288643546)
add<New        >: 33252      us (s=6185515908288643546)

sub<Old        >: 34090      us (s=6843120887274356308)
sub<New        >: 32985      us (s=6843120887274356308)

mul<Old        >: 64059      us (s=4541227745721732167)
mul<New        >: 56197      us (s=4541227745721732167)

---------------------------
One Word Operations
---------------------------

comparisons<old        >: 154393     us (s=299999985)
comparisons<new        >: 136043     us (s=299999985)

add<Old        >: 33103      us (s=7061247353260042742)
add<New        >: 32469      us (s=7061247353260042742)

sub<Old        >: 34273      us (s=18357599163646591540)
sub<New        >: 34619      us (s=18357599163646591540)

mul<Old        >: 56143      us (s=1771533121646374909)
mul<New        >: 56426      us (s=1771533121646374909)

@ckormanyos
Copy link
Collaborator

happy with the big speedup in comps and mul

Me too. Great work, Matt (@mborland)! Thank you for charging ahead with this.

@mborland
Copy link
Member Author

mborland commented Mar 6, 2025

happy with the big speedup in comps and mul

Me too. Great work, Matt (@mborland)! Thank you for charging ahead with this.

Thanks. I'm going to poke at u256 afterwards. Writing ADX assembly didn't help at all here because it's only 2 adds with one carry, but maybe in the 256 bit case it'll help out? We'll see.

@mborland
Copy link
Member Author

mborland commented Mar 6, 2025

I've added additional benchmarks and division is also available now. Here is ARM64 (M4 Mac):

---------------------------
Two Word Operations
---------------------------

comp<builtin    >: 162032     us (s=299999985)
comp<old        >: 164867     us (s=299999985)
comp<new        >: 120265     us (s=299999985)

add <Builtin    >: 21432      us (s=7061247353260042742)
add <Old        >: 18793      us (s=7061247353260042742)
add <New        >: 18070      us (s=7061247353260042742)

sub <Builtin    >: 20066      us (s=18357599163646591540)
sub <Old        >: 18886      us (s=18357599163646591540)
sub <New        >: 18373      us (s=18357599163646591540)

mul <Builtin    >: 19881      us (s=1771533121646374909)
mul <Old        >: 18218      us (s=1771533121646374909)
mul <New        >: 18396      us (s=1771533121646374909)

div <Builtin    >: 313312     us (s=889346420)
div <Old        >: 985225     us (s=889346420)
div <New        >: 308621     us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<builtin    >: 135642     us (s=299999985)
comp<old        >: 157072     us (s=299999985)
comp<new        >: 118712     us (s=299999985)

add <Builtin    >: 20148      us (s=6185515908288643546)
add <Old        >: 19108      us (s=6185515908288643546)
add <New        >: 18169      us (s=6185515908288643546)

sub <Builtin    >: 20179      us (s=6843120887274356308)
sub <Old        >: 19114      us (s=6843120887274356308)
sub <New        >: 18451      us (s=6843120887274356308)

mul <Builtin    >: 20070      us (s=4541227745721732167)
mul <Old        >: 18691      us (s=4541227745721732167)
mul <New        >: 18519      us (s=4541227745721732167)

div <Builtin    >: 477020     us (s=842110335)
div <Old        >: 1099519    us (s=842110335)
div <New        >: 472018     us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<builtin    >: 134397     us (s=299999985)
comp<old        >: 157708     us (s=299999985)
comp<new        >: 120280     us (s=299999985)

add <Builtin    >: 19855      us (s=343956012596809285)
add <Old        >: 18766      us (s=343956012596809285)
add <New        >: 18214      us (s=343956012596809285)

sub <Builtin    >: 20099      us (s=14793232657927033331)
sub <Old        >: 18988      us (s=14793232657927033331)
sub <New        >: 18309      us (s=14793232657927033331)

mul <Builtin    >: 19729      us (s=14323223738778550995)
mul <Old        >: 18440      us (s=14323223738778550995)
mul <New        >: 18479      us (s=14323223738778550995)

div <Builtin    >: 585675     us (s=2824396256559164129)
div <Old        >: 1927253    us (s=2824396256559164129)
div <New        >: 584919     us (s=2824396256559164129)

---------------------------
One-Two Word Operations
---------------------------

comp<builtin    >: 136776     us (s=299999985)
comp<old        >: 157084     us (s=299999985)
comp<new        >: 118580     us (s=299999985)

add <Builtin    >: 20132      us (s=10615944226330202903)
add <Old        >: 19018      us (s=10615944226330202903)
add <New        >: 18461      us (s=10615944226330202903)

sub <Builtin    >: 20293      us (s=7057049957990609625)
sub <Old        >: 19231      us (s=7057049957990609625)
sub <New        >: 18502      us (s=7057049957990609625)

mul <Builtin    >: 19836      us (s=15103756929938757885)
mul <Old        >: 18539      us (s=15103756929938757885)
mul <New        >: 18572      us (s=15103756929938757885)

div <Builtin    >: 588030     us (s=1998080879450088739)
div <Old        >: 1942631    us (s=1998080879450088739)
div <New        >: 589220     us (s=1998080879450088739)

---------------------------
Random Width Operations
---------------------------

comp<builtin    >: 136650     us (s=299999985)
comp<old        >: 159143     us (s=299999985)
comp<new        >: 127717     us (s=299999985)

add <Builtin    >: 20486      us (s=905614408289710516)
add <Old        >: 19730      us (s=905614408289710516)
add <New        >: 18850      us (s=905614408289710516)

sub <Builtin    >: 20465      us (s=16550818054324285228)
sub <Old        >: 19248      us (s=16550818054324285228)
sub <New        >: 18634      us (s=16550818054324285228)

mul <Builtin    >: 20072      us (s=18365287940902824468)
mul <Old        >: 18640      us (s=18365287940902824468)
mul <New        >: 19072      us (s=18365287940902824468)

div <Builtin    >: 647462     us (s=12691594570327189327)
div <Old        >: 1769816    us (s=12691594570327189327)
div <New        >: 579125     us (s=12691594570327189327)

@mborland
Copy link
Member Author

mborland commented Mar 7, 2025

Here's the complete new x64. Looks like some heuristics can be applied to DIV to bring timing down:

---------------------------
Two Word Operations
---------------------------

comp<builtin    >: 146119     us (s=299999985)
comp<old        >: 85314      us (s=299999985)
comp<new        >: 112340     us (s=299999985)

add <Builtin    >: 32555      us (s=7061247353260042742)
add <Old        >: 32225      us (s=7061247353260042742)
add <New        >: 31332      us (s=7061247353260042742)

sub <Builtin    >: 31295      us (s=18357599163646591540)
sub <Old        >: 54086      us (s=18357599163646591540)
sub <New        >: 35400      us (s=18357599163646591540)

mul <Builtin    >: 32145      us (s=1771533121646374909)
mul <Old        >: 55385      us (s=1771533121646374909)
mul <New        >: 33999      us (s=1771533121646374909)

div <Builtin    >: 275466     us (s=889346420)
div <Old        >: 1488078    us (s=889346420)
div <New        >: 255974     us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<builtin    >: 140107     us (s=299999985)
comp<old        >: 129186     us (s=299999985)
comp<new        >: 67301      us (s=299999985)

add <Builtin    >: 30927      us (s=6185515908288643546)
add <Old        >: 30693      us (s=6185515908288643546)
add <New        >: 29692      us (s=6185515908288643546)

sub <Builtin    >: 30905      us (s=6843120887274356308)
sub <Old        >: 51305      us (s=6843120887274356308)
sub <New        >: 34946      us (s=6843120887274356308)

mul <Builtin    >: 31123      us (s=4541227745721732167)
mul <Old        >: 54385      us (s=4541227745721732167)
mul <New        >: 32975      us (s=4541227745721732167)

div <Builtin    >: 3272623    us (s=842110335)
div <Old        >: 1403884    us (s=842110335)
div <New        >: 3046486    us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<builtin    >: 135740     us (s=299999985)
comp<old        >: 122577     us (s=299999985)
comp<new        >: 65118      us (s=299999985)

add <Builtin    >: 29506      us (s=343956012596809285)
add <Old        >: 30174      us (s=343956012596809285)
add <New        >: 29271      us (s=343956012596809285)

sub <Builtin    >: 29713      us (s=14793232657927033331)
sub <Old        >: 49998      us (s=14793232657927033331)
sub <New        >: 34299      us (s=14793232657927033331)

mul <Builtin    >: 29580      us (s=14323223738778550995)
mul <Old        >: 52238      us (s=14323223738778550995)
mul <New        >: 31492      us (s=14323223738778550995)

div <Builtin    >: 3637171    us (s=2824396256559164129)
div <Old        >: 2330502    us (s=2824396256559164129)
div <New        >: 3607949    us (s=2824396256559164129)

---------------------------
One-Two Word Operations
---------------------------

comp<builtin    >: 135058     us (s=299999985)
comp<old        >: 122363     us (s=299999985)
comp<new        >: 64995      us (s=299999985)

add <Builtin    >: 29757      us (s=10615944226330202903)
add <Old        >: 29304      us (s=10615944226330202903)
add <New        >: 29067      us (s=10615944226330202903)

sub <Builtin    >: 29780      us (s=7057049957990609625)
sub <Old        >: 48974      us (s=7057049957990609625)
sub <New        >: 33914      us (s=7057049957990609625)

mul <Builtin    >: 29721      us (s=15103756929938757885)
mul <Old        >: 52064      us (s=15103756929938757885)
mul <New        >: 31357      us (s=15103756929938757885)

div <Builtin    >: 3623782    us (s=1998080879450088739)
div <Old        >: 2322563    us (s=1998080879450088739)
div <New        >: 3644116    us (s=1998080879450088739)

---------------------------
Random Width Operations
---------------------------

comp<builtin    >: 135089     us (s=299999985)
comp<old        >: 182899     us (s=299999985)
comp<new        >: 181299     us (s=299999985)

add <Builtin    >: 29387      us (s=3475573289369278391)
add <Old        >: 29083      us (s=3475573289369278391)
add <New        >: 29078      us (s=3475573289369278391)

sub <Builtin    >: 29475      us (s=11626342892911575103)
sub <Old        >: 49457      us (s=11626342892911575103)
sub <New        >: 34019      us (s=11626342892911575103)

mul <Builtin    >: 29483      us (s=1850423372688224498)
mul <Old        >: 53186      us (s=1850423372688224498)
mul <New        >: 31437      us (s=1850423372688224498)

div <Builtin    >: 2768382    us (s=17660450816698112606)
div <Old        >: 2139310    us (s=17660450816698112606)
div <New        >: 2735098    us (s=17660450816698112606)

@mborland
Copy link
Member Author

mborland commented Mar 7, 2025

Here is a look at MSVC x64. With MUL I tried using the MSVC intrinsics, but clearly that's not the way to go so I'll pull those out. I'm not sure what's up with the DIV result

Updated with generic mul:

---------------------------
Two Word Operations
---------------------------

comp<old        >: 376094     us (s=299999985)
comp<new        >: 367328     us (s=299999985)

add <Old        >: 84285      us (s=7061247353260042742)
add <New        >: 61871      us (s=7061247353260042742)

sub <Old        >: 87463      us (s=18357599163646591540)
sub <New        >: 52636      us (s=18357599163646591540)

mul <Old        >: 85130      us (s=1771533121646374909)
mul <New        >: 63792      us (s=1771533121646374909)

div <Old        >: 2377815    us (s=889346420)
div <New        >: 418839     us (s=0)

---------------------------
One Word Operations
---------------------------

comp<old        >: 204363     us (s=299999985)
comp<new        >: 197124     us (s=299999985)

add <Old        >: 78715      us (s=6185515908288643546)
add <New        >: 57128      us (s=6185515908288643546)

sub <Old        >: 77899      us (s=6843120887274356308)
sub <New        >: 54039      us (s=6843120887274356308)

mul <Old        >: 87235      us (s=4541227745721732167)
mul <New        >: 61332      us (s=4541227745721732167)

div <Old        >: 2526123    us (s=842110335)
div <New        >: 975136     us (s=0)

---------------------------
Two-One Word Operations
---------------------------

comp<old        >: 407026     us (s=299999985)
comp<new        >: 196189     us (s=299999985)

add <Old        >: 77896      us (s=343956012596809285)
add <New        >: 54463      us (s=343956012596809285)

sub <Old        >: 79604      us (s=14793232657927033331)
sub <New        >: 54076      us (s=14793232657927033331)

mul <Old        >: 87553      us (s=14323223738778550995)
mul <New        >: 64497      us (s=14323223738778550995)

div <Old        >: 3490812    us (s=2824396256559164129)
div <New        >: 598835     us (s=0)

---------------------------
One-Two Word Operations
---------------------------

comp<old        >: 325907     us (s=299999985)
comp<new        >: 203823     us (s=299999985)

add <Old        >: 85719      us (s=10615944226330202903)
add <New        >: 56350      us (s=10615944226330202903)

sub <Old        >: 79766      us (s=7057049957990609625)
sub <New        >: 53787      us (s=7057049957990609625)

mul <Old        >: 88789      us (s=15103756929938757885)
mul <New        >: 65172      us (s=15103756929938757885)

div <Old        >: 3484921    us (s=1998080879450088739)
div <New        >: 594093     us (s=0)

---------------------------
Random Width Operations
---------------------------

comp<old        >: 310876     us (s=299999985)
comp<new        >: 366935     us (s=299999985)

add <Old        >: 78047      us (s=3475573289369278391)
add <New        >: 55983      us (s=3475573289369278391)

sub <Old        >: 79852      us (s=11626342892911575103)
sub <New        >: 54450      us (s=11626342892911575103)

mul <Old        >: 88133      us (s=1850423372688224498)
mul <New        >: 64098      us (s=1850423372688224498)

div <Old        >: 3325449    us (s=17660450816698112606)
div <New        >: 1018627    us (s=0)

@mborland
Copy link
Member Author

All tests are passing on all platforms now so we can have better results. I think the main ones that we are worried about are speeding up comps and divs since we use big integer div for decimal64 and decimal128.

@mborland
Copy link
Member Author

mborland commented Mar 11, 2025

Windows MSVC 14.3:

ARM64:

---------------------------
Two Word Operations
---------------------------

comp<old        >: 173877     us (s=299999985)
comp<new        >: 148565     us (s=299999985)

add <Old        >: 35166      us (s=7061247353260042742)
add <New        >: 35107      us (s=7061247353260042742)

sub <Old        >: 35002      us (s=18357599163646591540)
sub <New        >: 34794      us (s=18357599163646591540)

mul <Old        >: 60528      us (s=1771533121646374909)
mul <New        >: 60501      us (s=1771533121646374909)

div <Old        >: 1316872    us (s=889346420)
div <New        >: 1119430    us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<old        >: 174043     us (s=299999985)
comp<new        >: 150594     us (s=299999985)

add <Old        >: 35883      us (s=6185515908288643546)
add <New        >: 34664      us (s=6185515908288643546)

sub <Old        >: 34824      us (s=6843120887274356308)
sub <New        >: 35409      us (s=6843120887274356308)

mul <Old        >: 60998      us (s=4541227745721732167)
mul <New        >: 60907      us (s=4541227745721732167)

div <Old        >: 1451122    us (s=842110335)
div <New        >: 1352091    us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<old        >: 185961     us (s=299999985)
comp<new        >: 159078     us (s=299999985)

add <Old        >: 37418      us (s=343956012596809285)
add <New        >: 38181      us (s=343956012596809285)

sub <Old        >: 36390      us (s=14793232657927033331)
sub <New        >: 35240      us (s=14793232657927033331)

mul <Old        >: 58794      us (s=14323223738778550995)
mul <New        >: 58914      us (s=14323223738778550995)

div <Old        >: 2145583    us (s=2824396256559164129)
div <New        >: 2001601    us (s=2824396256559164129)

---------------------------
One-Two Word Operations
---------------------------

comp<old        >: 170448     us (s=299999985)
comp<new        >: 145279     us (s=299999985)

add <Old        >: 33637      us (s=10615944226330202903)
add <New        >: 33724      us (s=10615944226330202903)

sub <Old        >: 33867      us (s=7057049957990609625)
sub <New        >: 33804      us (s=7057049957990609625)

mul <Old        >: 59492      us (s=15103756929938757885)
mul <New        >: 58832      us (s=15103756929938757885)

div <Old        >: 2159803    us (s=1998080879450088739)
div <New        >: 2175189    us (s=1998080879450088739)

---------------------------
Random Width Operations
---------------------------

comp<old        >: 330214     us (s=299999985)
comp<new        >: 156183     us (s=299999985)

add <Old        >: 39529      us (s=3475573289369278391)
add <New        >: 41987      us (s=3475573289369278391)

sub <Old        >: 37020      us (s=11626342892911575103)
sub <New        >: 37130      us (s=11626342892911575103)

mul <Old        >: 60967      us (s=1850423372688224498)
mul <New        >: 61969      us (s=1850423372688224498)

div <Old        >: 2227530    us (s=17660450816698112606)
div <New        >: 1925437    us (s=17660450816698112606)

X64:

---------------------------
Two Word Operations
---------------------------

comp<old        >: 647348     us (s=299999985)
comp<new        >: 558806     us (s=299999985)

add <Old        >: 107479     us (s=7061247353260042742)
add <New        >: 81873      us (s=7061247353260042742)

sub <Old        >: 87273      us (s=18357599163646591540)
sub <New        >: 82428      us (s=18357599163646591540)

mul <Old        >: 132217     us (s=1771533121646374909)
mul <New        >: 95896      us (s=1771533121646374909)

div <Old        >: 2889371    us (s=889346420)
div <New        >: 2317152    us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<old        >: 363595     us (s=299999985)
comp<new        >: 383242     us (s=299999985)

add <Old        >: 155546     us (s=6185515908288643546)
add <New        >: 107404     us (s=6185515908288643546)

sub <Old        >: 150381     us (s=6843120887274356308)
sub <New        >: 62178      us (s=6843120887274356308)

mul <Old        >: 123702     us (s=4541227745721732167)
mul <New        >: 87946      us (s=4541227745721732167)

div <Old        >: 3125916    us (s=842110335)
div <New        >: 3226387    us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<old        >: 421952     us (s=299999985)
comp<new        >: 338194     us (s=299999985)

add <Old        >: 89156      us (s=343956012596809285)
add <New        >: 64043      us (s=343956012596809285)

sub <Old        >: 88954      us (s=14793232657927033331)
sub <New        >: 64253      us (s=14793232657927033331)

mul <Old        >: 99294      us (s=14323223738778550995)
mul <New        >: 69488      us (s=14323223738778550995)

div <Old        >: 4293431    us (s=2824396256559164129)
div <New        >: 3782136    us (s=2824396256559164129)

---------------------------
One-Two Word Operations
---------------------------

comp<old        >: 397084     us (s=299999985)
comp<new        >: 264753     us (s=299999985)

add <Old        >: 93488      us (s=10615944226330202903)
add <New        >: 63848      us (s=10615944226330202903)

sub <Old        >: 93765      us (s=7057049957990609625)
sub <New        >: 63121      us (s=7057049957990609625)

mul <Old        >: 101558     us (s=15103756929938757885)
mul <New        >: 73450      us (s=15103756929938757885)

div <Old        >: 4181032    us (s=1998080879450088739)
div <New        >: 3570025    us (s=1998080879450088739)

---------------------------
Random Width Operations
---------------------------

comp<old        >: 344062     us (s=299999985)
comp<new        >: 420603     us (s=299999985)

add <Old        >: 91640      us (s=3475573289369278391)
add <New        >: 65164      us (s=3475573289369278391)

sub <Old        >: 91395      us (s=11626342892911575103)
sub <New        >: 62938      us (s=11626342892911575103)

mul <Old        >: 98816      us (s=1850423372688224498)
mul <New        >: 68533      us (s=1850423372688224498)

div <Old        >: 3559419    us (s=17660450816698112606)
div <New        >: 4086796    us (s=17660450816698112606)

@mborland
Copy link
Member Author

macOS ARM64:

---------------------------
Two Word Operations
---------------------------

comp<builtin    >: 133917     us (s=299999985)
comp<old        >: 174007     us (s=299999985)
comp<new        >: 134537     us (s=299999985)

add <Builtin    >: 24122      us (s=7061247353260042742)
add <Old        >: 23163      us (s=7061247353260042742)
add <New        >: 22852      us (s=7061247353260042742)

sub <Builtin    >: 25488      us (s=18357599163646591540)
sub <Old        >: 23776      us (s=18357599163646591540)
sub <New        >: 23226      us (s=18357599163646591540)

mul <Builtin    >: 26116      us (s=1771533121646374909)
mul <Old        >: 22454      us (s=1771533121646374909)
mul <New        >: 24439      us (s=1771533121646374909)

div <Builtin    >: 358775     us (s=889346420)
div <Old        >: 994108     us (s=889346420)
div <New        >: 346423     us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<builtin    >: 134251     us (s=299999985)
comp<old        >: 170523     us (s=299999985)
comp<new        >: 130376     us (s=299999985)

add <Builtin    >: 23837      us (s=6185515908288643546)
add <Old        >: 22334      us (s=6185515908288643546)
add <New        >: 22529      us (s=6185515908288643546)

sub <Builtin    >: 23749      us (s=6843120887274356308)
sub <Old        >: 23590      us (s=6843120887274356308)
sub <New        >: 22328      us (s=6843120887274356308)

mul <Builtin    >: 23353      us (s=4541227745721732167)
mul <Old        >: 21819      us (s=4541227745721732167)
mul <New        >: 22256      us (s=4541227745721732167)

div <Builtin    >: 538638     us (s=842110335)
div <Old        >: 1152888    us (s=842110335)
div <New        >: 537637     us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<builtin    >: 129896     us (s=299999985)
comp<old        >: 167367     us (s=299999985)
comp<new        >: 127164     us (s=299999985)

add <Builtin    >: 21138      us (s=343956012596809285)
add <Old        >: 20311      us (s=343956012596809285)
add <New        >: 19515      us (s=343956012596809285)

sub <Builtin    >: 21021      us (s=14793232657927033331)
sub <Old        >: 20233      us (s=14793232657927033331)
sub <New        >: 19538      us (s=14793232657927033331)

mul <Builtin    >: 20926      us (s=14323223738778550995)
mul <Old        >: 19748      us (s=14323223738778550995)
mul <New        >: 19714      us (s=14323223738778550995)

div <Builtin    >: 662538     us (s=2824396256559164129)
div <Old        >: 2013325    us (s=2824396256559164129)
div <New        >: 664455     us (s=2824396256559164129)

---------------------------
One-Two Word Operations
---------------------------

comp<builtin    >: 130871     us (s=299999985)
comp<old        >: 168450     us (s=299999985)
comp<new        >: 129036     us (s=299999985)

add <Builtin    >: 23062      us (s=10615944226330202903)
add <Old        >: 22138      us (s=10615944226330202903)
add <New        >: 21197      us (s=10615944226330202903)

sub <Builtin    >: 22933      us (s=7057049957990609625)
sub <Old        >: 22349      us (s=7057049957990609625)
sub <New        >: 21367      us (s=7057049957990609625)

mul <Builtin    >: 22781      us (s=15103756929938757885)
mul <Old        >: 20973      us (s=15103756929938757885)
mul <New        >: 21399      us (s=15103756929938757885)

div <Builtin    >: 663730     us (s=1998080879450088739)
div <Old        >: 1989705    us (s=1998080879450088739)
div <New        >: 676634     us (s=1998080879450088739)

---------------------------
Random Width Operations
---------------------------

comp<builtin    >: 128581     us (s=299999985)
comp<old        >: 165428     us (s=299999985)
comp<new        >: 128034     us (s=299999985)

add <Builtin    >: 22647      us (s=905614408289710516)
add <Old        >: 21859      us (s=905614408289710516)
add <New        >: 20408      us (s=905614408289710516)

sub <Builtin    >: 22227      us (s=16550818054324285228)
sub <Old        >: 22965      us (s=16550818054324285228)
sub <New        >: 20985      us (s=16550818054324285228)

mul <Builtin    >: 21882      us (s=18365287940902824468)
mul <Old        >: 21100      us (s=18365287940902824468)
mul <New        >: 21135      us (s=18365287940902824468)

div <Builtin    >: 648208     us (s=12691594570327189327)
div <Old        >: 1768911    us (s=12691594570327189327)
div <New        >: 646529     us (s=12691594570327189327)

@mborland
Copy link
Member Author

mborland commented Mar 11, 2025

Ubuntu 24.04:

X64:

---------------------------
Two Word Operations
---------------------------

comp<builtin    >: 145871     us (s=299999985)
comp<old        >: 85382      us (s=299999985)
comp<new        >: 111150     us (s=299999985)

add <Builtin    >: 31955      us (s=7061247353260042742)
add <Old        >: 33017      us (s=7061247353260042742)
add <New        >: 32842      us (s=7061247353260042742)

sub <Builtin    >: 33451      us (s=18357599163646591540)
sub <Old        >: 56770      us (s=18357599163646591540)
sub <New        >: 37607      us (s=18357599163646591540)

mul <Builtin    >: 36561      us (s=1771533121646374909)
mul <Old        >: 57339      us (s=1771533121646374909)
mul <New        >: 35217      us (s=1771533121646374909)

div <Builtin    >: 283144     us (s=889346420)
div <Old        >: 1474282    us (s=889346420)
div <New        >: 897783     us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<builtin    >: 135853     us (s=299999985)
comp<old        >: 124536     us (s=299999985)
comp<new        >: 65166      us (s=299999985)

add <Builtin    >: 29473      us (s=6185515908288643546)
add <Old        >: 29440      us (s=6185515908288643546)
add <New        >: 29602      us (s=6185515908288643546)

sub <Builtin    >: 30050      us (s=6843120887274356308)
sub <Old        >: 50298      us (s=6843120887274356308)
sub <New        >: 33660      us (s=6843120887274356308)

mul <Builtin    >: 32453      us (s=4541227745721732167)
mul <Old        >: 52978      us (s=4541227745721732167)
mul <New        >: 32008      us (s=4541227745721732167)

div <Builtin    >: 3156460    us (s=842110335)
div <Old        >: 1430959    us (s=842110335)
div <New        >: 3162280    us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<builtin    >: 136092     us (s=299999985)
comp<old        >: 124363     us (s=299999985)
comp<new        >: 65519      us (s=299999985)

add <Builtin    >: 29569      us (s=343956012596809285)
add <Old        >: 29391      us (s=343956012596809285)
add <New        >: 29022      us (s=343956012596809285)

sub <Builtin    >: 29396      us (s=14793232657927033331)
sub <Old        >: 49804      us (s=14793232657927033331)
sub <New        >: 33827      us (s=14793232657927033331)

mul <Builtin    >: 32520      us (s=14323223738778550995)
mul <Old        >: 52138      us (s=14323223738778550995)
mul <New        >: 31423      us (s=14323223738778550995)

div <Builtin    >: 3779437    us (s=2824396256559164129)
div <Old        >: 2281934    us (s=2824396256559164129)
div <New        >: 1715287    us (s=2824396256559164129)

---------------------------
One-Two Word Operations
---------------------------

comp<builtin    >: 135726     us (s=299999985)
comp<old        >: 124476     us (s=299999985)
comp<new        >: 65026      us (s=299999985)

add <Builtin    >: 29611      us (s=10615944226330202903)
add <Old        >: 29244      us (s=10615944226330202903)
add <New        >: 29136      us (s=10615944226330202903)

sub <Builtin    >: 29431      us (s=7057049957990609625)
sub <Old        >: 49994      us (s=7057049957990609625)
sub <New        >: 33924      us (s=7057049957990609625)

mul <Builtin    >: 32535      us (s=15103756929938757885)
mul <Old        >: 52334      us (s=15103756929938757885)
mul <New        >: 32218      us (s=15103756929938757885)

div <Builtin    >: 3786019    us (s=1998080879450088739)
div <Old        >: 2269037    us (s=1998080879450088739)
div <New        >: 1723069    us (s=1998080879450088739)

---------------------------
Random Width Operations
---------------------------

comp<builtin    >: 135424     us (s=299999985)
comp<old        >: 187219     us (s=299999985)
comp<new        >: 187435     us (s=299999985)

add <Builtin    >: 29619      us (s=3475573289369278391)
add <Old        >: 29091      us (s=3475573289369278391)
add <New        >: 29072      us (s=3475573289369278391)

sub <Builtin    >: 29547      us (s=11626342892911575103)
sub <Old        >: 50048      us (s=11626342892911575103)
sub <New        >: 33708      us (s=11626342892911575103)

mul <Builtin    >: 32760      us (s=1850423372688224498)
mul <Old        >: 52552      us (s=1850423372688224498)
mul <New        >: 31573      us (s=1850423372688224498)

div <Builtin    >: 2870703    us (s=17660450816698112606)
div <Old        >: 2133785    us (s=17660450816698112606)
div <New        >: 2167475    us (s=17660450816698112606)

S390x:

---------------------------
Two Word Operations
---------------------------

comp<builtin    >: 2241824    us (s=299999985)
comp<old        >: 476535     us (s=299999985)
comp<new        >: 572941     us (s=299999985)

add <Builtin    >: 542360     us (s=7061247353260042742)
add <Old        >: 89138      us (s=7061247353260042742)
add <New        >: 88978      us (s=7061247353260042742)

sub <Builtin    >: 541055     us (s=18357599163646591540)
sub <Old        >: 55404      us (s=18357599163646591540)
sub <New        >: 54233      us (s=18357599163646591540)

mul <Builtin    >: 288418     us (s=1771533121646374909)
mul <Old        >: 300472     us (s=1771533121646374909)
mul <New        >: 304379     us (s=1771533121646374909)

div <Builtin    >: 3813538    us (s=889346420)
div <Old        >: 8909780    us (s=889346420)
div <New        >: 5025268    us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<builtin    >: 1627379    us (s=299999985)
comp<old        >: 403873     us (s=299999985)
comp<new        >: 402687     us (s=299999985)

add <Builtin    >: 548904     us (s=6185515908288643546)
add <Old        >: 89665      us (s=6185515908288643546)
add <New        >: 90239      us (s=6185515908288643546)

sub <Builtin    >: 540064     us (s=6843120887274356308)
sub <Old        >: 54956      us (s=6843120887274356308)
sub <New        >: 54496      us (s=6843120887274356308)

mul <Builtin    >: 290764     us (s=4541227745721732167)
mul <Old        >: 301171     us (s=4541227745721732167)
mul <New        >: 300702     us (s=4541227745721732167)

div <Builtin    >: 3205567    us (s=842110335)
div <Old        >: 8682113    us (s=842110335)
div <New        >: 4356581    us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<builtin    >: 1509142    us (s=299999985)
comp<old        >: 400848     us (s=299999985)
comp<new        >: 399882     us (s=299999985)

add <Builtin    >: 542231     us (s=343956012596809285)
add <Old        >: 89441      us (s=343956012596809285)
add <New        >: 89290      us (s=343956012596809285)

sub <Builtin    >: 540711     us (s=14793232657927033331)
sub <Old        >: 55149      us (s=14793232657927033331)
sub <New        >: 54187      us (s=14793232657927033331)

mul <Builtin    >: 292292     us (s=14323223738778550995)
mul <Old        >: 301548     us (s=14323223738778550995)
mul <New        >: 299784     us (s=14323223738778550995)

div <Builtin    >: 3665889    us (s=2824396256559164129)
div <Old        >: 12834475   us (s=2824396256559164129)
div <New        >: 4981009    us (s=2824396256559164129)

---------------------------
One-Two Word Operations
---------------------------

comp<builtin    >: 1546726    us (s=299999985)
comp<old        >: 409678     us (s=299999985)
comp<new        >: 408326     us (s=299999985)

add <Builtin    >: 555542     us (s=10615944226330202903)
add <Old        >: 92788      us (s=10615944226330202903)
add <New        >: 92724      us (s=10615944226330202903)

sub <Builtin    >: 552907     us (s=7057049957990609625)
sub <Old        >: 55302      us (s=7057049957990609625)
sub <New        >: 54165      us (s=7057049957990609625)

mul <Builtin    >: 294909     us (s=15103756929938757885)
mul <Old        >: 305164     us (s=15103756929938757885)
mul <New        >: 301817     us (s=15103756929938757885)

div <Builtin    >: 3634124    us (s=1998080879450088739)
div <Old        >: 12792697   us (s=1998080879450088739)
div <New        >: 4808343    us (s=1998080879450088739)

---------------------------
Random Width Operations
---------------------------

comp<builtin    >: 1892592    us (s=299999985)
comp<old        >: 559044     us (s=299999985)
comp<new        >: 579941     us (s=299999985)

add <Builtin    >: 543686     us (s=3475573289369278391)
add <Old        >: 89496      us (s=3475573289369278391)
add <New        >: 89428      us (s=3475573289369278391)

sub <Builtin    >: 541261     us (s=11626342892911575103)
sub <Old        >: 55134      us (s=11626342892911575103)
sub <New        >: 54012      us (s=11626342892911575103)

mul <Builtin    >: 294269     us (s=1850423372688224498)
mul <Old        >: 302893     us (s=1850423372688224498)
mul <New        >: 302034     us (s=1850423372688224498)

div <Builtin    >: 3895271    us (s=17660450816698112606)
div <Old        >: 11311799   us (s=17660450816698112606)
div <New        >: 5031463    us (s=17660450816698112606)

PPC64LE

---------------------------
Two Word Operations
---------------------------

comp<builtin    >: 1103709    us (s=299999985)
comp<old        >: 408044     us (s=299999985)
comp<new        >: 468083     us (s=299999985)

add <Builtin    >: 108304     us (s=7061247353260042742)
add <Old        >: 81828      us (s=7061247353260042742)
add <New        >: 82737      us (s=7061247353260042742)

sub <Builtin    >: 111692     us (s=18357599163646591540)
sub <Old        >: 95341      us (s=18357599163646591540)
sub <New        >: 97155      us (s=18357599163646591540)

mul <Builtin    >: 75015      us (s=1771533121646374909)
mul <Old        >: 146690     us (s=1771533121646374909)
mul <New        >: 67541      us (s=1771533121646374909)

div <Builtin    >: 1730119    us (s=889346420)
div <Old        >: 4992296    us (s=889346420)
div <New        >: 1761127    us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<builtin    >: 689369     us (s=299999985)
comp<old        >: 354374     us (s=299999985)
comp<new        >: 278792     us (s=299999985)

add <Builtin    >: 107202     us (s=6185515908288643546)
add <Old        >: 82116      us (s=6185515908288643546)
add <New        >: 82284      us (s=6185515908288643546)

sub <Builtin    >: 111697     us (s=6843120887274356308)
sub <Old        >: 94656      us (s=6843120887274356308)
sub <New        >: 96003      us (s=6843120887274356308)

mul <Builtin    >: 65210      us (s=4541227745721732167)
mul <Old        >: 146566     us (s=4541227745721732167)
mul <New        >: 66651      us (s=4541227745721732167)

div <Builtin    >: 1620306    us (s=842110335)
div <Old        >: 5010605    us (s=842110335)
div <New        >: 1665208    us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<builtin    >: 537601     us (s=299999985)
comp<old        >: 348672     us (s=299999985)
comp<new        >: 277771     us (s=299999985)

add <Builtin    >: 106455     us (s=343956012596809285)
add <Old        >: 81568      us (s=343956012596809285)
add <New        >: 82650      us (s=343956012596809285)

sub <Builtin    >: 111162     us (s=14793232657927033331)
sub <Old        >: 95556      us (s=14793232657927033331)
sub <New        >: 96269      us (s=14793232657927033331)

mul <Builtin    >: 66385      us (s=14323223738778550995)
mul <Old        >: 145718     us (s=14323223738778550995)
mul <New        >: 68411      us (s=14323223738778550995)

div <Builtin    >: 2194214    us (s=2824396256559164129)
div <Old        >: 7194137    us (s=2824396256559164129)
div <New        >: 2261594    us (s=2824396256559164129)

---------------------------
One-Two Word Operations
---------------------------

comp<builtin    >: 536774     us (s=299999985)
comp<old        >: 347291     us (s=299999985)
comp<new        >: 277606     us (s=299999985)

add <Builtin    >: 106857     us (s=10615944226330202903)
add <Old        >: 81739      us (s=10615944226330202903)
add <New        >: 81809      us (s=10615944226330202903)

sub <Builtin    >: 113409     us (s=7057049957990609625)
sub <Old        >: 95324      us (s=7057049957990609625)
sub <New        >: 99455      us (s=7057049957990609625)

mul <Builtin    >: 68295      us (s=15103756929938757885)
mul <Old        >: 147865     us (s=15103756929938757885)
mul <New        >: 67300      us (s=15103756929938757885)

div <Builtin    >: 2172582    us (s=1998080879450088739)
div <Old        >: 7204389    us (s=1998080879450088739)
div <New        >: 2245553    us (s=1998080879450088739)

---------------------------
Random Width Operations
---------------------------

comp<builtin    >: 900668     us (s=299999985)
comp<old        >: 408368     us (s=299999985)
comp<new        >: 444490     us (s=299999985)

add <Builtin    >: 106906     us (s=3475573289369278391)
add <Old        >: 82084      us (s=3475573289369278391)
add <New        >: 82256      us (s=3475573289369278391)

sub <Builtin    >: 111508     us (s=11626342892911575103)
sub <Old        >: 95974      us (s=11626342892911575103)
sub <New        >: 96618      us (s=11626342892911575103)

mul <Builtin    >: 66254      us (s=1850423372688224498)
mul <Old        >: 146078     us (s=1850423372688224498)
mul <New        >: 66899      us (s=1850423372688224498)

div <Builtin    >: 2241612    us (s=17660450816698112606)
div <Old        >: 6602686    us (s=17660450816698112606)
div <New        >: 2283165    us (s=17660450816698112606)

@ckormanyos
Copy link
Collaborator

Ubuntu 24.04: X64:

Everything got better. Some trivially, some profoundly. This is really a nice result @mborland and I know you worked hard for this.

@mborland
Copy link
Member Author

Ubuntu 24.04: X64:

Everything got better. Some trivially, some profoundly. This is really a nice result @mborland and I know you worked hard for this.

Thanks. It looks like the other 64 bit platforms can take the same optimizations as x64. I need to find some good docker images for ARM32 and x86. The ubuntu ones fail to build, and that's where the emulation is important since they won't have native types.

@ckormanyos
Copy link
Collaborator

ckormanyos commented Mar 11, 2025

Thanks. It looks like the other 64 bit platforms can take the same optimizations as x64. I need to find some good docker images for ARM32 and x86.

Hmmm. Do you/we really need to do that? I don't really see this as a clearly sensible use of development time. I think the improvements compared to earlier are clear enough?

@mborland
Copy link
Member Author

Thanks. It looks like the other 64 bit platforms can take the same optimizations as x64. I need to find some good docker images for ARM32 and x86.

Hmmm. Do you/we really need to do that? I don't really see this as a clearly sensible use of development time. I think the improvements compared to earlier are clear enough?

Probably not worth optimizing for, but checking the results are the same. I assume the majority of users are running x64 Linux or Win64 where we are sitting well.

@mborland
Copy link
Member Author

Looks like we're sitting good for 32 bits, and they pass all tests:

I386:

---------------------------
Two Word Operations
---------------------------

comp<old        >: 866424     us (s=299999985)
comp<new        >: 1290285    us (s=299999985)

add <Old        >: 461570     us (s=1111190006)
add <New        >: 462066     us (s=1111190006)

sub <Old        >: 482867     us (s=908706356)
sub <New        >: 472322     us (s=908706356)

mul <Old        >: 449556     us (s=4153552893)
mul <New        >: 449556     us (s=4153552893)

div <Old        >: 12304477   us (s=889346420)
div <New        >: 8139990    us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<old        >: 857424     us (s=299999985)
comp<new        >: 827889     us (s=299999985)

add <Old        >: 463251     us (s=1108509146)
add <New        >: 463823     us (s=1108509146)

sub <Old        >: 469621     us (s=3372957268)
sub <New        >: 468990     us (s=3372957268)

mul <Old        >: 452998     us (s=65293383)
mul <New        >: 452415     us (s=65293383)

div <Old        >: 13527674   us (s=842110335)
div <New        >: 13503405   us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<old        >: 858257     us (s=299999985)
comp<new        >: 826322     us (s=299999985)

add <Old        >: 461546     us (s=3442560581)
add <New        >: 461449     us (s=3442560581)

sub <Old        >: 468543     us (s=2004013555)
sub <New        >: 466397     us (s=2004013555)

mul <Old        >: 450299     us (s=1280738003)
mul <New        >: 450971     us (s=1280738003)

div <Old        >: 16674292   us (s=1495722721)
div <New        >: 12222917   us (s=1495722721)

---------------------------
One-Two Word Operations
---------------------------

comp<old        >: 847754     us (s=299999985)
comp<new        >: 833615     us (s=299999985)

add <Old        >: 463075     us (s=902124311)
add <New        >: 462762     us (s=902124311)

sub <Old        >: 468962     us (s=70726361)
sub <New        >: 469175     us (s=70726361)

mul <Old        >: 452290     us (s=2530744573)
mul <New        >: 453577     us (s=2530744573)

div <Old        >: 16680469   us (s=1008690467)
div <New        >: 12204816   us (s=1008690467)

---------------------------
Random Width Operations
---------------------------

comp<old        >: 1188569    us (s=299999985)
comp<new        >: 1159680    us (s=299999985)

add <Old        >: 463769     us (s=129046455)
add <New        >: 462798     us (s=129046455)

sub <Old        >: 469098     us (s=417447999)
sub <New        >: 469547     us (s=417447999)

mul <Old        >: 451930     us (s=3844698354)
mul <New        >: 451734     us (s=3844698354)

div <Old        >: 15275905   us (s=3733740126)
div <New        >: 12725964   us (s=3733740126)

Arm32v7:

---------------------------
Two Word Operations
---------------------------

comp<old        >: 1251838    us (s=299999985)
comp<new        >: 1292324    us (s=299999985)

add <Old        >: 422043     us (s=1111190006)
add <New        >: 420447     us (s=1111190006)

sub <Old        >: 421715     us (s=908706356)
sub <New        >: 422350     us (s=908706356)

mul <Old        >: 462382     us (s=4153552893)
mul <New        >: 449793     us (s=4153552893)

div <Old        >: 33819139   us (s=889346420)
div <New        >: 11002887   us (s=889346420)

---------------------------
One Word Operations
---------------------------

comp<old        >: 1156955    us (s=299999985)
comp<new        >: 1093539    us (s=299999985)

add <Old        >: 428802     us (s=1108509146)
add <New        >: 421936     us (s=1108509146)

sub <Old        >: 429674     us (s=3372957268)
sub <New        >: 421542     us (s=3372957268)

mul <Old        >: 462212     us (s=65293383)
mul <New        >: 448538     us (s=65293383)

div <Old        >: 49624598   us (s=842110335)
div <New        >: 48223278   us (s=842110335)

---------------------------
Two-One Word Operations
---------------------------

comp<old        >: 1048464    us (s=299999985)
comp<new        >: 953298     us (s=299999985)

add <Old        >: 428412     us (s=3442560581)
add <New        >: 419315     us (s=3442560581)

sub <Old        >: 428438     us (s=2004013555)
sub <New        >: 419841     us (s=2004013555)

mul <Old        >: 462928     us (s=1280738003)
mul <New        >: 450968     us (s=1280738003)

div <Old        >: 54098253   us (s=1495722721)
div <New        >: 35339832   us (s=1495722721)

---------------------------
One-Two Word Operations
---------------------------

comp<old        >: 1049147    us (s=299999985)
comp<new        >: 944754     us (s=299999985)

add <Old        >: 431144     us (s=902124311)
add <New        >: 420639     us (s=902124311)

sub <Old        >: 428091     us (s=70726361)
sub <New        >: 419535     us (s=70726361)

mul <Old        >: 461926     us (s=2530744573)
mul <New        >: 446754     us (s=2530744573)

div <Old        >: 54055245   us (s=1008690467)
div <New        >: 35262008   us (s=1008690467)

---------------------------
Random Width Operations
---------------------------

comp<old        >: 1315549    us (s=299999985)
comp<new        >: 1232650    us (s=299999985)

add <Old        >: 429616     us (s=129046455)
add <New        >: 419289     us (s=129046455)

sub <Old        >: 428217     us (s=417447999)
sub <New        >: 420339     us (s=417447999)

mul <Old        >: 461853     us (s=3844698354)
mul <New        >: 447205     us (s=3844698354)

div <Old        >: 48434926   us (s=3733740126)
div <New        >: 33323145   us (s=3733740126)

@mborland
Copy link
Member Author

I've made some improvements to the 64-bit platforms, and edited their posts rather than making more. We're almost universally better now. The only outstanding TODO I have is worthwhile to specialize 128 by 64 bit division? Passing it to the wide integer knuth might be overkill, especially because the first few steps of that method is to find information we already know (e.g. where is the head word, div by 0, etc). We also still have the old 128 by 32 bit division which uses 4 integer divs, and 4 integer mods.

I also re-investigated speeding up the digit counting methods since that is one of the slower operations we have that is used frequently (it's the basis for normalization). Below is the benchmark of just that operation on Mac ARM64:

digits<builtin    >: 564861     us (s=2903591855)
digits<old        >: 561131     us (s=2903591855)
digits<new        >: 188835     us (s=2903591855)

@mborland mborland linked a pull request Mar 13, 2025 that will close this issue
@mborland
Copy link
Member Author

@NAThompson here are the benchmarks as run on a native ARM64 Graviton runner:

---------------------------
Two Word Operations
---------------------------

comp<builtin    >: 1271311    us (s=299999985)
comp<old        >: 558755     us (s=299999985)
comp<new        >: 376748     us (s=299999985)

add <Builtin    >: 195503     us (s=7061247353260042742)
add <Old        >: 184308     us (s=7061247353260042742)
add <New        >: 95243      us (s=7061247353260042742)

sub <Builtin    >: 170684     us (s=18357599163646591540)
sub <Old        >: 99113      us (s=18357599163646591540)
sub <New        >: 152775     us (s=18357599163646591540)

mul <Builtin    >: 333176     us (s=1771533121646374909)
mul <Old        >: 579740     us (s=1771533121646374909)
mul <New        >: 518064     us (s=1771533121646374909)

div <Builtin    >: 1463935    us (s=889346420)
div <Old        >: 4421613    us (s=889346420)
div <New        >: 1432862    us (s=889346420)

digits<builtin    >: 550380     us (s=1939775395)
digits<old        >: 438990     us (s=1939775395)
digits<new        >: 2523727    us (s=1939775395)

---------------------------
One Word Operations
---------------------------

comp<builtin    >: 972214     us (s=299999985)
comp<old        >: 255140     us (s=299999985)
comp<new        >: 227751     us (s=299999985)

add <Builtin    >: 166249     us (s=6185515908288643546)
add <Old        >: 97040      us (s=6185515908288643546)
add <New        >: 170820     us (s=6185515908288643546)

sub <Builtin    >: 165595     us (s=6843120887274356308)
sub <Old        >: 95942      us (s=6843120887274356308)
sub <New        >: 173736     us (s=6843120887274356308)

mul <Builtin    >: 220516     us (s=4541227745721732167)
mul <Old        >: 634923     us (s=4541227745721732167)
mul <New        >: 485485     us (s=4541227745721732167)

div <Builtin    >: 1595020    us (s=842110335)
div <Old        >: 4766472    us (s=842110335)
div <New        >: 1599813    us (s=842110335)

digits<builtin    >: 1569186    us (s=3867358930)
digits<old        >: 1934104    us (s=3867358930)
digits<new        >: 1342841    us (s=3867358930)

---------------------------
Two-One Word Operations
---------------------------

comp<builtin    >: 751798     us (s=299999985)
comp<old        >: 271200     us (s=299999985)
comp<new        >: 241056     us (s=299999985)

add <Builtin    >: 143640     us (s=343956012596809285)
add <Old        >: 109960     us (s=343956012596809285)
add <New        >: 154556     us (s=343956012596809285)

sub <Builtin    >: 97091      us (s=14793232657927033331)
sub <Old        >: 171567     us (s=14793232657927033331)
sub <New        >: 93762      us (s=14793232657927033331)

mul <Builtin    >: 318639     us (s=14323223738778550995)
mul <Old        >: 505322     us (s=14323223738778550995)
mul <New        >: 541140     us (s=14323223738778550995)

div <Builtin    >: 2297794    us (s=2824396256559164129)
div <Old        >: 7422717    us (s=2824396256559164129)
div <New        >: 2287030    us (s=2824396256559164129)

digits<builtin    >: 1121159    us (s=2903578155)
digits<old        >: 1287406    us (s=2903578155)
digits<new        >: 1934852    us (s=2903578155)

---------------------------
One-Two Word Operations
---------------------------

comp<builtin    >: 659659     us (s=299999985)
comp<old        >: 290216     us (s=299999985)
comp<new        >: 327173     us (s=299999985)

add <Builtin    >: 103051     us (s=10615944226330202903)
add <Old        >: 192832     us (s=10615944226330202903)
add <New        >: 195161     us (s=10615944226330202903)

sub <Builtin    >: 119546     us (s=7057049957990609625)
sub <Old        >: 91639      us (s=7057049957990609625)
sub <New        >: 168182     us (s=7057049957990609625)

mul <Builtin    >: 188052     us (s=15103756929938757885)
mul <Old        >: 585182     us (s=15103756929938757885)
mul <New        >: 478544     us (s=15103756929938757885)

div <Builtin    >: 2219340    us (s=1998080879450088739)
div <Old        >: 7323251    us (s=1998080879450088739)
div <New        >: 2411335    us (s=1998080879450088739)

digits<builtin    >: 1082472    us (s=2903549675)
digits<old        >: 1216029    us (s=2903549675)
digits<new        >: 1736676    us (s=2903549675)

---------------------------
Random Width Operations
---------------------------

comp<builtin    >: 1369702    us (s=299999985)
comp<old        >: 501133     us (s=299999985)
comp<new        >: 584821     us (s=299999985)

add <Builtin    >: 194236     us (s=3475573289369278391)
add <Old        >: 185396     us (s=3475573289369278391)
add <New        >: 116206     us (s=3475573289369278391)

sub <Builtin    >: 148666     us (s=11626342892911575103)
sub <Old        >: 199357     us (s=11626342892911575103)
sub <New        >: 168057     us (s=11626342892911575103)

mul <Builtin    >: 240310     us (s=1850423372688224498)
mul <Old        >: 585867     us (s=1850423372688224498)
mul <New        >: 577053     us (s=1850423372688224498)

div <Builtin    >: 2118526    us (s=17660450816698112606)
div <Old        >: 6344489    us (s=17660450816698112606)
div <New        >: 2265787    us (s=17660450816698112606)

mod <Builtin    >: 2239968    us (s=10795064728928622325)
mod <Old        >: 6671733    us (s=10795064728928622325)
div <New        >: 2306170    us (s=10795064728928622325)

digits<builtin    >: 1635038    us (s=2903488885)
digits<old        >: 1864824    us (s=2903488885)
digits<new        >: 2435372    us (s=2903488885)

Apparently the CLZ instruction is horrible since on M4 the speedup was ~3x. I can probably get the mul timing value down using the same memcpy trick that is used on x64.

@mborland
Copy link
Member Author

We should check the MSVC comparison vs: https://github.com/microsoft/STL/blob/main/stl/inc/__msvc_int128.hpp. It extensively uses intrinsics which at least on ARM I found to perform worse than the regular scalar implementations.

@mborland
Copy link
Member Author

mborland commented Mar 24, 2025

Here is the data with MSVC's std::_Unsigned128 as the "builtin" type:

====== BEGIN OUTPUT ======

---------------------------
Two Word Operations
---------------------------

comp<builtin    >: 169203     us (s=299999985)
comp<old        >: 154939     us (s=299999985)
comp<new        >: 132799     us (s=299999985)

add <Builtin    >: 32247      us (s=7061247353260042742)
add <Old        >: 32427      us (s=7061247353260042742)
add <New        >: 32867      us (s=7061247353260042742)

sub <Builtin    >: 32078      us (s=18357599163646591540)
sub <Old        >: 32036      us (s=18357599163646591540)
sub <New        >: 32095      us (s=18357599163646591540)

mul <Builtin    >: 68753      us (s=1771533121646374909)
mul <Old        >: 57355      us (s=1771533121646374909)
mul <New        >: 55972      us (s=1771533121646374909)

div <Builtin    >: 218817     us (s=889346420)
div <Old        >: 1171086    us (s=889346420)
div <New        >: 1023186    us (s=889346420)

digits<builtin    >: 6718177    us (s=1939775395)
digits<old        >: 147770     us (s=1939775395)
digits<new        >: 147605     us (s=1939775395)

---------------------------
One Word Operations
---------------------------

comp<builtin    >: 173118     us (s=299999985)
comp<old        >: 156262     us (s=299999985)
comp<new        >: 132809     us (s=299999985)

add <Builtin    >: 32626      us (s=12166396133763276947)
add <Old        >: 32433      us (s=6185515908288643546)
add <New        >: 32385      us (s=6185515908288643546)

sub <Builtin    >: 31962      us (s=15111948464598141735)
sub <Old        >: 32787      us (s=6843120887274356308)
sub <New        >: 32121      us (s=6843120887274356308)

mul <Builtin    >: 68793      us (s=8866576161942358014)
mul <Old        >: 57709      us (s=4541227745721732167)
mul <New        >: 56394      us (s=4541227745721732167)

div <Builtin    >: 632194     us (s=810869830)
div <Old        >: 1340088    us (s=842110335)
div <New        >: 1264835    us (s=842110335)

digits<builtin    >: 13133880   us (s=3867359015)
digits<old        >: 614813     us (s=3867358930)
digits<new        >: 420807     us (s=3867358930)

---------------------------
Two-One Word Operations
---------------------------

comp<builtin    >: 167014     us (s=299999985)
comp<old        >: 157464     us (s=299999985)
comp<new        >: 134517     us (s=299999985)

add <Builtin    >: 32729      us (s=10615944226330202903)
add <Old        >: 32191      us (s=343956012596809285)
add <New        >: 33029      us (s=343956012596809285)

sub <Builtin    >: 32603      us (s=7057049957990609625)
sub <Old        >: 32656      us (s=14793232657927033331)
sub <New        >: 33808      us (s=14793232657927033331)

mul <Builtin    >: 68888      us (s=15103756929938757885)
mul <Old        >: 57817      us (s=14323223738778550995)
mul <New        >: 57560      us (s=14323223738778550995)

div <Builtin    >: 1223753    us (s=13606058108451423992)
div <Old        >: 2051149    us (s=2824396256559164129)
div <New        >: 1935158    us (s=2824396256559164129)

digits<builtin    >: 9928638    us (s=2903566375)
digits<old        >: 358512     us (s=2903578155)
digits<new        >: 284569     us (s=2903578155)

---------------------------
One-Two Word Operations
---------------------------

comp<builtin    >: 168388     us (s=299999985)
comp<old        >: 158795     us (s=299999985)
comp<new        >: 135786     us (s=299999985)

add <Builtin    >: 33171      us (s=8319793830730920280)
add <Old        >: 33505      us (s=10615944226330202903)
add <New        >: 33174      us (s=10615944226330202903)

sub <Builtin    >: 33300      us (s=5050418025115660168)
sub <Old        >: 32578      us (s=7057049957990609625)
sub <New        >: 32349      us (s=7057049957990609625)

mul <Builtin    >: 68152      us (s=16085279806510168732)
mul <Old        >: 57601      us (s=15103756929938757885)
mul <New        >: 57023      us (s=15103756929938757885)

div <Builtin    >: 1231018    us (s=4377107715507523416)
div <Old        >: 2094170    us (s=1998080879450088739)
div <New        >: 1972436    us (s=1998080879450088739)

digits<builtin    >: 9899819    us (s=2903567855)
digits<old        >: 350320     us (s=2903549675)
digits<new        >: 283570     us (s=2903549675)

---------------------------
Random Width Operations
---------------------------

comp<builtin    >: 299803     us (s=299999985)
comp<old        >: 288517     us (s=299999985)
comp<new        >: 137333     us (s=299999985)

add <Builtin    >: 33106      us (s=1060231826472181678)
add <Old        >: 33081      us (s=3475573289369278391)
add <New        >: 32981      us (s=3475573289369278391)

sub <Builtin    >: 32800      us (s=7322099092371254548)
sub <Old        >: 33093      us (s=11626342892911575103)
sub <New        >: 33089      us (s=11626342892911575103)

mul <Builtin    >: 67990      us (s=8981209912368794188)
mul <Old        >: 58014      us (s=1850423372688224498)
mul <New        >: 57602      us (s=1850423372688224498)

div <Builtin    >: 1070137    us (s=14500198671149722481)
div <Old        >: 1986726    us (s=17660450816698112606)
div <New        >: 1874920    us (s=17660450816698112606)

mod <Builtin    >: 1176824    us (s=6196585683429563738)
mod <Old        >: 2043130    us (s=10795064728928622325)
mod <New        >: 1788533    us (s=10795064728928622325)

digits<builtin    >: 10650533   us (s=2903487550)
digits<old        >: 568803     us (s=2903488885)
digits<new        >: 549706     us (s=2903488885)

EXIT STATUS: 1
====== END OUTPUT ======

I am going to have to see what they are doing for the div operation since it's about 2-5x faster. I'm sure we are having some loss by converting to wide-integer and back

@mborland
Copy link
Member Author

Took a peek into the source for MSVC. It looks like they are also using Knuth 4.3.1D, but specialized per operation e.g. 128 by 128 or 128 by 64 div. I think we can benefit from doing the same. It would also port well to a future portable u128 library since then we don't have to package all of wide-integer along with it.

@mborland
Copy link
Member Author

Made some simplification on the two word path:

---------------------------
Two Word Operations
---------------------------

comp<builtin    >: 176404     us (s=299999985)
comp<old        >: 162005     us (s=299999985)
comp<new        >: 138623     us (s=299999985)

add <Builtin    >: 32573      us (s=7061247353260042742)
add <Old        >: 33318      us (s=7061247353260042742)
add <New        >: 32986      us (s=7061247353260042742)

sub <Builtin    >: 33053      us (s=18357599163646591540)
sub <Old        >: 36167      us (s=18357599163646591540)
sub <New        >: 35251      us (s=18357599163646591540)

mul <Builtin    >: 70159      us (s=1771533121646374909)
mul <Old        >: 58049      us (s=1771533121646374909)
mul <New        >: 57253      us (s=1771533121646374909)

div <Builtin    >: 222323     us (s=889346420)
div <Old        >: 1191085    us (s=889346420)
div <New        >: 216874     us (s=889346420)

My current concern with the benchmarks of MSVC on the other operations is that the results are not matching our numerical results. Old and new both have identical values of s (as they should), but MSVC does not.

@ckormanyos
Copy link
Collaborator

ckormanyos commented Mar 25, 2025

It would also port well to a future portable u128 library since then we don't have to package all of wide-integer along with it.

Using portions of wide-integer was, indeed, a stepping stone to get the right division answers. It was also a mash-up because I had to separate into half-limbs for 64-bit limbs.

We can certainly do better than that in the future. And you are getting this for sure. Also 128/64 is can be specialized. The limb-counting is not needed for this known case.

There will be (as you've found out) a lot more cycles to optimize in long division.

I uncovered an older work I had started in 2003 that packages all of uint128_t, but it's quality is way too poor to use at the moment. It crackered out on the first fuzzing attempts. If this work ever gets the quality we need, we can see if there are any parts to mix-and-match.

Thanks Matt (@mborland)

@mborland
Copy link
Member Author

We can certainly do better than that in the future. And you are getting this for sure. Also 128/64 is can be specialized. The limb-counting is not needed for this known case.

I think for MSVC we need both 128 by 128 and 128 by 64. The reason for their numerical differences is their div follows _udiv128 which assumes that the return value can fit in a u64 which is clearly not always the case. I found a bug in the benchmarks where the two word runs were actually one and vice versa. Need to keep hammering on Knuth

@ckormanyos
Copy link
Collaborator

found a bug in the benchmarks

Need to keep hammering on Knuth

Hi Matt (@mborland) this concerns me at tad. And I need to query. Did you find a bug in something related to my / our Knuth long-division? Or did you find what seems like a bug in a third party work?

@mborland
Copy link
Member Author

No. I just had the benchmarks mislabeled. What should have been one word testing was two word and vice versa. The implementations of everything is totally fine. Just changed what I needed to look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants