Improve uniform `ShortByteString` #116

lehins · 2021-08-31T20:13:11Z

random-1.2.0 contained a shortcut where architecture independent generation of bytes relied on bytestring's builder functionality, which forced us to generate ShortByteString as pinned. This PR fixes that technical debt.

This is a non-breaking change.

Stop relying on `bytestring` for architecture independent generation of `ShortByteString` and `ByteString`

Bodigrim · 2021-08-31T20:45:57Z

Any idea how to test it on BE arch?

lehins · 2021-08-31T20:53:14Z

Any idea how to test it on BE arch?

Besides spinning up an AMD server from some cloud provider, like Hetzner for instance and running tests there, I have no other ideas.

Bodigrim · 2021-08-31T20:56:04Z

Which AMD machines are big-endian? I thought all of them are little-endian, as well as modern ARM.

Bodigrim · 2021-08-31T20:59:05Z

@juhp I recall you raising s390-related issues at GHC bug tracker. Do you possibly have an access to a big-endian machine to test this patch?

lehins · 2021-08-31T20:59:37Z

Oh look at that. AMD is LE too. In that case I have no clue how to get hold of BE hardware :D

Bodigrim · 2021-08-31T21:00:45Z

src/System/Random/Internal.hs

@@ -105,6 +103,8 @@ import GHC.ForeignPtr
 import Data.ByteString (ByteString)
 #endif

+#include "MachDeps.h"


We can avoid CPP: there is GHC.ByteOrder plus ghc-byteorder package for old GHCs:

if impl(ghc < 8.4) build-depends: ghc-byteorder

Thank you for the suggestion. Done.

For the sake of anyone stumbling on this suggestion, this will not work. See: https://gitlab.haskell.org/ghc/ghc/-/issues/20338

juhp · 2021-09-01T04:25:45Z

I can probably do a test build in the Fedora buildsystem.

You can see our latest release build here for example.

So you just want me to build this change? Get the testsuite to run looks rather tricky.

juhp · 2021-09-01T08:25:05Z

Here is a scratch build: https://koji.fedoraproject.org/koji/taskinfo?taskID=74908079
(using ghc-8.10.5 and LTS 18 packages basically).

lehins · 2021-09-01T12:18:05Z

@juhp Thank you for your help. Unfortunately it is the random:spec test suite that needs to be run in order to confirm that big/little endian compatibility works as expected.

juhp · 2021-09-02T07:49:45Z

Okay, I feared as much, perhaps I can get temporary access to a Fedora s390x instance... Otherwise in the worst case we will find out later I guess ;-)

juhp · 2021-09-02T11:54:44Z

~~Good news I ran the testsuite on Fedora 34 s390x and it passed:~~

lehins · 2021-09-02T11:59:32Z

@juhp Awesome!!! Thank you very much for verifying this PR!!!

juhp · 2021-09-02T12:00:43Z

Sorry please wait - that was the wrong log... rechecking now

juhp · 2021-09-02T12:07:44Z

I ran the testsuite on Fedora 34 s390x now correctly in your branch and I am afraid there was 1 test failure:

    genByteString/ShortByteString consistency:  FAIL
      test/Spec.hs:118:
      expected: [78,232,117,189]
       but got: [189,117,232,78]

(Sorry for the false confirmation earlier)

lehins · 2021-09-02T12:16:17Z

@juhp Dammit, that is unfortunate. Thank you for rechecking it!

Bodigrim · 2021-09-03T19:23:56Z

@lehins actually, what is our goal here? To produce the same random numbers from the same seed both on LE and BE platforms? Why it does not suffice just to produce some random numbers, but not necessarily the same?

lehins · 2021-09-03T19:26:22Z

To produce the same random numbers from the same seed both on LE and BE platforms?

yes

Why it does not suffice just to produce some random numbers, but not necessarily the same?

I don't follow.

lehins · 2021-09-03T19:32:24Z

@Bodigrim Not random numbers, but sequence of random bytes

lehins · 2021-09-03T20:25:45Z

@Bodigrim I'll describe what is going on in a little more detail:

In order to to generate a ByteString (or a ShortByteString) we could do something like genByteStringM n g = pack <$> replicateM n (uniformM g)

However this would generate 64bits for every byte that will be used, which is extremely wasteful and inefficient.

What we do instead is generate one Word64 at a time and write into a mutable buffer until we fill it up. Writing it in BE/LE agnostic manner will ensure that generated ByteString will be the same for all architectures for the same generator.

There is also an extra issue at the end of a ByteString as well, since we often will have a tail that is smaller than Word64 (when mod n 8 /= 0) we need to write the first few bytes into the end of the ByteString in the same manner across architectures as well.

So the failing test in this #116 (comment) depicts that there is a problem in the logic (or in my assumptions of how it works) somewhere and we will get bytes in different order on BE vs LE machines.

Now, all I need is to figure out how can I get hands on BE machine so I can experiment with this, I can't be constantly bugging Jens to verify if a change works or not. I suspect the problem was there prior to this PR as well, except the test was not present until now and if anyone would ever run random on a BE machine random bytes would be still ... random, so this problem is not very well pronounced, nevertheless it is still there.

curiousleo · 2021-09-06T06:34:48Z

Now, all I need is to figure out how can I get hands on BE machine so I can experiment with this, I can't be constantly bugging Jens to verify if a change works or not.

It looks like this project lets you run an emulated s390x Ubuntu with QEMU + Docker:

$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
$ docker run --rm -t s390x/ubuntu uname -m
s390x

https://github.com/multiarch/qemu-user-static#getting-started

That might help.

lehins · 2021-09-06T21:08:43Z

Halleluja! I am not going crazy! @curiousleo enormous thank you for this suggestion with docker. It was a bit painful to get it to work, anything complicated like cabal or stack will not work on that docker, because it uses up memory like crazy.

So it appears my sanity is fine and GHC is not reporting ByteOrder correctly: https://gitlab.haskell.org/ghc/ghc/-/issues/20338

@Bodigrim thank you for suggesting to avoid CPP otherwise we would not have found that bug, but now I need to bring the CPP back.

@juhp thank you again for helping debug this. If you don't mind I'll ask you again in a little bit to run the test suite one last time, just to be sure. For now I'll need to bring back the CPP approach first.

Bodigrim · 2021-09-06T21:13:37Z

GHC is not reporting ByteOrder correctly

Oh, that's pretty big. Thanks for debugging it.

curiousleo · 2021-09-07T06:03:50Z

Damn, nice find @lehins.

Bodigrim · 2021-09-07T23:48:50Z

It was a bit painful to get it to work, anything complicated like cabal or stack will not work on that docker, because it uses up memory like crazy.

I was able to use cabal -j1. It eats up to 16Gb RAM and is slow as hell, literally hours and hours to build dependencies, but succeeds.

lehins · 2021-09-08T00:01:27Z

😄 As I said, painful indeed

lehins added 3 commits August 31, 2021 19:34

Addition of runStateGenST_

cbb3658

Add (Short)ByteString consistency tests

05d9475

Make sure ShortByteString is generated unpinned.

0e7b49f

Stop relying on `bytestring` for architecture independent generation of `ShortByteString` and `ByteString`

Bodigrim reviewed Aug 31, 2021

View reviewed changes

Switch to GHC.ByteOrder for architecture detection

fdd882a

lehins merged commit d819629 into master Sep 2, 2021

lehins mentioned this pull request Sep 5, 2021

Improve uniform ShortByteString (fixup) #118

Merged

lehins deleted the improve-uniform-shortbytestring branch September 8, 2021 22:52

Bodigrim mentioned this pull request Sep 11, 2021

s390x issues haskell/bytestring#421

Closed

Bodigrim mentioned this pull request Sep 18, 2023

Add big-endian CI job #143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve uniform `ShortByteString` #116

Improve uniform `ShortByteString` #116

lehins commented Aug 31, 2021

Bodigrim commented Aug 31, 2021

lehins commented Aug 31, 2021

Bodigrim commented Aug 31, 2021

Bodigrim commented Aug 31, 2021

lehins commented Aug 31, 2021

Bodigrim Aug 31, 2021

lehins Aug 31, 2021

lehins Sep 8, 2021

juhp commented Sep 1, 2021 •

edited

Loading

juhp commented Sep 1, 2021

lehins commented Sep 1, 2021

juhp commented Sep 2, 2021 •

edited

Loading

juhp commented Sep 2, 2021 •

edited

Loading

lehins commented Sep 2, 2021

juhp commented Sep 2, 2021

juhp commented Sep 2, 2021 •

edited

Loading

lehins commented Sep 2, 2021

Bodigrim commented Sep 3, 2021 •

edited

Loading

lehins commented Sep 3, 2021

lehins commented Sep 3, 2021

lehins commented Sep 3, 2021

curiousleo commented Sep 6, 2021

lehins commented Sep 6, 2021 •

edited

Loading

Bodigrim commented Sep 6, 2021

curiousleo commented Sep 7, 2021

Bodigrim commented Sep 7, 2021

lehins commented Sep 8, 2021

Improve uniform ShortByteString #116

Improve uniform ShortByteString #116

Conversation

lehins commented Aug 31, 2021

Bodigrim commented Aug 31, 2021

lehins commented Aug 31, 2021

Bodigrim commented Aug 31, 2021

Bodigrim commented Aug 31, 2021

lehins commented Aug 31, 2021

Bodigrim Aug 31, 2021

Choose a reason for hiding this comment

lehins Aug 31, 2021

Choose a reason for hiding this comment

lehins Sep 8, 2021

Choose a reason for hiding this comment

juhp commented Sep 1, 2021 • edited Loading

juhp commented Sep 1, 2021

lehins commented Sep 1, 2021

juhp commented Sep 2, 2021 • edited Loading

juhp commented Sep 2, 2021 • edited Loading

lehins commented Sep 2, 2021

juhp commented Sep 2, 2021

juhp commented Sep 2, 2021 • edited Loading

lehins commented Sep 2, 2021

Bodigrim commented Sep 3, 2021 • edited Loading

lehins commented Sep 3, 2021

lehins commented Sep 3, 2021

lehins commented Sep 3, 2021

curiousleo commented Sep 6, 2021

lehins commented Sep 6, 2021 • edited Loading

Bodigrim commented Sep 6, 2021

curiousleo commented Sep 7, 2021

Bodigrim commented Sep 7, 2021

lehins commented Sep 8, 2021

Improve uniform `ShortByteString` #116

Improve uniform `ShortByteString` #116

juhp commented Sep 1, 2021 •

edited

Loading

juhp commented Sep 2, 2021 •

edited

Loading

juhp commented Sep 2, 2021 •

edited

Loading

juhp commented Sep 2, 2021 •

edited

Loading

Bodigrim commented Sep 3, 2021 •

edited

Loading

lehins commented Sep 6, 2021 •

edited

Loading