Update spec data in parallel #1743

ngzhian · 2020-08-05T23:12:00Z

Using queue and threads. The bulk of the logic is from the queue example
given in https://docs.python.org/3/library/queue.html#queue.Queue.join.
Usage of os.sched_getaffinity is explained
in https://docs.python.org/3/library/os.html?highlight=os#os.cpu_count.

I manually verified the correctness of this by comparing
bikeshed/spec-data/ of the current update with the one generated with
this patch applied.

$ time bikeshed update # current
real 2m17.526s
user 0m21.730s
sys 0m1.682s

$ time bikeshed update # with this patch, ran on 28 cores
real 0m13.869s
user 0m32.152s
sys 0m3.807s

We can probably speed this up more, maybe multiprocessing? But looks
good enough for me for now.

Fixed #1741.

Using queue and threads. The bulk of the logic is from the queue example given in https://docs.python.org/3/library/queue.html#queue.Queue.join. Usage of os.sched_getaffinity is explained in https://docs.python.org/3/library/os.html?highlight=os#os.cpu_count. I manually verified the correctness of this by comparing bikeshed/spec-data/ of the current update with the one generated with this patch applied. $ time bikeshed update # current real 2m17.526s user 0m21.730s sys 0m1.682s $ time bikeshed update # with this patch, ran on 28 cores real 0m13.869s user 0m32.152s sys 0m3.807s We can probably speed this up more, maybe multiprocessing? But looks good enough for me for now. Fixed speced#1741.

jyasskin · 2020-08-07T17:54:04Z

bikeshed/update/manifest.py

+        # This worker will download from remote, and update success if it fails.
+        def worker():
+            # Needed to update status and progress.
+            nonlocal success


Even Python has race conditions between threads, so results should be sent back using a queue also.

Good point. There is only 1 result, success is initialized to True, and any threads that sees a failure will set it to False, and won't touch it otherwise.
Since we use a .join() below, any threads that have encountered a failure would have set success to False, and by the time we read success in the main thread, threads would have joined, so the value is consistent. Lmk if my understanding of this is wrong. Thanks!

Yeah, since the only effect of worker() with the sync world is (a) possibly setting success to False, and (b) modifying a unique file, I think we're safe here.

And the race on lastMsgTime will probably just result in writing the "Updated n/m" message twice. And, of course, the number of files the output claims to have already fetched can actually go down if HTTP responses come back out of order.

Ah yup the printing is indeed racy.

jyasskin · 2020-08-07T17:55:26Z

bikeshed/update/manifest.py

+
+        # Create as many threads as we can.
+        for i in range(len(os.sched_getaffinity(0))):
+            t = threading.Thread(target=worker, daemon=True).start()


https://docs.aiohttp.org/en/stable/ suggests that we can avoid threads entirely by using a different fetching library.

Can you point to where in that page it mentions avoiding threads?
Also I'm hesitant to pull in an entire library just to make this parallel, but if @tabatkins is okay with it, I can do that!

These threads don't seem too complicated, so I'm fine with using them straight for now.

https://docs.aiohttp.org/en/stable/glossary.html#term-asyncio mentions that it's single-threaded, which reduces how many race conditions you need to handle.

jyasskin · 2020-08-08T02:38:48Z

bikeshed/update/manifest.py

+        # This worker will download from remote, and update success if it fails.
+        def worker():
+            # Needed to update status and progress.
+            nonlocal success


And the race on lastMsgTime will probably just result in writing the "Updated n/m" message twice. And, of course, the number of files the output claims to have already fetched can actually go down if HTTP responses come back out of order.

jyasskin · 2020-08-08T02:40:53Z

bikeshed/update/manifest.py

+
+        # Create as many threads as we can.
+        for i in range(len(os.sched_getaffinity(0))):
+            t = threading.Thread(target=worker, daemon=True).start()


https://docs.aiohttp.org/en/stable/glossary.html#term-asyncio mentions that it's single-threaded, which reduces how many race conditions you need to handle.

jyasskin · 2020-08-08T02:53:25Z

bikeshed/update/manifest.py

+            updateQueue.put((i, filePath))
+
+        # Create as many threads as we can.
+        for i in range(len(os.sched_getaffinity(0))):


Whether or not you use threads, the number of parallel HTTP fetches shouldn't be based on the number of usable CPUs. HTTP handling isn't CPU-bound, and Python threads run under a global lock that prevents them from using more than 1 net CPU anyway. (That's why multiprocessing exists, but again, because HTTP handling isn't CPU-bound, it won't help here.)

Whether or not you use threads, the number of parallel HTTP fetches shouldn't be based on the number of usable CPUs.

Sure, this is merely a heuristic. If I have 10 cores, I should be able to run up to 10 threads that do blocking work. It's a quick way of running blocking network calls in parallel.

Python threads run under a global lock that prevents them from using more than 1 net CPU anyway.

I don't know how the GIL interacts here, but from my testing, I see a speedup. If the GIL prevents these threads from being parallelized, why do we see a speedup?

HTTP handling isn't CPU-bound, it won't help here

Yea, it's a network call. But in this case the network call blocks, you can imagine it's a CPU busy wait (for network response).

Sorry, I didn't write that clearly. The GIL prevents multiple Python threads from using CPU concurrently, but it gets released while a Python thread is waiting on an OS function, which is what happens while one of these HTTP fetches waits for the response to come back. That's not a busy-wait.

So, even with just 1 core, you can efficiently run lots of threads here to get that many HTTP fetches to be sent in parallel. They can't actually process results in parallel, but that's fine because processing results takes much less wall-time than waiting for the server to respond.

tabatkins · 2020-08-10T03:46:23Z

Okay, I ended up rewriting this myself, relying on aiohttp (and aiofiles for the file creation). Thanks so much for the impetus to get this done, @ngzhian, and the pointer to aiohttp, @jyasskin!

On my laptop, updating went from ~3 files/second, to, uh, about 200 files/second. I can regen the entire set of files now (about 1700) in less than 10 seconds. This is astonishing, I had no idea I was blocking so long on network like that, phew.

Anyway, this was great! Yay!

ngzhian · 2020-08-10T15:59:42Z

Amazing, thanks Tab and Jeffrey!

ngzhian · 2020-08-10T16:59:35Z

Heads up, I see that some of our builds are hanging, see https://travis-ci.org/github/WebAssembly/simd/builds/716629866 for example. Could be related to this.

ngzhian · 2020-08-10T17:00:16Z

A local run of bikeshed update hangs at Updating 44 files...

ngzhian · 2020-08-10T21:37:08Z

Seems like f539c4a fixed it, we were pulling a broken alpha version (maybe)

jyasskin requested changes Aug 7, 2020

View reviewed changes

jyasskin reviewed Aug 8, 2020

View reviewed changes

tabatkins closed this in 615efd8 Aug 10, 2020

ngzhian deleted the threading-update branch August 10, 2020 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update spec data in parallel #1743

Update spec data in parallel #1743

ngzhian commented Aug 5, 2020

jyasskin Aug 7, 2020

ngzhian Aug 7, 2020

tabatkins Aug 8, 2020

jyasskin Aug 8, 2020

tabatkins Aug 8, 2020

jyasskin Aug 7, 2020

ngzhian Aug 7, 2020

tabatkins Aug 8, 2020

jyasskin Aug 8, 2020

jyasskin Aug 8, 2020

jyasskin Aug 8, 2020

jyasskin Aug 8, 2020

ngzhian Aug 8, 2020

jyasskin Aug 9, 2020

tabatkins commented Aug 10, 2020

ngzhian commented Aug 10, 2020

ngzhian commented Aug 10, 2020

ngzhian commented Aug 10, 2020

ngzhian commented Aug 10, 2020

Update spec data in parallel #1743

Update spec data in parallel #1743

Conversation

ngzhian commented Aug 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tabatkins commented Aug 10, 2020

ngzhian commented Aug 10, 2020

ngzhian commented Aug 10, 2020

ngzhian commented Aug 10, 2020

ngzhian commented Aug 10, 2020