Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update spec data in parallel #1743

Closed
wants to merge 1 commit into from
Closed

Conversation

ngzhian
Copy link

@ngzhian ngzhian commented Aug 5, 2020

Using queue and threads. The bulk of the logic is from the queue example
given in https://docs.python.org/3/library/queue.html#queue.Queue.join.
Usage of os.sched_getaffinity is explained
in https://docs.python.org/3/library/os.html?highlight=os#os.cpu_count.

I manually verified the correctness of this by comparing
bikeshed/spec-data/ of the current update with the one generated with
this patch applied.

$ time bikeshed update # current
real 2m17.526s
user 0m21.730s
sys 0m1.682s

$ time bikeshed update # with this patch, ran on 28 cores
real 0m13.869s
user 0m32.152s
sys 0m3.807s

We can probably speed this up more, maybe multiprocessing? But looks
good enough for me for now.

Fixed #1741.

Using queue and threads. The bulk of the logic is from the queue example
given in https://docs.python.org/3/library/queue.html#queue.Queue.join.
Usage of os.sched_getaffinity is explained
in https://docs.python.org/3/library/os.html?highlight=os#os.cpu_count.

I manually verified the correctness of this by comparing
bikeshed/spec-data/ of the current update with the one generated with
this patch applied.

$ time bikeshed update # current
real    2m17.526s
user    0m21.730s
sys     0m1.682s

$ time bikeshed update # with this patch, ran on 28 cores
real    0m13.869s
user    0m32.152s
sys     0m3.807s

We can probably speed this up more, maybe multiprocessing? But looks
good enough for me for now.

Fixed speced#1741.
# This worker will download from remote, and update success if it fails.
def worker():
# Needed to update status and progress.
nonlocal success
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even Python has race conditions between threads, so results should be sent back using a queue also.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. There is only 1 result, success is initialized to True, and any threads that sees a failure will set it to False, and won't touch it otherwise.
Since we use a .join() below, any threads that have encountered a failure would have set success to False, and by the time we read success in the main thread, threads would have joined, so the value is consistent. Lmk if my understanding of this is wrong. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since the only effect of worker() with the sync world is (a) possibly setting success to False, and (b) modifying a unique file, I think we're safe here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the race on lastMsgTime will probably just result in writing the "Updated n/m" message twice. And, of course, the number of files the output claims to have already fetched can actually go down if HTTP responses come back out of order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yup the printing is indeed racy.


# Create as many threads as we can.
for i in range(len(os.sched_getaffinity(0))):
t = threading.Thread(target=worker, daemon=True).start()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.aiohttp.org/en/stable/ suggests that we can avoid threads entirely by using a different fetching library.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to where in that page it mentions avoiding threads?
Also I'm hesitant to pull in an entire library just to make this parallel, but if @tabatkins is okay with it, I can do that!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These threads don't seem too complicated, so I'm fine with using them straight for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.aiohttp.org/en/stable/glossary.html#term-asyncio mentions that it's single-threaded, which reduces how many race conditions you need to handle.

# This worker will download from remote, and update success if it fails.
def worker():
# Needed to update status and progress.
nonlocal success
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the race on lastMsgTime will probably just result in writing the "Updated n/m" message twice. And, of course, the number of files the output claims to have already fetched can actually go down if HTTP responses come back out of order.


# Create as many threads as we can.
for i in range(len(os.sched_getaffinity(0))):
t = threading.Thread(target=worker, daemon=True).start()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.aiohttp.org/en/stable/glossary.html#term-asyncio mentions that it's single-threaded, which reduces how many race conditions you need to handle.

updateQueue.put((i, filePath))

# Create as many threads as we can.
for i in range(len(os.sched_getaffinity(0))):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not you use threads, the number of parallel HTTP fetches shouldn't be based on the number of usable CPUs. HTTP handling isn't CPU-bound, and Python threads run under a global lock that prevents them from using more than 1 net CPU anyway. (That's why multiprocessing exists, but again, because HTTP handling isn't CPU-bound, it won't help here.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not you use threads, the number of parallel HTTP fetches shouldn't be based on the number of usable CPUs.

Sure, this is merely a heuristic. If I have 10 cores, I should be able to run up to 10 threads that do blocking work. It's a quick way of running blocking network calls in parallel.

Python threads run under a global lock that prevents them from using more than 1 net CPU anyway.

I don't know how the GIL interacts here, but from my testing, I see a speedup. If the GIL prevents these threads from being parallelized, why do we see a speedup?

HTTP handling isn't CPU-bound, it won't help here

Yea, it's a network call. But in this case the network call blocks, you can imagine it's a CPU busy wait (for network response).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't write that clearly. The GIL prevents multiple Python threads from using CPU concurrently, but it gets released while a Python thread is waiting on an OS function, which is what happens while one of these HTTP fetches waits for the response to come back. That's not a busy-wait.

So, even with just 1 core, you can efficiently run lots of threads here to get that many HTTP fetches to be sent in parallel. They can't actually process results in parallel, but that's fine because processing results takes much less wall-time than waiting for the server to respond.

@tabatkins tabatkins closed this in 615efd8 Aug 10, 2020
@tabatkins
Copy link
Collaborator

Okay, I ended up rewriting this myself, relying on aiohttp (and aiofiles for the file creation). Thanks so much for the impetus to get this done, @ngzhian, and the pointer to aiohttp, @jyasskin!

On my laptop, updating went from ~3 files/second, to, uh, about 200 files/second. I can regen the entire set of files now (about 1700) in less than 10 seconds. This is astonishing, I had no idea I was blocking so long on network like that, phew.

Anyway, this was great! Yay!

@ngzhian ngzhian deleted the threading-update branch August 10, 2020 15:59
@ngzhian
Copy link
Author

ngzhian commented Aug 10, 2020

Amazing, thanks Tab and Jeffrey!

@ngzhian
Copy link
Author

ngzhian commented Aug 10, 2020

Heads up, I see that some of our builds are hanging, see https://travis-ci.org/github/WebAssembly/simd/builds/716629866 for example. Could be related to this.

@ngzhian
Copy link
Author

ngzhian commented Aug 10, 2020

A local run of bikeshed update hangs at Updating 44 files...

@ngzhian
Copy link
Author

ngzhian commented Aug 10, 2020

Seems like f539c4a fixed it, we were pulling a broken alpha version (maybe)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize bikeshed update
3 participants