Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mempool fix #2741

Merged
merged 2 commits into from
Jan 16, 2019
Merged

Mempool fix #2741

merged 2 commits into from
Jan 16, 2019

Conversation

jenswi-linaro
Copy link
Contributor

No description provided.

@jforissier
Copy link
Contributor

This adds some more complexity :( Why not simply revert d4f909c? Do we have numbers showing the optimization is significant?

@jenswi-linaro
Copy link
Contributor Author

Here's my results on Hikey:
github/master (edbb89f):

4006  real      1m 41.11s
4007  real      1m 14.51s
4008  real      0m 0.13s
4009  real      1m 5.68s

Revert "mempool: optimize reference counting":

4006  real      3m 27.78s
4007  real      0m 50.03s
4008  real      0m 0.13s
4009  real      2m 24.07s

Using this script

for a in 4006 4007 4008 4009 ; do \
echo -n $a " " >> time.txt ;\
time -o time.txt.tmp xtest -l 15 $a || break ;\
grep real time.txt.tmp >> time.txt
done
cat time.txt

@jforissier
Copy link
Contributor

Thanks for the numbers. So there is quite a big difference indeed... How does the code in this PR compare?

Assuming the code proposed here is still much faster, that worries me about the performance of our mutexes. The original code, with a mutex and a condvar, is a typical pattern... Is there a way we could optimize the mutex common case (locking, unlocked state)? With an atomic_load() maybe?

@jenswi-linaro
Copy link
Contributor Author

"mempool: fix race in get_pool()"

4006  real      1m 37.51s
4007  real      0m 56.67s
4008  real      0m 0.09s
4009  real      1m 3.18s

Second run

4006  real      1m 37.61s
4007  real      0m 35.32s
4008  real      0m 0.13s
4009  real      1m 3.15s

So I'd say it's if anything slightly faster.

I think we should try to improve the fast path of the mutex. I'm not so sure we'll get numbers that can compete with these, but it will be useful anyway. I guess we could base the fast path on the atomic_cas_ushort() function.

@jenswi-linaro
Copy link
Contributor Author

What we need here is a recursive mutex.

@jforissier
Copy link
Contributor

jforissier commented Jan 16, 2019

OK, thanks for running the benchmark again. I agree that manipulating mutex::state with atomic_cas_ushort() sounds like a good idea for the fast path. But that would be for a future PR I suppose. I'm fine with this for the coming release. Could you perhaps add the performance results to the commit text for the record?

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>

* any value but our thread id.
*/
if (atomic_load_int(&pool->owner) == thread_get_id()) {
if (!refcount_inc(&pool->refc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, that this code is still prone to race. You have two atomic operations there, but you need only one.
Suppose that Thread 0 is at line 107 and Tread 1 is at L82. Thread 1 passes check and thinks that it owns the pool. But Thread 0 resets pool owner.

I can't see how this can be fixed with atomic variables. Basically, you need to execute operation "check owner and increase refcount" atomically. But you are are doing two atomic operations instead - "check owner" and "increase refcount".

I propose to drop atomics at all and introduce spinlock. With spinlock you really can operate atomically on owner and refcount.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the thread owning the pool may set it to free. The case you're describing is not supposed to happen, it's like unlocking a mutex with wrong thread. The code doesn't depend on refcount_inc() and refcount_dec() being atomic, it just happens to be a convenient interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you are right.

@jenswi-linaro
Copy link
Contributor Author

Updated commit message and added tag. Please wait with the merge a bit to see that we can agree with @lorc .

@jforissier
Copy link
Contributor

@jenswi-linaro of course.

@lorc
Copy link
Contributor

lorc commented Jan 16, 2019

Reviewed-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>

Adds atomic_load_int() and atomic_store_int().

Reviewed-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>
Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Fixes a race in get_pool() which could leave the pool with zero refences
but still owned by the last thread using the pool.

Some performance number on Hikey with default configuration:
github/master (edbb89f, before this commit):

4006  real      1m 41.11s
4007  real      1m 14.51s
4008  real      0m 0.13s
4009  real      1m 5.68s

Revert "mempool: optimize reference counting", before this commit:
4006  real      3m 27.78s
4007  real      0m 50.03s
4008  real      0m 0.13s
4009  real      2m 24.07s

With this commit, two runs:
4006  real      1m 37.51s
4007  real      0m 56.67s
4008  real      0m 0.09s
4009  real      1m 3.18s

4006  real      1m 37.61s
4007  real      0m 35.32s
4008  real      0m 0.13s
4009  real      1m 3.15s

Numbers are gathered with this script:
for a in 4006 4007 4008 4009 ; do \
echo -n $a " " >> time.txt ;\
time -o time.txt.tmp xtest -l 15 $a || break ;\
grep real time.txt.tmp >> time.txt
done
cat time.txt

Reviewed-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>
Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
@jenswi-linaro
Copy link
Contributor Author

Tag applie

@jforissier jforissier merged commit 60b3990 into OP-TEE:master Jan 16, 2019
@jenswi-linaro jenswi-linaro deleted the mempool_fix branch January 16, 2019 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants