Skip to content

Commit 65d8fc7

Browse files
Mel GormanIngo Molnar
Mel Gorman
authored and
Ingo Molnar
committed
futex: Remove requirement for lock_page() in get_futex_key()
When dealing with key handling for shared futexes, we can drastically reduce the usage/need of the page lock. 1) For anonymous pages, the associated futex object is the mm_struct which does not require the page lock. 2) For inode based, keys, we can check under RCU read lock if the page mapping is still valid and take reference to the inode. This just leaves one rare race that requires the page lock in the slow path when examining the swapcache. Additionally realtime users currently have a problem with the page lock being contended for unbounded periods of time during futex operations. Task A get_futex_key() lock_page() ---> preempted Now any other task trying to lock that page will have to wait until task A gets scheduled back in, which is an unbound time. With this patch, we pretty much have a lockless futex_get_key(). Experiments show that this patch can boost/speedup the hashing of shared futexes with the perf futex benchmarks (which is good for measuring such change) by up to 45% when there are high (> 100) thread counts on a 60 core Westmere. Lower counts are pretty much in the noise range or less than 10%, but mid range can be seen at over 30% overall throughput (hash ops/sec). This makes anon-mem shared futexes much closer to its private counterpart. Signed-off-by: Mel Gorman <mgorman@suse.de> [ Ported on top of thp refcount rework, changelog, comments, fixes. ] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Mason <clm@fb.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1455045314-8305-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent 8ad7b37 commit 65d8fc7

File tree

1 file changed

+91
-8
lines changed

1 file changed

+91
-8
lines changed

kernel/futex.c

+91-8
Original file line numberDiff line numberDiff line change
@@ -520,7 +520,20 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
520520
else
521521
err = 0;
522522

523-
lock_page(page);
523+
/*
524+
* The treatment of mapping from this point on is critical. The page
525+
* lock protects many things but in this context the page lock
526+
* stabilizes mapping, prevents inode freeing in the shared
527+
* file-backed region case and guards against movement to swap cache.
528+
*
529+
* Strictly speaking the page lock is not needed in all cases being
530+
* considered here and page lock forces unnecessarily serialization
531+
* From this point on, mapping will be re-verified if necessary and
532+
* page lock will be acquired only if it is unavoidable
533+
*/
534+
page = compound_head(page);
535+
mapping = READ_ONCE(page->mapping);
536+
524537
/*
525538
* If page->mapping is NULL, then it cannot be a PageAnon
526539
* page; but it might be the ZERO_PAGE or in the gate area or
@@ -536,19 +549,31 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
536549
* shmem_writepage move it from filecache to swapcache beneath us:
537550
* an unlikely race, but we do need to retry for page->mapping.
538551
*/
539-
mapping = compound_head(page)->mapping;
540-
if (!mapping) {
541-
int shmem_swizzled = PageSwapCache(page);
552+
if (unlikely(!mapping)) {
553+
int shmem_swizzled;
554+
555+
/*
556+
* Page lock is required to identify which special case above
557+
* applies. If this is really a shmem page then the page lock
558+
* will prevent unexpected transitions.
559+
*/
560+
lock_page(page);
561+
shmem_swizzled = PageSwapCache(page) || page->mapping;
542562
unlock_page(page);
543563
put_page(page);
564+
544565
if (shmem_swizzled)
545566
goto again;
567+
546568
return -EFAULT;
547569
}
548570

549571
/*
550572
* Private mappings are handled in a simple way.
551573
*
574+
* If the futex key is stored on an anonymous page, then the associated
575+
* object is the mm which is implicitly pinned by the calling process.
576+
*
552577
* NOTE: When userspace waits on a MAP_SHARED mapping, even if
553578
* it's a read-only handle, it's expected that futexes attach to
554579
* the object not the particular process.
@@ -566,16 +591,74 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
566591
key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
567592
key->private.mm = mm;
568593
key->private.address = address;
594+
595+
get_futex_key_refs(key); /* implies smp_mb(); (B) */
596+
569597
} else {
598+
struct inode *inode;
599+
600+
/*
601+
* The associated futex object in this case is the inode and
602+
* the page->mapping must be traversed. Ordinarily this should
603+
* be stabilised under page lock but it's not strictly
604+
* necessary in this case as we just want to pin the inode, not
605+
* update the radix tree or anything like that.
606+
*
607+
* The RCU read lock is taken as the inode is finally freed
608+
* under RCU. If the mapping still matches expectations then the
609+
* mapping->host can be safely accessed as being a valid inode.
610+
*/
611+
rcu_read_lock();
612+
613+
if (READ_ONCE(page->mapping) != mapping) {
614+
rcu_read_unlock();
615+
put_page(page);
616+
617+
goto again;
618+
}
619+
620+
inode = READ_ONCE(mapping->host);
621+
if (!inode) {
622+
rcu_read_unlock();
623+
put_page(page);
624+
625+
goto again;
626+
}
627+
628+
/*
629+
* Take a reference unless it is about to be freed. Previously
630+
* this reference was taken by ihold under the page lock
631+
* pinning the inode in place so i_lock was unnecessary. The
632+
* only way for this check to fail is if the inode was
633+
* truncated in parallel so warn for now if this happens.
634+
*
635+
* We are not calling into get_futex_key_refs() in file-backed
636+
* cases, therefore a successful atomic_inc return below will
637+
* guarantee that get_futex_key() will still imply smp_mb(); (B).
638+
*/
639+
if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
640+
rcu_read_unlock();
641+
put_page(page);
642+
643+
goto again;
644+
}
645+
646+
/* Should be impossible but lets be paranoid for now */
647+
if (WARN_ON_ONCE(inode->i_mapping != mapping)) {
648+
err = -EFAULT;
649+
rcu_read_unlock();
650+
iput(inode);
651+
652+
goto out;
653+
}
654+
570655
key->both.offset |= FUT_OFF_INODE; /* inode-based key */
571-
key->shared.inode = mapping->host;
656+
key->shared.inode = inode;
572657
key->shared.pgoff = basepage_index(page);
658+
rcu_read_unlock();
573659
}
574660

575-
get_futex_key_refs(key); /* implies smp_mb(); (B) */
576-
577661
out:
578-
unlock_page(page);
579662
put_page(page);
580663
return err;
581664
}

0 commit comments

Comments
 (0)