summaryrefslogtreecommitdiff
path: root/include/linux
diff options
context:
space:
mode:
authorTim Chen <tim.c.chen@linux.intel.com>2017-08-25 19:13:54 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2017-09-14 19:56:17 +0300
commit2554db916586b228ce93e6f74a12fd7fe430a004 (patch)
treecebbd0d46f172da6a690ecc573069d3f21fdac29 /include/linux
parent46c1e79fee417f151547aa46fae04ab06cb666f4 (diff)
downloadlinux-2554db916586b228ce93e6f74a12fd7fe430a004.tar.xz
sched/wait: Break up long wake list walk
We encountered workloads that have very long wake up list on large systems. A waker takes a long time to traverse the entire wake list and execute all the wake functions. We saw page wait list that are up to 3700+ entries long in tests of large 4 and 8 socket systems. It took 0.8 sec to traverse such list during wake up. Any other CPU that contends for the list spin lock will spin for a long time. It is a result of the numa balancing migration of hot pages that are shared by many threads. Multiple CPUs waking are queued up behind the lock, and the last one queued has to wait until all CPUs did all the wakeups. The page wait list is traversed with interrupt disabled, which caused various problems. This was the original cause that triggered the NMI watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only extending the NMI watch dog timer there helped. This patch bookmarks the waker's scan position in wake list and break the wake up walk, to allow access to the list before the waker resume its walk down the rest of the wait list. It lowers the interrupt and rescheduling latency. This patch also provides a performance boost when combined with the next patch to break up page wakeup list walk. We saw 22% improvement in the will-it-scale file pread2 test on a Xeon Phi system running 256 threads. [ v2: Merged in Linus' changes to remove the bookmark_wake_function, and simply access to flags. ] Reported-by: Kan Liang <kan.liang@intel.com> Tested-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/wait.h1
1 files changed, 1 insertions, 0 deletions
diff --git a/include/linux/wait.h b/include/linux/wait.h
index dc19880c02f5..78401ef02d29 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -18,6 +18,7 @@ int default_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int
/* wait_queue_entry::flags */
#define WQ_FLAG_EXCLUSIVE 0x01
#define WQ_FLAG_WOKEN 0x02
+#define WQ_FLAG_BOOKMARK 0x04
/*
* A single wait-queue entry structure: