summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-05-08futex: Implement lockless wakeupsDavidlohr Bueso1-16/+17
Given the overall futex architecture, any chance of reducing hb->lock contention is welcome. In this particular case, using wake-queues to enable lockless wakeups addresses very much real world performance concerns, even cases of soft-lockups in cases of large amounts of blocked tasks (which is not hard to find in large boxes, using but just a handful of futex). At the lowest level, this patch can reduce latency of a single thread attempting to acquire hb->lock in highly contended scenarios by a up to 2x. At lower counts of nr_wake there are no regressions, confirming, of course, that the wake_q handling overhead is practically non existent. For instance, while a fair amount of variation, the extended pef-bench wakeup benchmark shows for a 20 core machine the following avg per-thread time to wakeup its share of tasks: nr_thr ms-before ms-after 16 0.0590 0.0215 32 0.0396 0.0220 48 0.0417 0.0182 64 0.0536 0.0236 80 0.0414 0.0097 96 0.0672 0.0152 Naturally, this can cause spurious wakeups. However there is no core code that cannot handle them afaict, and furthermore tglx does have the point that other events can already trigger them anyway. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Mason <clm@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: George Spelvin <linux@horizon.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1430494072-30283-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched: Implement lockless wake-queuesPeter Zijlstra2-0/+92
This is useful for locking primitives that can effect multiple wakeups per operation and want to avoid lock internal lock contention by delaying the wakeups until we've released the lock internal locks. Alternatively it can be used to avoid issuing multiple wakeups, and thus save a few cycles, in packet processing. Queue all target tasks and wakeup once you've processed all packets. That way you avoid waking the target task multiple times if there were multiple packets for the same task. Properties of a wake_q are: - Lockless, as queue head must reside on the stack. - Being a queue, maintains wakeup order passed by the callers. This can be important for otherwise, in scenarios where highly contended locks could affect any reliance on lock fairness. - A queued task cannot be added again until it is woken up. This patch adds the needed infrastructure into the scheduler code and uses the new wake_list to delay the futex wakeups until after we've released the hash bucket locks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [tweaks, adjustments, comments, etc.] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Mason <clm@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: George Spelvin <linux@horizon.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1430494072-30283-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched, timer: Use the atomic task_cputime in thread_group_cputimerJason Low4-23/+19
Recent optimizations were made to thread_group_cputimer to improve its scalability by keeping track of cputime stats without a lock. However, the values were open coded to the structure, causing them to be at a different abstraction level from the regular task_cputime structure. Furthermore, any subsequent similar optimizations would not be able to share the new code, since they are specific to thread_group_cputimer. This patch adds the new task_cputime_atomic data structure (introduced in the previous patch in the series) to thread_group_cputimer for keeping track of the cputime atomically, which also helps generalize the code. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1430251224-5764-6-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched, timer: Provide an atomic 'struct task_cputime' data structureJason Low1-0/+17
This patch adds an atomic variant of the 'struct task_cputime' data structure, which can be used to store and update task_cputime statistics without needing to do locking. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1430251224-5764-5-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched, timer: Replace spinlocks with atomics in thread_group_cputimer(), to ↵Jason Low5-52/+62
improve scalability While running a database workload, we found a scalability issue with itimers. Much of the problem was caused by the thread_group_cputimer spinlock. Each time we account for group system/user time, we need to obtain a thread_group_cputimer's spinlock to update the timers. On larger systems (such as a 16 socket machine), this caused more than 30% of total time spent trying to obtain this kernel lock to update these group timer stats. This patch converts the timers to 64-bit atomic variables and use atomic add to update them without a lock. With this patch, the percent of total time spent updating thread group cputimer timers was reduced from 30% down to less than 1%. Note: On 32-bit systems using the generic 64-bit atomics, this causes sample_group_cputimer() to take locks 3 times instead of just 1 time. However, we tested this patch on a 32-bit system ARM system using the generic atomics and did not find the overhead to be much of an issue. An explanation for why this isn't an issue is that 32-bit systems usually have small numbers of CPUs, and cacheline contention from extra spinlocks called periodically is not really apparent on smaller systems. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1430251224-5764-4-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched/numa: Document usages of mm->numa_scan_seqJason Low1-0/+13
The p->mm->numa_scan_seq is accessed using READ_ONCE/WRITE_ONCE and modified without exclusive access. It is not clear why it is accessed this way. This patch provides some documentation on that. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Waiman Long <waiman.long@hp.com> Link: http://lkml.kernel.org/r/1430440094.2475.61.camel@j-VirtualBox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to ↵Jason Low12-26/+26
READ_ONCE()/WRITE_ONCE() ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use the new READ_ONCE() and WRITE_ONCE() APIs as appropriate. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Waiman Long <Waiman.Long@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1430251224-5764-2-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched/core: Remove unnecessary down/up conversionNicholas Mc Guire1-2/+2
'rt_period_us' is automatically type converted from u64 to long and then cast back to u64 - this down/up conversion is unnecessary and can be removed to improve readability. This will also help us not truncate 'rt_period_us' to 32 bits on 32-bit kernels, should we ever have so large values. (unlikely, not the least due to procfs.) Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430643116-24049-1-git-send-email-hofrat@osadl.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08signals, ptrace, sched: Fix a misaligned load inside ptrace_attach()Palmer Dabbelt1-1/+1
The misaligned load exception arises when running ptrace_attach() on the RISC-V (which hasn't been upstreamed yet). The problem is that wait_on_bit() takes a void* but then proceeds to call test_bit(), which takes a long*. This allows an int-aligned pointer to be passed to test_bit(), which promptly fails. This will manifest on any other asm-generic port where unaligned loads trap, where sizeof(long) > sizeof(int), and where task_struct.jobctl ends up not being long-aligned. This patch changes task_struct.jobctl to be a long, which ensures it has the correct alignment. Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bobby.prani@gmail.com Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: richard@nod.at Cc: vdavydov@parallels.com Link: http://lkml.kernel.org/r/1430453997-32459-2-git-send-email-palmer@dabbelt.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched/wait: Change wait_on_bit*() to take an unsigned long *, not a void *Palmer Dabbelt1-7/+10
The implementations of wait_on_bit*() will only work with long-aligned memory on systems that don't support misaligned loads and stores. This patch changes the function prototypes to ensure that the compiler will enforce alignment. Running make defconfig make KFLAGS="-Werror" seems to indicate that, as of c56fb6564dcd ("Fix a misaligned load inside ptrace_attach()"), there are now no users of non-long-aligned calls to wait_on_bit*(). I additionally tried a few "make randconfig" attempts, none of which failed to compile for this reason. Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bobby.prani@gmail.com Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: richard@nod.at Cc: vdavydov@parallels.com Link: http://lkml.kernel.org/r/1430453997-32459-3-git-send-email-palmer@dabbelt.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08signals, sched: Change all uses of JOBCTL_* from 'int' to 'long'Palmer Dabbelt2-12/+12
c56fb6564dcd ("Fix a misaligned load inside ptrace_attach()") makes jobctl an "unsigned long". It makes sense to have the masks applied to it match that type. This is currently just a cosmetic change, but it will prevent the mask from being unexpectedly truncated if we ever end up with masks with more bits. One instance of "signr" is an int, but I left this alone because the mask ensures that it will never overflow. Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bobby.prani@gmail.com Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: richard@nod.at Cc: vdavydov@parallels.com Link: http://lkml.kernel.org/r/1430453997-32459-4-git-send-email-palmer@dabbelt.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched: Move the loadavg code to a more obvious locationPeter Zijlstra6-219/+222
I could not find the loadavg code.. turns out it was hidden in a file called proc.c. It further got mingled up with the cruft per rq load indexes (which we really want to get rid of). Move the per rq load indexes into the fair.c load-balance code (that's the only thing that uses them) and rename proc.c to loadavg.c so we can find it again. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> [ Did minor cleanups to the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08Merge branch 'sched/urgent' into sched/core, before applying new patchesIngo Molnar3-36/+37
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08perf/x86/intel: Fix SLM cache event listKan Liang1-4/+3
iTLB-load-misses and LLC-load-misses count incorrectly on SLM. There is no ITLB.MISSES support on SLM. Event PAGE_WALKS.I_SIDE_WALK should be used to count iTLB-load-misses. This event counts when an instruction (I) page walk is completed or started. Since a page walk implies a TLB miss, the number of TLB misses can be counted by counting the number of pagewalks. DMND_DATA_RD counts both demand and DCU prefetch data reads. However, LLC-load-misses should only count demand reads. There is no way to not include prefetches with a single counter on SLM. So the LLC-load-misses support should be removed on SLM. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1429608881-5055-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08perf: Annotate inherited event ctx->mutex recursionPeter Zijlstra1-8/+33
While fuzzing Sasha tripped over another ctx->mutex recursion lockdep splat. Annotate this. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched/core: Remove __cpuinit section tag that crept back inPaul Gortmaker1-1/+1
We removed __cpuinit support (leaving no-op stubs) quite some time ago. However this one crept back in as of commit a803f0261bb2bb57aab ("sched: Initialize rq->age_stamp on processor start") Since we want to clobber the stubs too, get this removed now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Minyard <cminyard@mvista.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430174880-27958-2-git-send-email-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched/core: Fix regression in cpuset_cpu_inactive() for suspendOmar Sandoval1-16/+12
Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that caused a regression in setting a CPU inactive during suspend. I ran into this when a program was failing pthread_setaffinity_np() with EINVAL after a suspend+wake up. A simple reproducer: $ ./a.out sched_setaffinity: Success $ systemctl suspend $ ./a.out sched_setaffinity: Invalid argument ... where ./a.out is: #define _GNU_SOURCE #include <errno.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> int main(void) { long num_cores; cpu_set_t cpu_set; int ret; num_cores = sysconf(_SC_NPROCESSORS_ONLN); CPU_ZERO(&cpu_set); CPU_SET(num_cores - 1, &cpu_set); errno = 0; ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set); perror("sched_setaffinity"); return ret ? EXIT_FAILURE : EXIT_SUCCESS; } The mistake is that suspend is handled in the action == CPU_DOWN_PREPARE_FROZEN case of the switch statement in cpuset_cpu_inactive(). However, the commit in question masked out CPU_TASKS_FROZEN from the action, making this case dead. The fix is straightforward. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()") Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08sched: Handle priority boosted tasks proper in setscheduler()Thomas Gleixner3-20/+25
Ronny reported that the following scenario is not handled correctly: T1 (prio = 10) lock(rtmutex); T2 (prio = 20) lock(rtmutex) boost T1 T1 (prio = 20) sys_set_scheduler(prio = 30) T1 prio = 30 .... sys_set_scheduler(prio = 10) T1 prio = 30 The last step is wrong as T1 should now be back at prio 20. Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()") only handles the case where a boosted tasks tries to lower its priority. Fix it by taking the new effective priority into account for the decision whether a change of the priority is required. Reported-by: Ronny Meeus <ronny.meeus@gmail.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()") Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08Merge branch 'sched/urgent' into sched/coreIngo Molnar1-4/+0
So this isn't really a fix but a cleanup that can wait for v4.2. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08md/raid5: fix handling of degraded stripes in batches.NeilBrown1-14/+3
There is no need for special handling of stripe-batches when the array is degraded. There may be if there is a failure in the batch, but STRIPE_DEGRADED does not imply an error. So don't set STRIPE_BATCH_ERR in ops_run_io just because the array is degraded. This actually causes a bug: the STRIPE_DEGRADED flag gets cleared in check_break_stripe_batch_list() and so the bitmap bit gets cleared when it shouldn't. So in check_break_stripe_batch_list(), split the batch up completely - again STRIPE_DEGRADED isn't meaningful. Also don't set STRIPE_BATCH_ERR when there is a write error to a replacement device. This simply removes the replacement device and requires no extra handling. Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-08md/raid5: fix allocation of 'scribble' array.NeilBrown1-22/+43
As the new 'scribble' array is sized based on chunk size, we need to make sure the size matches the largest of 'old' and 'new' chunk sizes when the array is undergoing reshape. We also potentially need to resize it even when not resizing the stripe cache, as chunk size can change without changing number of devices. So move the 'resize' code into a separate function, and consider old and new sizes when allocating. Signed-off-by: NeilBrown <neilb@suse.de> Fixes: 46d5b785621a ("raid5: use flex_array for scribble data")
2015-05-08md/raid5: don't record new size if resize_stripes fails.NeilBrown1-1/+2
If any memory allocation in resize_stripes fails we will return -ENOMEM, but in some cases we update conf->pool_size anyway. This means that if we try again, the allocations will be assumed to be larger than they are, and badness results. So only update pool_size if there is no error. This bug was introduced in 2.6.17 and the patch is suitable for -stable. Fixes: ad01c9e3752f ("[PATCH] md: Allow stripes to be expanded in preparation for expanding an array") Cc: stable@vger.kernel.org (v2.6.17+) Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-08md/raid5: avoid reading parity blocks for full-stripe write to degraded arrayNeilBrown1-1/+3
When performing a reconstruct write, we need to read all blocks that are not being over-written .. except the parity (P and Q) blocks. The code currently reads these (as they are not being over-written!) unnecessarily. Signed-off-by: NeilBrown <neilb@suse.de> Fixes: ea664c8245f3 ("md/raid5: need_this_block: tidy/fix last condition.")
2015-05-08md/raid5: more incorrect BUG_ON in handle_stripe_fill.NeilBrown1-1/+1
It is not incorrect to call handle_stripe_fill() when a batch of full-stripe writes is active. It is, however, a BUG if fetch_block() then decides it needs to actually fetch anything. So move the 'BUG_ON' to where it belongs. Signed-off-by: NeilBrown <neilb@suse.de> Fixes: 59fc630b8b5f ("RAID5: batch adjacent full stripe write")
2015-05-08md/raid5: new alloc_stripe() to allocate an initialize a stripe.NeilBrown1-14/+18
The new batch_lock and batch_list fields are being initialized in grow_one_stripe() but not in resize_stripes(). This causes a crash on resize. So separate the core initialization into a new function and call it from both allocation sites. Signed-off-by: NeilBrown <neilb@suse.de> Fixes: 59fc630b8b5f ("RAID5: batch adjacent full stripe write")
2015-05-08md-raid0: conditional mddev->queue access to suit dm-raidHeinz Mauelshagen1-2/+3
This patch is a prerequisite for dm-raid "raid0" support to allow dm-raid to access the MD RAID0 personality doing unconditional accesses to mddev->queue, which is NULL in case of dm-raid stacked on top of MD. Most of the conditional mddev->queue accesses made it to upstream but this missing one, which prohibits md raid0 to set disk stack limits (being done in dm core in case of md underneath dm). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Tested-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-08mmc: dw_mmc: dw_mci_get_cd check MMC_CAP_NONREMOVABLEZhangfei Gao1-1/+2
When non-removable is used for emmc, MMC_CAP_NONREMOVABLE should also be checked, otherwise detection fail since present=0 Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-05-08mmc: dw_mmc: init desc in dw_mci_idmac_initZhangfei Gao1-1/+3
Set 0 to des1 in 32bit case. Otherwise the random value of des1 will be used in dw_mci_translate_sglist: IDMAC_SET_BUFFER1_SIZE(desc, length) Signed-off-by: Fei Wang <w.f@huawei.com> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2015-05-08staging: rtl8712: freeing an ERR_PTRDan Carpenter1-10/+7
If memdup_user() fails then "pparmbuf" is an error pointer and we can't pass it to kfree(). I changed the "goto _r871x_mp_ioctl_hdl_exit" to a direct return. I changed the earlier goto to a direct return as well for consistency and removed the "pparmbuf = NULL" initializer since it's no longer needed. Fixes: 45de432775d6 ('Staging: rtl8712: Use memdup_user() instead of copy_from_user()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-08staging: sm750: remove incorrect __exit annotationArnd Bergmann1-1/+1
The lynxfb_pci_remove function is used as the 'remove' callback of the driver, and must not be discarded: lynxfb_pci_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.exit.text' of drivers/built-in.o This removes the extraneous annotation. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-08Merge tag 'pm+acpi-4.1-rc3' of ↵Linus Torvalds6-7/+51
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These include three regression fixes (PCI resources management, ACPI/PNP device enumeration, ACPI SBS on MacBook) and two ACPI documentation fixes related to GPIO. Specifics: - Fix for a PCI resources management regression introduced during the 4.0 cycle and related to the handling of ACPI resources' Producer/Consumer flags that turn out to be useless (Jiang Liu) - Fix for a MacBook regression related to the Smart Battery Subsystem (SBS) driver causing various problems (stalls on boot, failure to detect or report battery) to happen and introduced during the 3.18 cycle (Chris Bainbridge) - Fix for an ACPI/PNP device enumeration regression introduced during the 3.16 cycle caused by failing to include two PNP device IDs into the list of IDs that PNP device objects need to be created for (Witold Szczeponik) - Fixes for two minor mistakes in the ACPI GPIO properties documentation (Antonio Ospite, Rafael J Wysocki)" * tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PNP: add two IDs to list for PNPACPI device enumeration ACPI / documentation: Fix ambiguity in the GPIO properties document ACPI / documentation: fix a sentence about GPIO resources ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus
2015-05-08iio: kfifo: Set update_needed to false only if a buffer was allocatedGabriele Mazzotta1-1/+2
Check whether the allocation of a new kfifo buffer failed or not before setting the update_needed flag to false. This will make iio_request_update_kfifo() try to allocate a new buffer the next time a buffer update is requested. Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2015-05-07Merge branches 'acpi-resources', 'acpi-battery', 'acpi-doc' and 'acpi-pnp'Rafael J. Wysocki230-1649/+2700
* acpi-resources: x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus * acpi-battery: ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook * acpi-doc: ACPI / documentation: Fix ambiguity in the GPIO properties document ACPI / documentation: fix a sentence about GPIO resources * acpi-pnp: ACPI / PNP: add two IDs to list for PNPACPI device enumeration
2015-05-07ACPI / init: Fix the ordering of acpi_reserve_resources()Rafael J. Wysocki1-4/+2
Since acpi_reserve_resources() is defined as a device_initcall(), there's no guarantee that it will be executed in the right order with respect to the rest of the ACPI initialization code. On some systems this leads to breakage if, for example, the address range that should be reserved for the ACPI fixed registers is given to the PCI host bridge instead if the race is won by the wrong code path. Fix this by turning acpi_reserve_resources() into a void function and calling it directly from within the ACPI initialization sequence. Reported-and-tested-by: George McCollister <george.mccollister@gmail.com> Link: http://marc.info/?t=143092384600002&r=1&w=2 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-07Merge tag 'for-f2fs-4.1-rc3' of ↵Linus Torvalds4-5/+12
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "Fix a performance regression and a bug" * tag 'for-f2fs-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: fix wrong error hanlder in f2fs_follow_link Revert "f2fs: enhance multi-threads performance"
2015-05-07ARM: multi_v7_defconfig: Select more FSL SoCsFabio Estevam1-0/+3
Select IMX50, IMX6SX and LS1021A SoC support. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-05-07MAINTAINERS: replace an AT91 maintainerNicolas Ferre2-2/+8
As some help is needed from an active maintainer, replace Andrew Victor by Alexandre Belloni in the ARM/Atmel MAINTAINERS' entry (aka AT91). Add an entry to the CREDITS file. Thanks Andrew for the great role you played during the early days of this product family. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Andrew Victor <linux@maxim.org.za> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-05-07drivers: CCI: fix used_mask init in validate_group()Mark Salter1-1/+1
Currently in validate_group(), there is a static initializer for fake_pmu.used_mask which is based on CPU_BITS_NONE but the used_mask array size is based on CCI_PMU_MAX_HW_EVENTS. CCI_PMU_MAX_HW_EVENTS is not based on NR_CPUS, so CPU_BITS_NONE is not correct and will cause a build failure if NR_CPUS is set high enough to make CPU_BITS_NONE larger than used_mask. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-05-07Merge tag 'stericsson-fixes-v4.1' of ↵Arnd Bergmann3-17/+28
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson into fixes Merge "Ux500 fixes" from Linus Walleij: This fixes an MMC/SD configuration issue present for some time in the Ux500 DT but triggered by proper error handling in v4.1-rc1. * tag 'stericsson-fixes-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson: ARM: ux500: Enable GPIO regulator for SD-card for snowball ARM: ux500: Enable GPIO regulator for SD-card for HREF boards ARM: ux500: Move GPIO regulator for SD-card into board DTSs
2015-05-07Merge tag 'renesas-fixes-for-v4.1' of ↵Arnd Bergmann1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes Merge "Renesas ARM Based SoC Fixes for v4.1" from Simon Horman: * Fix adv7511 IRQ sensing on koelsch board * tag 'renesas-fixes-for-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: koelsch: Fix adv7511 IRQ sensing
2015-05-07Merge tag 'omap-for-v4.1/fixes-rc1' of ↵Arnd Bergmann16-32/+81
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Merge "omap fixes against v4.1-rc1" from Tony Lindgren: Fixes for omaps, mostly a fix for power power consumption creeping up during idle, and two l3-noc device fixes: - Fix power consumption creeping up with I2C4 staying on - Fix n900 microphone bias voltages - Fix dra7 l3-noc for host clock - Fix omap5 l3-noc id address decoding The rest are all just minor dts fixes: - Fix changed EXTCON_USB_GPIO_USB in defconfig - Fix missing isp and iva #iommu-cells property - Various beagle x15 dts fixes for pre-production changes - Fix am437x-sk display dts entries * tag 'omap-for-v4.1/fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: bus: omap_l3_noc: Fix master id address decoding for OMAP5 bus: omap_l3_noc: Fix offset for DRA7 CLK1_HOST_CLK1_2 instance ARM: dts: dra7: Fix efuse register size for ABB ARM: dts: am57xx-beagle-x15: Switch GPIO fan number ARM: dts: am57xx-beagle-x15: Switch UART mux pins ARM: dts: am437x-sk: reduce col-scan-delay-us ARM: dts: am437x-sk: fix for new newhaven display module revision ARM: dts: am57xx-beagle-x15: Fix RTC aliases ARM: dts: am57xx-beagle-x15: Fix IRQ type for mcp7941x ARM: dts: omap3: Add #iommu-cells to isp and iva iommu ARM: omap2plus_defconfig: Enable EXTCON_USB_GPIO ARM: dts: OMAP3-N900: Add microphone bias voltages ARM: OMAP2+: Fix omap off idle power consumption creeping up
2015-05-07Merge tag 'arm-soc/for-4.1/maintainers' of ↵Arnd Bergmann1-2/+2
http://github.com/broadcom/stblinux into fixes Merge "MAINTAINERS update for Broadcom SoCs for 4.1 #2" from Florian Fainelli: This pull request contains 3 changes to the MAINTAINERS file for Broadcom SoCs: - add Ray and Scott for mach-bcm - remove Christian for mach-bcm - remove Marc for brcmstb * tag 'arm-soc/for-4.1/maintainers' of http://github.com/broadcom/stblinux: MAINTAINERS: Update brcmstb entry MAINTAINERS: Remove Christian Daudt for mach-bcm MAINTAINERS: Update mach-bcm maintainers list
2015-05-07Merge tag 'mvebu-fixes-4.1' of git://git.infradead.org/linux-mvebu into fixesArnd Bergmann1-0/+4
Pull "mvebu fix for 4.1" from Gregory CLEMENT: Disable the unused internal RTC in the dts of the OpenBlock AX3 * tag 'mvebu-fixes-4.1' of git://git.infradead.org/linux-mvebu: ARM: mvebu: armada-xp-openblocks-ax3-4: Disable internal RTC
2015-05-07Merge tag 'fixes-for-v4.1-rc2' of https://github.com/rjarzmik/linux into fixesArnd Bergmann7-172/+274
Merged "ARM: pxa: fixes for v4.1-rc2" from Robert Jarzmik: These fixes reenable the lubbock(pxa25x) and mainstone(pxa27x) platforms, which were broken since the gpio handling was converted to a driver, and the interrupt ordering broke the external interrupts of these systems. * tag 'fixes-for-v4.1-rc2' of https://github.com/rjarzmik/linux: ARM: pxa: lubbock: use new pxa_cplds driver ARM: pxa: mainstone: use new pxa_cplds driver ARM: pxa: pxa_cplds: add lubbock and mainstone IO
2015-05-07Merge tag 'imx-fixes-4.1' of ↵Arnd Bergmann7-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes Pull "The i.MX fixes for 4.1" from Shawn Guo: - A couple of imx23-olinuxino device tree fixes regarding to LED GPIO polarity and USB dr_mode setting - One i.MX28 device tree fix on AUART4 TX-DMA interrupt name - Add missing pwm-cells to PWM4 for i.MX25 device tree - Fix imx6q-phytec device tree to get correct USB VBUS control - Drop invalid pinctrl-assert-gpios property from imx6qdl-sabreauto device tree, which was sneaked in from vendor device tree - One fix on Wolfram's broken email address * tag 'imx-fixes-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx6qdl-sabreauto: remove pinctrl-assert-gpios ARM: dts: imx28: Fix AUART4 TX-DMA interrupt name ARM: dts: imx25: Add #pwm-cells to pwm4 ARM: dts: imx6: phyFLEX: USB VBUS control is active-high ARM: mach-imx: devices: platform-sdhci-esdhc-imx: fix broken email address ARM: dts: imx23-olinuxino: Fix dr_mode of usb0 ARM: dts: imx23-olinuxino: Fix polarity of LED GPIO
2015-05-07Merge tag 'v4.1-rockchip-socfixes1' of ↵Arnd Bergmann3-0/+60
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes Merge "ARM: rockchip: some soc-level fixes for 4.1" from Heiko Stübner: Two fixes from Chris Zhong, fixing some suspend oddities. And I've given up on the timer7 issue. While I initially thought devices would either have both the grave mmu issue requiring a uboot update and the timer7 issue or none, it looks like in all units in the field the mmu issue got fixed while the timer7 issue stayed on. So instead of making everybody wanting to use mainline jump through a hoop just make sure timer7 is on on boot before we init the arch-timer. * tag 'v4.1-rockchip-socfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: rockchip: make sure timer7 is enabled on rk3288 platforms ARM: rockchip: fix undefined instruction of reset_ctrl_regs ARM: rockchip: disable dapswjdp during suspend
2015-05-07Merge tag 'pinctrl-v4.1-3' of ↵Linus Torvalds7-13/+16
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Here is a smallish set of pin control fixes for the v4.1 cycle, collected the last two weeks: - fix a real nasty legacy bug that has screwed up the protection of adding pinctrl maps dynamically. Normally this didn't happen so much but Dough Anderson ran into it and fixed it, kudos! - minor driver fixes for Qualcomm spmi, mediatek and Marvell drivers" * tag 'pinctrl-v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: Don't just pretend to protect pinctrl_maps, do it for real pinctrl: mediatek: mtk-common: initialize unmask pinctrl: qcom-spmi-mpp: Fix input value report pinctrl: qcom-spmi: Fix pin direction configuration pinctrl: mvebu: Fix mapping of pin 63 (gpo -> gpio)
2015-05-07Merge tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfioLinus Torvalds2-4/+25
Pull vfio fixes from Alex Williamson: "Fix some undesirable behavior with the vfio device request interface: - increase verbosity of device request channel (Alex Williamson) - fix runaway interruptible timeout (Alex Williamson)" * tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfio: vfio: Fix runaway interruptible timeout vfio-pci: Log device requests more verbosely
2015-05-07drm/radeon: stop trying to suspend UVD sessionsChristian König2-21/+19
Saving the current UVD state on suspend and restoring it on resume just doesn't work reliable. Just close cleanup all sessions on suspend. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2015-05-07drm/radeon: more strictly validate the UVD codecChristian König1-2/+31
MPEG 2/4 are only supported since UVD3. Signed-off-by: Christian König <christian.koenig@amd.com> CC: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>