<feed xmlns='http://www.w3.org/2005/Atom'>
<title>BMC/Intel-BMC/linux.git/kernel/sched/core.c, branch dev-4.7</title>
<subtitle>Intel OpenBMC Linux kernel source tree (mirror)</subtitle>
<id>https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-4.7</id>
<link rel='self' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/atom?h=dev-4.7'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/'/>
<updated>2016-10-07T13:21:18+00:00</updated>
<entry>
<title>sched/cputime: Fix prev steal time accouting during CPU hotplug</title>
<updated>2016-10-07T13:21:18+00:00</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpeng.li@hotmail.com</email>
</author>
<published>2016-06-13T10:32:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=afca668faa80cbd97ca767d41c2845a175d931c2'/>
<id>urn:sha1:afca668faa80cbd97ca767d41c2845a175d931c2</id>
<content type='text'>
commit 3d89e5478bf550a50c99e93adf659369798263b0 upstream.

Commit:

  e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")

... set rq-&gt;prev_* to 0 after a CPU hotplug comes back, in order to
fix the case where (after CPU hotplug) steal time is smaller than
rq-&gt;prev_steal_time.

However, this should never happen. Steal time was only smaller because of the
KVM-specific bug fixed by the previous patch.  Worse, the previous patch
triggers a bug on CPU hot-unplug/plug operation: because
rq-&gt;prev_steal_time is cleared, all of the CPU's past steal time will be
accounted again on hot-plug.

Since the root cause has been fixed, we can just revert commit e9532e69b8d1.

Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/core: Fix a race between try_to_wake_up() and a woken up task</title>
<updated>2016-09-24T08:09:36+00:00</updated>
<author>
<name>Balbir Singh</name>
<email>bsingharora@gmail.com</email>
</author>
<published>2016-09-05T03:16:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=00f179bf9ca41d436c13246d449d3504512db3c3'/>
<id>urn:sha1:00f179bf9ca41d436c13246d449d3504512db3c3</id>
<content type='text'>
commit 135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf upstream.

The origin of the issue I've seen is related to
a missing memory barrier between check for task-&gt;state and
the check for task-&gt;on_rq.

The task being woken up is already awake from a schedule()
and is doing the following:

	do {
		schedule()
		set_current_state(TASK_(UN)INTERRUPTIBLE);
	} while (!cond);

The waker, actually gets stuck doing the following in
try_to_wake_up():

	while (p-&gt;on_cpu)
		cpu_relax();

Analysis:

The instance I've seen involves the following race:

 CPU1					CPU2

 while () {
   if (cond)
     break;
   do {
     schedule();
     set_current_state(TASK_UN..)
   } while (!cond);
					wakeup_routine()
					  spin_lock_irqsave(wait_lock)
   raw_spin_lock_irqsave(wait_lock)	  wake_up_process()
 }					  try_to_wake_up()
 set_current_state(TASK_RUNNING);	  ..
 list_del(&amp;waiter.list);

CPU2 wakes up CPU1, but before it can get the wait_lock and set
current state to TASK_RUNNING the following occurs:

 CPU3
 wakeup_routine()
 raw_spin_lock_irqsave(wait_lock)
 if (!list_empty)
   wake_up_process()
   try_to_wake_up()
   raw_spin_lock_irqsave(p-&gt;pi_lock)
   ..
   if (p-&gt;on_rq &amp;&amp; ttwu_wakeup())
   ..
   while (p-&gt;on_cpu)
     cpu_relax()
   ..

CPU3 tries to wake up the task on CPU1 again since it finds
it on the wait_queue, CPU1 is spinning on wait_lock, but immediately
after CPU2, CPU3 got it.

CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and
the task is spinning on the wait_lock. Interestingly since p-&gt;on_rq
is checked under pi_lock, I've noticed that try_to_wake_up() finds
p-&gt;on_rq to be 0. This was the most confusing bit of the analysis,
but p-&gt;on_rq is changed under runqueue lock, rq_lock, the p-&gt;on_rq
check is not reliable without this fix IMHO. The race is visible
(based on the analysis) only when ttwu_queue() does a remote wakeup
via ttwu_queue_remote. In which case the p-&gt;on_rq change is not
done uder the pi_lock.

The result is that after a while the entire system locks up on
the raw_spin_irqlock_save(wait_lock) and the holder spins infintely

Reproduction of the issue:

The issue can be reproduced after a long run on my system with 80
threads and having to tweak available memory to very low and running
memory stress-ng mmapfork test. It usually takes a long time to
reproduce. I am trying to work on a test case that can reproduce
the issue faster, but thats work in progress. I am still testing the
changes on my still in a loop and the tests seem OK thus far.

Big thanks to Benjamin and Nick for helping debug this as well.
Ben helped catch the missing barrier, Nick caught every missing
bit in my theory.

Signed-off-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
[ Updated comment to clarify matching barriers. Many
  architectures do not have a full barrier in switch_to()
  so that cannot be relied upon. ]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Nicholas Piggin &lt;nicholas.piggin@gmail.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/core: Correct off by one bug in load migration calculation</title>
<updated>2016-07-13T12:58:20+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-07-12T16:33:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=d60585c5766e9620d5d83e2b25dc042c7bdada2c'/>
<id>urn:sha1:d60585c5766e9620d5d83e2b25dc042c7bdada2c</id>
<content type='text'>
The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
account that the function is now called from a thread running on the outgoing
CPU. As a result a cpu unplug leakes a load of 1 into the global load
accounting mechanism.

Fix it by adjusting for the currently running thread which calls
calc_load_migrate().

Reported-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Vaidyanathan Srinivasan &lt;svaidy@linux.vnet.ibm.com&gt;
Cc: rt@linutronix.de
Cc: shreyas@linux.vnet.ibm.com
Fixes: e9cd8fa4fcfd: ("sched/migration: Move calc_load_migrate() into CPU_DYING")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607121744350.4083@nanos
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Allow kthreads to fall back to online &amp;&amp; !active cpus</title>
<updated>2016-06-24T06:26:53+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>htejun@gmail.com</email>
</author>
<published>2016-06-16T19:35:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=feb245e304f343cf5e4f9123db36354144dce8a4'/>
<id>urn:sha1:feb245e304f343cf5e4f9123db36354144dce8a4</id>
<content type='text'>
During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is
online but not active.  A CPU_ONLINE callback may create or bind a
kthread so that its cpus_allowed mask only allows the CPU which is
being brought online.  The kthread may start executing before the CPU
is made active and can end up in select_fallback_rq().

In such cases, the expected behavior is selecting the CPU which is
coming online; however, because select_fallback_rq() only chooses from
active CPUs, it determines that the task doesn't have any viable CPU
in its allowed mask and ends up overriding it to cpu_possible_mask.

CPU_ONLINE callbacks should be able to put kthreads on the CPU which
is coming online.  Update select_fallback_rq() so that it follows
cpu_online() rather than cpu_active() for kthreads.

Reported-by: Gautham R Shenoy &lt;ego@linux.vnet.ibm.com&gt;
Tested-by: Gautham R. Shenoy &lt;ego@linux.vnet.ibm.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Abdul Haleem &lt;abdhalee@linux.vnet.ibm.com&gt;
Cc: Aneesh Kumar &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: kernel-team@fb.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160616193504.GB3262@mtj.duckdns.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w</title>
<updated>2016-06-14T10:48:38+00:00</updated>
<author>
<name>Andrey Ryabinin</name>
<email>aryabinin@virtuozzo.com</email>
</author>
<published>2016-06-09T12:20:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=57675cb976eff977aefb428e68e4e0236d48a9ff'/>
<id>urn:sha1:57675cb976eff977aefb428e68e4e0236d48a9ff</id>
<content type='text'>
Lengthy output of sysrq-w may take a lot of time on slow serial console.

Currently we reset NMI-watchdog on the current CPU to avoid spurious
lockup messages. Sometimes this doesn't work since softlockup watchdog
might trigger on another CPU which is waiting for an IPI to proceed.
We reset softlockup watchdogs on all CPUs, but we do this only after
listing all tasks, and this may be too late on a busy system.

So, reset watchdogs CPUs earlier, in for_each_process_thread() loop.

Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/1465474805-14641-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix post_init_entity_util_avg() serialization</title>
<updated>2016-06-14T08:58:34+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2016-06-09T13:07:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=b7fa30c9cc48c4f55663420472505d3b4f6e1705'/>
<id>urn:sha1:b7fa30c9cc48c4f55663420472505d3b4f6e1705</id>
<content type='text'>
Chris Wilson reported a divide by 0 at:

 post_init_entity_util_avg():

 &gt;    725	if (cfs_rq-&gt;avg.util_avg != 0) {
 &gt;    726		sa-&gt;util_avg  = cfs_rq-&gt;avg.util_avg * se-&gt;load.weight;
 &gt; -&gt; 727		sa-&gt;util_avg /= (cfs_rq-&gt;avg.load_avg + 1);
 &gt;    728
 &gt;    729		if (sa-&gt;util_avg &gt; cap)
 &gt;    730			sa-&gt;util_avg = cap;
 &gt;    731	} else {

Which given the lack of serialization, and the code generated from
update_cfs_rq_load_avg() is entirely possible:

	if (atomic_long_read(&amp;cfs_rq-&gt;removed_load_avg)) {
		s64 r = atomic_long_xchg(&amp;cfs_rq-&gt;removed_load_avg, 0);
		sa-&gt;load_avg = max_t(long, sa-&gt;load_avg - r, 0);
		sa-&gt;load_sum = max_t(s64, sa-&gt;load_sum - r * LOAD_AVG_MAX, 0);
		removed_load = 1;
	}

turns into:

  ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
  ffffffff8108706b:       48 85 c0                test   %rax,%rax
  ffffffff8108706e:       74 40                   je     ffffffff810870b0
  ffffffff81087070:       4c 89 f8                mov    %r15,%rax
  ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
  ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
  ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
  ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
  ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
  ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx

Which you'll note ends up with 'sa-&gt;load_avg - r' in memory at
ffffffff8108707a.

By calling post_init_entity_util_avg() under rq-&gt;lock we're sure to be
fully serialized against PELT updates and cannot observe intermediate
state like this.

Reported-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Yuyang Du &lt;yuyang.du@intel.com&gt;
Cc: bsegall@google.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Fixes: 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a bounded value")
Link: http://lkml.kernel.org/r/20160609130750.GQ30909@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'stacking-fixes' (vfs stacking fixes from Jann)</title>
<updated>2016-06-10T19:10:02+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-06-10T19:10:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=f5364c150aa645b3d7daa21b5c0b9feaa1c9cd6d'/>
<id>urn:sha1:f5364c150aa645b3d7daa21b5c0b9feaa1c9cd6d</id>
<content type='text'>
Merge filesystem stacking fixes from Jann Horn.

* emailed patches from Jann Horn &lt;jannh@google.com&gt;:
  sched: panic on corrupted stack end
  ecryptfs: forbid opening files without mmap handler
  proc: prevent stacking filesystems on top
</content>
</entry>
<entry>
<title>sched: panic on corrupted stack end</title>
<updated>2016-06-10T19:09:43+00:00</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2016-06-01T09:55:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=29d6455178a09e1dc340380c582b13356227e8df'/>
<id>urn:sha1:29d6455178a09e1dc340380c582b13356227e8df</id>
<content type='text'>
Until now, hitting this BUG_ON caused a recursive oops (because oops
handling involves do_exit(), which calls into the scheduler, which in
turn raises an oops), which caused stuff below the stack to be
overwritten until a panic happened (e.g.  via an oops in interrupt
context, caused by the overwritten CPU index in the thread_info).

Just panic directly.

Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/debug: Fix 'schedstats=enable' cmdline option</title>
<updated>2016-06-08T12:33:05+00:00</updated>
<author>
<name>Josh Poimboeuf</name>
<email>jpoimboe@redhat.com</email>
</author>
<published>2016-06-07T19:43:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=4698f88c06b893f2acc0b443004a53bf490fde7c'/>
<id>urn:sha1:4698f88c06b893f2acc0b443004a53bf490fde7c</id>
<content type='text'>
The 'schedstats=enable' option doesn't work, and also produces the
following warning during boot:

  WARNING: CPU: 0 PID: 0 at /home/jpoimboe/git/linux/kernel/jump_label.c:61 static_key_slow_inc+0x8c/0xa0
  static_key_slow_inc used before call to jump_label_init
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc1+ #25
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   0000000000000086 3ae3475a4bea95d4 ffffffff81e03da8 ffffffff8143fc83
   ffffffff81e03df8 0000000000000000 ffffffff81e03de8 ffffffff810b1ffb
   0000003d00000096 ffffffff823514d0 ffff88007ff197c8 0000000000000000
  Call Trace:
   [&lt;ffffffff8143fc83&gt;] dump_stack+0x85/0xc2
   [&lt;ffffffff810b1ffb&gt;] __warn+0xcb/0xf0
   [&lt;ffffffff810b207f&gt;] warn_slowpath_fmt+0x5f/0x80
   [&lt;ffffffff811e9c0c&gt;] static_key_slow_inc+0x8c/0xa0
   [&lt;ffffffff810e07c6&gt;] static_key_enable+0x16/0x40
   [&lt;ffffffff8216d633&gt;] setup_schedstats+0x29/0x94
   [&lt;ffffffff82148a05&gt;] unknown_bootoption+0x89/0x191
   [&lt;ffffffff810d8617&gt;] parse_args+0x297/0x4b0
   [&lt;ffffffff82148d61&gt;] start_kernel+0x1d8/0x4a9
   [&lt;ffffffff8214897c&gt;] ? set_init_arg+0x55/0x55
   [&lt;ffffffff82148120&gt;] ? early_idt_handler_array+0x120/0x120
   [&lt;ffffffff821482db&gt;] x86_64_start_reservations+0x2f/0x31
   [&lt;ffffffff82148427&gt;] x86_64_start_kernel+0x14a/0x16d

The problem is that it tries to update the 'sched_schedstats' static key
before jump labels have been initialized.

Changing jump_label_init() to be called earlier before
parse_early_param() wouldn't fix it: it would still fail trying to
poke_text() because mm isn't yet initialized.

Instead, just create a temporary '__sched_schedstats' variable which can
be copied to the static key later during sched_init() after jump labels
have been initialized.

Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default")
Link: http://lkml.kernel.org/r/453775fe3433bed65731a583e228ccea806d18cd.1465322027.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Fix remote wakeups</title>
<updated>2016-05-25T06:35:18+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2016-05-23T09:19:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/BMC/Intel-BMC/linux.git/commit/?id=b7e7ade34e6188bee2e3b0d42b51d25137d9e2a5'/>
<id>urn:sha1:b7e7ade34e6188bee2e3b0d42b51d25137d9e2a5</id>
<content type='text'>
Commit:

  b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")

... introduced a bug: Mike Galbraith found that it introduced a
performance regression, while Paul E. McKenney reported lost
wakeups and bisected it to this commit.

The reason is that I mis-read ttwu_queue() such that I assumed any
wakeup that got a remote queue must have had the task migrated.

Since this is not so; we need to transfer this information between
queueing the wakeup and actually doing the wakeup. Use a new
task_struct::sched_flag for this, we already write to
sched_contributes_to_load in the wakeup path so this is a hot and
modified cacheline.

Reported-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reported-by: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Tested-by: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Hunter &lt;ahh@google.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Fleming &lt;matt@codeblueprint.co.uk&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Pavan Kondeti &lt;pkondeti@codeaurora.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Quentin Casasnovas &lt;quentin.casasnovas@oracle.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: byungchul.park@lge.com
Fixes: b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration")
Link: http://lkml.kernel.org/r/20160523091907.GD15728@worktop.ger.corp.intel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
