<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/arch/x86, branch linux-5.9.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.9.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.9.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-12-21T12:28:21+00:00</updated>
<entry>
<title>x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled</title>
<updated>2020-12-21T12:28:21+00:00</updated>
<author>
<name>Xiaochen Shen</name>
<email>xiaochen.shen@intel.com</email>
</author>
<published>2020-12-04T06:27:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=17678a355a74b0c92ae71e18b7bcb961bcf2b127'/>
<id>urn:sha1:17678a355a74b0c92ae71e18b7bcb961bcf2b127</id>
<content type='text'>
commit 06c5fe9b12dde1b62821f302f177c972bb1c81f9 upstream

The MBA software controller (mba_sc) is a feedback loop which
periodically reads MBM counters and tries to restrict the bandwidth
below a user-specified value. It tags along the MBM counter overflow
handler to do the updates with 1s interval in mbm_update() and
update_mba_bw().

The purpose of mbm_update() is to periodically read the MBM counters to
make sure that the hardware counter doesn't wrap around more than once
between user samplings. mbm_update() calls __mon_event_count() for local
bandwidth updating when mba_sc is not enabled, but calls mbm_bw_count()
instead when mba_sc is enabled. __mon_event_count() will not be called
for local bandwidth updating in MBM counter overflow handler, but it is
still called when reading MBM local bandwidth counter file
'mbm_local_bytes', the call path is as below:

  rdtgroup_mondata_show()
    mon_event_read()
      mon_event_count()
        __mon_event_count()

In __mon_event_count(), m-&gt;chunks is updated by delta chunks which is
calculated from previous MSR value (m-&gt;prev_msr) and current MSR value.
When mba_sc is enabled, m-&gt;chunks is also updated in mbm_update() by
mistake by the delta chunks which is calculated from m-&gt;prev_bw_msr
instead of m-&gt;prev_msr. But m-&gt;chunks is not used in update_mba_bw() in
the mba_sc feedback loop.

When reading MBM local bandwidth counter file, m-&gt;chunks was changed
unexpectedly by mbm_bw_count(). As a result, the incorrect local
bandwidth counter which calculated from incorrect m-&gt;chunks is shown to
the user.

Fix this by removing incorrect m-&gt;chunks updating in mbm_bw_count() in
MBM counter overflow handler, and always calling __mon_event_count() in
mbm_update() to make sure that the hardware local bandwidth counter
doesn't wrap around.

Test steps:
  # Run workload with aggressive memory bandwidth (e.g., 10 GB/s)
  git clone https://github.com/intel/intel-cmt-cat &amp;&amp; cd intel-cmt-cat
  &amp;&amp; make
  ./tools/membw/membw -c 0 -b 10000 --read

  # Enable MBA software controller
  mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

  # Create control group c1
  mkdir /sys/fs/resctrl/c1

  # Set MB throttle to 6 GB/s
  echo "MB:0=6000;1=6000" &gt; /sys/fs/resctrl/c1/schemata

  # Write PID of the workload to tasks file
  echo `pidof membw` &gt; /sys/fs/resctrl/c1/tasks

  # Read local bytes counters twice with 1s interval, the calculated
  # local bandwidth is not as expected (approaching to 6 GB/s):
  local_1=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  sleep 1
  local_2=`cat /sys/fs/resctrl/c1/mon_data/mon_L3_00/mbm_local_bytes`
  echo "local b/w (bytes/s):" `expr $local_2 - $local_1`

Before fix:
  local b/w (bytes/s): 11076796416

After fix:
  local b/w (bytes/s): 5465014272

Fixes: ba0f26d8529c (x86/intel_rdt/mba_sc: Prepare for feedback loop)
Signed-off-by: Xiaochen Shen &lt;xiaochen.shen@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/1607063279-19437-1-git-send-email-xiaochen.shen@intel.com
Signed-off-by: Sudip Mukherjee &lt;sudipm.mukherjee@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>x86/resctrl: Remove unused struct mbm_state::chunks_bw</title>
<updated>2020-12-21T12:28:21+00:00</updated>
<author>
<name>James Morse</name>
<email>james.morse@arm.com</email>
</author>
<published>2020-07-08T16:39:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=329f550cb940fcba9dc28392824e1e771f5104eb'/>
<id>urn:sha1:329f550cb940fcba9dc28392824e1e771f5104eb</id>
<content type='text'>
commit abe8f12b44250d02937665033a8b750c1bfeb26e upstream

Nothing reads struct mbm_states's chunks_bw value, its a copy of
chunks. Remove it.

Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Link: https://lkml.kernel.org/r/20200708163929.2783-2-james.morse@arm.com
Signed-off-by: Sudip Mukherjee &lt;sudipm.mukherjee@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>KVM: mmu: Fix SPTE encoding of MMIO generation upper half</title>
<updated>2020-12-21T12:28:20+00:00</updated>
<author>
<name>Maciej S. Szmigiero</name>
<email>maciej.szmigiero@oracle.com</email>
</author>
<published>2020-12-05T00:48:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d717f6aa9d49585432e2a363d1c4127a9116457a'/>
<id>urn:sha1:d717f6aa9d49585432e2a363d1c4127a9116457a</id>
<content type='text'>
commit 34c0f6f2695a2db81e09a3ab7bdb2853f45d4d3d upstream.

Commit cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
cleaned up the computation of MMIO generation SPTE masks, however it
introduced a bug how the upper part was encoded:
SPTE bits 52-61 were supposed to contain bits 10-19 of the current
generation number, however a missing shift encoded bits 1-10 there instead
(mostly duplicating the lower part of the encoded generation number that
then consisted of bits 1-9).

In the meantime, the upper part was shrunk by one bit and moved by
subsequent commits to become an upper half of the encoded generation number
(bits 9-17 of bits 0-17 encoded in a SPTE).

In addition to the above, commit 56871d444bc4 ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation")
has changed the SPTE bit range assigned to encode the generation number and
the total number of bits encoded but did not update them in the comment
attached to their defines, nor in the KVM MMU doc.
Let's do it here, too, since it is too trivial thing to warrant a separate
commit.

Fixes: cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
Signed-off-by: Maciej S. Szmigiero &lt;maciej.szmigiero@oracle.com&gt;
Message-Id: &lt;156700708db2a5296c5ed7a8b9ac71f1e9765c85.1607129096.git.maciej.szmigiero@oracle.com&gt;
Cc: stable@vger.kernel.org
[Reorganize macros so that everything is computed from the bit ranges. - Paolo]
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>x86/kprobes: Fix optprobe to detect INT3 padding correctly</title>
<updated>2020-12-16T09:58:38+00:00</updated>
<author>
<name>Masami Hiramatsu</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2020-12-11T07:04:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d4f949439d2748209b004b4003e21285e580909d'/>
<id>urn:sha1:d4f949439d2748209b004b4003e21285e580909d</id>
<content type='text'>
commit 0d07c0ec4381f630c801539c79ad8dcc627f6e4a upstream.

Commit

  7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")

changed the padding bytes between functions from NOP to INT3. However,
when optprobe decodes a target function it finds INT3 and gives up the
jump optimization.

Instead of giving up any INT3 detection, check whether the rest of the
bytes to the end of the function are INT3. If all of them are INT3,
those come from the linker. In that case, continue the optprobe jump
optimization.

 [ bp: Massage commit message. ]

Fixes: 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
Reported-by: Adam Zabrocki &lt;pi3@pi3.com.pl&gt;
Signed-off-by: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160767025681.3880685.16021570341428835411.stgit@devnote2
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>x86/apic/vector: Fix ordering in vector assignment</title>
<updated>2020-12-16T09:58:38+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-12-10T20:18:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6c12baff9e1e6b60d693448a5111f2e66241982e'/>
<id>urn:sha1:6c12baff9e1e6b60d693448a5111f2e66241982e</id>
<content type='text'>
commit 190113b4c6531c8e09b31d5235f9b5175cbb0f72 upstream.

Prarit reported that depending on the affinity setting the

 ' irq $N: Affinity broken due to vector space exhaustion.'

message is showing up in dmesg, but the vector space on the CPUs in the
affinity mask is definitely not exhausted.

Shung-Hsi provided traces and analysis which pinpoints the problem:

The ordering of trying to assign an interrupt vector in
assign_irq_vector_any_locked() is simply wrong if the interrupt data has a
valid node assigned. It does:

 1) Try the intersection of affinity mask and node mask
 2) Try the node mask
 3) Try the full affinity mask
 4) Try the full online mask

Obviously #2 and #3 are in the wrong order as the requested affinity
mask has to take precedence.

In the observed cases #1 failed because the affinity mask did not contain
CPUs from node 0. That made it allocate a vector from node 0, thereby
breaking affinity and emitting the misleading message.

Revert the order of #2 and #3 so the full affinity mask without the node
intersection is tried before actually affinity is broken.

If no node is assigned then only the full affinity mask and if that fails
the full online mask is tried.

Fixes: d6ffc6ac83b1 ("x86/vector: Respect affinity mask in irq descriptor")
Reported-by: Prarit Bhargava &lt;prarit@redhat.com&gt;
Reported-by: Shung-Hsi Yu &lt;shung-hsi.yu@suse.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Shung-Hsi Yu &lt;shung-hsi.yu@suse.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87ft4djtyp.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>x86/membarrier: Get rid of a dubious optimization</title>
<updated>2020-12-16T09:58:38+00:00</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@kernel.org</email>
</author>
<published>2020-12-04T05:07:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f114de3c03bd1406c0c110a5dee589365dc88a68'/>
<id>urn:sha1:f114de3c03bd1406c0c110a5dee589365dc88a68</id>
<content type='text'>
commit a493d1ca1a03b532871f1da27f8dbda2b28b04c4 upstream.

sync_core_before_usermode() had an incorrect optimization.  If the kernel
returns from an interrupt, it can get to usermode without IRET. It just has
to schedule to a different task in the same mm and do SYSRET.  Fortunately,
there were no callers of sync_core_before_usermode() that could have had
in_irq() or in_nmi() equal to true, because it's only ever called from the
scheduler.

While at it, clarify a related comment.

Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
Signed-off-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/5afc7632be1422f91eaf7611aaaa1b5b8580a086.1607058304.git.luto@kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>x86/mm/mem_encrypt: Fix definition of PMD_FLAGS_DEC_WP</title>
<updated>2020-12-16T09:58:38+00:00</updated>
<author>
<name>Arvind Sankar</name>
<email>nivedita@alum.mit.edu</email>
</author>
<published>2020-11-11T16:09:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c31a444c48981106aab63ca4297d285e47d260cf'/>
<id>urn:sha1:c31a444c48981106aab63ca4297d285e47d260cf</id>
<content type='text'>
commit 29ac40cbed2bc06fa218ca25d7f5e280d3d08a25 upstream.

The PAT bit is in different locations for 4k and 2M/1G page table
entries.

Add a definition for _PAGE_LARGE_CACHE_MASK to represent the three
caching bits (PWT, PCD, PAT), similar to _PAGE_CACHE_MASK for 4k pages,
and use it in the definition of PMD_FLAGS_DEC_WP to get the correct PAT
index for write-protected pages.

Fixes: 6ebcb060713f ("x86/mm: Add support to encrypt the kernel in-place")
Signed-off-by: Arvind Sankar &lt;nivedita@alum.mit.edu&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Tested-by: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20201111160946.147341-1-nivedita@alum.mit.edu
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/idle: Fix arch_cpu_idle() vs tracing</title>
<updated>2020-12-16T09:58:32+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-11-20T10:50:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fb82d3ac7ec1ffb5f4758b025bfdcd1b82eeddd5'/>
<id>urn:sha1:fb82d3ac7ec1ffb5f4758b025bfdcd1b82eeddd5</id>
<content type='text'>
[ Upstream commit 58c644ba512cfbc2e39b758dd979edd1d6d00e27 ]

We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.

(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)

Reported-by: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Tested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS</title>
<updated>2020-12-16T09:58:29+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2020-11-26T11:09:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3a88ba2a67e47de7014199b8a4583e2fe45fff9c'/>
<id>urn:sha1:3a88ba2a67e47de7014199b8a4583e2fe45fff9c</id>
<content type='text'>
[ Upstream commit 5debf02131227d39988e44adf5090fb796fa8466 ]

The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
bit in NMI handler") introduced this.  It seems x86_pmu_stop can be
called recursively (like when it losts some samples) like below:

  x86_pmu_stop
    intel_pmu_disable_event  (x86_pmu_disable)
      intel_pmu_pebs_disable
        intel_pmu_drain_pebs_nhm  (x86_pmu_drain_pebs_buffer)
          x86_pmu_stop

While commit 35d1ce6bec13 ("perf/x86/intel/ds: Fix x86_pmu_stop
warning for large PEBS") fixed it for the normal cases, there's
another path to call x86_pmu_stop() recursively when a PEBS error was
detected (like two or more counters overflowed at the same time).

Like in the Kan's previous fix, we can skip the interrupt accounting
for large PEBS, so check the iregs which is set for PMI only.

Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Reported-by: John Sperbeck &lt;jsperbeck@google.com&gt;
Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20201126110922.317681-1-namhyung@kernel.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes</title>
<updated>2020-12-11T12:22:13+00:00</updated>
<author>
<name>Masami Hiramatsu</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2020-12-03T04:50:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e7a9b51b7e1fc8ca42856d1ecbba76165d29f55f'/>
<id>urn:sha1:e7a9b51b7e1fc8ca42856d1ecbba76165d29f55f</id>
<content type='text'>
commit 12cb908a11b2544b5f53e9af856e6b6a90ed5533 upstream

Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be

  insn.prefixes.bytes[i] != 0 and i &lt; 4

instead of using insn.prefixes.nbytes. Use the new
for_each_insn_prefix() macro which does it correctly.

Debugged by Kees Cook &lt;keescook@chromium.org&gt;.

 [ bp: Massage commit message. ]

Fixes: 32d0b95300db ("x86/insn-eval: Add utility functions to get segment selector")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697104969.3146288.16329307586428270032.stgit@devnote2
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee &lt;sudipm.mukherjee@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
