<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/asm-generic/pgtable.h, branch v3.16.61</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.16.61</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.16.61'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2018-10-03T03:10:05+00:00</updated>
<entry>
<title>x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures</title>
<updated>2018-10-03T03:10:05+00:00</updated>
<author>
<name>Jiri Kosina</name>
<email>jkosina@suse.cz</email>
</author>
<published>2018-07-14T19:56:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b3dc998f1ca71c91f0b0e077a360405f0550a511'/>
<id>urn:sha1:b3dc998f1ca71c91f0b0e077a360405f0550a511</id>
<content type='text'>
commit 6c26fcd2abfe0a56bbd95271fce02df2896cfd24 upstream.

pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the
!__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses
assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g.
ia64) and breaks build:

    include/asm-generic/pgtable.h: Assembler messages:
    include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)'
    include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true'
    include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)'
    include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false'
    arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle

Move those two static inlines into the !__ASSEMBLY__ section so that they
don't confuse the asm build pass.

Fixes: 42e4089c7890 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
[groeck: Context changes]
Signed-off-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings</title>
<updated>2018-10-03T03:10:02+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2018-06-13T22:48:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=74a430116636754bf42e7ab08fdd9629bd00ffc1'/>
<id>urn:sha1:74a430116636754bf42e7ab08fdd9629bd00ffc1</id>
<content type='text'>
commit 42e4089c7890725fcd329999252dc489b72f2921 upstream.

For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
table entry. This sets the high bits in the CPU's address space, thus
making sure to point to not point an unmapped entry to valid cached memory.

Some server system BIOSes put the MMIO mappings high up in the physical
address space. If such an high mapping was mapped to unprivileged users
they could attack low memory by setting such a mapping to PROT_NONE. This
could happen through a special device driver which is not access
protected. Normal /dev/mem is of course access protected.

To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.

Valid page mappings are allowed because the system is then unsafe anyways.

It's not expected that users commonly use PROT_NONE on MMIO. But to
minimize any impact this is only enforced if the mapping actually refers to
a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
the check for root.

For mmaps this is straight forward and can be handled in vm_insert_pfn and
in remap_pfn_range().

For mprotect it's a bit trickier. At the point where the actual PTEs are
accessed a lot of state has been changed and it would be difficult to undo
on an error. Since this is a uncommon case use a separate early page talk
walk pass for MMIO PROT_NONE mappings that checks for this condition
early. For non MMIO and non PROT_NONE there are no changes.

[dwmw2: Backport to 4.9]
[groeck: Backport to 4.4]

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Acked-by: Dave Hansen &lt;dave.hansen@intel.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>asm-generic: drop unused pte_file* helpers</title>
<updated>2018-10-03T03:09:52+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-02-10T22:10:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=53df494aaa7e2fef9d6f3b272d42e8272327f119'/>
<id>urn:sha1:53df494aaa7e2fef9d6f3b272d42e8272327f119</id>
<content type='text'>
commit 5064c8e19dc215afae8ffae95570e7f22062d49c upstream.

All users are gone.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels</title>
<updated>2014-06-04T23:53:55+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2014-06-04T23:06:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c46a7c817e662a820373bb76b88d0ad67d6abe5d'/>
<id>urn:sha1:c46a7c817e662a820373bb76b88d0ad67d6abe5d</id>
<content type='text'>
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
faults on x86.  Care is taken such that _PAGE_NUMA is used only in
situations where the VMA flags distinguish between NUMA hinting faults
and prot_none faults.  This decision was x86-specific and conceptually
it is difficult requiring special casing to distinguish between PROTNONE
and NUMA ptes based on context.

Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
between an entry that is really unmapped and a page that is protected
for NUMA hinting faults as if the PTE is not present then a fault will
be trapped.

Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset.
This patch shrinks the maximum possible swap size and uses the bit to
uniquely distinguish between NUMA hinting ptes and swap ptes.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Steven Noonan &lt;steven@uplinklabs.net&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use paravirt friendly ops for NUMA hinting ptes</title>
<updated>2014-04-18T23:40:09+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2014-04-18T22:07:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=29c7787075c92ca8af353acd5301481e6f37082f'/>
<id>urn:sha1:29c7787075c92ca8af353acd5301481e6f37082f</id>
<content type='text'>
David Vrabel identified a regression when using automatic NUMA balancing
under Xen whereby page table entries were getting corrupted due to the
use of native PTE operations.  Quoting him

	Xen PV guest page tables require that their entries use machine
	addresses if the preset bit (_PAGE_PRESENT) is set, and (for
	successful migration) non-present PTEs must use pseudo-physical
	addresses.  This is because on migration MFNs in present PTEs are
	translated to PFNs (canonicalised) so they may be translated back
	to the new MFN in the destination domain (uncanonicalised).

	pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
	set and clear the _PAGE_PRESENT bit using pte_set_flags(),
	pte_clear_flags(), etc.

	In a Xen PV guest, these functions must translate MFNs to PFNs
	when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
	_PAGE_PRESENT.

His suggested fix converted p[te|md]_[set|clear]_flags to using
paravirt-friendly ops but this is overkill.  He suggested an alternative
of using p[te|md]_modify in the NUMA page table operations but this is
does more work than necessary and would require looking up a VMA for
protections.

This patch modifies the NUMA page table operations to use paravirt
friendly operations to set/clear the flags of interest.  Unfortunately
this will take a performance hit when updating the PTEs on
CONFIG_PARAVIRT but I do not see a way around it that does not break
Xen.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: David Vrabel &lt;david.vrabel@citrix.com&gt;
Tested-by: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Steven Noonan &lt;steven@uplinklabs.net&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add support for discard of unused ptes</title>
<updated>2014-02-21T07:50:18+00:00</updated>
<author>
<name>Konstantin Weitz</name>
<email>konstantin.weitz@gmail.com</email>
</author>
<published>2013-04-17T11:59:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=45961722f8e30ceab9d135b1ddc0947d53aef7c3'/>
<id>urn:sha1:45961722f8e30ceab9d135b1ddc0947d53aef7c3</id>
<content type='text'>
In a virtualized environment and given an appropriate interface the guest
can mark pages as unused while they are free (for the s390 implementation
see git commit 45e576b1c3d00206 "guest page hinting light"). For the host
the unused state is a property of the pte.

This patch adds the primitive 'pte_unused' and code to the host swap out
handler so that pages marked as unused by all mappers are not swapped out
but discarded instead, thus saving one IO for swap out and potentially
another one for swap in.

[ Martin Schwidefsky: patch reordering and simplification ]

Signed-off-by: Konstantin Weitz &lt;konstantin.weitz@gmail.com&gt;
Signed-off-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
</content>
</entry>
<entry>
<title>mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit</title>
<updated>2014-02-17T00:19:36+00:00</updated>
<author>
<name>Aneesh Kumar K.V</name>
<email>aneesh.kumar@linux.vnet.ibm.com</email>
</author>
<published>2014-02-12T03:43:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=56eecdb912b536a4fa97fb5bfe5a940a54d79be6'/>
<id>urn:sha1:56eecdb912b536a4fa97fb5bfe5a940a54d79be6</id>
<content type='text'>
Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
a hash table MMU for various reasons (the flush is handled as part of
the PTE modification when necessary).

ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.

Additionally ppc64 require the tlb flushing to be batched within ptl locks.

The reason to do that is to ensure that the hash page table is in sync with
linux page table.

We track the hpte index in linux pte and if we clear them without flushing
hash and drop the ptl lock, we can have another cpu update the pte and can
end up with duplicate entry in the hash table, which is fatal.

We also want to keep set_pte_at simpler by not requiring them to do hash
flush for performance reason. We do that by assuming that set_pte_at() is
never *ever* called on a PTE that is already valid.

This was the case until the NUMA code went in which broke that assumption.

Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
using set_pte_at() and a powerpc specific one using the appropriate
mechanism needed to keep the hash table in sync.

Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Signed-off-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
</content>
</entry>
<entry>
<title>powerpc/thp: Fix crash on mremap</title>
<updated>2014-01-15T04:46:38+00:00</updated>
<author>
<name>Aneesh Kumar K.V</name>
<email>aneesh.kumar@linux.vnet.ibm.com</email>
</author>
<published>2014-01-13T06:04:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b3084f4db3aeb991c507ca774337c7e7893ed04f'/>
<id>urn:sha1:b3084f4db3aeb991c507ca774337c7e7893ed04f</id>
<content type='text'>
This patch fix the below crash

NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
LR [c0000000000439ac] .hash_page+0x18c/0x5e0
...
Call Trace:
[c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
[437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
[437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58

On ppc64 we use the pgtable for storing the hpte slot information and
store address to the pgtable at a constant offset (PTRS_PER_PMD) from
pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
from new pmd.

We also want to move the withdraw and deposit before the set_pmd so
that, when page fault find the pmd as trans huge we can be sure that
pgtable can be located at the offset.

Signed-off-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
</content>
</entry>
<entry>
<title>mm: Fix NULL pointer dereference in madvise(MADV_WILLNEED) support</title>
<updated>2013-12-20T20:17:03+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2013-12-20T13:10:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ee53664bda169f519ce3c6a22d378f0b946c8178'/>
<id>urn:sha1:ee53664bda169f519ce3c6a22d378f0b946c8178</id>
<content type='text'>
Sasha Levin found a NULL pointer dereference that is due to a missing
page table lock, which in turn is due to the pmd entry in question being
a transparent huge-table entry.

The code - introduced in commit 1998cc048901 ("mm: make
madvise(MADV_WILLNEED) support swap file prefetch") - correctly checks
for this situation using pmd_none_or_trans_huge_or_clear_bad(), but it
turns out that that function doesn't work correctly.

pmd_none_or_trans_huge_or_clear_bad() expected that pmd_bad() would
trigger if the transparent hugepage bit was set, but it doesn't do that
if pmd_numa() is also set. Note that the NUMA bit only gets set on real
NUMA machines, so people trying to reproduce this on most normal
development systems would never actually trigger this.

Fix it by removing the very subtle (and subtly incorrect) expectation,
and instead just checking pmd_trans_huge() explicitly.

Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Acked-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
[ Additionally remove the now stale test for pmd_trans_huge() inside the
  pmd_bad() case - Linus ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix TLB flush race between migration, and change_protection_range</title>
<updated>2013-12-19T03:04:51+00:00</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2013-12-19T01:08:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=20841405940e7be0617612d521e206e4b6b325db'/>
<id>urn:sha1:20841405940e7be0617612d521e206e4b6b325db</id>
<content type='text'>
There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A			CPU B			CPU C

						load TLB entry
make entry PTE/PMD_NUMA
			fault on entry
						read/write old page
			start migrating page
			change PTE/PMD to new page
						read/write old page [*]
flush TLB
						reload TLB from new entry
						read/write new page
						lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
