summaryrefslogtreecommitdiff
path: root/arch/s390/include/asm/pgtable.h
AgeCommit message (Collapse)AuthorFilesLines
2018-10-03s390: drop pte_file()-related helpersKirill A. Shutemov1-25/+4
commit 6e76d4b20bf6b514408ab5bd07f4a76723259b64 upstream. We've replaced remap_file_pages(2) implementation with emulation. Nobody creates non-linear mapping anymore. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-07-18s390/mm: fix CMMA vs KSM vs othersChristian Borntraeger1-0/+2
commit a8f60d1fadf7b8b54449fcc9d6b15248917478ba upstream. On heavy paging with KSM I see guest data corruption. Turns out that KSM will add pages to its tree, where the mapping return true for pte_unused (or might become as such later). KSM will unmap such pages and reinstantiate with different attributes (e.g. write protected or special, e.g. in replace_page or write_protect_page)). This uncovered a bug in our pagetable handling: We must remove the unused flag as soon as an entry becomes present again. Signed-of-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-10-06KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flagsChristian Borntraeger1-0/+1
commit 1951497d90d6754201af3e65241a06f9ef6755cd upstream. commit 0944fe3f4a32 ("s390/mm: implement software referenced bits") triggered another paging/storage key corruption. There is an unhandled invalid->valid pte change where we have to set the real storage key from the pgste. When doing paging a guest page might be swapcache or swap and when faulted in it might be read-only and due to a parallel scan old. An do_wp_page will make it writeable and young. Due to software reference tracking this page was invalid and now becomes valid. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-06KVM: s390/mm: Fix storage key corruption during swappingChristian Borntraeger1-2/+3
commit 3e03d4c46daa849880837d802e41c14132a03ef9 upstream. Since 3.12 or more precisely commit 0944fe3f4a32 ("s390/mm: implement software referenced bits") guest storage keys get corrupted during paging. This commit added another valid->invalid translation for page tables - namely ptep_test_and_clear_young. We have to transfer the storage key into the pgste in that case. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-22KVM: s390/mm: new gmap_test_and_clear_dirty functionDominik Dingel1-0/+2
For live migration kvm needs to test and clear the dirty bit of guest pages. That for is ptep_test_and_clear_user_dirty, to be sure we are not racing with other code, we protect the pte. This needs to be done within the architecture memory management code. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22KVM: s390/mm: use software dirty bit detection for user dirty trackingMartin Schwidefsky1-79/+56
Switch the user dirty bit detection used for migration from the hardware provided host change-bit in the pgste to a fault based detection method. This reduced the dependency of the host from the storage key to a point where it becomes possible to enable the RCP bypass for KVM guests. The fault based dirty detection will only indicate changes caused by accesses via the guest address space. The hardware based method can detect all changes, even those caused by I/O or accesses via the kernel page table. The KVM/qemu code needs to take this into account. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22KVM: s390: Allow skeys to be enabled for the current processDominik Dingel1-0/+1
Introduce a new function s390_enable_skey(), which enables storage key handling via setting the use_skey flag in the mmu context. This function is only useful within the context of kvm. Note that enabling storage keys will cause a one-time hickup when walking the page table; however, it saves us special effort for cases like clear reset while making it possible for us to be architecture conform. s390_enable_skey() takes the page table lock to prevent reseting storage keys triggered from multiple vcpus. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22KVM: s390: Adding skey bit to mmu contextDominik Dingel1-14/+27
For lazy storage key handling, we need a mechanism to track if the process ever issued a storage key operation. This patch adds the basic infrastructure for making the storage key handling optional, but still leaves it enabled for now by default. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-08Merge branch 'for-linus' of ↵Linus Torvalds1-36/+92
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull second set of s390 patches from Martin Schwidefsky: "The second part of Heikos uaccess rework, the page table walker for uaccess is now a thing of the past (yay!) The code change to fix the theoretical TLB flush problem allows us to add a TLB flush optimization for zEC12, this machine has new instructions that allow to do CPU local TLB flushes for single pages and for all pages of a specific address space. Plus the usual bug fixing and some more cleanup" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/uaccess: rework uaccess code - fix locking issues s390/mm,tlb: optimize TLB flushing for zEC12 s390/mm,tlb: safeguard against speculative TLB creation s390/irq: Use defines for external interruption codes s390/irq: Add defines for external interruption codes s390/sclp: add timeout for queued requests kvm/s390: also set guest pages back to stable on kexec/kdump lcs: Add missing destroy_timer_on_stack() s390/tape: Add missing destroy_timer_on_stack() s390/tape: Use del_timer_sync() s390/3270: fix crash with multiple reset device requests s390/bitops,atomic: add missing memory barriers s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6
2014-04-03s390/mm,tlb: optimize TLB flushing for zEC12Martin Schwidefsky1-36/+92
The zEC12 machines introduced the local-clearing control for the IDTE and IPTE instruction. If the control is set only the TLB of the local CPU is cleared of entries, either all entries of a single address space for IDTE, or the entry for a single page-table entry for IPTE. Without the local-clearing control the TLB flush is broadcasted to all CPUs in the configuration, which is expensive. The reset of the bit mask of the CPUs that need flushing after a non-local IDTE is tricky. As TLB entries for an address space remain in the TLB even if the address space is detached a new bit field is required to keep track of attached CPUs vs. CPUs in the need of a flush. After a non-local flush with IDTE the bit-field of attached CPUs is copied to the bit-field of CPUs in need of a flush. The ordering of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is such that an underindication in mm_cpumask(mm) is prevented but an overindication in mm_cpumask(mm) is possible. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-03Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+2
Pull kvm updates from Paolo Bonzini: "PPC and ARM do not have much going on this time. Most of the cool stuff, instead, is in s390 and (after a few releases) x86. ARM has some caching fixes and PPC has transactional memory support in guests. MIPS has some fixes, with more probably coming in 3.16 as QEMU will soon get support for MIPS KVM. For x86 there are optimizations for debug registers, which trigger on some Windows games, and other important fixes for Windows guests. We now expose to the guest Broadwell instruction set extensions and also Intel MPX. There's also a fix/workaround for OS X guests, nested virtualization features (preemption timer), and a couple kvmclock refinements. For s390, the main news is asynchronous page faults, together with improvements to IRQs (floating irqs and adapter irqs) that speed up virtio devices" * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits) KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8 KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode KVM: PPC: Book3S HV: Return ENODEV error rather than EIO KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state KVM: PPC: Book3S HV: Add transactional memory support KVM: Specify byte order for KVM_EXIT_MMIO KVM: vmx: fix MPX detection KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write KVM: s390: clear local interrupts at cpu initial reset KVM: s390: Fix possible memory leak in SIGP functions KVM: s390: fix calculation of idle_mask array size KVM: s390: randomize sca address KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP KVM: Bump KVM_MAX_IRQ_ROUTES for s390 KVM: s390: irq routing for adapter interrupts. KVM: s390: adapter interrupt sources ...
2014-03-21s390/mm: remove unecessary parameter from pgste_ipte_notifyDominik Dingel1-8/+7
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-03-21s390/mm: remove unnecessary parameter from gmap_do_ipte_notifyDominik Dingel1-2/+2
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21s390/kvm: support collaborative memory managementKonstantin Weitz1-0/+26
This patch enables Collaborative Memory Management (CMM) for kvm on s390. CMM allows the guest to inform the host about page usage (see arch/s390/mm/cmm.c). The host uses this information to avoid swapping in unused pages in the page fault handler. Further, a CPU provided list of unused invalid pages is processed to reclaim swap space of not yet accessed unused pages. [ Martin Schwidefsky: patch reordering and cleanup ] Signed-off-by: Konstantin Weitz <konstantin.weitz@gmail.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entriesMartin Schwidefsky1-23/+37
Git commit 050eef364ad70059 "[S390] fix tlb flushing vs. concurrent /proc accesses" introduced the attach counter to avoid using the mm_users value to decide between IPTE for every PTE and lazy TLB flushing with IDTE. That fixed the problem with mm_users but it introduced another subtle race, fortunately one that is very hard to hit. The background is the requirement of the architecture that a valid PTE may not be changed while it can be used concurrently by another cpu. The decision between IPTE and lazy TLB flushing needs to be done while the PTE is still valid. Now if the virtual cpu is temporarily stopped after the decision to use lazy TLB flushing but before the invalid bit of the PTE has been set, another cpu can attach the mm, find that flush_mm is set, do the IDTE, return to userspace, and recreate a TLB that uses the PTE in question. When the first, stopped cpu continues it will change the PTE while it is attached on another cpu. The first cpu will do another IDTE shortly after the modification of the PTE which makes the race window quite short. To fix this race the CPU that wants to attach the address space of a user space thread needs to wait for the end of the PTE modification. The number of concurrent TLB flushers for an mm is tracked in the upper 16 bits of the attach_count and finish_arch_post_lock_switch is used to wait for the end of the flush operation if required. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-30KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest faultDominik Dingel1-0/+2
In the case of a fault, we will retry to exit sie64 but with gmap fault indication for this thread set. This makes it possible to handle async page faults. Based on a patch from Martin Schwidefsky. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2013-10-15s390/mm,kvm: fix software dirty bits vs. kvm for old machinesMartin Schwidefsky1-1/+3
For machines without enhanced supression on protection the software dirty bit code forces the pte dirty bit and clears the page protection bit in pgste_set_pte. This is done for all pte types, the check for present ptes is missing. As a result swap ptes and other not-present ptes can get corrupted. Add a check for the _PAGE_PRESENT bit to pgste_set_pte before modifying the pte value. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-05Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+11
Pull KVM updates from Gleb Natapov: "The highlights of the release are nested EPT and pv-ticketlocks support (hypervisor part, guest part, which is most of the code, goes through tip tree). Apart of that there are many fixes for all arches" Fix up semantic conflicts as discussed in the pull request thread.. * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (88 commits) ARM: KVM: Add newlines to panic strings ARM: KVM: Work around older compiler bug ARM: KVM: Simplify tracepoint text ARM: KVM: Fix kvm_set_pte assignment ARM: KVM: vgic: Bump VGIC_NR_IRQS to 256 ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs ARM: KVM: vgic: fix GICD_ICFGRn access ARM: KVM: vgic: simplify vgic_get_target_reg KVM: MMU: remove unused parameter KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX KVM: x86: update masterclock when kvmclock_offset is calculated (v2) KVM: PPC: Book3S: Fix compile error in XICS emulation KVM: PPC: Book3S PR: return appropriate error when allocation fails arch: powerpc: kvm: add signed type cast for comparation KVM: x86: add comments where MMIO does not return to the emulator KVM: vmx: count exits to userspace during invalid guest emulation KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp kvm: optimize away THP checks in kvm_is_mmio_pfn() ...
2013-08-29s390/mm: implement software referenced bitsMartin Schwidefsky1-148/+197
The last remaining use for the storage key of the s390 architecture is reference counting. The alternative is to make page table entries invalid while they are old. On access the fault handler marks the pte/pmd as young which makes the pte/pmd valid if the access rights allow read access. The pte/pmd invalidations required for software managed reference bits cost a bit of performance, on the other hand the RRBE/RRBM instructions to read and reset the referenced bits are quite expensive as well. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-28s390/pgtable: fix mprotect for single-threaded KVM guestsMartin Schwidefsky1-0/+1
For a single-threaded KVM guest ptep_modify_prot_start will not use IPTE, the invalid bit will therefore not be set. If DEBUG_VM is set pgste_set_key called by ptep_modify_prot_commit will complain about the missing invalid bit. ptep_modify_prot_start should set the invalid bit in all cases. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/mm: introduce ptep_flush_lazy helperMartin Schwidefsky1-16/+15
Isolate the logic of IDTE vs. IPTE flushing of ptes in two functions, ptep_flush_lazy and __tlb_flush_mm_lazy. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/pgtable: add pgste_get helperMartin Schwidefsky1-1/+10
ptep_modify_prot_start uses the pgste_set helper to store the pgste, while ptep_modify_prot_commit uses its own pointer magic to retrieve the value again. Add the pgste_get help function to keep things symmetrical and improve readability. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/pgtable: skip pgste updates on full flushMartin Schwidefsky1-4/+3
On process exit there is no more need for the pgste information. Skip the expensive storage key operations which should speed up termination of KVM processes. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/mm: cleanup page table definitionsMartin Schwidefsky1-156/+144
Improve the encoding of the different pte types and the naming of the page, segment table and region table bits. Due to the different pte encoding the hugetlbfs primitives need to be adapted as well. To improve compatability with common code make the huge ptes use the encoding of normal ptes. The conversion between the pte and pmd encoding for a huge pte is done with set_huge_pte_at and huge_ptep_get. Overall the code is now easier to understand. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-07-29KVM: s390: allow sie enablement for multi-threaded programsMartin Schwidefsky1-0/+11
Improve the code to upgrade the standard 2K page tables to 4K page tables with PGSTEs to allow the operation to happen when the program is already multi-threaded. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-04Merge branch 'next' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...
2013-07-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-43/+40
Pull KVM fixes from Paolo Bonzini: "On the x86 side, there are some optimizations and documentation updates. The big ARM/KVM change for 3.11, support for AArch64, will come through Catalin Marinas's tree. s390 and PPC have misc cleanups and bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits) KVM: PPC: Ignore PIR writes KVM: PPC: Book3S PR: Invalidate SLB entries properly KVM: PPC: Book3S PR: Allow guest to use 1TB segments KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry KVM: PPC: Book3S PR: Fix proto-VSID calculations KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL KVM: Fix RTC interrupt coalescing tracking kvm: Add a tracepoint write_tsc_offset KVM: MMU: Inform users of mmio generation wraparound KVM: MMU: document fast invalidate all mmio sptes KVM: MMU: document fast invalidate all pages KVM: MMU: document fast page fault KVM: MMU: document mmio page fault KVM: MMU: document write_flooding_count KVM: MMU: document clear_spte_count KVM: MMU: drop kvm_mmu_zap_mmio_sptes KVM: MMU: init kvm generation close to mmio wrap-around value KVM: MMU: add tracepoint for check_mmio_spte KVM: MMU: fast invalidate all mmio sptes ...
2013-06-29consolidate io_remap_pfn_range definitionsAl Viro1-3/+0
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-20mm/THP: add pmd args to pgtable deposit and withdraw APIsAneesh Kumar K.V1-2/+3
This will be later used by powerpc THP support. In powerpc we want to use pgtable for storing the hash index values. So instead of adding them to mm_context list, we would like to store them in the second half of pmd Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-05s390/pgtable: make pgste lock an explicit barrierChristian Borntraeger1-2/+3
Getting and Releasing the pgste lock has lock semantics. Make the code an explicit barrier. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-05s390/pgtable: Save pgste during modify_prot_start/commitChristian Borntraeger1-1/+10
In modify_prot_start we update the pgste value but never store it back into the original location. Lets save the calculated result, since modify_prot_commit will use the value of the pgste. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-05s390/pgtable: Fix guest overindication for change bitChristian Borntraeger1-7/+9
When doing the transition invalid->valid in the host page table for a guest, then the guest view of C/R is in the pgste. After validation the view is pgste OR real key. We must zero out the real key C/R to avoid guest over-indication for change (and reference). Touching the real key is ok also for the host: The change bit is tracked via write protection and the reference bit is also ok because set_pte_at was called and the page will be touched anyway soon. Furthermore architecture defines reference as "substantially accurate", over- and underindication are ok. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-05-28s390/pgtable: Fix check for pgste/storage key handlingChristian Borntraeger1-4/+11
pte_present might return true on PAGE_TYPE_NONE, even if the invalid bit is on. Modify the existing check of the pgste functions to avoid crashes. [ Martin Schwidefsky: added ptep_modify_prot_[start|commit] bits ] Reported-by: Martin Schwidefky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> CC: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-05-21s390/kvm: Kick guests out of sie if prefix page host pte is touchedChristian Borntraeger1-0/+1
The guest prefix pages must be mapped writeable all the time while SIE is running, otherwise the guest might see random behaviour. (pinned at the pte level) Turns out that mlocking is not enough, the page table entry (not the page) might change or become r/o. This patch uses the gmap notifiers to kick guest cpus out of SIE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21s390/kvm: rename RCP_xxx defines to PGSTE_xxxMartin Schwidefsky1-43/+39
The RCP byte is a part of the PGSTE value, the existing RCP_xxx names are inaccurate. As the defines describe bits and pieces of the PGSTE, the names should start with PGSTE_. The KVM_UR_BIT and KVM_UC_BIT are part of the PGSTE as well, give them better names as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21s390/pgtable: fix ipte notify bitChristian Borntraeger1-2/+2
Dont use the same bit as user referenced. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-17s390/pgtable: fix ipte notify bitChristian Borntraeger1-2/+2
Dont use the same bit as user referenced. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2013-05-03s390/mm: add pte invalidation notifier for kvmMartin Schwidefsky1-10/+56
Add a notifier for kvm to get control before a page table entry is invalidated. The notifier is only called for ptes of an address space with pgstes that have been explicitly marked to require notification. Kvm will use this to get control before prefix pages of virtual CPU are unmapped. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-30mm/hugetlb: add more arch-defined huge_pte functionsGerald Schaefer1-55/+40
Commit abf09bed3cce ("s390/mm: implement software dirty bits") introduced another difference in the pte layout vs. the pmd layout on s390, thoroughly breaking the s390 support for hugetlbfs. This requires replacing some more pte_xxx functions in mm/hugetlbfs.c with a huge_pte_xxx version. This patch introduces those huge_pte_xxx functions and their generic implementation in asm-generic/hugetlb.h, which will now be included on all architectures supporting hugetlbfs apart from s390. This change will be a no-op for those architectures. [akpm@linux-foundation.org: fix warning] Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts] Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29Merge branch 'for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 update from Martin Schwidefsky: "This is the first batch of s390 patches for the 3.10 merge window. Included are some performance enhancements: storage key initialization, zero page cache synonyms, system call micro optimization and the speedup patches for dasdfmt. Sebastian managed to get rid of the special casing for the console device in the cio layer. And the usual bunch of bug fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (59 commits) s390/pci: use pci_scan_root_bus s390/scm_blk: fix memleak in init function s390/scm_blk: allow more cluster size values s390/cio: fix irq statistics s390/memory hotplug: prevent offline of active memory increments s390: remove small stack config option s390: system call path micro optimization s390: lowcore stack pointer offsets s390/uapi: change struct statfs[64] member types to unsigned values s390/pci: return correct dma address for offset > PAGE_SIZE s390/ptrace: remove empty ifdefs s390/compat: remove ptrace compat definitions from uapi header file s390/compat: fix compile error for !COMPAT s390/compat: fix compat_sys_statfs() memory corruption s390/zcore: Fix HSA copy length for last block s390/mm,gmap: segment mapping race s390/mm,gmap: implement gmap_translate() s390/pci: remove disable_device implementation s390/pci: disable per default s390/pci: return error after failed pci ops ...
2013-04-23s390/mm,gmap: implement gmap_translate()Heiko Carstens1-0/+2
Implement gmap_translate() function which translates a guest absolute address to a user space process address without establishing the guest page table entries. This is useful for kvm guest address translations where no memory access is expected to happen soon (e.g. tprot exception handler). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-17s390: move dummy io_remap_pfn_range() to asm/pgtable.hLinus Torvalds1-0/+4
Commit b4cbb197c7e7 ("vm: add vm_iomap_memory() helper function") added a helper function wrapper around io_remap_pfn_range(), and every other architecture defined it in <asm/pgtable.h>. The s390 choice of <asm/io.h> may make sense, but is not very convenient for this case, and gratuitous differences like that cause unexpected errors like this: mm/memory.c: In function 'vm_iomap_memory': mm/memory.c:2439:2: error: implicit declaration of function 'io_remap_pfn_range' [-Werror=implicit-function-declaration] Glory be the kbuild test robot who noticed this, bisected it, and reported it to the guilty parties (ie me). Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-02s390/mm: provide emtpy check_pgt_cache() functionHeiko Carstens1-1/+2
All architectures need to provide a check_pgt_cache() function. The s390 one got lost somewhere. So reintroduce it to prevent future compile errors e.g. if Thomas Gleixner's idle loop rework patches get merged. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-02s390/uaccess: fix page table walkHeiko Carstens1-0/+1
When translating user space addresses to kernel addresses the follow_table() function had two bugs: - PROT_NONE mappings could be read accessed via the kernel mapping. That is e.g. putting a filename into a user page, then protecting the page with PROT_NONE and afterwards issuing the "open" syscall with a pointer to the filename would incorrectly succeed. - when walking the page tables it used the pgd/pud/pmd/pte primitives which with dynamic page tables give no indication which real level of page tables is being walked (region2, region3, segment or page table). So in case of an exception the translation exception code passed to __handle_fault() is not necessarily correct. This is not really an issue since __handle_fault() doesn't evaluate the code. Only in case of e.g. a SIGBUS this code gets passed to user space. If user space can do something sane with the value is a different question though. To fix these issues don't use any Linux primitives. Only walk the page tables like the hardware would do it, however we leave quite some checks away since we know that we only have full size page tables and each index is within bounds. In theory this should fix all issues... Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-02-28s390/page table dumper: add support for change-recording override bitHeiko Carstens1-0/+2
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-02-14s390/mm: implement software dirty bitsMartin Schwidefsky1-43/+88
The s390 architecture is unique in respect to dirty page detection, it uses the change bit in the per-page storage key to track page modifications. All other architectures track dirty bits by means of page table entries. This property of s390 has caused numerous problems in the past, e.g. see git commit ef5d437f71afdf4a "mm: fix XFS oops due to dirty pages without buffers on s390". To avoid future issues in regard to per-page dirty bits convert s390 to a fault based software dirty bit detection mechanism. All user page table entries which are marked as clean will be hardware read-only, even if the pte is supposed to be writable. A write by the user process will trigger a protection fault which will cause the user pte to be marked as dirty and the hardware read-only bit is removed. With this change the dirty bit in the storage key is irrelevant for Linux as a host, but the storage key is still required for KVM guests. The effect is that page_test_and_clear_dirty and the related code can be removed. The referenced bit in the storage key is still used by the page_test_and_clear_young primitive to provide page age information. For page cache pages of mappings with mapping_cap_account_dirty there will not be any change in behavior as the dirty bit tracking already uses read-only ptes to control the amount of dirty pages. Only for swap cache pages and pages of mappings without mapping_cap_account_dirty there can be additional protection faults. To avoid an excessive number of additional faults the mk_pte primitive checks for PageDirty if the pgprot value allows for writes and pre-dirties the pte. That avoids all additional faults for tmpfs and shmem pages until these pages are added to the swap cache. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-02-14s390/mm: provide PAGE_SHARED defineHeiko Carstens1-0/+1
Only needed to make some drivers compile... Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-01-22s390/thp: implement pmdp_set_wrprotect()Gerald Schaefer1-0/+12
On s390, an architecture-specific implementation of the function pmdp_set_wrprotect() is missing and the generic version is currently being used. The generic version does not flush the tlb as it would be needed on s390 when modifying an active pmd, which can lead to subtle tlb errors on s390 when using transparent hugepages. This patch adds an s390-specific implementation of pmdp_set_wrprotect() including the missing tlb flush. Cc: stable@vger.kernel.org Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-01-12s390/mm: fix pmd_pfn() for thpGerald Schaefer1-4/+1
The pfn calculation in pmd_pfn() is broken for thp, because it uses HPAGE_SHIFT instead of the normal PAGE_SHIFT. This is fixed by removing the distinction between thp and normal pmds in that function, and always using PAGE_SHIFT. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-12-14Merge branch 'for-linus' of ↵Linus Torvalds1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 update from Martin Schwidefsky: "Add support to generate code for the latest machine zEC12, MOD and XOR instruction support for the BPF jit compiler, the dasd safe offline feature and the big one: the s390 architecture gets PCI support!! Right before the world ends on the 21st ;-)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits) s390/qdio: rename the misleading PCI flag of qdio devices s390/pci: remove obsolete email addresses s390/pci: speed up __iowrite64_copy by using pci store block insn s390/pci: enable NEED_DMA_MAP_STATE s390/pci: no msleep in potential IRQ context s390/pci: fix potential NULL pointer dereference in dma_free_seg_table() s390/pci: use kmem_cache_zalloc instead of kmem_cache_alloc/memset s390/bpf,jit: add support for XOR instruction s390/bpf,jit: add support MOD instruction s390/cio: fix pgid reserved check vga: compile fix, disable vga for s390 s390/pci: add PCI Kconfig options s390/pci: s390 specific PCI sysfs attributes s390/pci: PCI hotplug support via SCLP s390/pci: CHSC PCI support for error and availability events s390/pci: DMA support s390/pci: PCI adapter interrupts for MSI/MSI-X s390/bitops: find leftmost bit instruction support s390/pci: CLP interface s390/pci: base support ...