summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)AuthorFilesLines
2015-10-13s390/compat: correct uc_sigmask of the compat signal frameMartin Schwidefsky1-4/+26
commit 8d4bd0ed0439dfc780aab801a085961925ed6838 upstream. The uc_sigmask in the ucontext structure is an array of words to keep the 64 signal bits (or 1024 if you ask glibc but the kernel sigset_t only has 64 bits). For 64 bit the sigset_t contains a single 8 byte word, but for 31 bit there are two 4 byte words. The compat signal handler code uses a simple copy of the 64 bit sigset_t to the 31 bit compat_sigset_t. As s390 is a big-endian architecture this is incorrect, the two words in the 31 bit sigset_t array need to be swapped. Reported-by: Stefan Liebler <stli@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: - Introduce local compat_sigset_t in setup_frame32() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-08-12s390/process: fix sfpc inline assemblyHeiko Carstens1-1/+1
commit e47994dd44bcb4a77b4152bd0eada585934703c0 upstream. The sfpc inline assembly within execve_tail() may incorrectly set bits 28-31 of the sfpc instruction to a value which is not zero. These bits however are currently unused and therefore should be zero so we won't get surprised if these bits will be used in the future. Therefore remove the second operand from the inline assembly. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-08-12s390/sclp: clear upper register halves in _sclp_print_earlyMartin Schwidefsky1-0/+4
commit f9c87a6f46d508eae0d9ae640be98d50f237f827 upstream. If the kernel is compiled with gcc 5.1 and the XZ compression option the decompress_kernel function calls _sclp_print_early in 64-bit mode while the content of the upper register half of %r6 is non-zero. This causes a specification exception on the servc instruction in _sclp_servc. The _sclp_print_early function saves and restores the upper registers halves but it fails to clear them for the 31-bit code of the mini sclp driver. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-08-07crypto: s390/ghash - Fix incorrect ghash icv buffer handling.Harald Freudenberger1-12/+13
commit a1cae34e23b1293eccbcc8ee9b39298039c3952a upstream. Multitheaded tests showed that the icv buffer in the current ghash implementation is not handled correctly. A move of this working ghash buffer value to the descriptor context fixed this. Code is tested and verified with an multithreaded application via af_alg interface. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Gerald Schaefer <geraldsc@linux.vnet.ibm.com> Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-08-07s390/hibernate: fix save and restore of kernel text sectionHeiko Carstens1-0/+6
commit d74419495633493c9cd3f2bbeb7f3529d0edded6 upstream. Sebastian reported a crash caused by a jump label mismatch after resume. This happens because we do not save the kernel text section during suspend and therefore also do not restore it during resume, but use the kernel image that restores the old system. This means that after a suspend/resume cycle we lost all modifications done to the kernel text section. The reason for this is the pfn_is_nosave() function, which incorrectly returns that read-only pages don't need to be saved. This is incorrect since we mark the kernel text section read-only. We still need to make sure to not save and restore pages contained within NSS and DCSS segment. To fix this add an extra case for the kernel text section and only save those pages if they are not contained within an NSS segment. Fixes the following crash (and the above bugs as well): Jump label code mismatch at netif_receive_skb_internal+0x28/0xd0 Found: c0 04 00 00 00 00 Expected: c0 f4 00 00 00 11 New: c0 04 00 00 00 00 Kernel panic - not syncing: Corrupted kernel text CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0-01975-gb1b096e70f23 #4 Call Trace: [<0000000000113972>] show_stack+0x72/0xf0 [<000000000081f15e>] dump_stack+0x6e/0x90 [<000000000081c4e8>] panic+0x108/0x2b0 [<000000000081be64>] jump_label_bug.isra.2+0x104/0x108 [<0000000000112176>] __jump_label_transform+0x9e/0xd0 [<00000000001121e6>] __sm_arch_jump_label_transform+0x3e/0x50 [<00000000001d1136>] multi_cpu_stop+0x12e/0x170 [<00000000001d1472>] cpu_stopper_thread+0xb2/0x168 [<000000000015d2ac>] smpboot_thread_fn+0x134/0x1b0 [<0000000000158baa>] kthread+0x10a/0x110 [<0000000000824a86>] kernel_thread_starter+0x6/0xc Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: add necessary #include directives] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-08-07KVM: s390: Zero out current VMDB of STSI before including level3 data.Ekaterina Tumanova1-0/+1
commit b75f4c9afac2604feb971441116c07a24ecca1ec upstream. s390 documentation requires words 0 and 10-15 to be reserved and stored as zeros. As we fill out all other fields, we can memset the full structure. Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-10Revert "KVM: s390: flush CPU on load control"Ben Hutchings1-2/+0
This reverts commit 823f14022fd2335affc8889a9c7e1b60258883a3, which was commit 2dca485f8740208604543c3960be31a5dd3ea603 upstream. It depends on functionality that is not present in 3.2.y. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
2015-05-10KVM: s390: base hrtimer on a monotonic clockDavid Hildenbrand1-1/+1
commit 0ac96caf0f9381088c673a16d910b1d329670edf upstream. The hrtimer that handles the wait with enabled timer interrupts should not be disturbed by changes of the host time. This patch changes our hrtimer to be based on a monotonic clock. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20vm: add VM_FAULT_SIGSEGV handling supportLinus Torvalds1-0/+7
commit 33692f27597fcab536d7cbbcc8f52905133e4aa7 upstream. The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Adjust filenames, context - Drop arc, metag, nios2 and lustre changes - For sh, patch both 32-bit and 64-bit implementations to use goto bad_area - For s390, pass int_code and trans_exc_code as arguments to do_no_context() and do_sigsegv()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20crypto: prefix module autoloading with "crypto-"Kees Cook6-9/+9
commit 5d26a105b5a73e5635eae0629b42fa0a90e07b7b upstream. This prefixes all crypto module loading with "crypto-" so we never run the risk of exposing module auto-loading to userspace via a crypto API, as demonstrated by Mathias Krause: https://lkml.org/lkml/2013/3/4/70 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [bwh: Backported to 3.2: - Adjust filenames - Drop changes to algorithms and drivers we don't have - Add aliases to generic C implementations that didn't need them before] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20KVM: s390: flush CPU on load controlChristian Borntraeger1-0/+2
commit 2dca485f8740208604543c3960be31a5dd3ea603 upstream. some control register changes will flush some aspects of the CPU, e.g. POP explicitely mentions that for CR9-CR11 "TLBs may be cleared". Instead of trying to be clever and only flush on specific CRs, let play safe and flush on all lctl(g) as future machines might define new bits in CRs. Load control intercept should not happen that often. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> [bwh: Backported to 3.2: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-01-01s390,time: revert direct ktime path for s390 clockevent deviceMartin Schwidefsky1-15/+4
commit 8adbf78ec4839c1dc4ff20c9a1f332a7bc99e6e6 upstream. Git commit 4f37a68cdaf6dea833cfdded2a3e0c47c0f006da "s390: Use direct ktime path for s390 clockevent device" makes use of the CLOCK_EVT_FEAT_KTIME clockevent option to avoid the delta calculation with ktime_get() in clockevents_program_event and the get_tod_clock() in s390_next_event. This is based on the assumption that the difference between the internal ktime and the hardware clock is reflected in the wall_to_monotonic delta. But this is not true, the ntp corrections are applied via changes to the tk->mult multiplier and this is not reflected in wall_to_monotonic. In theory this could be solved by using the raw monotonic clock but it is simpler to switch back to the standard clock delta calculation. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: s/get_tod_clock()/get_clock()/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14KVM: s390: unintended fallthrough for external callChristian Borntraeger1-0/+1
commit f346026e55f1efd3949a67ddd1dcea7c1b9a615e upstream. We must not fallthrough if the conditions for external call are not met. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-11-05KVM: s390: Fix user triggerable bug in dead codeChristian Borntraeger1-10/+0
commit 614a80e474b227cace52fd6e3c790554db8a396e upstream. In the early days, we had some special handling for the KVM_EXIT_S390_SIEIC exit, but this was gone in 2009 with commit d7b0b5eb3000 (KVM: s390: Make psw available on all exits, not just a subset). Now this switch statement is just a sanity check for userspace not messing with the kvm_run structure. Unfortunately, this allows userspace to trigger a kernel BUG. Let's just remove this switch statement. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-09-14s390/locking: Reenable optimistic spinningChristian Borntraeger1-0/+1
commit 36e7fdaa1a04fcf65b864232e1af56a51c7814d6 upstream. commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc (locking/mutex: Disable optimistic spinning on some architectures) fenced spinning for architectures without proper cmpxchg. There is no need to disable mutex spinning on s390, though: The instructions CS,CSG and friends provide the proper guarantees. (We dont implement cmpxchg with locks). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-08-06s390/ptrace: fix PSW mask checkMartin Schwidefsky1-2/+7
commit dab6cf55f81a6e16b8147aed9a843e1691dcd318 upstream. The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect. For the default user_mode=home address space layout the psw_user_bits variable has the home space address-space-control bits set. But the PSW_MASK_USER contains PSW_MASK_ASC, the ptrace validity check for the PSW mask will therefore always fail. Fixes CVE-2014-3534 Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-07-11s390/lowcore: reserve 96 bytes for IRB in lowcoreChristian Borntraeger1-5/+6
commit 993072ee67aa179c48c85eb19869804e68887d86 upstream. The IRB might be 96 bytes if the extended-I/O-measurement facility is used. This feature is currently not used by Linux, but struct irb already has the emw defined. So let's make the irb in lowcore match the size of the internal data structure to be future proof. We also have to add a pad, to correctly align the paste. The bigger irb field also circumvents a bug in some QEMU versions that always write the emw field on test subchannel and therefore destroy the paste definitions of this CPU. Running under these QEMU version broke some timing functions in the VDSO and all users of these functions, e.g. some JREs. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> [bwh: Backported to 3.2: offsets of the affected fields in the 64-bit version of struct _lowcore are 128 bytes smaller] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-09s390: fix kernel crash due to linkage stack instructionsMartin Schwidefsky1-2/+5
commit 8d7f6690cedb83456edd41c9bd583783f0703bf0 upstream. The kernel currently crashes with a low-address-protection exception if a user space process executes an instruction that tries to use the linkage stack. Set the base-ASTE origin and the subspace-ASTE origin of the dispatchable-unit-control-table to point to a dummy ASTE. Set up control register 15 to point to an empty linkage stack with no room left. A user space process with a linkage stack instruction will still crash but with a different exception which is correctly translated to a segmentation fault instead of a kernel oops. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-02s390/dump: Fix dump memory detectionMichael Holzheu1-0/+10
commit d7736ff5be31edaa4fe5ab62810c64529a24b149 upstream. Dumps created by kdump or zfcpdump can contain invalid memory holes when dumping z/VM systems that have memory pressure. For example: # zgetdump -i /proc/vmcore. Memory map: 0000000000000000 - 0000000000bfffff (12 MB) 0000000000e00000 - 00000000014fffff (7 MB) 000000000bd00000 - 00000000f3bfffff (3711 MB) The memory detection function find_memory_chunks() issues tprot to find valid memory chunks. In case of CMM it can happen that pages are marked as unstable via set_page_unstable() in arch_free_page(). If z/VM has released that pages, tprot returns -EFAULT and indicates a memory hole. So fix this and switch off CMM in case of kdump or zfcpdump. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-02crypto: s390 - fix des and des3_ede ctr concurrency issueHarald Freudenberger1-21/+48
commit ee97dc7db4cbda33e4241c2d85b42d1835bc8a35 upstream. In s390 des and 3des ctr mode there is one preallocated page used to speed up the en/decryption. This page is not protected against concurrent usage and thus there is a potential of data corruption with multiple threads. The fix introduces locking/unlocking the ctr page and a slower fallback solution at concurrency situations. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-02crypto: s390 - fix des and des3_ede cbc concurrency issueHarald Freudenberger1-12/+14
commit adc3fcf1552b6e406d172fd9690bbd1395053d13 upstream. In s390 des and des3_ede cbc mode the iv value is not protected against concurrency access and modifications from another running en/decrypt operation which is using the very same tfm struct instance. This fix copies the iv to the local stack before the crypto operation and stores the value back when done. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-02crypto: s390 - fix concurrency issue in aes-ctr modeHarald Freudenberger1-19/+46
commit 0519e9ad89e5cd6e6b08398f57c6a71d9580564c upstream. The aes-ctr mode uses one preallocated page without any concurrency protection. When multiple threads run aes-ctr encryption or decryption this can lead to data corruption. The patch introduces locking for the page and a fallback solution with slower en/decryption performance in concurrency situations. Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-02s390/crypto: Don't panic after crypto instruction failuresJan Glauber4-21/+39
commit 36eb2caa7bace31b7868a57f77cb148e58d1c9f9 upstream. Remove the BUG_ON's that check for failure or incomplete results of the s390 hardware crypto instructions. Rather report the errors as -EIO to the crypto layer. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-01-03crypto: s390 - Fix aes-xts parameter corruptionGerald Schaefer1-14/+17
commit 9dda2769af4f3f3093434648c409bb351120d9e8 upstream. Some s390 crypto algorithms incorrectly use the crypto_tfm structure to store private data. As the tfm can be shared among multiple threads, this can result in data corruption. This patch fixes aes-xts by moving the xts and pcc parameter blocks from the tfm onto the stack (48 + 96 bytes). Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-01-03crypto: s390 - Fix aes-cbc IV corruptionHerbert Xu1-7/+12
commit f262f0f5cad0c9eca61d1d383e3b67b57dcbe5ea upstream. The cbc-aes-s390 algorithm incorrectly places the IV in the tfm data structure. As the tfm is shared between multiple threads, this introduces a possibility of data corruption. This patch fixes this by moving the parameter block containing the IV and key onto the stack (the block is 48 bytes long). The same bug exists elsewhere in the s390 crypto system and they will be fixed in subsequent patches. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-11-28compiler/gcc4: Add quirk for 'asm goto' miscompilation bugIngo Molnar1-1/+1
commit 3f0116c3238a96bc18ad4b4acefe4e7be32fa861 upstream. Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto' constructs, as outlined here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 Implement a workaround suggested by Jakub Jelinek. Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: - Drop inapplicable changes - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-09-10KVM: s390: move kvm_guest_enter,exit closer to sieDominik Dingel1-6/+11
commit 2b29a9fdcb92bfc6b6f4c412d71505869de61a56 upstream. Any uaccess between guest_enter and guest_exit could trigger a page fault, the page fault handler would handle it as a guest fault and translate a user address as guest address. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: adjust context and add the rc variable] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-05-13s390: move dummy io_remap_pfn_range() to asm/pgtable.hLinus Torvalds1-0/+4
commit 4f2e29031e6c67802e7370292dd050fd62f337ee upstream. Commit b4cbb197c7e7 ("vm: add vm_iomap_memory() helper function") added a helper function wrapper around io_remap_pfn_range(), and every other architecture defined it in <asm/pgtable.h>. The s390 choice of <asm/io.h> may make sense, but is not very convenient for this case, and gratuitous differences like that cause unexpected errors like this: mm/memory.c: In function 'vm_iomap_memory': mm/memory.c:2439:2: error: implicit declaration of function 'io_remap_pfn_range' [-Werror=implicit-function-declaration] Glory be the kbuild test robot who noticed this, bisected it, and reported it to the guilty parties (ie me). Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: the macro was not defined, so this is an addition and not a move] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-27signal: Define __ARCH_HAS_SA_RESTORER so we know whether to clear sa_restorerBen Hutchings1-0/+1
flush_signal_handlers() needs to know whether sigaction::sa_restorer is defined, not whether SA_RESTORER is defined. Define the __ARCH_HAS_SA_RESTORER macro to indicate this. Vaguely based on upstream commit 574c4866e33d 'consolidate kernel-side struct sigaction declarations'. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk>
2013-03-27s390/mm: fix flush_tlb_kernel_range()Heiko Carstens1-2/+0
commit f6a70a07079518280022286a1dceb797d12e1edf upstream. Our flush_tlb_kernel_range() implementation calls __tlb_flush_mm() with &init_mm as argument. __tlb_flush_mm() however will only flush tlbs for the passed in mm if its mm_cpumask is not empty. For the init_mm however its mm_cpumask has never any bits set. Which in turn means that our flush_tlb_kernel_range() implementation doesn't work at all. This can be easily verified with a vmalloc/vfree loop which allocates a page, writes to it and then frees the page again. A crash will follow almost instantly. To fix this remove the cpumask_empty() check in __tlb_flush_mm() since there shouldn't be too many mms with a zero mm_cpumask, besides the init_mm of course. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06s390/timer: avoid overflow when programming clock comparatorHeiko Carstens1-0/+3
commit d911e03d097bdc01363df5d81c43f69432eb785c upstream. Since ed4f209 "s390/time: fix sched_clock() overflow" a new helper function is used to avoid overflows when converting TOD format values to nanosecond values. The kvm interrupt code formerly however only worked by accident because of an overflow. It tried to program a timer that would expire in more than ~29 years. Because of the old TOD-to-nanoseconds overflow bug the real expiry value however was much smaller, but now it isn't anymore. This however triggers yet another bug in the function that programs the clock comparator s390_next_ktime(): if the absolute "expires" value is after 2042 this will result in an overflow and the programmed value is lower than the current TOD value which immediatly triggers a clock comparator (= timer) interrupt. Since the timer isn't expired it will be programmed immediately again and so on... the result is a dead system. To fix this simply program the maximum possible value if an overflow is detected. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-03-06s390/kvm: Fix store status for ACRS/FPRSChristian Borntraeger1-0/+8
commit 15bc8d8457875f495c59d933b05770ba88d1eacb upstream. On store status we need to copy the current state of registers into a save area. Currently we might save stale versions: The sie state descriptor doesnt have fields for guest ACRS,FPRS, those registers are simply stored in the host registers. The host program must copy these away if needed. We do that in vcpu_put/load. If we now do a store status in KVM code between vcpu_put/load, the saved values are not up-to-date. Lets collect the ACRS/FPRS before saving them. This also fixes some strange problems with hotplug and virtio-ccw, since the low level machine check handler (on hotplug a machine check will happen) will revalidate all registers with the content of the save area. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> [bwh: Backported to 3.2 as done in 3.0 by Jiri Slaby] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Jiri Slaby <jslaby@suse.cz>
2013-02-06s390/time: fix sched_clock() overflowHeiko Carstens3-2/+30
commit ed4f20943cd4c7b55105c04daedf8d63ab6d499c upstream. Converting a 64 Bit TOD format value to nanoseconds means that the value must be divided by 4.096. In order to achieve that we multiply with 125 and divide by 512. When used within sched_clock() this triggers an overflow after appr. 417 days. Resulting in a sched_clock() return value that is much smaller than previously and therefore may cause all sort of weird things in subsystems that rely on a monotonic sched_clock() behaviour. To fix this implement a tod_to_ns() helper function which converts TOD values without overflow and call this function from both places that open coded the conversion: sched_clock() and kvm_s390_handle_wait(). Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03s390/kvm: dont announce RRBM supportChristian Borntraeger1-1/+1
commit 87cac8f879a5ecd7109dbe688087e8810b3364eb upstream. Newer kernels (linux-next with the transparent huge page patches) use rrbm if the feature is announced via feature bit 66. RRBM will cause intercepts, so KVM does not handle it right now, causing an illegal instruction in the guest. The easy solution is to disable the feature bit for the guest. This fixes bugs like: Kernel BUG at 0000000000124c2a [verbose debug info unavailable] illegal operation: 0001 [#1] SMP Modules linked in: virtio_balloon virtio_net ipv6 autofs4 CPU: 0 Not tainted 3.5.4 #1 Process fmempig (pid: 659, task: 000000007b712fd0, ksp: 000000007bed3670) Krnl PSW : 0704d00180000000 0000000000124c2a (pmdp_clear_flush_young+0x5e/0x80) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 00000000003cc000 0000000000000004 0000000000000000 0000000079800000 0000000000040000 0000000000000000 000000007bed3918 000000007cf40000 0000000000000001 000003fff7f00000 000003d281a94000 000000007bed383c 000000007bed3918 00000000005ecbf8 00000000002314a6 000000007bed36e0 Krnl Code:>0000000000124c2a: b9810025 ogr %r2,%r5 0000000000124c2e: 41343000 la %r3,0(%r4,%r3) 0000000000124c32: a716fffa brct %r1,124c26 0000000000124c36: b9010022 lngr %r2,%r2 0000000000124c3a: e3d0f0800004 lg %r13,128(%r15) 0000000000124c40: eb22003f000c srlg %r2,%r2,63 [ 2150.713198] Call Trace: [ 2150.713223] ([<00000000002312c4>] page_referenced_one+0x6c/0x27c) [ 2150.713749] [<0000000000233812>] page_referenced+0x32a/0x410 [...] CC: Alex Graf <agraf@suse.de> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06s390/gup: add missing TASK_SIZE check to get_user_pages_fast()Heiko Carstens1-1/+1
commit d55c4c613fc4d4ad2ba0fc6fa2b57176d420f7e4 upstream. When walking page tables we need to make sure that everything is within bounds of the ASCE limit of the task's address space. Otherwise we might calculate e.g. a pud pointer which is not within a pud and dereference it. So check against TASK_SIZE (which is the ASCE limit) before walking page tables. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06s390/signal: set correct address space controlMartin Schwidefsky4-7/+27
commit fa968ee215c0ca91e4a9c3a69ac2405aae6e5d2f upstream. If user space is running in primary mode it can switch to secondary or access register mode, this is used e.g. in the clock_gettime code of the vdso. If a signal is delivered to the user space process while it has been running in access register mode the signal handler is executed in access register mode as well which will result in a crash most of the time. Set the address space control bits in the PSW to the default for the execution of the signal handler and make sure that the previous address space control is restored on signal return. Take care that user space can not switch to the kernel address space by modifying the registers in the signal frame. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: - Adjust filename - The RI bit is not included in PSW_MASK_USER] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-31s390: fix linker script for 31 bit buildsHeiko Carstens2-2/+2
commit c985cb37f1b39c2c8035af741a2a0b79f1fbaca7 upstream. Because of a change in the s390 arch backend of binutils (commit 23ecd77 "Pick the default arch depending on the target size" in binutils repo) 31 bit builds will fail since the linker would now try to create 64 bit binary output. Fix this by setting OUTPUT_ARCH to s390:31-bit instead of s390. Thanks to Andreas Krebbel for figuring out the issue. Fixes this build error: LD init/built-in.o s390x-4.7.2-ld: s390:31-bit architecture of input file `arch/s390/kernel/head.o' is incompatible with s390:64-bit output Cc: Andreas Krebbel <Andreas.Krebbel@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-19s390/compat: fix mmap compat system callsHeiko Carstens1-2/+0
commit e85871218513c54f7dfdb6009043cb638f2fecbe upstream. The native 31 bit and the compat behaviour for the mmap system calls differ: In native 31 bit mode the passed in address for the mmap system call will be unmodified passed to sys_mmap_pgoff(). In compat mode however the passed in address will be modified with compat_ptr() which masks out the most significant bit. The result is that in native 31 bit mode each mmap request (with MAP_FIXED) will fail where the most significat bit is set, while in compat mode it may succeed. This odd behaviour was introduced with d3815898 "[S390] mmap: add missing compat_ptr conversion to both mmap compat syscalls". To restore a consistent behaviour accross native and compat mode this patch functionally reverts the above mentioned commit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-19s390/compat: fix compat wrappers for process_vm system callsHeiko Carstens1-2/+2
commit 82aabdb6f1eb61e0034ec23901480f5dd23db7c4 upstream. The compat wrappers incorrectly called the non compat versions of the system process_vm system calls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-10s390/mm: fix fault handling for page table walk caseHeiko Carstens1-6/+7
commit 008c2e8f247f0a8db1e8e26139da12f3a3abcda0 upstream. Make sure the kernel does not incorrectly create a SIGBUS signal during user space accesses: For user space accesses in the switched addressing mode case the kernel may walk page tables and access user address space via the kernel mapping. If a page table entry is invalid the function __handle_fault() gets called in order to emulate a page fault and trigger all the usual actions like paging in a missing page etc. by calling handle_mm_fault(). If handle_mm_fault() returns with an error fixup handling is necessary. For the switched addressing mode case all errors need to be mapped to -EFAULT, so that the calling uaccess function can return -EFAULT to user space. Unfortunately the __handle_fault() incorrectly calls do_sigbus() if VM_FAULT_SIGBUS is set. This however should only happen if a page fault was triggered by a user space instruction. For kernel mode uaccesses the correct action is to only return -EFAULT. So user space may incorrectly see SIGBUS signals because of this bug. For current machines this would only be possible for the switched addressing mode case in conjunction with futex operations. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: do_exception() and do_sigbus() parameters differ] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-10s390/mm: downgrade page table after fork of a 31 bit processMartin Schwidefsky4-8/+25
commit 0f6f281b731d20bfe75c13f85d33f3f05b440222 upstream. The downgrade of the 4 level page table created by init_new_context is currently done only in start_thread31. If a 31 bit process forks the new mm uses a 4 level page table, including the task size of 2<<42 that goes along with it. This is incorrect as now a 31 bit process can map memory beyond 2GB. Define arch_dup_mmap to do the downgrade after fork. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-02s390/idle: fix sequence handling vs cpu hotplugHeiko Carstens2-3/+2
commit 0008204ffe85d23382d6fd0f971f3f0fbe70bae2 upstream. The s390 idle accounting code uses a sequence counter which gets used when the per cpu idle statistics get updated and read. One assumption on read access is that only when the sequence counter is even and did not change while reading all values the result is valid. On cpu hotplug however the per cpu data structure gets initialized via a cpu hotplug notifier on CPU_ONLINE. CPU_ONLINE however is too late, since the onlined cpu is already running and might access the per cpu data. Worst case is that the data structure gets initialized while an idle thread is updating its idle statistics. This will result in an uneven sequence counter after an update. As a result user space tools like top, which access /proc/stat in order to get idle stats, will busy loop waiting for the sequence counter to become even again, which will never happen until the queried cpu will update its idle statistics again. And even then the sequence counter will only have an even value for a couple of cpu cycles. Fix this by moving the initialization of the per cpu idle statistics to cpu_init(). I prefer that solution in favor of changing the notifier to CPU_UP_PREPARE, which would be a different solution to the problem. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-31s390/pfault: fix task state raceHeiko Carstens1-2/+12
commit d5e50a51ccbda36b379aba9d1131a852eb908dda upstream. When setting the current task state to TASK_UNINTERRUPTIBLE this can race with a different cpu. The other cpu could set the task state after it inspected it (while it was still TASK_RUNNING) to TASK_RUNNING which would change the state from TASK_UNINTERRUPTIBLE to TASK_RUNNING again. This race was always present in the pfault interrupt code but didn't cause anything harmful before commit f2db2e6c "[S390] pfault: cpu hotplug vs missing completion interrupts" which relied on the fact that after setting the task state to TASK_UNINTERRUPTIBLE the task would really sleep. Since this is not necessarily the case the result may be a list corruption of the pfault_list or, as observed, a use-after-free bug while trying to access the task_struct of a task which terminated itself already. To fix this, we need to get a reference of the affected task when receiving the initial pfault interrupt and add special handling if we receive yet another initial pfault interrupt when the task is already enqueued in the pfault list. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-31KVM: s390: Sanitize fpc registers for KVM_SET_FPUChristian Borntraeger1-1/+1
(cherry picked from commit 851755871c1f3184f4124c466e85881f17fa3226) commit 7eef87dc99e419b1cc051e4417c37e4744d7b661 (KVM: s390: fix register setting) added a load of the floating point control register to the KVM_SET_FPU path. Lets make sure that the fpc is valid. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-31KVM: s390: do store status after handling STOP_ON_STOP bitJens Freimann1-8/+12
(cherry picked from commit 9e0d5473e2f0ba2d2fe9dab9408edef3060b710e) In handle_stop() handle the stop bit before doing the store status as described for "Stop and Store Status" in the Principles of Operation. We have to give up the local_int.lock before calling kvm store status since it calls gmap_fault() which might sleep. Since local_int.lock only protects local_int.* and not guest memory we can give up the lock. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-04-23fix tlb flushing for page table pagesMartin Schwidefsky4-28/+61
commit cd94154cc6a28dd9dc271042c1a59c08d26da886 upstream. Git commit 36409f6353fc2d7b6516e631415f938eadd92ffa "use generic RCU page-table freeing code" introduced a tlb flushing bug. Partially revert the above git commit and go back to s390 specific page table flush code. For s390 the TLB can contain three types of entries, "normal" TLB page-table entries, TLB combined region-and-segment-table (CRST) entries and real-space entries. Linux does not use real-space entries which leaves normal TLB entries and CRST entries. The CRST entries are intermediate steps in the page-table translation called translation paths. For example a 4K page access in a three-level page table setup will create two CRST TLB entries and one page-table TLB entry. The advantage of that approach is that a page access next to the previous one can reuse the CRST entries and needs just a single read from memory to create the page-table TLB entry. The disadvantage is that the TLB flushing rules are more complicated, before any page-table may be freed the TLB needs to be flushed. In short: the generic RCU page-table freeing code is incorrect for the CRST entries, in particular the check for mm_users < 2 is troublesome. This is applicable to 3.0+ kernels. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12S390: KEYS: Enable the compat keyctl wrapper on s390xDavid Howells1-0/+3
commit 1d057720609ed052a6371fe1d53300e5e6328e94 upstream. Enable the compat keyctl wrapper on s390x so that 32-bit s390 userspace can call the keyctl() syscall. There's an s390x assembly wrapper that truncates all the register values to 32-bits and this then calls compat_sys_keyctl() - but the latter only exists if CONFIG_KEYS_COMPAT is enabled, and the s390 Kconfig doesn't enable it. Without this patch, 32-bit calls to the keyctl() syscall are given an ENOSYS error: [root@devel4 ~]# keyctl show Session Keyring -3: key inaccessible (Function not implemented) Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: dan@danny.cz Cc: Carsten Otte <cotte@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: linux-s390@vger.kernel.org Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12compat: fix compile breakage on s390Heiko Carstens7-13/+3
commit 048cd4e51d24ebf7f3552226d03c769d6ad91658 upstream. The new is_compat_task() define for the !COMPAT case in include/linux/compat.h conflicts with a similar define in arch/s390/include/asm/compat.h. This is the minimal patch which fixes the build issues. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-01S390: correct ktime to tod clock comparator conversionMartin Schwidefsky1-2/+5
commit cf1eb40f8f5ea12c9e569e7282161fc7f194fd62 upstream. The conversion of the ktime to a value suitable for the clock comparator does not take changes to wall_to_monotonic into account. In fact the conversion just needs the boot clock (sched_clock_base_cc) and the total_sleep_time. This is applicable to 3.2+ kernels. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2011-12-30procfs: do not confuse jiffies with cputime64_tAndreas Schwab1-0/+2
Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time for nohz") did not take into account that one some architectures jiffies and cputime use different units. This causes get_idle_time() to return numbers in the wrong units, making the idle time fields in /proc/stat wrong. Instead of converting the usec value returned by get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function usecs_to_cputime64 to convert it to the correct unit of cputime64_t. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Artem S. Tashkinov" <t.artem@mailcity.com> Cc: Dave Jones <davej@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>