summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2015-05-10x86, mm/ASLR: Fix stack randomization on 64-bit systemsHector Marco-Gisbert1-3/+3
commit 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77 upstream. The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-10ARM: 8284/1: sa1100: clear RCSR_SMR on resumeDmitry Eremin-Solenikov1-0/+1
commit e461894dc2ce7778ccde1c3483c9b15a85a7fc5f upstream. StrongARM core uses RCSR SMR bit to tell to bootloader that it was reset by entering the sleep mode. After we have resumed, there is little point in having that bit enabled. Moreover, if this bit is set before reboot, the bootloader can become confused. Thus clear the SMR bit on resume just before clearing the scratchpad (resume address) register. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-10KVM: s390: base hrtimer on a monotonic clockDavid Hildenbrand1-1/+1
commit 0ac96caf0f9381088c673a16d910b1d329670edf upstream. The hrtimer that handles the wait with enabled timer interrupts should not be disturbed by changes of the host time. This patch changes our hrtimer to be based on a monotonic clock. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-10axonram: Fix bug in direct_accessMatthew Wilcox1-1/+1
commit 91117a20245b59f70b563523edbf998a62fc6383 upstream. The 'pfn' returned by axonram was completely bogus, and has been since 2008. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-10hx4700: regulator: declare full constraintsMartin Vajnar1-0/+2
commit a52d209336f8fc7483a8c7f4a8a7d2a8e1692a6c upstream. Since the removal of CONFIG_REGULATOR_DUMMY option, the touchscreen stopped working. This patch enables the "replacement" for REGULATOR_DUMMY and allows the touchscreen to work even though there is no regulator for "vcc". Signed-off-by: Martin Vajnar <martin.vajnar@gmail.com> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-10ARM: pxa: add regulator_has_full_constraints to spitz board fileDmitry Eremin-Solenikov1-0/+2
commit baad2dc49c5d970ea881d92981a1b76c94a7b7a1 upstream. Add regulator_has_full_constraints() call to spitz board file to let regulator core know that we do not have any additional regulators left. This lets it substitute unprovided regulators with dummy ones. This fixes the following warnings that can be seen on spitz if regulators are enabled: ads7846 spi2.0: unable to get regulator: -517 spi spi2.0: Driver ads7846 requests probe deferral Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-10ARM: pxa: add regulator_has_full_constraints to poodle board fileDmitry Eremin-Solenikov1-0/+2
commit 9bc78f32c2e430aebf6def965b316aa95e37a20c upstream. Add regulator_has_full_constraints() call to poodle board file to let regulator core know that we do not have any additional regulators left. This lets it substitute unprovided regulators with dummy ones. This fixes the following warnings that can be seen on poodle if regulators are enabled: ads7846 spi1.0: unable to get regulator: -517 spi spi1.0: Driver ads7846 requests probe deferral wm8731 0-001b: Failed to get supply 'AVDD': -517 wm8731 0-001b: Failed to request supplies: -517 wm8731 0-001b: ASoC: failed to probe component -517 Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-10ARM: pxa: add regulator_has_full_constraints to corgi board fileDmitry Eremin-Solenikov1-0/+3
commit 271e80176aae4e5b481f4bb92df9768c6075bbca upstream. Add regulator_has_full_constraints() call to corgi board file to let regulator core know that we do not have any additional regulators left. This lets it substitute unprovided regulators with dummy ones. This fixes the following warnings that can be seen on corgi if regulators are enabled: ads7846 spi1.0: unable to get regulator: -517 spi spi1.0: Driver ads7846 requests probe deferral wm8731 0-001b: Failed to get supply 'AVDD': -517 wm8731 0-001b: Failed to request supplies: -517 wm8731 0-001b: ASoC: failed to probe component -517 corgi-audio corgi-audio: ASoC: failed to instantiate card -517 Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-03-06MIPS: Fix kernel lockup or crash after CPU offline/onlineHemmo Nieminen1-1/+1
commit c7754e75100ed5e3068ac5085747f2bfc386c8d6 upstream. As printk() invocation can cause e.g. a TLB miss, printk() cannot be called before the exception handlers have been properly initialized. This can happen e.g. when netconsole has been loaded as a kernel module and the TLB table has been cleared when a CPU was offline. Call cpu_report() in start_secondary() only after the exception handlers have been initialized to fix this. Without the patch the kernel will randomly either lockup or crash after a CPU is onlined and the console driver is a module. Signed-off-by: Hemmo Nieminen <hemmo.nieminen@iki.fi> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/8953/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-03-06MIPS: IRQ: Fix disable_irq on CPU IRQsFelix Fietkau1-0/+4
commit a3e6c1eff54878506b2dddcc202df9cc8180facb upstream. If the irq_chip does not define .irq_disable, any call to disable_irq will defer disabling the IRQ until it fires while marked as disabled. This assumes that the handler function checks for this condition, which handle_percpu_irq does not. In this case, calling disable_irq leads to an IRQ storm, if the interrupt fires while disabled. This optimization is only useful when disabling the IRQ is slow, which is not true for the MIPS CPU IRQ. Disable this optimization by implementing .irq_disable and .irq_enable Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8949/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-03-06x86: mm/fault: Fix semaphore imbalanceBen Hutchings1-1/+1
When backporting commit 33692f27597f ('vm: add VM_FAULT_SIGSEGV handling support') I didn't notice that it depended on a recent change to the locking context of mm_fault_error() (commit 7fb08eca4527, 'x86: mm: move mmap_sem unlock from mm_fault_error() to caller'). That isn't easily applicable to 3.2, so instead make sure we drop mm->mmap_sem on the new branch of mm_fault_error(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20KVM: x86: SYSENTER emulation is brokenNadav Amit1-17/+8
commit f3747379accba8e95d70cec0eae0582c8c182050 upstream. SYSENTER emulation is broken in several ways: 1. It misses the case of 16-bit code segments completely (CVE-2015-0239). 2. MSR_IA32_SYSENTER_CS is checked in 64-bit mode incorrectly (bits 0 and 1 can still be set without causing #GP). 3. MSR_IA32_SYSENTER_EIP and MSR_IA32_SYSENTER_ESP are not masked in legacy-mode. 4. There is some unneeded code. Fix it. Cc: stable@vger.linux.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20KVM: x86 emulator: reject SYSENTER in compatibility mode on AMD guestsAvi Kivity1-0/+19
commit 1a18a69b762374c423305772500f36eb8984ca52 upstream. If the guest thinks it's an AMD, it will not have prepared the SYSENTER MSRs, and if the guest executes SYSENTER in compatibility mode, it will fails. Detect this condition and #UD instead, like the spec says. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20x86, cpu, amd: Add workaround for family 16h, erratum 793Borislav Petkov2-0/+11
commit 3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream. This adds the workaround for erratum 793 as a precaution in case not every BIOS implements it. This addresses CVE-2013-6885. Erratum text: [Revision Guide for AMD Family 16h Models 00h-0Fh Processors, document 51810 Rev. 3.04 November 2013] 793 Specific Combination of Writes to Write Combined Memory Types and Locked Instructions May Cause Core Hang Description Under a highly specific and detailed set of internal timing conditions, a locked instruction may trigger a timing sequence whereby the write to a write combined memory type is not flushed, causing the locked instruction to stall indefinitely. Potential Effect on System Processor core hang. Suggested Workaround BIOS should set MSR C001_1020[15] = 1b. Fix Planned No fix planned [ hpa: updated description, fixed typo in MSR name ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> [bwh: Backported to 3.2: - Adjust filename - Venkatesh Srinivas pointed out we should use {rd,wr}msrl_safe() to avoid crashing on KVM. This was fixed upstream by commit 8f86a7373a1c ("x86, AMD: Convert to the new bit access MSR accessors") but that's too much trouble to backport. Here we must use {rd,wr}msrl_amd_safe().] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Moritz Muehlenhoff <jmm@debian.org> Cc: Venkatesh Srinivas <venkateshs@google.com>
2015-02-20Revert "x86, 64bit, mm: Mark data/bss/brk to nx"Ben Hutchings1-4/+3
This reverts commit e105c8187b7101e8a8a54ac0218c9d9c9463c636 which was commit 72212675d1c96f5db8ec6fb35701879911193158 upstream. This caused suspend/resume to stop working on at least some systems - specifically, the system would reboot when woken. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org>
2015-02-20Revert "x86, mm: Set NX across entire PMD at boot"Ben Hutchings1-10/+1
This reverts commit a5c187d92d2ce30315f333b9dff33af832e8b443 which was commit 45e2a9d4701d8c624d4a4bcdd1084eae31e92f58 upstream. The previous commit caused suspend/resume to stop working on at least some systems - specifically, the system would reboot when woken. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org>
2015-02-20vm: add VM_FAULT_SIGSEGV handling supportLinus Torvalds24-1/+52
commit 33692f27597fcab536d7cbbcc8f52905133e4aa7 upstream. The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Adjust filenames, context - Drop arc, metag, nios2 and lustre changes - For sh, patch both 32-bit and 64-bit implementations to use goto bad_area - For s390, pass int_code and trans_exc_code as arguments to do_no_context() and do_sigsegv()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20x86, tls: Interpret an all-zero struct user_desc as "no segment"Andy Lutomirski2-2/+36
commit 3669ef9fa7d35f573ec9c0e0341b29251c2734a7 upstream. The Witcher 2 did something like this to allocate a TLS segment index: struct user_desc u_info; bzero(&u_info, sizeof(u_info)); u_info.entry_number = (uint32_t)-1; syscall(SYS_set_thread_area, &u_info); Strictly speaking, this code was never correct. It should have set read_exec_only and seg_not_present to 1 to indicate that it wanted to find a free slot without putting anything there, or it should have put something sensible in the TLS slot if it wanted to allocate a TLS entry for real. The actual effect of this code was to allocate a bogus segment that could be used to exploit espfix. The set_thread_area hardening patches changed the behavior, causing set_thread_area to return -EINVAL and crashing the game. This changes set_thread_area to interpret this as a request to find a free slot and to leave it empty, which isn't *quite* what the game expects but should be close enough to keep it working. In particular, using the code above to allocate two segments will allocate the same segment both times. According to FrostbittenKing on Github, this fixes The Witcher 2. If this somehow still causes problems, we could instead allocate a limit==0 32-bit data segment, but that seems rather ugly to me. Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20x86, tls, ldt: Stop checking lm in LDT_emptyAndy Lutomirski1-7/+2
commit e30ab185c490e9a9381385529e0fd32f0a399495 upstream. 32-bit programs don't have an lm bit in their ABI, so they can't reliably cause LDT_empty to return true without resorting to memset. They shouldn't need to do this. This should fix a longstanding, if minor, issue in all 64-bit kernels as well as a potential regression in the TLS hardening code. Fixes: 41bdc78544b8 x86/tls: Validate TLS entries to protect espfix Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20x86, hyperv: Mark the Hyper-V clocksource as being continuousK. Y. Srinivasan1-0/+1
commit 32c6590d126836a062b3140ed52d898507987017 upstream. The Hyper-V clocksource is continuous; mark it accordingly. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: jasowang@redhat.com Cc: gregkh@linuxfoundation.org Cc: devel@linuxdriverproject.org Cc: olaf@aepfle.de Cc: apw@canonical.com Link: http://lkml.kernel.org/r/1421108762-3331-1-git-send-email-kys@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracingSteven Rostedt (Red Hat)1-5/+15
commit 237d28db036e411f22c03cfd5b0f6dc2aa9bf3bc upstream. If the function graph tracer traces a jprobe callback, the system will crash. This can easily be demonstrated by compiling the jprobe sample module that is in the kernel tree, loading it and running the function graph tracer. # modprobe jprobe_example.ko # echo function_graph > /sys/kernel/debug/tracing/current_tracer # ls The first two commands end up in a nice crash after the first fork. (do_fork has a jprobe attached to it, so "ls" just triggers that fork) The problem is caused by the jprobe_return() that all jprobe callbacks must end with. The way jprobes works is that the function a jprobe is attached to has a breakpoint placed at the start of it (or it uses ftrace if fentry is supported). The breakpoint handler (or ftrace callback) will copy the stack frame and change the ip address to return to the jprobe handler instead of the function. The jprobe handler must end with jprobe_return() which swaps the stack and does an int3 (breakpoint). This breakpoint handler will then put back the saved stack frame, simulate the instruction at the beginning of the function it added a breakpoint to, and then continue on. For function tracing to work, it hijakes the return address from the stack frame, and replaces it with a hook function that will trace the end of the call. This hook function will restore the return address of the function call. If the function tracer traces the jprobe handler, the hook function for that handler will not be called, and its saved return address will be used for the next function. This will result in a kernel crash. To solve this, pause function tracing before the jprobe handler is called and unpause it before it returns back to the function it probed. Some other updates: Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the code look a bit cleaner and easier to understand (various tries to fix this bug required this change). Note, if fentry is being used, jprobes will change the ip address before the function graph tracer runs and it will not be able to trace the function that the jprobe is probing. Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bwh: Backported to 3.2: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20crypto: include crypto- module prefix in templateKees Cook1-0/+3
commit 4943ba16bbc2db05115707b3ff7b4874e9e3c560 upstream. This adds the module loading prefix "crypto-" to the template lookup as well. For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly includes the "crypto-" prefix at every level, correctly rejecting "vfat": net-pf-38 algif-hash crypto-vfat(blowfish) crypto-vfat(blowfish)-all crypto-vfat Reported-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [bwh: Backported to 3.2: drop changes to cmac and mcryptd which we don't have] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20crypto: prefix module autoloading with "crypto-"Kees Cook15-24/+24
commit 5d26a105b5a73e5635eae0629b42fa0a90e07b7b upstream. This prefixes all crypto module loading with "crypto-" so we never run the risk of exposing module auto-loading to userspace via a crypto API, as demonstrated by Mathias Krause: https://lkml.org/lkml/2013/3/4/70 Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [bwh: Backported to 3.2: - Adjust filenames - Drop changes to algorithms and drivers we don't have - Add aliases to generic C implementations that didn't need them before] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20x86_64, vdso: Fix the vdso address randomization algorithmAndy Lutomirski1-14/+27
commit 394f56fe480140877304d342dec46d50dc823d46 upstream. The theory behind vdso randomization is that it's mapped at a random offset above the top of the stack. To avoid wasting a page of memory for an extra page table, the vdso isn't supposed to extend past the lowest PMD into which it can fit. Other than that, the address should be a uniformly distributed address that meets all of the alignment requirements. The current algorithm is buggy: the vdso has about a 50% probability of being at the very end of a PMD. The current algorithm also has a decent chance of failing outright due to incorrect handling of the case where the top of the stack is near the top of its PMD. This fixes the implementation. The paxtest estimate of vdso "randomisation" improves from 11 bits to 18 bits. (Disclaimer: I don't know what the paxtest code is actually calculating.) It's worth noting that this algorithm is inherently biased: the vdso is more likely to end up near the end of its PMD than near the beginning. Ideally we would either nix the PMD sharing requirement or jointly randomize the vdso and the stack to reduce the bias. In the mean time, this is a considerable improvement with basically no risk of compatibility issues, since the allowed outputs of the algorithm are unchanged. As an easy test, doing this: for i in `seq 10000` do grep -P vdso /proc/self/maps |cut -d- -f1 done |sort |uniq -d used to produce lots of output (1445 lines on my most recent run). A tiny subset looks like this: 7fffdfffe000 7fffe01fe000 7fffe05fe000 7fffe07fe000 7fffe09fe000 7fffe0bfe000 7fffe0dfe000 Note the suspicious fe000 endings. With the fix, I get a much more palatable 76 repeated addresses. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> [bwh: Backported to 3.2: - Adjust context - The whole file is only built for x86_64; adjust comment for this] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20x86/tls: Don't validate lm in set_thread_area() after allAndy Lutomirski2-6/+7
commit 3fb2f4237bb452eb4e98f6a5dbd5a445b4fed9d0 upstream. It turns out that there's a lurking ABI issue. GCC, when compiling this in a 32-bit program: struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; will leave .lm uninitialized. This means that anything in the kernel that reads user_desc.lm for 32-bit tasks is unreliable. Revert the .lm check in set_thread_area(). The value never did anything in the first place. Fixes: 0e58af4e1d21 ("x86/tls: Disallow unusual TLS segments") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20x86/tls: Disallow unusual TLS segmentsAndy Lutomirski1-0/+22
commit 0e58af4e1d2166e9e33375a0f121e4867010d4f8 upstream. Users have no business installing custom code segments into the GDT, and segments that are not present but are otherwise valid are a historical source of interesting attacks. For completeness, block attempts to set the L bit. (Prior to this patch, the L bit would have been silently dropped.) This is an ABI break. I've checked glibc, musl, and Wine, and none of them look like they'll have any trouble. Note to stable maintainers: this is a hardening patch that fixes no known bugs. Given the possibility of ABI issues, this probably shouldn't be backported quickly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: security@kernel.org <security@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20x86_64, switch_to(): Load TLS descriptors before switching DS and ESAndy Lutomirski1-28/+73
commit f647d7c155f069c1a068030255c300663516420e upstream. Otherwise, if buggy user code points DS or ES into the TLS array, they would be corrupted after a context switch. This also significantly improves the comments and documents some gotchas in the code. Before this patch, the both tests below failed. With this patch, the es test passes, although the gsbase test still fails. ----- begin es test ----- /* * Copyright (c) 2014 Andy Lutomirski * GPL v2 */ static unsigned short GDT3(int idx) { return (idx << 3) | 3; } static int create_tls(int idx, unsigned int base) { struct user_desc desc = { .entry_number = idx, .base_addr = base, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0, }; if (syscall(SYS_set_thread_area, &desc) != 0) err(1, "set_thread_area"); return desc.entry_number; } int main() { int idx = create_tls(-1, 0); printf("Allocated GDT index %d\n", idx); unsigned short orig_es; asm volatile ("mov %%es,%0" : "=rm" (orig_es)); int errors = 0; int total = 1000; for (int i = 0; i < total; i++) { asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx))); usleep(100); unsigned short es; asm volatile ("mov %%es,%0" : "=rm" (es)); asm volatile ("mov %0,%%es" : : "rm" (orig_es)); if (es != GDT3(idx)) { if (errors == 0) printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n", GDT3(idx), es); errors++; } } if (errors) { printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total); return 1; } else { printf("[OK]\tES was preserved\n"); return 0; } } ----- end es test ----- ----- begin gsbase test ----- /* * gsbase.c, a gsbase test * Copyright (c) 2014 Andy Lutomirski * GPL v2 */ static unsigned char *testptr, *testptr2; static unsigned char read_gs_testvals(void) { unsigned char ret; asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr)); return ret; } int main() { int errors = 0; testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0); if (testptr == MAP_FAILED) err(1, "mmap"); testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0); if (testptr2 == MAP_FAILED) err(1, "mmap"); *testptr = 0; *testptr2 = 1; if (syscall(SYS_arch_prctl, ARCH_SET_GS, (unsigned long)testptr2 - (unsigned long)testptr) != 0) err(1, "ARCH_SET_GS"); usleep(100); if (read_gs_testvals() == 1) { printf("[OK]\tARCH_SET_GS worked\n"); } else { printf("[FAIL]\tARCH_SET_GS failed\n"); errors++; } asm volatile ("mov %0,%%gs" : : "r" (0)); if (read_gs_testvals() == 0) { printf("[OK]\tWriting 0 to gs worked\n"); } else { printf("[FAIL]\tWriting 0 to gs failed\n"); errors++; } usleep(100); if (read_gs_testvals() == 0) { printf("[OK]\tgsbase is still zero\n"); } else { printf("[FAIL]\tgsbase was corrupted\n"); errors++; } return errors == 0 ? 0 : 1; } ----- end gsbase test ----- Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20bus: omap_l3_noc: Correct returning IRQ_HANDLED unconditionally in the irq ↵Keerthy1-3/+7
handler commit c4cf0935a2d8fe6d186bf4253ea3c4b4a8a8a710 upstream. Correct returning IRQ_HANDLED unconditionally in the irq handler. Return IRQ_NONE for some interrupt which we do not expect to be handled in this handler. This prevents kernel stalling with back to back spurious interrupts. Fixes: 2722e56de6 ("OMAP4: l3: Introduce l3-interconnect error handling driver") Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> [bwh: Backported to 3.2: adjust filename, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-02-20KVM: s390: flush CPU on load controlChristian Borntraeger1-0/+2
commit 2dca485f8740208604543c3960be31a5dd3ea603 upstream. some control register changes will flush some aspects of the CPU, e.g. POP explicitely mentions that for CR9-CR11 "TLBs may be cleared". Instead of trying to be clever and only flush on specific CRs, let play safe and flush on all lctl(g) as future machines might define new bits in CRs. Load control intercept should not happen that often. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> [bwh: Backported to 3.2: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-01-01x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-onlyPaolo Bonzini3-2/+16
commit c1118b3602c2329671ad5ec8bdf8e374323d6343 upstream. On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA. In that case, KVM will fail to patch VMCALL instructions to VMMCALL as required on AMD processors. The failure mode is currently a divide-by-zero exception, which obviously is a KVM bug that has to be fixed. However, picking the right instruction between VMCALL and VMMCALL will be faster and will help if you cannot upgrade the hypervisor. Reported-by: Chris Webb <chris@arachsys.com> Tested-by: Chris Webb <chris@arachsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-01-01crypto: ghash-clmulni-intel - use C implementation for setkey()Ard Biesheuvel2-31/+11
commit 8ceee72808d1ae3fb191284afc2257a2be964725 upstream. The GHASH setkey() function uses SSE registers but fails to call kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and then having to deal with the restriction that they cannot be called from interrupt context, move the setkey() implementation to the C domain. Note that setkey() does not use any particular SSE features and is not expected to become a performance bottleneck. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Fixes: 0e1227d356e9b (crypto: ghash - Add PCLMULQDQ accelerated implementation) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-01-01s390,time: revert direct ktime path for s390 clockevent deviceMartin Schwidefsky1-15/+4
commit 8adbf78ec4839c1dc4ff20c9a1f332a7bc99e6e6 upstream. Git commit 4f37a68cdaf6dea833cfdded2a3e0c47c0f006da "s390: Use direct ktime path for s390 clockevent device" makes use of the CLOCK_EVT_FEAT_KTIME clockevent option to avoid the delta calculation with ktime_get() in clockevents_program_event and the get_tod_clock() in s390_next_event. This is based on the assumption that the difference between the internal ktime and the hardware clock is reflected in the wall_to_monotonic delta. But this is not true, the ntp corrections are applied via changes to the tk->mult multiplier and this is not reflected in wall_to_monotonic. In theory this could be solved by using the raw monotonic clock but it is simpler to switch back to the standard clock delta calculation. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: s/get_tod_clock()/get_clock()/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-01-01move d_rcu from overlapping d_child to overlapping d_aliasAl Viro1-2/+2
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [bwh: Backported to 3.2: - Apply name changes in all the different places we use d_alias and d_child - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-01-01x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefitAndy Lutomirski2-2/+8
commit 29fa6825463c97e5157284db80107d1bfac5d77b upstream. paravirt_enabled has the following effects: - Disables the F00F bug workaround warning. There is no F00F bug workaround any more because Linux's standard IDT handling already works around the F00F bug, but the warning still exists. This is only cosmetic, and, in any event, there is no such thing as KVM on a CPU with the F00F bug. - Disables 32-bit APM BIOS detection. On a KVM paravirt system, there should be no APM BIOS anyway. - Disables tboot. I think that the tboot code should check the CPUID hypervisor bit directly if it matters. - paravirt_enabled disables espfix32. espfix32 should *not* be disabled under KVM paravirt. The last point is the purpose of this patch. It fixes a leak of the high 16 bits of the kernel stack address on 32-bit KVM paravirt guests. Fixes CVE-2014-8134. Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-01-01x86/tls: Validate TLS entries to protect espfixAndy Lutomirski1-0/+23
commit 41bdc78544b8a93a9c6814b8bbbfef966272abbe upstream. Installing a 16-bit RW data segment into the GDT defeats espfix. AFAICT this will not affect glibc, Wine, or dosemu at all. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: security@kernel.org <security@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-01-01KVM: x86: Don't report guest userspace emulation error to userspaceNadav Amit1-1/+1
commit a2b9e6c1a35afcc0973acb72e591c714e78885ff upstream. Commit fc3a9157d314 ("KVM: X86: Don't report L2 emulation failures to user-space") disabled the reporting of L2 (nested guest) emulation failures to userspace due to race-condition between a vmexit and the instruction emulator. The same rational applies also to userspace applications that are permitted by the guest OS to access MMIO area or perform PIO. This patch extends the current behavior - of injecting a #UD instead of reporting it to userspace - also for guest userspace code. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14MIPS: Loongson: Make platform serial setup always built-in.Aaro Koskinen1-1/+2
commit 26927f76499849e095714452b8a4e09350f6a3b9 upstream. If SERIAL_8250 is compiled as a module, the platform specific setup for Loongson will be a module too, and it will not work very well. At least on Loongson 3 it will trigger a build failure, since loongson_sysconf is not exported to modules. Fix by making the platform specific serial code always built-in. Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reported-by: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Huacai Chen <chenhc@lemote.com> Cc: Markos Chandras <Markos.Chandras@imgtec.com> Patchwork: https://patchwork.linux-mips.org/patch/8533/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regsAndy Lutomirski1-2/+2
commit 7ddc6a2199f1da405a2fb68c40db8899b1a8cd87 upstream. These functions can be executed on the int3 stack, so kprobes are dangerous. Tracing is probably a bad idea, too. Fixes: b645af2d5905 ("x86_64, traps: Rework bad_iret") Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/50e33d26adca60816f3ba968875801652507d0c4.1416870125.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: - Use __kprobes instead of NOKPROBE_SYMBOL() - Don't use __visible] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86_64, traps: Rework bad_iretAndy Lutomirski2-29/+48
commit b645af2d5905c4e32399005b867987919cbfc3ae upstream. It's possible for iretq to userspace to fail. This can happen because of a bad CS, SS, or RIP. Historically, we've handled it by fixing up an exception from iretq to land at bad_iret, which pretends that the failed iret frame was really the hardware part of #GP(0) from userspace. To make this work, there's an extra fixup to fudge the gs base into a usable state. This is suboptimal because it loses the original exception. It's also buggy because there's no guarantee that we were on the kernel stack to begin with. For example, if the failing iret happened on return from an NMI, then we'll end up executing general_protection on the NMI stack. This is bad for several reasons, the most immediate of which is that general_protection, as a non-paranoid idtentry, will try to deliver signals and/or schedule from the wrong stack. This patch throws out bad_iret entirely. As a replacement, it augments the existing swapgs fudge into a full-blown iret fixup, mostly written in C. It's should be clearer and more correct. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - We didn't use the _ASM_EXTABLE macro - Don't use __visible] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in CAndy Lutomirski2-32/+26
commit af726f21ed8af2cdaa4e93098dc211521218ae65 upstream. There's nothing special enough about the espfix64 double fault fixup to justify writing it in assembly. Move it to C. This also fixes a bug: if the double fault came from an IST stack, the old asm code would return to a partially uninitialized stack frame. Fixes: 3891a04aafd668686239349ea58f3314ea2af86b Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Keep using the paranoiderrorentry macro to generate the asm code - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86_64, traps: Stop using IST for #SSAndy Lutomirski5-22/+7
commit 6f442be2fb22be02cafa606f1769fa1e6f894441 upstream. On a 32-bit kernel, this has no effect, since there are no IST stacks. On a 64-bit kernel, #SS can only happen in user code, on a failed iret to user space, a canonical violation on access via RSP or RBP, or a genuine stack segment violation in 32-bit kernel code. The first two cases don't need IST, and the latter two cases are unlikely fatal bugs, and promoting them to double faults would be fine. This fixes a bug in which the espfix64 code mishandles a stack segment violation. This saves 4k of memory per CPU and a tiny bit of code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - No need to define trace_stack_segment - Use the errorentry macro to generate #SS asm code - Adjust context - Checked that this matches Luis's backport for Ubuntu] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14ARM: 8216/1: xscale: correct auxiliary register in suspend/resumeDmitry Eremin-Solenikov1-2/+2
commit ef59a20ba375aeb97b3150a118318884743452a8 upstream. According to the manuals I have, XScale auxiliary register should be reached with opc_2 = 1 instead of crn = 1. cpu_xscale_proc_init correctly uses c1, c0, 1 arguments, but cpu_xscale_do_suspend and cpu_xscale_do_resume use c1, c1, 0. Correct suspend/resume functions to also use c1, c0, 1. The issue was primarily noticed thanks to qemu reporing "unsupported instruction" on the pxa suspend path. Confirmed in PXA210/250 and PXA255 XScale Core manuals and in PXA270 and PXA320 Developers Guides. Harware tested by me on tosa (pxa255). Robert confirmed on pxa270 board. Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14MIPS: oprofile: Fix backtrace on 64-bit kernelAaro Koskinen1-1/+1
commit bbaf113a481b6ce32444c125807ad3618643ce57 upstream. Fix incorrect cast that always results in wrong address for the new frame on 64-bit kernels. Signed-off-by: Aaro Koskinen <aaro.koskinen@nsn.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8110/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86, mm: Set NX across entire PMD at bootKees Cook1-1/+10
commit 45e2a9d4701d8c624d4a4bcdd1084eae31e92f58 upstream. When setting up permissions on kernel memory at boot, the end of the PMD that was split from bss remained executable. It should be NX like the rest. This performs a PMD alignment instead of a PAGE alignment to get the correct span of memory. Before: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82c00000 10M RW PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82df5000 2004K RW GLB NX pte 0xffffffff82df5000-0xffffffff82e00000 44K RW GLB x pte 0xffffffff82e00000-0xffffffffc0000000 978M pmd After: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82e00000 12M RW PSE GLB NX pmd 0xffffffff82e00000-0xffffffffc0000000 978M pmd [ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment. We really should unmap the reminder along with the holes caused by init,initdata etc. but thats a different issue ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bwh: BAckported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86, 64bit, mm: Mark data/bss/brk to nxYinghai Lu1-3/+4
commit 72212675d1c96f5db8ec6fb35701879911193158 upstream. HPA said, we should not have RW and +x set at the time. for kernel layout: [ 0.000000] Kernel Layout: [ 0.000000] .text: [0x01000000-0x021434f8] [ 0.000000] .rodata: [0x02200000-0x02a13fff] [ 0.000000] .data: [0x02c00000-0x02dc763f] [ 0.000000] .init: [0x02dc9000-0x0312cfff] [ 0.000000] .bss: [0x0313b000-0x03dd6fff] [ 0.000000] .brk: [0x03dd7000-0x03dfffff] before the patch, we have ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff82200000 18M ro PSE GLB x pmd 0xffffffff82200000-0xffffffff82c00000 10M ro PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82dc9000 1828K RW GLB x pte 0xffffffff82dc9000-0xffffffff82e00000 220K RW GLB NX pte 0xffffffff82e00000-0xffffffff83000000 2M RW PSE GLB NX pmd 0xffffffff83000000-0xffffffff8313a000 1256K RW GLB NX pte 0xffffffff8313a000-0xffffffff83200000 792K RW GLB x pte 0xffffffff83200000-0xffffffff83e00000 12M RW PSE GLB x pmd 0xffffffff83e00000-0xffffffffa0000000 450M pmd after patch,, we get ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff82200000 18M ro PSE GLB x pmd 0xffffffff82200000-0xffffffff82c00000 10M ro PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82e00000 2M RW GLB NX pte 0xffffffff82e00000-0xffffffff83000000 2M RW PSE GLB NX pmd 0xffffffff83000000-0xffffffff83200000 2M RW GLB NX pte 0xffffffff83200000-0xffffffff83e00000 12M RW PSE GLB NX pmd 0xffffffff83e00000-0xffffffffa0000000 450M pmd so data, bss, brk get NX ... Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-33-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86: Require exact match for 'noxsave' command line optionDave Hansen1-0/+2
commit 2cd3949f702692cf4c5d05b463f19cd706a92dd3 upstream. We have some very similarly named command-line options: arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup); __setup() is designed to match options that take arguments, like "foo=bar" where you would have: __setup("foo", x86_foo_func...); The problem is that "noxsave" actually _matches_ "noxsaves" in the same way that "foo" matches "foo=bar". If you boot an old kernel that does not know about "noxsaves" with "noxsaves" on the command line, it will interpret the argument as "noxsave", which is not what you want at all. This makes the "noxsave" handler only return success when it finds an *exact* match. [ tglx: We really need to make __setup() more robust. ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14kvm: x86: don't kill guest on unknown exit reasonMichael S. Tsirkin2-6/+6
commit 2bc19dc3754fc066c43799659f0d848631c44cfe upstream. KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was triggered by a priveledged application. Let's not kill the guest: WARN and inject #UD instead. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14MIPS: ftrace: Fix a microMIPS build problemMarkos Chandras1-2/+2
commit aedd153f5bb5b1f1d6d9142014f521ae2ec294cc upstream. Code before the .fixup section needs to have the .insn directive. This has no side effects on MIPS32/64 but it affects the way microMIPS loads the address for the return label. Fixes the following build problem: mips-linux-gnu-ld: arch/mips/built-in.o: .fixup+0x4a0: Unsupported jump between ISA modes; consider recompiling with interlinking enabled. mips-linux-gnu-ld: final link failed: Bad value Makefile:819: recipe for target 'vmlinux' failed The fix is similar to 1658f914ff91c3bf ("MIPS: microMIPS: Disable LL/SC and fix linker bug.") Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8117/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86, apic: Handle a bad TSC more gracefullyAndy Lutomirski2-3/+6
commit b47dcbdc5161d3d5756f430191e2840d9b855492 upstream. If the TSC is unusable or disabled, then this patch fixes: - Confusion while trying to clear old APIC interrupts. - Division by zero and incorrect programming of the TSC deadline timer. This fixes boot if the CPU has a TSC deadline timer but a missing or broken TSC. The failure to boot can be observed with qemu using -cpu qemu64,-tsc,+tsc-deadline This also happens to me in nested KVM for unknown reasons. With this patch, I can boot cleanly (although without a TSC). Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Bandan Das <bsd@redhat.com> Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14x86: Conditionally update time when ack-ing pending irqsShai Fultheim1-5/+7
commit 42fa4250436304d4650fa271f37671f6cee24e08 upstream. On virtual environments, apic_read could take a long time. As a result, under certain conditions the ack pending loop may exit without any queued irqs left, but after more than one second. A warning will be printed needlessly in this case. If the loop is about to exit regardless of max_loops, don't update it. Signed-off-by: Shai Fultheim <shai@scalemp.com> [ rebased and reworded the commit message] Signed-off-by: Ido Yariv <ido@wizery.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>