summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2016-06-27block: convert to device_add_disk()Dan Williams1-2/+1
For block drivers that specify a parent device, convert them to use device_add_disk(). This conversion was done with the following semantic patch: @@ struct gendisk *disk; expression E; @@ - disk->driverfs_dev = E; ... - add_disk(disk); + device_add_disk(E, disk); @@ struct gendisk *disk; expression E1, E2; @@ - disk->driverfs_dev = E1; ... E2 = disk; ... - add_disk(E2); + device_add_disk(E1, E2); ...plus some manual fixups for a few missed conversions. Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-22um: track 'parent' device in a local variableDan Williams1-2/+3
In preparation for the removal of 'driverfs_dev' from 'struct gendisk' use a local variable to track the parented vs un-parented case in ubd_disk_register(). Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-07block, drivers: add REQ_OP_FLUSH operationMike Christie1-1/+1
This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-05parisc: Move die_if_kernel() prototype into traps.h headerHelge Deller2-2/+3
Signed-off-by: Helge Deller <deller@gmx.de>
2016-06-05parisc: Fix pagefault crash in unaligned __get_user() callHelge Deller1-1/+9
One of the debian buildd servers had this crash in the syslog without any other information: Unaligned handler failed, ret = -2 clock_adjtime (pid 22578): Unaligned data reference (code 28) CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G E 4.5.0-2-parisc64-smp #1 Debian 4.5.4-1 task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001001111100000001111 Tainted: G E r00-03 000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0 r04-07 00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff r08-11 0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4 r12-15 000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b r16-19 0000000000028800 0000000000000001 0000000000000070 00000001bde7c218 r20-23 0000000000000000 00000001bde7c210 0000000000000002 0000000000000000 r24-27 0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0 r28-31 0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218 sr00-03 0000000001200000 0000000001200000 0000000000000000 0000000001200000 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88 IIR: 0ca0d089 ISR: 0000000001200000 IOR: 00000000fa6f7fff CPU: 1 CR30: 00000001bde7c000 CR31: ffffffffffffffff ORIG_R28: 00000002369fe628 IAOQ[0]: compat_get_timex+0x2dc/0x3c0 IAOQ[1]: compat_get_timex+0x2e0/0x3c0 RP(r2): compat_get_timex+0x40/0x3c0 Backtrace: [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0 [<0000000040205024>] syscall_exit+0x0/0x14 This means the userspace program clock_adjtime called the clock_adjtime() syscall and then crashed inside the compat_get_timex() function. Syscalls should never crash programs, but instead return EFAULT. The IIR register contains the executed instruction, which disassebles into "ldw 0(sr3,r5),r9". This load-word instruction is part of __get_user() which tried to read the word at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The unaligned handler is able to emulate all ldw instructions, but it fails if it fails to read the source e.g. because of page fault. The following program reproduces the problem: #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <sys/mman.h> int main(void) { /* allocate 8k */ char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); /* free second half (upper 4k) and make it invalid. */ munmap(ptr+4096, 4096); /* syscall where first int is unaligned and clobbers into invalid memory region */ /* syscall should return EFAULT */ return syscall(__NR_clock_adjtime, 0, ptr+4095); } To fix this issue we simply need to check if the faulting instruction address is in the exception fixup table when the unaligned handler failed. If it is, call the fixup routine instead of crashing. While looking at the unaligned handler I found another issue as well: The target register should not be modified if the handler was unsuccessful. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org
2016-06-05parisc: Fix printk time during bootHelge Deller2-7/+3
Avoid showing invalid printk time stamps during boot. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Aaro Koskinen <aaro.koskinen@iki.fi>
2016-06-04parisc: Fix backtrace on PA-RISCMikulas Patocka1-8/+14
This patch fixes backtrace on PA-RISC There were several problems: 1) The code that decodes instructions handles instructions that subtract from the stack pointer incorrectly. If the instruction subtracts the number X from the stack pointer the code increases the frame size by (0x100000000-X). This results in invalid accesses to memory and recursive page faults. 2) Because gcc reorders blocks, handling instructions that subtract from the frame pointer is incorrect. For example, this function int f(int a) { if (__builtin_expect(a, 1)) return a; g(); return a; } is compiled in such a way, that the code that decreases the stack pointer for the first "return a" is placed before the code for "g" call. If we recognize this decrement, we mistakenly believe that the frame size for the "g" call is zero. To fix problems 1) and 2), the patch doesn't recognize instructions that decrease the stack pointer at all. To further safeguard the unwind code against nonsense values, we don't allow frame size larger than Total_frame_size. 3) The backtrace is not locked. If stack dump races with module unload, invalid table can be accessed. This patch adds a spinlock when processing module tables. Note, that for correct backtrace, you need recent binutils. Binutils 2.18 from Debian 5 produce garbage unwind tables. Binutils 2.21 work better (it sometimes forgets function frames, but at least it doesn't generate garbage). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Helge Deller <deller@gmx.de>
2016-06-04Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: - a few simple fixes for fallout from the recent gic-v3 changes - a workaround for a Cavium thunderX erratum - a bugfix for the pic32 irqchip to make external interrupts work proper - a missing return value in the generic IPI management code * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-pic32-evic: Fix bug with external interrupts. irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144 irqchip/gic-v3: Fix quiescence check in gic_enable_redist irqchip/gic-v3: Fix copy+paste mistakes in defines irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask genirq: Fix missing return value in irq_destroy_ipi()
2016-06-04Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds1-1/+1
Pull ARM fix from Russell King: "Just one fix to the ptrace code, spotted by Simon Marchi, where if a thread migrates to a different CPU and the VFP registers are changed through ptrace, the application doesn't see the updated VFP registers" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: fix PTRACE_SETVFPREGS on SMP systems
2016-06-04Merge tag 'arm64-fixes' of ↵Linus Torvalds13-44/+74
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The main thing here is reviving hugetlb support using contiguous ptes, which we ended up reverting at the last minute in 4.5 pending a fix which went into the core mm/ code during the recent merge window. - Revert a previous revert and get hugetlb going with contiguous hints - Wire up missing compat syscalls - Enable CONFIG_SET_MODULE_RONX by default - Add missing line to our compat /proc/cpuinfo output - Clarify levels in our page table dumps - Fix booting with RANDOMIZE_TEXT_OFFSET enabled - Misc fixes to the ARM CPU PMU driver (refcounting, probe failure) - Remove some dead code and update a comment" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled arm64: move {PAGE,CONT}_SHIFT into Kconfig arm64: mm: dump: log span level arm64: update stale PAGE_OFFSET comment drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu drivers/perf: arm_pmu: Fix reference count of a device_node in of_pmu_irq_cfg arm64: report CPU number in bad_mode arm64: unistd32.h: wire up missing syscalls for compat tasks arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks arm64: enable CONFIG_SET_MODULE_RONX by default arm64: Remove orphaned __addr_ok() definition Revert "arm64: hugetlb: partial revert of 66b3923a1a0f"
2016-06-04Merge tag 'powerpc-4.7-2' of ↵Linus Torvalds6-38/+68
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Handle RTAS delay requests in configure_bridge from Russell Currey - Refactor the configure_bridge RTAS tokens from Russell Currey - Fix definition of SIAR and SDAR registers from Thomas Huth - Use privileged SPR number for MMCR2 from Thomas Huth - Update LPCR only if it is powernv from Aneesh Kumar K.V - Fix the reference bit update when handling hash fault from Aneesh Kumar K.V - Add missing tlb flush from Aneesh Kumar K.V - Add POWER8NVL support to ibm,client-architecture-support call from Thomas Huth * tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call powerpc/mm/radix: Add missing tlb flush powerpc/mm/hash: Fix the reference bit update when handling hash fault powerpc/mm/radix: Update LPCR only if it is powernv powerpc: Use privileged SPR number for MMCR2 powerpc: Fix definition of SIAR and SDAR registers powerpc/pseries/eeh: Refactor the configure_bridge RTAS tokens powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge
2016-06-03arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabledMark Rutland1-1/+3
With ARM64_64K_PAGES and RANDOMIZE_TEXT_OFFSET enabled, we hit the following issue on the boot: kernel BUG at arch/arm64/mm/mmu.c:480! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.6.0 #310 Hardware name: ARM Juno development board (r2) (DT) task: ffff000008d58a80 ti: ffff000008d30000 task.ti: ffff000008d30000 PC is at map_kernel_segment+0x44/0xb0 LR is at paging_init+0x84/0x5b0 pc : [<ffff000008c450b4>] lr : [<ffff000008c451a4>] pstate: 600002c5 Call trace: [<ffff000008c450b4>] map_kernel_segment+0x44/0xb0 [<ffff000008c451a4>] paging_init+0x84/0x5b0 [<ffff000008c42728>] setup_arch+0x198/0x534 [<ffff000008c40848>] start_kernel+0x70/0x388 [<ffff000008c401bc>] __primary_switched+0x30/0x74 Commit 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text segment mapping") removed the alignment between the .head.text and .text sections, and used the _text rather than the _stext interval for mapping the .text segment. Prior to this commit _stext was always section aligned and didn't cause any issue even when RANDOMIZE_TEXT_OFFSET was enabled. Since that alignment has been removed and _text is used to map the .text segment, we need ensure _text is always page aligned when RANDOMIZE_TEXT_OFFSET is enabled. This patch adds logic to TEXT_OFFSET fuzzing to ensure that the offset is always aligned to the kernel page size. To ensure this, we rely on the PAGE_SHIFT being available via Kconfig. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Sudeep Holla <sudeep.holla@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Fixes: 7eb90f2ff7e3 ("arm64: cover the .head.text section in the .text segment mapping") Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03arm64: move {PAGE,CONT}_SHIFT into KconfigMark Rutland2-10/+14
In some cases (e.g. the awk for CONFIG_RANDOMIZE_TEXT_OFFSET) we would like to make use of PAGE_SHIFT outside of code that can include the usual header files. Add a new CONFIG_ARM64_PAGE_SHIFT for this, likewise with ARM64_CONT_SHIFT for consistency. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03arm64: mm: dump: log span levelMark Rutland1-1/+7
The page table dump code logs spans of entries at the same level (pgd/pud/pmd/pte) which have the same attributes. While we log the (decoded) attributes, we don't log the level, which leaves the output ambiguous and/or confusing in some cases. For example: 0xffff800800000000-0xffff800980000000 6G RW NX SHD AF BLK UXN MEM/NORMAL If using 4K pages, this may describe a span of 6 1G block entries at the PGD/PUD level, or 3072 2M block entries at the PMD level. This patch adds the page table level to each output line, removing this ambiguity. For the example above, this will produce: 0xffffffc800000000-0xffffffc980000000 6G PUD RW NX SHD AF BLK UXN MEM/NORMAL When 3 level tables are in use, and we use the asm-generic/nopud.h definitions, the dump code treats each entry in the PGD as a 1 element table at the PUD level, and logs spans as being PUDs, which can be confusing. To counteract this, the "PUD" mnemonic is replaced with "PGD" when CONFIG_PGTABLE_LEVELS <= 3. Likewise for "PMD" when CONFIG_PGTABLE_LEVELS <= 2. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Huang Shijie <shijie.huang@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03arm64: update stale PAGE_OFFSET commentMark Rutland1-1/+2
Commit ab893fb9f1b17f02 ("arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region") logically split KIMAGE_VADDR from PAGE_OFFSET, and since commit f9040773b7bbbd9e ("arm64: move kernel image to base of vmalloc area") the two have been distinct values. Unfortunately, neither commit updated the comment above these definitions, which now erroneously states that PAGE_OFFSET is the start of the kernel image rather than the start of the linear mapping. This patch fixes said comment, and introduces an explanation of KIMAGE_VADDR. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03arm64: report CPU number in bad_modeMark Rutland1-2/+3
If we take an exception we don't expect (e.g. SError), we report this in the bad_mode handler with pr_crit. Depending on the configured log level, we may or may not log additional information in functions called subsequently. Notably, the messages in dump_stack (including the CPU number) are printed with KERN_DEFAULT and may not appear. Some exceptions have an IMPLEMENTATION DEFINED ESR_ELx.ISS encoding, and knowing the CPU number is crucial to correctly decode them. To ensure that this is always possible, we should log the CPU number along with the ESR_ELx value, so we are not reliant on subsequent logs or additional printk configuration options. This patch logs the CPU number in bad_mode such that it is possible for a developer to decode these exceptions, provided access to sufficient documentation. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Al Grant <Al.Grant@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Martin <dave.martin@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-31/+60
Pull KVM fixes from Radim Krčmář: "ARM: - two fixes for 4.6 vgic [Christoffer] (cc stable) - six fixes for 4.7 vgic [Marc] x86: - six fixes from syzkaller reports [Paolo] (two of them cc stable) - allow OS X to boot [Dmitry] - don't trust compilers [Nadav]" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR KVM: Handle MSR_IA32_PERF_CTL KVM: x86: avoid write-tearing of TDP KVM: arm/arm64: vgic-new: Removel harmful BUG_ON arm64: KVM: vgic-v3: Relax synchronization when SRE==1 arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1 arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value KVM: arm/arm64: vgic-v3: Always resample level interrupts KVM: arm/arm64: vgic-v2: Always resample level interrupts KVM: arm/arm64: vgic-v3: Clear all dirty LRs KVM: arm/arm64: vgic-v2: Clear all dirty LRs
2016-06-02irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144Ganapatrao Kulkarni1-0/+9
The erratum fixes the hang of ITS SYNC command by avoiding inter node io and collections/cpu mapping on thunderx dual-socket platform. This fix is only applicable for Cavium's ThunderX dual-socket platform. Reviewed-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-06-02KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGSPaolo Bonzini1-0/+5
MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to any of bits 63:32. However, this is not detected at KVM_SET_DEBUGREGS time, and the next KVM_RUN oopses: general protection fault: 0000 [#1] SMP CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 [...] Call Trace: [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm] [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480 [<ffffffff812418a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 RIP [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40 RSP <ffff88005836bd50> Testcase (beautified/reduced from syzkaller output): #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[8]; int main() { struct kvm_debugregs dr = { 0 }; r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); memcpy(&dr, "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72" "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8" "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9" "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb", 48); r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr); r[6] = ioctl(r[4], KVM_RUN, 0); } Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: fail KVM_SET_VCPU_EVENTS with invalid exception numberPaolo Bonzini1-0/+4
This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return EINVAL. It causes a WARN from exception_type: WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]() CPU: 3 PID: 16732 Comm: a.out Tainted: G W 4.4.6-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e 0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2 ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001 Call Trace: [<ffffffff813b542e>] dump_stack+0x63/0x85 [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm] [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm] [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 ---[ end trace b1a0391266848f50 ]--- Testcase (beautified/reduced from syzkaller output): #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <fcntl.h> #include <sys/ioctl.h> #include <linux/kvm.h> long r[31]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0); struct kvm_vcpu_events ve = { .exception.injected = 1, .exception.nr = 0xd4 }; r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve); r[30] = ioctl(r[7], KVM_RUN, 0); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUIDPaolo Bonzini1-10/+12
This causes an ugly dmesg splat. Beautified syzkaller testcase: #include <unistd.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <fcntl.h> #include <linux/kvm.h> long r[8]; int main() { struct kvm_cpuid2 c = { 0 }; r[2] = open("/dev/kvm", O_RDWR); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 0x8); r[7] = ioctl(r[4], KVM_SET_CPUID, &c); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDRPaolo Bonzini1-1/+1
Found by syzkaller: WARNING: CPU: 3 PID: 15175 at arch/x86/kvm/x86.c:7705 __x86_set_memory_region+0x1dc/0x1f0 [kvm]() CPU: 3 PID: 15175 Comm: a.out Tainted: G W 4.4.6-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 0000000000000286 00000000950899a7 ffff88011ab3fbf0 ffffffff813b542e 0000000000000000 ffffffffa0966496 ffff88011ab3fc28 ffffffff810a40f2 00000000000001fd 0000000000003000 ffff88014fc50000 0000000000000000 Call Trace: [<ffffffff813b542e>] dump_stack+0x63/0x85 [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa09251cc>] __x86_set_memory_region+0x1dc/0x1f0 [kvm] [<ffffffffa092521b>] x86_set_memory_region+0x3b/0x60 [kvm] [<ffffffffa09bb61c>] vmx_set_tss_addr+0x3c/0x150 [kvm_intel] [<ffffffffa092f4d4>] kvm_arch_vm_ioctl+0x654/0xbc0 [kvm] [<ffffffffa091d31a>] kvm_vm_ioctl+0x9a/0x6f0 [kvm] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 Testcase: #include <unistd.h> #include <sys/ioctl.h> #include <fcntl.h> #include <string.h> #include <linux/kvm.h> long r[8]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", O_RDONLY|O_TRUNC); r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul); r[5] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul); r[7] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul); return 0; } Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: Handle MSR_IA32_PERF_CTLDmitry Bilunov1-0/+1
Intel CPUs having Turbo Boost feature implement an MSR to provide a control interface via rdmsr/wrmsr instructions. One could detect the presence of this feature by issuing one of these instructions and handling the #GP exception which is generated in case the referenced MSR is not implemented by the CPU. KVM's vCPU model behaves exactly as a real CPU in this case by injecting a fault when MSR_IA32_PERF_CTL is called (which KVM does not support). However, some operating systems use this register during an early boot stage in which their kernel is not capable of handling #GP correctly, causing #DP and finally a triple fault effectively resetting the vCPU. This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the crashes. Signed-off-by: Dmitry Bilunov <kmeaw@yandex-team.ru> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02KVM: x86: avoid write-tearing of TDPNadav Amit1-4/+4
In theory, nothing prevents the compiler from write-tearing PTEs, or split PTE writes. These partially-modified PTEs can be fetched by other cores and cause mayhem. I have not really encountered such case in real-life, but it does seem possible. For example, the compiler may try to do something creative for kvm_set_pte_rmapp() and perform multiple writes to the PTE. Signed-off-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-06-02ARM: fix PTRACE_SETVFPREGS on SMP systemsRussell King1-1/+1
PTRACE_SETVFPREGS fails to properly mark the VFP register set to be reloaded, because it undoes one of the effects of vfp_flush_hwstate(). Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to an invalid CPU number, but vfp_set() overwrites this with the original CPU number, thereby rendering the hardware state as apparently "valid", even though the software state is more recent. Fix this by reverting the previous change. Cc: <stable@vger.kernel.org> Fixes: 8130b9d7b9d8 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers") Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Simon Marchi <simon.marchi@ericsson.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2016-06-01arm64: unistd32.h: wire up missing syscalls for compat tasksWill Deacon2-1/+9
We're missing entries for mlock2, copy_file_range, preadv2 and pwritev2 in our compat syscall table, so hook them up. Only the last two need compat wrappers. Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds11-100/+215
Pull sparc fixes from David Miller: "sparc64 mmu context allocation and trap return bug fixes" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix return from trap window fill crashes. sparc: Harden signal return frame checks. sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
2016-06-01powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support callThomas Huth1-0/+1
If we do not provide the PVR for POWER8NVL, a guest on this system currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU does not provide a generic PowerISA 2.07 mode yet. So some new instructions from POWER8 (like "mtvsrd") get disabled for the guest, resulting in crashes when using code compiled explicitly for POWER8 (e.g. with the "-mcpu=power8" option of GCC). Fixes: ddee09c099c3 ("powerpc: Add PVR for POWER8NVL processor") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01powerpc/mm/radix: Add missing tlb flushAneesh Kumar K.V1-4/+1
This should not have any impact on hash, because hash does tlb invalidate with every pte update and we don't implement flush_tlb_* functions for hash. With radix we should make an explicit call to flush tlb outside pte update. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01powerpc/mm/hash: Fix the reference bit update when handling hash faultAneesh Kumar K.V1-2/+20
When we converted the asm routines to C functions, we missed updating HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower bits from pte via. andi. r3,r30,0x1fe /* Get basic set of flags */ We also update the code such that we won't update the Change bit ('C' bit) always. This was added by commit c5cf0e30bf3d8 ("powerpc: Fix buglet with MMU hash management"). With hash64, we need to make sure that hardware doesn't do a pte update directly. This is because we do end up with entries in TLB with no hash page table entry. This happens because when we find a hash bucket full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". For more info look at commit 0608d692463("powerpc/mm: Always invalidate tlb on hpte invalidate and update"). Thus it's critical that valid hash PTEs always have reference bit set and writeable ones have change bit set. We do this by hashing a non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and thus R) when hashing anything else in. Any attempt by Linux at clearing those bits also removes the corresponding hash entry. Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always. We don't really need to do that because we never map a RW pte entry without setting 'C' bit. On READ fault on a RW pte entry, we still map it READ only, hence a store update in the page will still cause a hash pte fault. This patch reverts the part of commit c5cf0e30bf3d8 ("[PATCH] powerpc: Fix buglet with MMU hash management") and retain the updatepp part. - If we hit the updatepp path on native, the old code without that commit, would fail to set C bcause native_hpte_updatepp() was implemented to filter the same bits as H_PROTECT and not let C through thus we would "upgrade" a RO HPTE to RW without setting C thus causing the bug. So the real fix in that commit was the change to native_hpte_updatepp Fixes: 89ff725051d1 ("powerpc/mm: Convert __hash_page_64K to C") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01powerpc/mm/radix: Update LPCR only if it is powernvAneesh Kumar K.V1-13/+10
LPCR cannot be updated when running in guest mode. Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-31arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasksCatalin Marinas2-3/+9
This patch brings the PER_LINUX32 /proc/cpuinfo format more in line with the 32-bit ARM one by providing an additional line: model name : ARMv8 Processor rev X (v8l) Cc: <stable@vger.kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-31Merge branch 'for-linus' of ↵Linus Torvalds8-68/+103
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Three bugs fixes and an update for the default configuration" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: fix info leak in do_sigsegv s390/config: update default configuration s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop s390/bpf: reduce maximum program size to 64 KB
2016-05-31arm64: KVM: vgic-v3: Relax synchronization when SRE==1Marc Zyngier1-7/+16
The GICv3 backend of the vgic is quite barrier heavy, in order to ensure synchronization of the system registers and the memory mapped view for a potential GICv2 guest. But when the guest is using a GICv3 model, there is absolutely no need to execute all these heavy barriers, and it is actually beneficial to avoid them altogether. This patch makes the synchonization conditional, and ensures that we do not change the EL1 SRE settings if we do not need to. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-31arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1Marc Zyngier1-4/+2
Both our GIC emulations are "strict", in the sense that we either emulate a GICv2 or a GICv3, and not a GICv3 with GICv2 legacy support. But when running on a GICv3 host, we still allow the guest to tinker with the ICC_SRE_EL1 register during its time slice: it can switch SRE off, observe that it is off, and yet on the next world switch, find the SRE bit to be set again. Not very nice. An obvious solution is to always trap accesses to ICC_SRE_EL1 (by clearing ICC_SRE_EL2.Enable), and to let the handler return the programmed value on a read, or ignore the write. That way, the guest can always observe that our GICv3 is SRE==1 only. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-31arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE valueMarc Zyngier1-1/+12
When we trap ICC_SRE_EL1, we handle it as RAZ/WI. It would be more correct to actual make it RO, and return the configured value when read. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-31KVM: arm/arm64: vgic-v3: Clear all dirty LRsChristoffer Dall1-4/+3
When saving the state of the list registers, it is critical to reset them zero, as we could otherwise leave unexpected EOI interrupts pending for virtual level interrupts. Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-05-31arm64: enable CONFIG_SET_MODULE_RONX by defaultMark Rutland1-12/+13
The SET_MODULE_RONX protections are effectively the same as the DEBUG_RODATA protections we enabled by default back in commit 57efac2f7108e325 ("arm64: enable CONFIG_DEBUG_RODATA by default"). It seems unusual to have one but not the other. As evidenced by the help text, the rationale appears to be that SET_MODULE_RONX interacts poorly with tracing and patching, but both of these make use of the insn framework, which takes SET_MODULE_RONX into account. Any remaining issues are bugs which should be fixed regardless of the default state of the option. This patch enables DEBUG_SET_MODULE_RONX by default, and replaces the help text with a new wording derived from the DEBUG_RODATA help text, which better describes the functionality. Previously, the DEBUG_RODATA entry was inconsistently indented with spaces, which are replaced with tabs as with the other Kconfig entries. Additionally, the wording of recommended defaults is made consistent for all options. These are placed in a new paragraph, unquoted, as a full sentence (with a period/full stop) as this appears to be the most common form per $(git grep 'in doubt'). Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Laura Abbott <labbott@fedoraproject.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-31arm64: Remove orphaned __addr_ok() definitionRobin Murphy1-13/+0
Since commit 12a0ef7b0ac3 ("arm64: use generic strnlen_user and strncpy_from_user functions"), the definition of __addr_ok() has been languishing unused; eradicate the sucker. CC: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-31powerpc: Use privileged SPR number for MMCR2Thomas Huth1-1/+1
We are already using the privileged versions of MMCR0, MMCR1 and MMCRA in the kernel, so for MMCR2, we should better use the privileged versions, too, to be consistent. Fixes: 240686c13687 ("powerpc: Initialise PMU related regs on Power8") Cc: stable@vger.kernel.org # v3.10+ Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-31powerpc: Fix definition of SIAR and SDAR registersThomas Huth1-2/+2
The SIAR and SDAR registers are available twice, one time as SPRs 780 / 781 (unprivileged, but read-only), and one time as the SPRs 796 / 797 (privileged, but read and write). The Linux kernel code currently uses the unprivileged SPRs - while this is OK for reading, writing to that register of course does not work. Since the KVM code tries to write to this register, too (see the mtspr in book3s_hv_rmhandlers.S), the contents of this register sometimes get lost for the guests, e.g. during migration of a VM. To fix this issue, simply switch to the privileged SPR numbers instead. Cc: stable@vger.kernel.org Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-31Revert "arm64: hugetlb: partial revert of 66b3923a1a0f"Will Deacon1-0/+14
This reverts commit ff7925848b50050732ac0401e0acf27e8b241d7b. Now that the contiguous-hint hugetlb regression has been debugged and fixed upstream by 66ee95d16a7f ("mm: exclude HugeTLB pages from THP page_mapped() logic"), we can revert the previous partial revert of this feature. Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-05-30powerpc/pseries/eeh: Refactor the configure_bridge RTAS tokensRussell Currey1-16/+12
The RTAS calls "ibm,configure-pe" and "ibm,configure-bridge" perform the same actions, however the former can skip configuration if unnecessary. The existing code treats them as different tokens even though only one will ever be called. Refactor this by making a single token that is assigned during init. Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-30powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridgeRussell Currey1-15/+36
In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the spec states that values of 9900-9905 can be returned, indicating that software should delay for 10^x (where x is the last digit, i.e. 990x) milliseconds and attempt the call again. Currently, the kernel doesn't know about this, and respecting it fixes some PCI failures when the hypervisor is busy. The delay is capped at 0.2 seconds. Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-30sparc64: Fix return from trap window fill crashes.David S. Miller5-52/+116
We must handle data access exception as well as memory address unaligned exceptions from return from trap window fill faults, not just normal TLB misses. Otherwise we can get an OOPS that looks like this: ld-linux.so.2(36808): Kernel bad sw trap 5 [#1] CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34 task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000 TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted TPC: <do_sparc64_fault+0x5c4/0x700> g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001 g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001 o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358 o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c RPC: <do_sparc64_fault+0x5bc/0x700> l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000 l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000 i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c I7: <user_rtt_fill_fixup+0x6c/0x7c> Call Trace: [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c The window trap handlers are slightly clever, the trap table entries for them are composed of two pieces of code. First comes the code that actually performs the window fill or spill trap handling, and then there are three instructions at the end which are for exception processing. The userland register window fill handler is: add %sp, STACK_BIAS + 0x00, %g1; \ ldxa [%g1 + %g0] ASI, %l0; \ mov 0x08, %g2; \ mov 0x10, %g3; \ ldxa [%g1 + %g2] ASI, %l1; \ mov 0x18, %g5; \ ldxa [%g1 + %g3] ASI, %l2; \ ldxa [%g1 + %g5] ASI, %l3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %l4; \ ldxa [%g1 + %g2] ASI, %l5; \ ldxa [%g1 + %g3] ASI, %l6; \ ldxa [%g1 + %g5] ASI, %l7; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i0; \ ldxa [%g1 + %g2] ASI, %i1; \ ldxa [%g1 + %g3] ASI, %i2; \ ldxa [%g1 + %g5] ASI, %i3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i4; \ ldxa [%g1 + %g2] ASI, %i5; \ ldxa [%g1 + %g3] ASI, %i6; \ ldxa [%g1 + %g5] ASI, %i7; \ restored; \ retry; nop; nop; nop; nop; \ b,a,pt %xcc, fill_fixup_dax; \ b,a,pt %xcc, fill_fixup_mna; \ b,a,pt %xcc, fill_fixup; And the way this works is that if any of those memory accesses generate an exception, the exception handler can revector to one of those final three branch instructions depending upon which kind of exception the memory access took. In this way, the fault handler doesn't have to know if it was a spill or a fill that it's handling the fault for. It just always branches to the last instruction in the parent trap's handler. For example, for a regular fault, the code goes: winfix_trampoline: rdpr %tpc, %g3 or %g3, 0x7c, %g3 wrpr %g3, %tnpc done All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the trap time program counter, we'll get that final instruction in the trap handler. On return from trap, we have to pull the register window in but we do this by hand instead of just executing a "restore" instruction for several reasons. The largest being that from Niagara and onward we simply don't have enough levels in the trap stack to fully resolve all possible exception cases of a window fault when we are already at trap level 1 (which we enter to get ready to return from the original trap). This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's code branches directly to these to do the window fill by hand if necessary. Now if you look at them, we'll see at the end: ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; And oops, all three cases are handled like a fault. This doesn't work because each of these trap types (data access exception, memory address unaligned, and faults) store their auxiliary info in different registers to pass on to the C handler which does the real work. So in the case where the stack was unaligned, the unaligned trap handler sets up the arg registers one way, and then we branched to the fault handler which expects them setup another way. So the FAULT_TYPE_* value ends up basically being garbage, and randomly would generate the backtrace seen above. Reported-by: Nick Alcock <nix@esperi.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29sparc: Harden signal return frame checks.David S. Miller5-45/+92
All signal frames must be at least 16-byte aligned, because that is the alignment we explicitly create when we build signal return stack frames. All stack pointers must be at least 8-byte aligned. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds58-248/+353
Pull more MIPS updates from Ralf Baechle: "This is the secondnd batch of MIPS patches for 4.7. Summary: CPS: - Copy EVA configuration when starting secondary VPs. EIC: - Clear Status IPL. Lasat: - Fix a few off by one bugs. lib: - Mark intrinsics notrace. Not only are the intrinsics uninteresting, it would cause infinite recursion. MAINTAINERS: - Add file patterns for MIPS BRCM device tree bindings. - Add file patterns for mips device tree bindings. MT7628: - Fix MT7628 pinmux typos. - wled_an pinmux gpio. - EPHY LEDs pinmux support. Pistachio: - Enable KASLR VDSO: - Build microMIPS VDSO for microMIPS kernels. - Fix aliasing warning by building with `-fno-strict-aliasing' for debugging but also tracing them might result in recursion. Misc: - Add missing FROZEN hotplug notifier transitions. - Fix clk binding example for varioius PIC32 devices. - Fix cpu interrupt controller node-names in the DT files. - Fix XPA CPU feature separation. - Fix write_gc0_* macros when writing zero. - Add inline asm encoding helpers. - Add missing VZ accessor microMIPS encodings. - Fix little endian microMIPS MSA encodings. - Add 64-bit HTW fields and fix its configuration. - Fix sigreturn via VDSO on microMIPS kernel. - Lots of typo fixes. - Add definitions of SegCtl registers and use them" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (49 commits) MIPS: Add missing FROZEN hotplug notifier transitions MIPS: Build microMIPS VDSO for microMIPS kernels MIPS: Fix sigreturn via VDSO on microMIPS kernel MIPS: devicetree: fix cpu interrupt controller node-names MIPS: VDSO: Build with `-fno-strict-aliasing' MIPS: Pistachio: Enable KASLR MIPS: lib: Mark intrinsics notrace MIPS: Fix 64-bit HTW configuration MIPS: Add 64-bit HTW fields MAINTAINERS: Add file patterns for mips device tree bindings MAINTAINERS: Add file patterns for mips brcm device tree bindings MIPS: Simplify DSP instruction encoding macros MIPS: Add missing tlbinvf/XPA microMIPS encodings MIPS: Fix little endian microMIPS MSA encodings MIPS: Add missing VZ accessor microMIPS encodings MIPS: Add inline asm encoding helpers MIPS: Spelling fix lets -> let's MIPS: VR41xx: Fix typo MIPS: oprofile: Fix typo MIPS: math-emu: Fix typo ...
2016-05-29Merge branch 'hash' of git://ftp.sciencehorizons.net/linuxLinus Torvalds7-0/+204
Pull string hash improvements from George Spelvin: "This series does several related things: - Makes the dcache hash (fs/namei.c) useful for general kernel use. (Thanks to Bruce for noticing the zero-length corner case) - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the above. - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two 32-bit multiplies will do well enough. - Rids the world of the bad hash multipliers in hash_32. This finishes the job started in commit 689de1d6ca95 ("Minimal fix-up of bad hashing behavior of hash_64()") The vast majority of Linux architectures have hardware support for 32x32-bit multiply and so derive no benefit from "simplified" multipliers. The few processors that do not (68000, h8/300 and some models of Microblaze) have arch-specific implementations added. Those patches are last in the series. - Overhauls the dcache hash mixing. The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion. Replaced with a much more careful design that's simultaneously faster and better. (My own invention, as there was noting suitable in the literature I could find. Comments welcome!) - Modify the hash_name() loop to skip the initial HASH_MIX(). This would let us salt the hash if we ever wanted to. - Sort out partial_name_hash(). The hash function is declared as using a long state, even though it's truncated to 32 bits at the end and the extra internal state contributes nothing to the result. And some callers do odd things: - fs/hfs/string.c only allocates 32 bits of state - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes - Modify bytemask_from_count to handle inputs of 1..sizeof(long) rather than 0..sizeof(long)-1. This would simplify users other than full_name_hash" Special thanks to Bruce Fields for testing and finding bugs in v1. (I learned some humbling lessons about "obviously correct" code.) On the arch-specific front, the m68k assembly has been tested in a standalone test harness, I've been in contact with the Microblaze maintainers who mostly don't care, as the hardware multiplier is never omitted in real-world applications, and I haven't heard anything from the H8/300 world" * 'hash' of git://ftp.sciencehorizons.net/linux: h8300: Add <asm/hash.h> microblaze: Add <asm/hash.h> m68k: Add <asm/hash.h> <linux/hash.h>: Add support for architecture-specific functions fs/namei.c: Improve dcache hash function Eliminate bad hash multipliers from hash_32() and hash_64() Change hash_64() return value to 32 bits <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() fs/namei.c: Add hashlen_string() function Pull out string hash to <linux/stringhash.h>
2016-05-28h8300: Add <asm/hash.h>George Spelvin2-0/+54
This will improve the performance of hash_32() and hash_64(), but due to complete lack of multi-bit shift instructions on H8, performance will still be bad in surrounding code. Designing H8-specific hash algorithms to work around that is a separate project. (But if the maintainers would like to get in touch...) Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp
2016-05-28microblaze: Add <asm/hash.h>George Spelvin2-0/+82
Microblaze is an FPGA soft core that can be configured various ways. If it is configured without a multiplier, the standard __hash_32() will require a call to __mulsi3, which is a slow software loop. Instead, use a shift-and-add sequence for the constant multiply. GCC knows how to do this, but it's not as clever as some. Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com>