summaryrefslogtreecommitdiff
path: root/arch/sparc
AgeCommit message (Collapse)AuthorFilesLines
2014-10-31sparc64: Fix inconsistent max-physical-address defines.David S. Miller1-2/+4
commit f998c9c0d663b013e3aa3ba78908396c8c497218 upstream. Some parts of the code use '41' others use '42', make them all use the same value. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-31sparc64: Document the shift counts used to validate linear kernel addresses.David S. Miller3-6/+24
commit bb7b435388b9f035ecfb16f42b5c6bf428359c63 upstream. This way we can see exactly what they are derived from, and in particular how they would change if we were to use a different PAGE_OFFSET value. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-31sparc64: Define PAGE_OFFSET in terms of physical address bits.David S. Miller1-1/+3
commit e0a45e3580a033669b24b04c3535515d69bb9702 upstream. This makes clearer the implications for a given choosen value. Based upon patches by Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-31sparc64: Use PAGE_OFFSET instead of a magic constant.David S. Miller1-7/+7
commit 922631b988d8cbb821ebe2c67feffc0b95264894 upstream. This pertains to all of the computations of the kernel fast TLB miss xor values. Based upon a patch by Bob Picco. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-31sparc64: Clean up 64-bit mmap exclusion defines.David S. Miller3-7/+15
commit c920745e6964bd4b9315a17b018d83fad66010d3 upstream. Older UltraSPARC chips had an address space hole due to the MMU only supporting 44-bit virtual addresses. The top end of this hole also has the same value as the current definition of PAGE_OFFSET, so this can be confusing. Consolidate the defines for the userspace mmap exclusion range into page_64.h and use them in sys_sparc_64.c and hugetlbpage.c Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-31sparc64: Implement __get_user_pages_fast().David S. Miller1-0/+30
[ Upstream commit 06090e8ed89ea2113a236befb41f71d51f100e60 ] It is not sufficient to only implement get_user_pages_fast(), you must also implement the atomic version __get_user_pages_fast() otherwise you end up using the weak symbol fallback implementation which simply returns zero. This is dangerous, because it causes the futex code to loop forever if transparent hugepages are supported (see get_futex_key()). Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Fix register corruption in top-most kernel stack frame during boot.David S. Miller10-62/+42
[ Upstream commit ef3e035c3a9b81da8a778bc333d10637acf6c199 ] Meelis Roos reported that kernels built with gcc-4.9 do not boot, we eventually narrowed this down to only impacting machines using UltraSPARC-III and derivitive cpus. The crash happens right when the first user process is spawned: [ 54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 [ 54.451346] [ 54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab #96 [ 54.666431] Call Trace: [ 54.698453] [0000000000762f8c] panic+0xb0/0x224 [ 54.759071] [000000000045cf68] do_exit+0x948/0x960 [ 54.823123] [000000000042cbc0] fault_in_user_windows+0xe0/0x100 [ 54.902036] [0000000000404ad0] __handle_user_windows+0x0/0x10 [ 54.978662] Press Stop-A (L1-A) to return to the boot prom [ 55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 Further investigation showed that compiling only per_cpu_patch() with an older compiler fixes the boot. Detailed analysis showed that the function is not being miscompiled by gcc-4.9, but it is using a different register allocation ordering. With the gcc-4.9 compiled function, something during the code patching causes some of the %i* input registers to get corrupted. Perhaps we have a TLB miss path into the firmware that is deep enough to cause a register window spill and subsequent restore when we get back from the TLB miss trap. Let's plug this up by doing two things: 1) Stop using the firmware stack for client interface calls into the firmware. Just use the kernel's stack. 2) As soon as we can, call into a new function "start_early_boot()" to put a one-register-window buffer between the firmware's deepest stack frame and the top-most initial kernel one. Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Increase size of boot string to 1024 bytesDave Kleikamp1-1/+4
[ Upstream commit 1cef94c36bd4d79b5ae3a3df99ee0d76d6a4a6dc ] This is the longest boot string that silo supports. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Bob Picco <bob.picco@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Do not define thread fpregs save area as zero-length array.David S. Miller1-1/+2
[ Upstream commit e2653143d7d79a49f1a961aeae1d82612838b12c ] This breaks the stack end corruption detection facility. What that facility does it write a magic value to "end_of_stack()" and checking to see if it gets overwritten. "end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is the beginning of the FPU register save area. So once the user uses the FPU, the magic value is overwritten and the debug checks trigger. Fix this by making the size explicit. Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we are limited to 7 levels of FPU state saves. So each FPU register set is 256 bytes, allocate 256 * 7 for the fpregs area. Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Fix FPU register corruption with AES crypto offload.David S. Miller2-1/+21
[ Upstream commit f4da3628dc7c32a59d1fb7116bb042e6f436d611 ] The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the key material is preloaded into the FPU registers, and then we loop over and over doing the crypt operation, reusing those pre-cooked key registers. There are intervening blkcipher*() calls between the crypt operation calls. And those might perform memcpy() and thus also try to use the FPU. The sparc64 kernel FPU usage mechanism is designed to allow such recursive uses, but with a catch. There has to be a trap between the two FPU using threads of control. The mechanism works by, when the FPU is already in use by the kernel, allocating a slot for FPU saving at trap time. Then if, within the trap handler, we try to use the FPU registers, the pre-trap FPU register state is saved into the slot. Then at trap return time we notice this and restore the pre-trap FPU state. Over the long term there are various more involved ways we can make this work, but for a quick fix let's take advantage of the fact that the situation where this happens is very limited. All sparc64 chips that support the crypto instructiosn also are using the Niagara4 memcpy routine, and that routine only uses the FPU for large copies where we can't get the source aligned properly to a multiple of 8 bytes. We look to see if the FPU is already in use in this context, and if so we use the non-large copy path which only uses integer registers. Furthermore, we also limit this special logic to when we are doing kernel copy, rather than a user copy. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Fix lockdep warnings on reboot on Ultra-5David S. Miller1-3/+4
[ Upstream commit bdcf81b658ebc4c2640c3c2c55c8b31c601b6996 ] Inconsistently, the raw_* IRQ routines do not interact with and update the irqflags tracing and lockdep state, whereas the raw_* spinlock interfaces do. This causes problems in p1275_cmd_direct() because we disable hardirqs by hand using raw_local_irq_restore() and then do a raw_spin_lock() which triggers a lockdep trace because the CPU's hw IRQ state doesn't match IRQ tracing's internal software copy of that state. The CPU's irqs are disabled, yet current->hardirqs_enabled is true. ==================== reboot: Restarting system ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3536 check_flags+0x7c/0x240() DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) Modules linked in: openpromfs CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G W 3.17.0-dirty #145 Call Trace: [000000000045919c] warn_slowpath_common+0x5c/0xa0 [0000000000459210] warn_slowpath_fmt+0x30/0x40 [000000000048f41c] check_flags+0x7c/0x240 [0000000000493280] lock_acquire+0x20/0x1c0 [0000000000832b70] _raw_spin_lock+0x30/0x60 [000000000068f2fc] p1275_cmd_direct+0x1c/0x60 [000000000068ed28] prom_reboot+0x28/0x40 [000000000043610c] machine_restart+0x4c/0x80 [000000000047d2d4] kernel_restart+0x54/0x80 [000000000047d618] SyS_reboot+0x138/0x200 [00000000004060b4] linux_sparc_syscall32+0x34/0x60 ---[ end trace 5c439fe81c05a100 ]--- possible reason: unannotated irqs-off. irq event stamp: 2010267 hardirqs last enabled at (2010267): [<000000000049a358>] vprintk_emit+0x4b8/0x580 hardirqs last disabled at (2010266): [<0000000000499f08>] vprintk_emit+0x68/0x580 softirqs last enabled at (2010046): [<000000000045d278>] __do_softirq+0x378/0x4a0 softirqs last disabled at (2010039): [<000000000042bf08>] do_softirq_own_stack+0x28/0x40 Resetting ... ==================== Use local_* variables of the hw IRQ interfaces so that IRQ tracing sees all of our changes. Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Fix reversed start/end in flush_tlb_kernel_range()David S. Miller1-2/+2
[ Upstream commit 473ad7f4fb005d1bb727e4ef27d370d28703a062 ] When we have to split up a flush request into multiple pieces (in order to avoid the firmware range) we don't specify the arguments in the right order for the second piece. Fix the order, or else we get hangs as the code tries to flush "a lot" of entries and we get lockups like this: [ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032] [ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core [ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608 [ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000 [ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000 Not tainted [ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40> [ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000 [ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006 [ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000 [ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54 [ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0> [ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000 [ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138 [ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000 [ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24 [ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80> [ 4423.234193] Call Trace: [ 4423.239051] [0000000000530c24] free_vmap_area_noflush+0x64/0x80 [ 4423.251029] [0000000000531a7c] remove_vm_area+0x5c/0x80 [ 4423.261628] [0000000000531b80] __vunmap+0x20/0x120 [ 4423.271352] [000000000071cf18] n_tty_close+0x18/0x40 [ 4423.281423] [00000000007222b0] tty_ldisc_close+0x30/0x60 [ 4423.292183] [00000000007225a4] tty_ldisc_reinit+0x24/0xa0 [ 4423.303120] [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0 [ 4423.314232] [0000000000719aa0] __tty_hangup+0x280/0x3c0 [ 4423.324835] [0000000000724cb4] pty_close+0x134/0x1a0 [ 4423.334905] [000000000071aa24] tty_release+0x104/0x500 [ 4423.345316] [00000000005511d0] __fput+0x90/0x1e0 [ 4423.354701] [000000000047fa54] task_work_run+0x94/0xe0 [ 4423.365126] [0000000000404b44] __handle_signal+0xc/0x2c Fixes: 4ca9a23765da ("sparc64: Guard against flushing openfirmware mappings.") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc: Let memset return the address argumentAndreas Larsson1-4/+14
[ Upstream commit 74cad25c076a2f5253312c2fe82d1a4daecc1323 ] This makes memset follow the standard (instead of returning 0 on success). This is needed when certain versions of gcc optimizes around memset calls and assume that the address argument is preserved in %o0. Signed-off-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Move request_irq() from ldc_bind() to ldc_alloc()Sowmini Varadhan4-26/+28
[ Upstream commit c21c4ab0d6921f7160a43216fa6973b5924de561 ] The request_irq() needs to be done from ldc_alloc() to avoid the following (caught by lockdep) [00000000004a0738] __might_sleep+0xf8/0x120 [000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0 [00000000004faf80] request_threaded_irq+0x80/0x160 [000000000044f71c] ldc_bind+0x7c/0x220 [0000000000452454] vio_port_up+0x54/0xe0 [00000000101f6778] probe_disk+0x38/0x220 [sunvdc] [00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc] [0000000000451a88] vio_device_probe+0x48/0x60 [000000000074c56c] really_probe+0x6c/0x300 [000000000074c83c] driver_probe_device+0x3c/0xa0 [000000000074c92c] __driver_attach+0x8c/0xa0 [000000000074a6ec] bus_for_each_dev+0x6c/0xa0 [000000000074c1dc] driver_attach+0x1c/0x40 [000000000074b0fc] bus_add_driver+0xbc/0x280 Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: find_node adjustmentbob picco1-1/+4
[ Upstream commit 3dee9df54836d5f844f3d58281d3f3e6331b467f ] We have seen an issue with guest boot into LDOM that causes early boot failures because of no matching rules for node identitity of the memory. I analyzed this on my T4 and concluded there might not be a solution. I saw the issue in mainline too when booting into the control/primary domain - with guests configured. Note, this could be a firmware bug on some older machines. I'll provide a full explanation of the issues below. Should we not find a matching BEST latency group for a real address (RA) then we will assume node 0. On the T4-2 here with the information provided I can't see an alternative. Technically the LDOM shown below should match the MBLOCK to the favorable latency group. However other factors must be considered too. Were the memory controllers configured "fine" grained interleave or "coarse" grain interleaved - T4. Also should a "group" MD node be considered a NUMA node? There has to be at least one Machine Description (MD) "group" and hence one NUMA node. The group can have one or more latency groups (lg) - more than one memory controller. The current code chooses the smallest latency as the most favorable per group. The latency and lg information is in MLGROUP below. MBLOCK is the base and size of the RAs for the machine as fetched from OBP /memory "available" property. My machine has one MBLOCK but more would be possible - with holes? For a T4-2 the following information has been gathered: with LDOM guest MEMBLOCK configuration: memory size = 0x27f870000 memory.cnt = 0x3 memory[0x0] [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes memory[0x1] [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes memory[0x2] [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes reserved[0x1] [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes MBLOCK[0]: base[20000000] size[280000000] offset[0] (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values) (note: (RA + offset) & mask = val is the formula to detect a match for the memory controller. should there be no match for find_node node, a return value of -1 resulted for the node - BAD) There is one group. It has these forward links MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000] MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000] NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8]) (note: "val" is the best lg's (smallest latency) "match") no LDOM guest - bare metal MEMBLOCK configuration: memory size = 0xfdf2d0000 memory.cnt = 0x3 memory[0x0] [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes memory[0x1] [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes memory[0x2] [0x00000fff766000-0x00000fff771fff], 0xc000 bytes reserved.cnt = 0x2 reserved[0x0] [0x00000020800000-0x00000021a04580], 0x1204581 bytes reserved[0x1] [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes MBLOCK[0]: base[20000000] size[fe0000000] offset[0] there are two groups group node[16d5] MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000] MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000] NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8]) group node[171d] MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000] MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000] NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8]) (note: for this two "group" bare metal machine, 1/2 memory is in group one's lg and 1/2 memory is in group two's lg). Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Fix corrupted thread fault code.David S. Miller2-6/+6
[ Upstream commit 84bd6d8b9c0f06b3f188efb479c77e20f05e9a8a ] Every path that ends up at do_sparc64_fault() must install a valid FAULT_CODE_* bitmask in the per-thread fault code byte. Two paths leading to the label winfix_trampoline (which expects the FAULT_CODE_* mask in register %g4) were not doing so: 1) For pre-hypervisor TLB protection violation traps, if we took the 'winfix_trampoline' path we wouldn't have %g4 initialized with the FAULT_CODE_* value yet. Resulting in using the TLB_TAG_ACCESS register address value instead. 2) In the TSB miss path, when we notice that we are going to use a hugepage mapping, but we haven't allocated the hugepage TSB yet, we still have to take the window fixup case into consideration and in that particular path we leave %g4 not setup properly. Errors on this sort were largely invisible previously, but after commit 4ccb9272892c33ef1c19a783cfa87103b30c2784 ("sparc64: sun4v TLB error power off events") we now have a fault_code mask bit (FAULT_CODE_BAD_RA) that triggers due to this bug. FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS (see #1 above) and thus we get seemingly random bus errors triggered for user processes. Fixes: 4ccb9272892c ("sparc64: sun4v TLB error power off events") Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: sun4v TLB error power off eventsbob picco4-20/+34
[ Upstream commit 4ccb9272892c33ef1c19a783cfa87103b30c2784 ] We've witnessed a few TLB events causing the machine to power off because of prom_halt. In one case it was some nfs related area during rmmod. Another was an mmapper of /dev/mem. A more recent one is an ITLB issue with a bad pagesize which could be a hardware bug. Bugs happen but we should attempt to not power off the machine and/or hang it when possible. This is a DTLB error from an mmapper of /dev/mem: [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1 SUN4V-DTLB: TPC<0xfffff80100903e6c> SUN4V-DTLB: O7[fffff801081979d0] SUN4V-DTLB: O7<0xfffff801081979d0> SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2] . This is recent mainline for ITLB: [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc> [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8] [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8> [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4] . Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call prom_halt() and drop us to OF command prompt "ok". This isn't the case for LDOMs and the machine powers off. For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal one("1"). Otherwise, for %tl > 1, we proceed eventually to die_if_kernel(). The logic of this patch was partially inspired by David Miller's feedback. Power off of large sparc64 machines is painful. Plus die_if_kernel provides more context. A reset sequence isn't a brief period on large sparc64 but better than power-off/power-on sequence. Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc32: dma_alloc_coherent must honour gfp flagsDaniel Hellstrom1-2/+3
[ Upstream commit d1105287aabe88dbb3af825140badaa05cf0442c ] dma_zalloc_coherent() calls dma_alloc_coherent(__GFP_ZERO) but the sparc32 implementations sbus_alloc_coherent() and pci32_alloc_coherent() doesn't take the gfp flags into account. Tested on the SPARC32/LEON GRETH Ethernet driver which fails due to dma_alloc_coherent(__GFP_ZERO) returns non zeroed pages. Signed-off-by: Daniel Hellstrom <daniel@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Fix pcr_ops initialization and usage bugs.David S. Miller3-3/+8
[ Upstream commit 8bccf5b313180faefce38e0d1140f76e0f327d28 ] Christopher reports that perf_event_print_debug() can crash in uniprocessor builds. The crash is due to pcr_ops being NULL. This happens because pcr_arch_init() is only invoked by smp_cpus_done() which only executes in SMP builds. init_hw_perf_events() is closely intertwined with pcr_ops being setup properly, therefore: 1) Call pcr_arch_init() early on from init_hw_perf_events(), instead of from smp_cpus_done(). 2) Do not hook up a PMU type if pcr_ops is NULL after pcr_arch_init(). 3) Move init_hw_perf_events to a later initcall so that it we will be sure to invoke pcr_arch_init() after all cpus are brought up. Finally, guard the one naked sequence of pcr_ops dereferences in __global_pmu_self() with an appropriate NULL check. Reported-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31sparc64: Do not disable interrupts in nmi_cpu_busy()David S. Miller1-1/+0
[ Upstream commit 58556104e9cd0107a7a8d2692cf04ef31669f6e4 ] nmi_cpu_busy() is a SMP function call that just makes sure that all of the cpus are spinning using cpu cycles while the NMI test runs. It does not need to disable IRQs because we just care about NMIs executing which will even with 'normal' IRQs disabled. It is not legal to enable hard IRQs in a SMP cross call, in fact this bug triggers the BUG check in irq_work_run_list(): BUG_ON(!irqs_disabled()); Because now irq_work_run() is invoked from the tail of generic_smp_call_function_single_interrupt(). Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-26arch/sparc/math-emu/math_32.c: drop stray break operatorAndrey Utkin1-1/+1
[ Upstream commit 093758e3daede29cb4ce6aedb111becf9d4bfc57 ] This commit is a guesswork, but it seems to make sense to drop this break, as otherwise the following line is never executed and becomes dead code. And that following line actually saves the result of local calculation by the pointer given in function argument. So the proposed change makes sense if this code in the whole makes sense (but I am unable to analyze it in the whole). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81641 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-26sparc64: ldc_connect() should not return EINVAL when handshake is in progress.Sowmini Varadhan1-1/+1
[ Upstream commit 4ec1b01029b4facb651b8ef70bc20a4be4cebc63 ] The LDC handshake could have been asynchronously triggered after ldc_bind() enables the ldc_rx() receive interrupt-handler (and thus intercepts incoming control packets) and before vio_port_up() calls ldc_connect(). If that is the case, ldc_connect() should return 0 and let the state-machine progress. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Karl Volz <karl.volz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-26sparc64: Guard against flushing openfirmware mappings.David S. Miller2-10/+25
[ Upstream commit 4ca9a23765da3260058db3431faf5b4efd8cf926 ] Based almost entirely upon a patch by Christopher Alexander Tobias Schulze. In commit db64fe02258f1507e13fe5212a989922323685ce ("mm: rewrite vmap layer") lazy VMAP tlb flushing was added to the vmalloc layer. This causes problems on sparc64. Sparc64 has two VMAP mapped regions and they are not contiguous with eachother. First we have the malloc mapping area, then another unrelated region, then the vmalloc region. This "another unrelated region" is where the firmware is mapped. If the lazy TLB flushing logic in the vmalloc code triggers after we've had both a module unload and a vfree or similar, it will pass an address range that goes from somewhere inside the malloc region to somewhere inside the vmalloc region, and thus covering the openfirmware area entirely. The sparc64 kernel learns about openfirmware's dynamic mappings in this region early in the boot, and then services TLB misses in this area. But openfirmware has some locked TLB entries which are not mentioned in those dynamic mappings and we should thus not disturb them. These huge lazy TLB flush ranges causes those openfirmware locked TLB entries to be removed, resulting in all kinds of problems including hard hangs and crashes during reboot/reset. Besides causing problems like this, such huge TLB flush ranges are also incredibly inefficient. A plea has been made with the author of the VMAP lazy TLB flushing code, but for now we'll put a safety guard into our flush_tlb_kernel_range() implementation. Since the implementation has become non-trivial, stop defining it as a macro and instead make it a function in a C source file. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-26sparc64: Do not insert non-valid PTEs into the TSB hash table.David S. Miller1-0/+4
[ Upstream commit 18f38132528c3e603c66ea464727b29e9bbcb91b ] The assumption was that update_mmu_cache() (and the equivalent for PMDs) would only be called when the PTE being installed will be accessible by the user. This is not true for code paths originating from remove_migration_pte(). There are dire consequences for placing a non-valid PTE into the TSB. The TLB miss frramework assumes thatwhen a TSB entry matches we can just load it into the TLB and return from the TLB miss trap. So if a non-valid PTE is in there, we will deadlock taking the TLB miss over and over, never satisfying the miss. Just exit early from update_mmu_cache() and friends in this situation. Based upon a report and patch from Christopher Alexander Tobias Schulze. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-26sparc64: Add membar to Niagara2 memcpy code.David S. Miller1-0/+1
[ Upstream commit 5aa4ecfd0ddb1e6dcd1c886e6c49677550f581aa ] This is the prevent previous stores from overlapping the block stores done by the memcpy loop. Based upon a glibc patch by Jose E. Marchesi Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-26sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.David S. Miller2-3/+17
[ Upstream commit b18eb2d779240631a098626cb6841ee2dd34fda0 ] Access to the TSB hash tables during TLB misses requires that there be an atomic 128-bit quad load available so that we fetch a matching TAG and DATA field at the same time. On cpus prior to UltraSPARC-III only virtual address based quad loads are available. UltraSPARC-III and later provide physical address based variants which are easier to use. When we only have virtual address based quad loads available this means that we have to lock the TSB into the TLB at a fixed virtual address on each cpu when it runs that process. We can't just access the PAGE_OFFSET based aliased mapping of these TSBs because we cannot take a recursive TLB miss inside of the TLB miss handler without risking running out of hardware trap levels (some trap combinations can be deep, such as those generated by register window spill and fill traps). Without huge pages it's working perfectly fine, but when the huge TSB got added another chunk of fixed virtual address space was not allocated for this second TSB mapping. So we were mapping both the 8K and 4MB TSBs to the same exact virtual address, causing multiple TLB matches which gives undefined behavior. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-26sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault ↵David S. Miller1-15/+1
addresses. [ Upstream commit e5c460f46ae7ee94831cb55cb980f942aa9e5a85 ] This was found using Dave Jone's trinity tool. When a user process which is 32-bit performs a load or a store, the cpu chops off the top 32-bits of the effective address before translating it. This is because we run 32-bit tasks with the PSTATE_AM (address masking) bit set. We can't run the kernel with that bit set, so when the kernel accesses userspace no address masking occurs. Since a 32-bit process will have no mappings in that region we will properly fault, so we don't try to handle this using access_ok(), which can safely just be a NOP on sparc64. Real faults from 32-bit processes should never generate such addresses so a bug check was added long ago, and it barks in the logs if this happens. But it also barks when a kernel user access causes this condition, and that _can_ happen. For example, if a pointer passed into a system call is "0xfffffffc" and the kernel access 4 bytes offset from that pointer. Just handle such faults normally via the exception entries. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-26sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill ↵David S. Miller1-3/+6
pte_ERROR(). [ Upstream commit fe866433f843b080246ce729b5e6b27b5f5d9a58 ] pte_ERROR() is not used anywhere, delete it. For pgd_ERROR() and pmd_ERROR(), output something similar to x86, giving the address of the pgd/pmd as well as it's value. Also provide the caller, since these macros are invoked from pgd_clear_bad() and pmd_clear_bad() which provides little context as to what high level operation was occuring when the BAD state was detected. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-26sparc64: Fix top-level fault handling bugs.David S. Miller1-30/+52
[ Upstream commit 70ffc6ebaead783ac8dafb1e87df0039bb043596 ] Make get_user_insn() able to cope with huge PMDs. Next, make do_fault_siginfo() more robust when get_user_insn() can't actually fetch the instruction. In particular, use the MMU announced fault address when that happens, instead of calling compute_effective_address() and computing garbage. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-26sparc64: Handle 32-bit tasks properly in compute_effective_address().David S. Miller1-3/+9
[ Upstream commit d037d16372bbe4d580342bebbb8826821ad9edf0 ] If we have a 32-bit task we must chop off the top 32-bits of the 64-bit value just as the cpu would. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-19sparc64: Make itc_sync_lock rawKirill Tkhai1-3/+3
[ Upstream commit 49b6c01f4c1de3b5e5427ac5aba80f9f6d27837a ] One more place where we must not be able to be preempted or to be interrupted in RT. Always actually disable interrupts during synchronization cycle. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-19sparc64: Fix argument sign extension for compat_sys_futex().David S. Miller1-1/+1
[ Upstream commit aa3449ee9c87d9b7660dd1493248abcc57769e31 ] Only the second argument, 'op', is signed. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-29locking/mutex: Disable optimistic spinning on some architecturesPeter Zijlstra1-0/+1
commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream. The optimistic spin code assumes regular stores and cmpxchg() play nice; this is found to not be true for at least: parisc, sparc32, tile32, metag-lock1, arc-!llsc and hexagon. There is further wreckage, but this in particular seemed easy to trigger, so blacklist this. Opt in for known good archs. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Jason Low <jason.low2@hp.com> Cc: Waiman Long <waiman.long@hp.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: John David Anglin <dave.anglin@bell.net> Cc: James Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-02hugetlb: restrict hugepage_migration_support() to x86_64Naoya Horiguchi1-5/+0
commit c177c81e09e517bbf75b67762cdab1b83aba6976 upstream. Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-06-23net: filter: fix sparc32 typoAlexei Starovoitov1-1/+1
[ Upstream commit 588f5d629b3369aba88f52217d1c473a28fa7723 ] Fixes: 569810d1e327 ("net: filter: fix typo in sparc BPF JIT") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-06-23net: filter: fix typo in sparc BPF JITAlexei Starovoitov1-4/+4
[ Upstream commit 569810d1e3278907264f5b115281fca3f0038d53 ] fix typo in sparc codegen for SKF_AD_IFINDEX and SKF_AD_HATYPE classic BPF extensions Fixes: 2809a2087cc4 ("net: filter: Just In Time compiler for sparc") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-05-05sparc64: Make sure %pil interrupts are enabled during hypervisor yield.David S. Miller1-1/+3
[ Upstream commit cb3042d609e30e6144024801c89be3925106752b ] In arch_cpu_idle() we must enable %pil based interrupts before potentially invoking the hypervisor cpu yield call. As per the Hypervisor API documentation for cpu_yield: Interrupts which are blocked by some mechanism other that pstate.ie (for example %pil) are not guaranteed to cause a return from this service. It seems that only first generation Niagara chips are hit by this bug. My best guess is that later chips implement this in hardware and wake up anyways from %pil events, whereas in first generation chips the yield is implemented completely in hypervisor code and requires %pil to be enabled in order to wake properly from this call. Fixes: 87fa05aeb3a5 ("sparc: Use generic idle loop") Reported-by: Fabio M. Di Nitto <fabbione@fabbione.net> Reported-by: Jan Engelhardt <jengelh@inai.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-05-05sparc64: don't treat 64-bit syscall return codes as 32-bitDave Kleikamp1-2/+2
[ Upstream commit 1535bd8adbdedd60a0ee62e28fd5225d66434371 ] When checking a system call return code for an error, linux_sparc_syscall was sign-extending the lower 32-bit value and comparing it to -ERESTART_RESTARTBLOCK. lseek can return valid return codes whose lower 32-bits alone would indicate a failure (such as 4G-1). Use the whole 64-bit value to check for errors. Only the 32-bit path should sign extend the lower 32-bit value. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Acked-by: Bob Picco <bob.picco@oracle.com> Acked-by: Allen Pais <allen.pais@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-05-05sparc32: fix build failure for arch_jump_label_transformPaul Gortmaker1-1/+1
[ Upstream commit 4f6500fff5f7644a03c46728fd7ef0f62fa6940b ] In arch/sparc/Kernel/Makefile, we see: obj-$(CONFIG_SPARC64) += jump_label.o However, the Kconfig selects HAVE_ARCH_JUMP_LABEL unconditionally for all SPARC. This in turn leads to the following failure when doing allmodconfig coverage builds: kernel/built-in.o: In function `__jump_label_update': jump_label.c:(.text+0x8560c): undefined reference to `arch_jump_label_transform' kernel/built-in.o: In function `arch_jump_label_transform_static': (.text+0x85cf4): undefined reference to `arch_jump_label_transform' make: *** [vmlinux] Error 1 Change HAVE_ARCH_JUMP_LABEL to be conditional on SPARC64 so that it matches the Makefile. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-02-06bpf: do not use reciprocal divideEric Dumazet1-3/+14
[ Upstream commit aee636c4809fa54848ff07a899b326eb1f9987a2 ] At first Jakub Zawadzki noticed that some divisions by reciprocal_divide were not correct. (off by one in some cases) http://www.wireshark.org/~darkjames/reciprocal-buggy.c He could also show this with BPF: http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c The reciprocal divide in linux kernel is not generic enough, lets remove its use in BPF, as it is not worth the pain with current cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Daniel Borkmann <dxchgb@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Matt Evans <matt@ozlabs.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10mm: fix TLB flush race between migration, and change_protection_rangeRik van Riel1-2/+2
commit 20841405940e7be0617612d521e206e4b6b325db upstream. There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [*] flush TLB reload TLB from new entry read/write new page lose data [*] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+1
Pull networking fixes from David Miller: "Sorry I let so much accumulate, I was in Buffalo and wanted a few things to cook in my tree for a while before sending to you. Anyways, it's a lot of little things as usual at this stage in the game" 1) Make bonding MAINTAINERS entry reflect reality, from Andy Gospodarek. 2) Fix accidental sock_put() on timewait mini sockets, from Eric Dumazet. 3) Fix crashes in l2tp due to mis-handling of ipv4 mapped ipv6 addresses, from François CACHEREUL. 4) Fix heap overflow in __audit_sockaddr(), from the eagle eyed Dan Carpenter. 5) tcp_shifted_skb() doesn't take handle FINs properly, from Eric Dumazet. 6) SFC driver bug fixes from Ben Hutchings. 7) Fix TX packet scheduling wedge after channel change in ath9k driver, from Felix Fietkau. 8) Fix user after free in BPF JIT code, from Alexei Starovoitov. 9) Source address selection test is reversed in __ip_route_output_key(), fix from Jiri Benc. 10) VLAN and CAN layer mis-size netlink attributes, from Marc Kleine-Budde. 11) Fix permission checks in sysctls to use current_euid() instead of current_uid(). From Eric W Biederman. 12) IPSEC policies can go away while a timer is still pending for them, add appropriate ref-counting to fix, from Steffen Klassert. 13) Fix mis-programming of FDR and RMCR registers on R8A7740 sh_eth chips, from Nguyen Hong Ky and Simon Horman. 14) MLX4 forgets to DMA unmap pages on RX, fix from Amir Vadai. 15) IPV6 GRE tunnel MTU upper limit is miscalculated, from Oussama Ghorbel. 16) Fix typo in fq_change(), we were assigning "initial quantum" to "quantum". From Eric Dumazet. 17) Set a more appropriate sk_pacing_rate for non-TCP sockets, otherwise FQ packet scheduler does not pace those flows properly. Also from Eric Dumazet. 18) rtlwifi miscalculates packet pointers, from Mark Cave-Ayland. 19) l2tp_xmit_skb() can be called from process context, not just softirq context, so we must always make sure to BH disable around it. From Eric Dumazet. 20) On qdisc reset, we forget to purge the RB tree of SKBs in netem packet scheduler. From Stephen Hemminger. 21) Fix info leak in farsync WAN driver ioctl() handler, from Dan Carpenter and Salva Peiró. 22) Fix PHY reset and other issues in dm9000 driver, from Nikita Kiryanov and Michael Abbott. 23) When hardware can do SCTP crc32 checksums, we accidently don't disable the csum offload when IPSEC transformations have been applied. From Fan Du and Vlad Yasevich. 24) Tail loss probing in TCP leaves the socket in the wrong congestion avoidance state. From Yuchung Cheng. 25) In CPSW driver, enable NAPI before interrupts are turned on, from Markus Pargmann. 26) Integer underflow and dual-assignment in YAM hamradio driver, from Dan Carpenter. 27) If we are going to mangle a packet in tcp_set_skb_tso_segs() we must unclone it. This fixes various hard to track down crashes in drivers where the SKBs ->gso_segs was changing right from underneath the driver during TX queueing. From Eric Dumazet. 28) Fix the handling of VLAN IDs, and in particular the special IDs 0 and 4095, in the bridging layer. From Toshiaki Makita. 29) Another info leak, this time in wanxl WAN driver, from Salva Peiró. 30) Fix race in socket credential passing, from Daniel Borkmann. 31) WHen NETLABEL is disabled, we don't validate CIPSO packets properly, from Seif Mazareeb. 32) Fix identification of fragmented frames in ipv4/ipv6 UDP Fragmentation Offload output paths, from Jiri Pirko. 33) Virtual Function fixes in bnx2x driver from Yuval Mintz and Ariel Elior. 34) When we removed the explicit neighbour pointer from ipv6 routes a slight regression was introduced for users such as IPVS, xt_TEE, and raw sockets. We mix up the users requested destination address with the routes assigned nexthop/gateway. From Julian Anastasov and Simon Horman. 35) Fix stack overruns in rt6_probe(), the issue is that can end up doing two full packet xmit paths at the same time when emitting neighbour discovery messages. From Hannes Frederic Sowa. 36) davinci_emac driver doesn't handle IFF_ALLMULTI correctly, from Mariusz Ceier. 37) Make sure to set TCP sk_pacing_rate after the first legitimate RTT sample, from Neal Cardwell. 38) Wrong netlink attribute passed to xfrm_replay_verify_len(), from Steffen Klassert. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits) ax88179_178a: Add VID:DID for Samsung USB Ethernet Adapter ax88179_178a: Correct the RX error definition in RX header Revert "bridge: only expire the mdb entry when query is received" tcp: initialize passive-side sk_pacing_rate after 3WHS davinci_emac.c: Fix IFF_ALLMULTI setup mac802154: correct a typo in ieee802154_alloc_device() prototype ipv6: probe routes asynchronous in rt6_probe netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper ipv6: fill rt6i_gateway with nexthop address ipv6: always prefer rt6i_gateway if present bnx2x: Set NETIF_F_HIGHDMA unconditionally bnx2x: Don't pretend during register dump bnx2x: Lock DMAE when used by statistic flow bnx2x: Prevent null pointer dereference on error flow bnx2x: Fix config when SR-IOV and iSCSI are enabled bnx2x: Fix Coalescing configuration bnx2x: Unlock VF-PF channel on MAC/VLAN config error bnx2x: Prevent an illegal pointer dereference during panic bnx2x: Fix Maximum CoS estimation for VFs drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing ...
2013-10-11compiler/gcc4: Add quirk for 'asm goto' miscompilation bugIngo Molnar1-1/+1
Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto' constructs, as outlined here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 Implement a workaround suggested by Jakub Jelinek. Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-07net: fix unsafe set_memory_rw from softirqAlexei Starovoitov1-0/+1
on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146] CPU0 [ 56.786807] ---- [ 56.793188] lock(&(&vb->lock)->rlock); [ 56.799593] <Interrupt> [ 56.805889] lock(&(&vb->lock)->rlock); [ 56.812266] [ 56.812266] *** DEADLOCK *** [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007 [ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001 [ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001 [ 56.923006] Call Trace: [ 56.929532] [<ffffffff8175a145>] dump_stack+0x55/0x76 [ 56.936067] [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208 [ 56.942445] [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50 [ 56.948932] [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150 [ 56.955470] [<ffffffff810ccb52>] mark_lock+0x282/0x2c0 [ 56.961945] [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50 [ 56.968474] [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90 [ 56.981942] [<ffffffff810cef72>] lock_acquire+0x92/0x1d0 [ 56.988745] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50 [ 57.002493] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0 [ 57.051720] [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40 [ 57.058727] [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40 [ 57.065577] [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0 [ 57.085373] [<ffffffff81058245>] run_ksoftirqd+0x35/0x70 cannot reuse jited filter memory, since it's readonly, so use original bpf insns memory to hold work_struct defer kfree of sk_filter until jit completed freeing tested on x86_64 and i386 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03sparc: fix MSI build failure on Sparc32Thomas Petazzoni2-2/+8
Commit ebd97be635 ('PCI: remove ARCH_SUPPORTS_MSI kconfig option') removes the ARCH_SUPPORTS_MSI Kconfig option that allowed architectures to indicate whether they support PCI MSI or not. Now, PCI MSI support can be compiled in on any architecture thanks to the use of weak functions thanks to 4287d824f265 ('PCI: use weak functions for MSI arch-specific functions'). So, architecture specific code is now responsible to ensure that its PCI MSI code builds in all cases, or be appropriately conditionally compiled. On Sparc, the MSI support is only provided for Sparc64, so the ARCH_SUPPORTS_MSI kconfig option was only selected for SPARC64, and not for the Sparc architecture as a whole. Therefore, removing ARCH_SUPPORTS_MSI broke Sparc32 configurations with CONFIG_PCI_MSI=y, because the Sparc-specific MSI code is not designed to be built on Sparc32. To solve this, this commit ensures that the Sparc MSI code is only built on Sparc64. This is done thanks to a new Kconfig Makefile helper option SPARC64_PCI_MSI, modeled after the existing SPARC64_PCI. The SPARC64_PCI_MSI option is an hidden option that is true when both Sparc64 PCI support is enabled and MSI is enabled. The arch/sparc/kernel/pci_msi.c file is now only built when SPARC64_PCI_MSI is true. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03sparc: remove deprecated IRQF_DISABLEDMichael Opdenacker2-3/+3
This patch proposes to remove the IRQF_DISABLED flag from sparc architecture code. It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03sparc: fix ldom_reboot buffer overflow harderKees Cook1-3/+2
The length argument to strlcpy was still wrong. It could overflow the end of full_boot_str by 5 bytes. Instead of strcat and strlcpy, just use snprint. Reported-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-28sparc64: Fix buggy strlcpy() conversion in ldom_reboot().David S. Miller1-1/+1
Commit 117a0c5fc9c2d06045bd217385b2b39ea426b5a6 ("sparc: kernel: using strlcpy() instead of strcpy()") added a bug to ldom_reboot in arch/sparc/kernel/ds.c - strcpy(full_boot_str + strlen("boot "), boot_command); + strlcpy(full_boot_str + strlen("boot "), boot_command, + sizeof(full_boot_str + strlen("boot "))); That last sizeof() expression evaluates to sizeof(size_t) which is not what was intended. Also even the corrected: sizeof(full_boot_str) + strlen("boot ") is not right as the destination buffer length is just plain "sizeof(full_boot_str)" and that's what the final argument should be. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-13Remove GENERIC_HARDIRQ config optionMartin Schwidefsky1-1/+0
After the last architecture switched to generic hard irqs the config options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code for !CONFIG_GENERIC_HARDIRQS can be removed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-13arch: mm: pass userspace fault flag to generic fault handlerJohannes Weiner2-5/+13
Unlike global OOM handling, memory cgroup code will invoke the OOM killer in any OOM situation because it has no way of telling faults occuring in kernel context - which could be handled more gracefully - from user-triggered faults. Pass a flag that identifies faults originating in user space from the architecture-specific fault handlers to generic code so that memcg OOM handling can be improved. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>