summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2016-08-16vlan: use a valid default mtu value for vlan over macsecPaolo Abeni1-0/+7
[ Upstream commit 18d3df3eab23796d7f852f9c6bb60962b8372ced ] macsec can't cope with mtu frames which need vlan tag insertion, and vlan device set the default mtu equal to the underlying dev's one. By default vlan over macsec devices use invalid mtu, dropping all the large packets. This patch adds a netif helper to check if an upper vlan device needs mtu reduction. The helper is used during vlan devices initialization to set a valid default and during mtu updating to forbid invalid, too bit, mtu values. The helper currently only check if the lower dev is a macsec device, if we get more users, we need to update only the helper (possibly reserving an additional IFF bit). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10radix-tree: fix radix_tree_iter_retry() for tagged iterators.Andrey Ryabinin1-0/+1
commit 3cb9185c67304b2a7ea9be73e7d13df6fb2793a1 upstream. radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() leading to crash: RIP: radix_tree_next_slot include/linux/radix-tree.h:473 find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 .... Call Trace: pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 do_writepages+0x97/0x100 mm/page-writeback.c:2364 __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 vfs_fsync_range+0x10a/0x250 fs/sync.c:195 vfs_fsync fs/sync.c:209 do_fsync+0x42/0x70 fs/sync.c:219 SYSC_fdatasync fs/sync.c:232 SyS_fdatasync+0x19/0x20 fs/sync.c:230 entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 We must reset iterator's tags to bail out from radix_tree_next_slot() and go to the slow-path in radix_tree_next_chunk(). Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10mm: memcontrol: fix cgroup creation failure after many small jobsJohannes Weiner1-15/+10
commit 73f576c04b9410ed19660f74f97521bee6e1c546 upstream. The memory controller has quite a bit of state that usually outlives the cgroup and pins its CSS until said state disappears. At the same time it imposes a 16-bit limit on the CSS ID space to economically store IDs in the wild. Consequently, when we use cgroups to contain frequent but small and short-lived jobs that leave behind some page cache, we quickly run into the 64k limitations of outstanding CSSs. Creating a new cgroup fails with -ENOSPC while there are only a few, or even no user-visible cgroups in existence. Although pinning CSSs past cgroup removal is common, there are only two instances that actually need an ID after a cgroup is deleted: cache shadow entries and swapout records. Cache shadow entries reference the ID weakly and can deal with the CSS having disappeared when it's looked up later. They pose no hurdle. Swap-out records do need to pin the css to hierarchically attribute swapins after the cgroup has been deleted; though the only pages that remain swapped out after offlining are tmpfs/shmem pages. And those references are under the user's control, so they are manageable. This patch introduces a private 16-bit memcg ID and switches swap and cache shadow entries over to using that. This ID can then be recycled after offlining when the CSS remains pinned only by objects that don't specifically need it. This script demonstrates the problem by faulting one cache page in a new cgroup and deleting it again: set -e mkdir -p pages for x in `seq 128000`; do [ $((x % 1000)) -eq 0 ] && echo $x mkdir /cgroup/foo echo $$ >/cgroup/foo/cgroup.procs echo trex >pages/$x echo $$ >/cgroup/cgroup.procs rmdir /cgroup/foo done When run on an unpatched kernel, we eventually run out of possible IDs even though there are no visible cgroups: [root@ham ~]# ./cssidstress.sh [...] 65000 mkdir: cannot create directory '/cgroup/foo': No space left on device After this patch, the IDs get released upon cgroup destruction and the cache and css objects get released once memory reclaim kicks in. [hannes@cmpxchg.org: init the IDR] Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups") Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: John Garcia <john.garcia@mesosphere.io> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10mm: thp: refix false positive BUG in page_move_anon_rmap()Hugh Dickins1-1/+1
commit 5a49973d7143ebbabd76e1dcd69ee42e349bb7b9 upstream. The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's worth: the syzkaller fuzzer hit it again. It's still wrong for some THP cases, because linear_page_index() was never intended to apply to addresses before the start of a vma. That's easily fixed with a signed long cast inside linear_page_index(); and Dmitry has tested such a patch, to verify the false positive. But why extend linear_page_index() just for this case? when the avoidance in page_move_anon_rmap() has already grown ugly, and there's no reason for the check at all (nothing else there is using address or index). Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE, remove CONFIG_DEBUG_VM PageTransHuge adjustment. And one more thing: should the compound_head(page) be done inside or outside page_move_anon_rmap()? It's usually pushed down to the lowest level nowadays (and mm/memory.c shows no other explicit use of it), so I think it's better done in page_move_anon_rmap() than by caller. Fixes: 0798d3c022dc ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()") Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvils Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10x86/quirks: Add early quirk to reset Apple AirPort cardLukas Wunner1-0/+1
commit abb2bafd295fe962bbadc329dbfb2146457283ac upstream. The EFI firmware on Macs contains a full-fledged network stack for downloading OS X images from osrecovery.apple.com. Unfortunately on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331 wireless card on every boot and leaves it enabled even after ExitBootServices has been called. The card continues to assert its IRQ line, causing spurious interrupts if the IRQ is shared. It also corrupts memory by DMAing received packets, allowing for remote code execution over the air. This only stops when a driver is loaded for the wireless card, which may be never if the driver is not installed or blacklisted. The issue seems to be constrained to the Broadcom 4331. Chris Milsted has verified that the newer Broadcom 4360 built into the MacBookPro11,3 (2013/2014) does not exhibit this behaviour. The chances that Apple will ever supply a firmware fix for the older machines appear to be zero. The solution is to reset the card on boot by writing to a reset bit in its mmio space. This must be done as an early quirk and not as a plain vanilla PCI quirk to successfully combat memory corruption by DMAed packets: Matthew Garrett found out in 2012 that the packets are written to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html). This type of memory is made available to the page allocator by efi_free_boot_services(). Plain vanilla PCI quirks run much later, in subsys initcall level. In-between a time window would be open for memory corruption. Random crashes occurring in this time window and attributed to DMAed packets have indeed been observed in the wild by Chris Bainbridge. When Matthew Garrett analyzed the memory corruption issue in 2012, he sought to fix it with a grub quirk which transitions the card to D3hot: http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56 This approach does not help users with other bootloaders and while it may prevent DMAed packets, it does not cure the spurious interrupts emanating from the card. Unfortunately the card's mmio space is inaccessible in D3hot, so to reset it, we have to undo the effect of Matthew's grub patch and transition the card back to D0. Note that the quirk takes a few shortcuts to reduce the amount of code: The size of BAR 0 and the location of the PM capability is identical on all affected machines and therefore hardcoded. Only the address of BAR 0 differs between models. Also, it is assumed that the BCMA core currently mapped is the 802.11 core. The EFI driver seems to always take care of this. Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback towards finding the best solution to this problem. The following should be a comprehensive list of affected models: iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16] iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16] Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12] Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12] MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12] MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] For posterity, spurious interrupts caused by the Broadcom 4331 wireless card resulted in splats like this (stacktrace omitted): irq 17: nobody cared (try booting with the "irqpoll" option) handlers: [<ffffffff81374370>] pcie_isr [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci] [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec] Disabling IRQ #17 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732 Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1] Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1] Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2] Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1] Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2] Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Rafał Miłecki <zajec5@gmail.com> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris Milsted <cmilsted@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Michael Buesch <m@bues.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: b43-dev@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linux-wireless@vger.kernel.org Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de [ Did minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27vfs: add d_real_inode() helperMiklos Szeredi1-0/+12
commit a118084432d642eeccb961c7c8cc61525a941fcb upstream. Needed by the following fix. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27net_sched: fix mirrored packets checksumWANG Cong1-0/+19
[ Upstream commit 82a31b9231f02d9c1b7b290a46999d517b0d312a ] Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation") we need to fixup the checksum for CHECKSUM_COMPLETE when pushing skb on RX path. Otherwise we get similar splats. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27packet: Use symmetric hash for PACKET_FANOUT_HASH.David S. Miller1-0/+1
[ Upstream commit eb70db8756717b90c01ccc765fdefc4dd969fc74 ] People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that they want packets going in both directions on a flow to hash to the same bucket. The core kernel SKB hash became non-symmetric when the ipv6 flow label and other entities were incorporated into the standard flow hash order to increase entropy. But there are no users of PACKET_FANOUT_HASH who want an assymetric hash, they all want a symmetric one. Therefore, use the flow dissector to compute a flat symmetric hash over only the protocol, addresses and ports. This hash does not get installed into and override the normal skb hash, so this change has no effect whatsoever on the rest of the stack. Reported-by: Eric Leblond <eric@regit.org> Tested-by: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27rpc: share one xps between all backchannelsJ. Bruce Fields2-0/+2
commit 39a9beab5acb83176e8b9a4f0778749a09341f1f upstream. The spec allows backchannels for multiple clients to share the same tcp connection. When that happens, we need to use the same xprt for all of them. Similarly, we need the same xps. This fixes list corruption introduced by the multipath code. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27nfsd4/rpc: move backchannel create logic into rpc codeJ. Bruce Fields1-2/+0
commit d50039ea5ee63c589b0434baa5ecf6e5075bb6f9 upstream. Also simplify the logic a bit. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: Trond Myklebust <trondmy@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLECatalin Marinas1-0/+3
commit 9bd616e3dbedfc103f158197c8ad93678849b1ed upstream. The cpuidle_devices per-CPU variable is only defined when CPU_IDLE is enabled. Commit c8cc7d4de7a4 ("sched/idle: Reorganize the idle loop") removed the #ifdef CONFIG_CPU_IDLE around cpuidle_idle_call() with the compiler optimising away __this_cpu_read(cpuidle_devices). However, with CONFIG_UBSAN && !CONFIG_CPU_IDLE, this optimisation no longer happens and the kernel fails to link since cpuidle_devices is not defined. This patch introduces an accessor function for the current CPU cpuidle device (returning NULL when !CONFIG_CPU_IDLE) and uses it in cpuidle_idle_call(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27locking/static_key: Fix concurrent static_key_slow_inc()Paolo Bonzini1-3/+13
commit 4c5ea0a9cd02d6aa8adc86e100b2a4cff8d614ff upstream. The following scenario is possible: CPU 1 CPU 2 static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 0, no increment jump_label_lock() atomic_inc_return() -> key.enabled == 1 now static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 1, inc to 2 return ** static key is wrong! jump_label_update() jump_label_unlock() Testing the static key at the point marked by (**) will follow the wrong path for jumps that have not been patched yet. This can actually happen when creating many KVM virtual machines with userspace LAPIC emulation; just run several copies of the following program: #include <fcntl.h> #include <unistd.h> #include <sys/ioctl.h> #include <linux/kvm.h> int main(void) { for (;;) { int kvmfd = open("/dev/kvm", O_RDONLY); int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0); close(ioctl(vmfd, KVM_CREATE_VCPU, 1)); close(vmfd); close(kvmfd); } return 0; } Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call. The static key's purpose is to skip NULL pointer checks and indeed one of the processes eventually dereferences NULL. As explained in the commit that introduced the bug: 706249c222f6 ("locking/static_keys: Rework update logic") jump_label_update() needs key.enabled to be true. The solution adopted here is to temporarily make key.enabled == -1, and use go down the slow path when key.enabled <= 0. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 706249c222f6 ("locking/static_keys: Rework update logic") Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com [ Small stylistic edits to the changelog and the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27USB: EHCI: declare hostpc register as zero-length arrayAlan Stern1-2/+2
commit 7e8b3dfef16375dbfeb1f36a83eb9f27117c51fd upstream. The HOSTPC extension registers found in some EHCI implementations form a variable-length array, with one element for each port. Therefore the hostpc field in struct ehci_regs should be declared as a zero-length array, not a single-element array. This fixes a problem reported by UBSAN. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de> Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11bpf, perf: delay release of BPF prog after grace periodDaniel Borkmann1-0/+4
[ Upstream commit ceb56070359b7329b5678b5d95a376fcb24767be ] Commit dead9f29ddcc ("perf: Fix race in BPF program unregister") moved destruction of BPF program from free_event_rcu() callback to __free_event(), which is problematic if used with tail calls: if prog A is attached as trace event directly, but at the same time present in a tail call map used by another trace event program elsewhere, then we need to delay destruction via RCU grace period since it can still be in use by the program doing the tail call (the prog first needs to be dropped from the tail call map, then trace event with prog A attached destroyed, so we get immediate destruction). Fixes: dead9f29ddcc ("perf: Fix race in BPF program unregister") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Jann Horn <jann@thejh.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11sock_diag: do not broadcast raw socket destructionWillem de Bruijn1-0/+6
[ Upstream commit 9a0fee2b552b1235fb1706ae1fc664ae74573be8 ] Diag intends to broadcast tcp_sk and udp_sk socket destruction. Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not sufficient for this. Raw sockets can have the same type. Add a test for sk->sk_type. Fixes: eb4cb008529c ("sock_diag: define destruction multicast groups") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-11net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUGJason A. Donenfeld1-1/+2
[ Upstream commit daddef76c3deaaa7922f9d7b18edbf0a061215c3 ] The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG case was added with 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the usual definition of the dynamic_pr_debug macro, but alter it by adding a call to "net_ratelimit()" in the if statement. This is, in fact, the correct approach. However, while doing this, the author of the commit forgot to surround fmt by pr_fmt, resulting in unprefixed log messages appearing in the console. So, this commit adds back the pr_fmt(fmt) invocation, making net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and DYNAMIC_DEBUG cases, and bringing parity with the behavior of dynamic_pr_debug as well. Fixes: 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Tim Bingham <tbingham@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: introduce and use xt_copy_counters_from_userFlorian Westphal1-0/+3
commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream. The three variants use same copy&pasted code, condense this into a helper and use that. Make sure info.name is 0-terminated. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: xt_compat_match_from_user doesn't need a retvalFlorian Westphal1-1/+1
commit 0188346f21e6546498c2a0f84888797ad4063fc5 upstream. Always returned 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: check for bogus target offsetFlorian Westphal1-2/+2
commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c upstream. We're currently asserting that targetoff + targetsize <= nextoff. Extend it to also check that targetoff is >= sizeof(xt_entry). Since this is generic code, add an argument pointing to the start of the match/target, we can then derive the base structure size from the delta. We also need the e->elems pointer in a followup change to validate matches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: add compat version of xt_check_entry_offsetsFlorian Westphal1-0/+3
commit fc1221b3a163d1386d1052184202d5dc50d302d1 upstream. 32bit rulesets have different layout and alignment requirements, so once more integrity checks get added to xt_check_entry_offsets it will reject well-formed 32bit rulesets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24netfilter: x_tables: add and use xt_check_entry_offsetsFlorian Westphal1-0/+4
commit 7d35812c3214afa5b37a675113555259cfd67b98 upstream. Currently arp/ip and ip6tables each implement a short helper to check that the target offset is large enough to hold one xt_entry_target struct and that t->u.target_size fits within the current rule. Unfortunately these checks are not sufficient. To avoid adding new tests to all of ip/ip6/arptables move the current checks into a helper, then extend this helper in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding maskMarc Zyngier1-1/+1
commit dd5f1b049dc139876801db3cdd0f20d21fd428cc upstream. The INTID mask is wrong, and is made a signed value, which has nteresting effects in the KVM emulation. Let's sanitize it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-08mm: use phys_addr_t for reserve_bootmem_region() argumentsStefan Bader1-1/+1
commit 4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5 upstream. Since commit 92923ca3aace ("mm: meminit: only set page reserved in the memblock region") the reserved bit is set on reserved memblock regions. However start and end address are passed as unsigned long. This is only 32bit on i386, so it can end up marking the wrong pages reserved for ranges at 4GB and above. This was observed on a 32bit Xen dom0 which was booted with initial memory set to a value below 4G but allowing to balloon in memory (dom0_mem=1024M for example). This would define a reserved bootmem region for the additional memory (for example on a 8GB system there was a reverved region covering the 4GB-8GB range). But since the addresses were passed on as unsigned long, this was actually marking all pages from 0 to 4GB as reserved. Fixes: 92923ca3aacef63 ("mm: meminit: only set page reserved in the memblock region") Link: http://lkml.kernel.org/r/1463491221-10573-1-git-send-email-stefan.bader@canonical.com Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01SIGNAL: Move generic copy_siginfo() to signal.hJames Hogan1-0/+15
commit ca9eb49aa9562eaadf3cea071ec7018ad6800425 upstream. The generic copy_siginfo() is currently defined in asm-generic/siginfo.h, after including uapi/asm-generic/siginfo.h which defines the generic struct siginfo. However this makes it awkward for an architecture to use it if it has to define its own struct siginfo (e.g. MIPS and potentially IA64), since it means that asm-generic/siginfo.h can only be included after defining the arch-specific siginfo, which may be problematic if the arch-specific definition needs definitions from uapi/asm-generic/siginfo.h. It is possible to work around this by first including uapi/asm-generic/siginfo.h to get the constants before defining the arch-specific siginfo, and include asm-generic/siginfo.h after. However uapi headers can't be included by other uapi headers, so that first include has to be in an ifdef __kernel__, with the non __kernel__ case including the non-UAPI header instead. Instead of that mess, move the generic copy_siginfo() definition into linux/signal.h, which allows an arch-specific uapi/asm/siginfo.h to include asm-generic/siginfo.h and define the arch-specific siginfo, and for the generic copy_siginfo() to see that arch-specific definition. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Petr Malat <oss@malat.biz> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Christopher Ferris <cferris@google.com> Cc: linux-arch@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-ia64@vger.kernel.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/12478/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01Fix OpenSSH pty regression on closeBrian Bloniarz1-1/+1
commit 0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa upstream. OpenSSH expects the (non-blocking) read() of pty master to return EAGAIN only if it has received all of the slave-side output after it has received SIGCHLD. This used to work on pre-3.12 kernels. This fix effectively forces non-blocking read() and poll() to block for parallel i/o to complete for all ttys. It also unwinds these changes: 1) f8747d4a466ab2cafe56112c51b3379f9fdb7a12 tty: Fix pty master read() after slave closes 2) 52bce7f8d4fc633c9a9d0646eef58ba6ae9a3b73 pty, n_tty: Simplify input processing on final close 3) 1a48632ffed61352a7810ce089dc5a8bcd505a60 pty: Fix input race when closing Inspired by analysis and patch from Marc Aurele La France <tsi@tuyoix.net> Reported-by: Volth <openssh@volth.com> Reported-by: Marc Aurele La France <tsi@tuyoix.net> BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=52 BugLink: https://bugzilla.mindrot.org/show_bug.cgi?id=2492 Signed-off-by: Brian Bloniarz <brian.bloniarz@gmail.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01usb: core: hub: hub_port_init lock controller instead of busChris Bainbridge2-2/+2
commit feb26ac31a2a5cb88d86680d9a94916a6343e9e6 upstream. The XHCI controller presents two USB buses to the system - one for USB2 and one for USB3. The hub init code (hub_port_init) is reentrant but only locks one bus per thread, leading to a race condition failure when two threads attempt to simultaneously initialise a USB2 and USB3 device: [ 8.034843] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this failure occurred on 6% of all boots. The call traces at the point of failure are: Call Trace: [<ffffffff81b9bab7>] schedule+0x37/0x90 [<ffffffff817da7cd>] usb_kill_urb+0x8d/0xd0 [<ffffffff8111e5e0>] ? wake_up_atomic_t+0x30/0x30 [<ffffffff817dafbe>] usb_start_wait_urb+0xbe/0x150 [<ffffffff817db10c>] usb_control_msg+0xbc/0xf0 [<ffffffff817d07de>] hub_port_init+0x51e/0xb70 [<ffffffff817d4697>] hub_event+0x817/0x1570 [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620 [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620 [<ffffffff810f4684>] worker_thread+0x64/0x4b0 [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390 [<ffffffff810fa7f5>] kthread+0x105/0x120 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 Call Trace: [<ffffffff817fd36d>] xhci_setup_device+0x53d/0xa40 [<ffffffff817fd87e>] xhci_address_device+0xe/0x10 [<ffffffff817d047f>] hub_port_init+0x1bf/0xb70 [<ffffffff811247ed>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff817d4697>] hub_event+0x817/0x1570 [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620 [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620 [<ffffffff810f4684>] worker_thread+0x64/0x4b0 [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390 [<ffffffff810fa7f5>] kthread+0x105/0x120 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 Which results from the two call chains: hub_port_init usb_get_device_descriptor usb_get_descriptor usb_control_msg usb_internal_control_msg usb_start_wait_urb usb_submit_urb / wait_for_completion_timeout / usb_kill_urb hub_port_init hub_set_address xhci_address_device xhci_setup_device Mathias Nyman explains the current behaviour violates the XHCI spec: hub_port_reset() will end up moving the corresponding xhci device slot to default state. As hub_port_reset() is called several times in hub_port_init() it sounds reasonable that we could end up with two threads having their xhci device slots in default state at the same time, which according to xhci 4.5.3 specs still is a big no no: "Note: Software shall not transition more than one Device Slot to the Default State at a time" So both threads fail at their next task after this. One fails to read the descriptor, and the other fails addressing the device. Fix this in hub_port_init by locking the USB controller (instead of an individual bus) to prevent simultaneous initialisation of both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Link: https://lkml.org/lkml/2016/2/8/312 Link: https://lkml.org/lkml/2016/2/4/748 Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01USB: leave LPM alone if possible when binding/unbinding interface driversAlan Stern1-1/+1
commit 6fb650d43da3e7054984dc548eaa88765a94d49f upstream. When a USB driver is bound to an interface (either through probing or by claiming it) or is unbound from an interface, the USB core always disables Link Power Management during the transition and then re-enables it afterward. The reason is because the driver might want to prevent hub-initiated link power transitions, in which case the HCD would have to recalculate the various LPM parameters. This recalculation takes place when LPM is re-enabled and the new parameters are sent to the device and its parent hub. However, if the driver does not want to prevent hub-initiated link power transitions then none of this work is necessary. The parameters don't need to be recalculated, and LPM doesn't need to be disabled and re-enabled. It turns out that disabling and enabling LPM can be time-consuming, enough so that it interferes with user programs that want to claim and release interfaces rapidly via usbfs. Since the usbfs kernel driver doesn't set the disable_hub_initiated_lpm flag, we can speed things up and get the user programs to work by leaving LPM alone whenever the flag isn't set. And while we're improving the way disable_hub_initiated_lpm gets used, let's also fix its kerneldoc. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Matthew Giassa <matthew@giassa.net> CC: Mathias Nyman <mathias.nyman@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01can: fix handling of unmodifiable configuration optionsOliver Hartkopp1-2/+20
commit bb208f144cf3f59d8f89a09a80efd04389718907 upstream. As described in 'can: m_can: tag current CAN FD controllers as non-ISO' (6cfda7fbebe) it is possible to define fixed configuration options by setting the according bit in 'ctrlmode' and clear it in 'ctrlmode_supported'. This leads to the incovenience that the fixed configuration bits can not be passed by netlink even when they have the correct values (e.g. non-ISO, FD). This patch fixes that issue and not only allows fixed set bit values to be set again but now requires(!) to provide these fixed values at configuration time. A valid CAN FD configuration consists of a nominal/arbitration bittiming, a data bittiming and a control mode with CAN_CTRLMODE_FD set - which is now enforced by a new can_validate() function. This fix additionally removed the inconsistency that was prohibiting the support of 'CANFD-only' controller drivers, like the RCar CAN FD. For this reason a new helper can_set_static_ctrlmode() has been introduced to provide a proper interface to handle static enabled CAN controller options. Reported-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Reviewed-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01fscrypto/f2fs: allow fs-specific key prefix for fs encryptionJaegeuk Kim1-0/+1
commit b5a7aef1ef436ec005fef0efe31a676ec5f4ab31 upstream. This patch allows fscrypto to handle a second key prefix given by filesystem. The main reason is to provide backward compatibility, since previously f2fs used "f2fs:" as a crypto prefix instead of "fscrypt:". Later, ext4 should also provide key_prefix() to give "ext4:". One concern decribed by Ted would be kinda double check overhead of prefixes. In x86, for example, validate_user_key consumes 8 ms after boot-up, which turns out derive_key_aes() consumed most of the time to load specific crypto module. After such the cold miss, it shows almost zero latencies, which treats as a negligible overhead. Note that request_key() detects wrong prefix in prior to derive_key_aes() even. Cc: Ted Tso <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-14Merge branch 'for-linus' of ↵Linus Torvalds3-0/+15
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Overlayfs fixes from Miklos, assorted fixes from me. Stable fodder of varying severity, all sat in -next for a while" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ovl: ignore permissions on underlying lookup vfs: add lookup_hash() helper vfs: rename: check backing inode being equal vfs: add vfs_select_inode() helper get_rock_ridge_filename(): handle malformed NM entries ecryptfs: fix handling of directory opening atomic_open(): fix the handling of create_error fix the copy vs. map logics in blk_rq_map_user_iov() do_splice_to(): cap the size before passing to ->splice_read()
2016-05-14Merge branch 'for-4.6-fixes' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "During v4.6-rc1 cgroup namespace support was merged. There is an issue where it's impossible to tell whether a given cgroup mount point is bind mounted or namespaced. Serge has been working on the issue but it took longer than expected to resolve, so the late pull request. Given that it's a completely new feature and the patches don't touch anything else, the risk seems acceptable. However, if this is too late, an alternative is plugging new cgroup ns creation for v4.6 and retrying for v4.7" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix compile warning kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces kernfs_path_from_node_locked: don't overwrite nlen
2016-05-13Merge tag 'regulator-fix-v4.6-rc7' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "A small collection of driver specific fixes for the regulator subsysetem: - Fix handling of probe deferral for GPIO regulators - Fix a typo in the module alias for DA9053 - Fix the definition of BUCK9 in the S2MPS11 driver. This change looks larger than it is because an irregularity in the hardware means that the macro used to define bucks 6-10 needs duplicating and tweaking to have a separate macro for 9 - Fix a series of errors in the definitions of the LDOs the AXP20x regulators, some of which had always been present and some of which were introduced in the merge window" * tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: da9063: Correct module alias prefix to fix module autoloading regulator: axp20x: Fix axp22x ldo_io registration error on cold boot regulator: axp20x: Fix axp22x ldo_io voltage ranges regulator: axp20x: Fix LDO4 linear voltage range regulator: s2mps11: Fix invalid selector mask and voltages for buck9 regulator: gpio: check return value of of_get_named_gpio
2016-05-13Merge remote-tracking branches 'regulator/fix/axp20x', ↵Mark Brown1-0/+2
'regulator/fix/da9063', 'regulator/fix/gpio' and 'regulator/fix/s2mps11' into regulator-linus
2016-05-13mm: thp: calculate the mapcount correctly for THP pages during WP faultsAndrea Arcangeli2-3/+12
This will provide fully accuracy to the mapcount calculation in the write protect faults, so page pinning will not get broken by false positive copy-on-writes. total_mapcount() isn't the right calculation needed in reuse_swap_page(), so this introduces a page_trans_huge_mapcount() that is effectively the full accurate return value for page_mapcount() if dealing with Transparent Hugepages, however we only use the page_trans_huge_mapcount() during COW faults where it strictly needed, due to its higher runtime cost. This also provide at practical zero cost the total_mapcount information which is needed to know if we can still relocate the page anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we can reuse the page no matter if it's a pte or a pmd_trans_huge triggering the fault, but we can only relocate the page anon_vma to the local vma->anon_vma if we're sure it's only this "vma" mapping the whole THP physical range. Kirill A. Shutemov discovered the problem with moving the page anon_vma to the local vma->anon_vma in a previous version of this patch and another problem in the way page_move_anon_rmap() was called. Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a previous version, because reuse_swap_page must be a macro to call page_trans_huge_mapcount from swap.h, so this uses a macro again instead of an inline function. With this change at least it's a less dangerous usage than it was before, because "page" is used only once now, while with the previous code reuse_swap_page(page++) would have called page_mapcount on page+1 and it would have increased page twice instead of just once. Dean Luick noticed an uninitialized variable that could result in a rmap inefficiency for the non-THP case in a previous version. Mike Marciniszyn said: : Our RDMA tests are seeing an issue with memory locking that bisects to : commit 61f5d698cc97 ("mm: re-enable THP") : : The test program registers two rather large MRs (512M) and RDMA : writes data to a passive peer using the first and RDMA reads it back : into the second MR and compares that data. The sizes are chosen randomly : between 0 and 1024 bytes. : : The test will get through a few (<= 4 iterations) and then gets a : compare error. : : Tracing indicates the kernel logical addresses associated with the individual : pages at registration ARE correct , the data in the "RDMA read response only" : packets ARE correct. : : The "corruption" occurs when the packet crosse two pages that are not physically : contiguous. The second page reads back as zero in the program. : : It looks like the user VA at the point of the compare error no longer points to : the same physical address as was registered. : : This patch totally resolves the issue! Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reviewed-by: Dean Luick <dean.luick@intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Josh Collier <josh.d.collier@intel.com> Cc: Marc Haber <mh+linux-kernel@zugschlus.de> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-11Merge branch 'ovl-fixes' into for-linusAl Viro53-484/+309
2016-05-11vfs: add lookup_hash() helperMiklos Szeredi1-0/+2
Overlayfs needs lookup without inode_permission() and already has the name hash (in form of dentry->d_name on overlayfs dentry). It also doesn't support filesystems with d_op->d_hash() so basically it only needs the actual hashed lookup from lookup_one_len_unlocked() So add a new helper that does unlocked lookup of a hashed name. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-05-11vfs: add vfs_select_inode() helperMiklos Szeredi1-0/+12
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> # v4.2+
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+3
Pull networking fixes from David Miller: 1) Check klogctl failure correctly, from Colin Ian King. 2) Prevent OOM when under memory pressure in flowcache, from Steffen Klassert. 3) Fix info leak in llc and rtnetlink ifmap code, from Kangjie Lu. 4) Memory barrier and multicast handling fixes in bnxt_en, from Michael Chan. 5) Endianness bug in mlx5, from Daniel Jurgens. 6) Fix disconnect handling in VSOCK, from Ian Campbell. 7) Fix locking of netdev list walking in get_bridge_ifindices(), from Nikolay Aleksandrov. 8) Bridge multicast MLD parser can look at wrong packet offsets, fix from Linus Lüssing. 9) Fix chip hang in qede driver, from Sudarsana Reddy Kalluru. 10) Fix missing setting of encapsulation before inner handling completes in udp_offload code, from Jarno Rajahalme. 11) Missing rollbacks during LAG join and flood configuration failures in mlxsw driver, from Ido Schimmel. 12) Fix error code checks in netxen driver, from Dan Carpenter. 13) Fix key size in new macsec driver, from Sabrina Dubroca. 14) Fix mlx5/VXLAN dependencies, from Arnd Bergmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) net/mlx5e: make VXLAN support conditional Revert "net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue" macsec: key identifier is 128 bits, not 64 Documentation/networking: more accurate LCO explanation macvtap: segmented packet is consumed tools: bpf_jit_disasm: check for klogctl failure qede: uninitialized variable in qede_start_xmit() netxen: netxen_rom_fast_read() doesn't return -1 netxen: reversed condition in netxen_nic_set_link_parameters() netxen: fix error handling in netxen_get_flash_block() mlxsw: spectrum: Add missing rollback in flood configuration mlxsw: spectrum: Fix rollback order in LAG join failure udp_offload: Set encapsulation before inner completes. udp_tunnel: Remove redundant udp_tunnel_gro_complete(). qede: prevent chip hang when increasing channels net: ipv6: tcp reset, icmp need to consider L3 domain bridge: fix igmp / mld query parsing net: bridge: fix old ioctl unlocked net device walk VSOCK: do not disconnect socket when peer has shutdown SEND only net/mlx4_en: Fix endianness bug in IPV6 csum calculation ...
2016-05-09compiler-gcc: require gcc 4.8 for powerpc __builtin_bswap16()Josh Poimboeuf1-1/+1
gcc support for __builtin_bswap16() was supposedly added for powerpc in gcc 4.6, and was then later added for other architectures in gcc 4.8. However, Stephen Rothwell reported that attempting to use it on powerpc in gcc 4.6 fails with: lib/vsprintf.c:160:2: error: initializer element is not constant lib/vsprintf.c:160:2: error: (near initialization for 'decpair[0]') lib/vsprintf.c:160:2: error: initializer element is not constant lib/vsprintf.c:160:2: error: (near initialization for 'decpair[1]') ... I'm not entirely sure what those errors mean, but I don't see them on gcc 4.8. So let's consider gcc 4.8 to be the official starting point for __builtin_bswap16(). Arnd Bergmann adds: "I found the commit in gcc-4.8 that replaced the powerpc-specific implementation of __builtin_bswap16 with an architecture-independent one. Apparently the powerpc version (gcc-4.6 and 4.7) just mapped to the lhbrx/sthbrx instructions, so it ended up not being a constant, though the intent of the patch was mainly to add support for the builtin to x86: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52624 has the patch that went into gcc-4.8 and more information." Fixes: 7322dd755e7d ("byteswap: try to avoid __builtin_constant_p gcc bug") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-09cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespacesSerge E. Hallyn1-0/+2
Patch summary: When showing a cgroupfs entry in mountinfo, show the path of the mount root dentry relative to the reader's cgroup namespace root. Short explanation (courtesy of mkerrisk): If we create a new cgroup namespace, then we want both /proc/self/cgroup and /proc/self/mountinfo to show cgroup paths that are correctly virtualized with respect to the cgroup mount point. Previous to this patch, /proc/self/cgroup shows the right info, but /proc/self/mountinfo does not. Long version: When a uid 0 task which is in freezer cgroup /a/b, unshares a new cgroup namespace, and then mounts a new instance of the freezer cgroup, the new mount will be rooted at /a/b. The root dentry field of the mountinfo entry will show '/a/b'. cat > /tmp/do1 << EOF mount -t cgroup -o freezer freezer /mnt grep freezer /proc/self/mountinfo EOF unshare -Gm bash /tmp/do1 > 330 160 0:34 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer > 355 133 0:34 /a/b /mnt rw,relatime - cgroup freezer rw,freezer The task's freezer cgroup entry in /proc/self/cgroup will simply show '/': grep freezer /proc/self/cgroup 9:freezer:/ If instead the same task simply bind mounts the /a/b cgroup directory, the resulting mountinfo entry will again show /a/b for the dentry root. However in this case the task will find its own cgroup at /mnt/a/b, not at /mnt: mount --bind /sys/fs/cgroup/freezer/a/b /mnt 130 25 0:34 /a/b /mnt rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,freezer In other words, there is no way for the task to know, based on what is in mountinfo, which cgroup directory is its own. Example (by mkerrisk): First, a little script to save some typing and verbiage: echo -e "\t/proc/self/cgroup:\t$(cat /proc/self/cgroup | grep freezer)" cat /proc/self/mountinfo | grep freezer | awk '{print "\tmountinfo:\t\t" $4 "\t" $5}' Create cgroup, place this shell into the cgroup, and look at the state of the /proc files: 2653 2653 # Our shell 14254 # cat(1) /proc/self/cgroup: 10:freezer:/a/b mountinfo: / /sys/fs/cgroup/freezer Create a shell in new cgroup and mount namespaces. The act of creating a new cgroup namespace causes the process's current cgroups directories to become its cgroup root directories. (Here, I'm using my own version of the "unshare" utility, which takes the same options as the util-linux version): Look at the state of the /proc files: /proc/self/cgroup: 10:freezer:/ mountinfo: / /sys/fs/cgroup/freezer The third entry in /proc/self/cgroup (the pathname of the cgroup inside the hierarchy) is correctly virtualized w.r.t. the cgroup namespace, which is rooted at /a/b in the outer namespace. However, the info in /proc/self/mountinfo is not for this cgroup namespace, since we are seeing a duplicate of the mount from the old mount namespace, and the info there does not correspond to the new cgroup namespace. However, trying to create a new mount still doesn't show us the right information in mountinfo: # propagating to other mountns /proc/self/cgroup: 7:freezer:/ mountinfo: /a/b /mnt/freezer The act of creating a new cgroup namespace caused the process's current freezer directory, "/a/b", to become its cgroup freezer root directory. In other words, the pathname directory of the directory within the newly mounted cgroup filesystem should be "/", but mountinfo wrongly shows us "/a/b". The consequence of this is that the process in the cgroup namespace cannot correctly construct the pathname of its cgroup root directory from the information in /proc/PID/mountinfo. With this patch, the dentry root field in mountinfo is shown relative to the reader's cgroup namespace. So the same steps as above: /proc/self/cgroup: 10:freezer:/a/b mountinfo: / /sys/fs/cgroup/freezer /proc/self/cgroup: 10:freezer:/ mountinfo: /../.. /sys/fs/cgroup/freezer /proc/self/cgroup: 10:freezer:/ mountinfo: / /mnt/freezer cgroup.clone_children freezer.parent_freezing freezer.state tasks cgroup.procs freezer.self_freezing notify_on_release 3164 2653 # First shell that placed in this cgroup 3164 # Shell started by 'unshare' 14197 # cat(1) Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-05-07udp_offload: Set encapsulation before inner completes.Jarno Rajahalme1-0/+3
UDP tunnel segmentation code relies on the inner offsets being set for an UDP tunnel GSO packet, but the inner *_complete() functions will set the inner offsets only if 'encapsulation' is set before calling them. Currently, udp_gro_complete() sets 'encapsulation' only after the inner *_complete() functions are done. This causes the inner offsets having invalid values after udp_gro_complete() returns, which in turn will make it impossible to properly segment the packet in case it needs to be forwarded, which would be visible to the user either as invalid packets being sent or as packet loss. This patch fixes this by setting skb's 'encapsulation' in udp_gro_complete() before calling into the inner complete functions, and by making each possible UDP tunnel gro_complete() callback set the inner_mac_header to the beginning of the tunnel payload. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06Merge tag 'pm+acpi-4.6-rc7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "Fixes for problems introduced or discovered recently (intel_pstate, sti-cpufreq, ARM64 cpuidle, Operating Performance Points framework, generic device properties framework) and one fix for a hotplug-related deadlock in ACPICA that's been there forever, but is nasty enough. Specifics: - Fix for a recent regression in the intel_pstate driver causing it to fail to restore the HWP (HW-managed P-states) configuration of the boot CPU after suspend-to-RAM (Rafael Wysocki). - Fix for two recent regressions in the intel_pstate driver, one that can trigger a divide by zero if the driver is accessed via sysfs before it manages to take the first sample and one causing it to fail to update a structure field used in a trace point, so the information coming from it is less useful (Rafael Wysocki). - Fix for a problem in the sti-cpufreq driver introduced during the 4.5 cycle that causes it to break CPU PM in multi-platform kernels by registering cpufreq-dt (which subsequently doesn't work) unconditionally and preventing the driver that would actually work from registering (Sudeep Holla). - Stable-candidate fix for an ARM64 cpuidle issue causing idle state usage counters to be incorrectly updated for idle states that were not entered due to errors (James Morse). - Fix for a recently introduced issue in the OPP (Operating Performance Points) framework causing it to print bogus error messages for missing optional regulators (Viresh Kumar). - Fix for a recently introduced issue in the generic device properties framework that may cause it to attempt to dereferece and invalid pointer in some cases (Heikki Krogerus). - Fix for a deadlock in the ACPICA core that may be triggered by device (eg Thunderbolt) hotplug (Prarit Bhargava)" * tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / OPP: Remove useless check ACPICA: Dispatcher: Update thread ID for recursive method calls intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value device property: Avoid potential dereferences of invalid pointers
2016-05-06Merge branches 'acpica-fixes' and 'device-properties-fixes'Rafael J. Wysocki1-1/+1
* acpica-fixes: ACPICA: Dispatcher: Update thread ID for recursive method calls * device-properties-fixes: device property: Avoid potential dereferences of invalid pointers
2016-05-06mm: thp: kvm: fix memory corruption in KVM with THP enabledAndrea Arcangeli1-0/+22
After the THP refcounting change, obtaining a compound pages from get_user_pages() no longer allows us to assume the entire compound page is immediately mappable from a secondary MMU. A secondary MMU doesn't want to call get_user_pages() more than once for each compound page, in order to know if it can map the whole compound page. So a secondary MMU needs to know from a single get_user_pages() invocation when it can map immediately the entire compound page to avoid a flood of unnecessary secondary MMU faults and spurious atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier users). Ideally instead of the page->_mapcount < 1 check, get_user_pages() should return the granularity of the "page" mapping in the "mm" passed to get_user_pages(). However it's non trivial change to pass the "pmd" status belonging to the "mm" walked by get_user_pages up the stack (up to the caller of get_user_pages). So the fix just checks if there is not a single pte mapping on the page returned by get_user_pages, and in turn if the caller can assume that the whole compound page is mapped in the current "mm" (in a pmd_trans_huge()). In such case the entire compound page is safe to map into the secondary MMU without additional get_user_pages() calls on the surrounding tail/head pages. In addition of being faster, not having to run other get_user_pages() calls also reduces the memory footprint of the secondary MMU fault in case the pmd split happened as result of memory pressure. Without this fix after a MADV_DONTNEED (like invoked by QEMU during postcopy live migration or balloning) or after generic swapping (with a failure in split_huge_page() that would only result in pmd splitting and not a physical page split), KVM would map the whole compound page into the shadow pagetables, despite regular faults or userfaults (like UFFDIO_COPY) may map regular pages into the primary MMU as result of the pte faults, leading to the guest mode and userland mode going out of sync and not working on the same memory at all times. Any other secondary MMU notifier manager (KVM is just one of the many MMU notifier users) will need the same information if it doesn't want to run a flood of get_user_pages_fast and it can support multiple granularity in the secondary MMU mappings, so I think it is justified to be exposed not just to KVM. The other option would be to move transparent_hugepage_adjust to mm/huge_memory.c but that currently has all kind of KVM data structures in it, so it's definitely not a cut-and-paste work, so I couldn't do a fix as cleaner as this one for 4.6. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: "Li, Liang Z" <liang.z.li@intel.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06rapidio/mport_cdev: fix uapi type definitionsAlexandre Bounine1-271/+0
Fix problems in uapi definitions reported by Gabriel Laskar: (see https://lkml.org/lkml/2016/4/5/205 for details) - move public header file rio_mport_cdev.h to include/uapi/linux directory - change types in data structures passed as IOCTL parameters - improve parameter checking in some IOCTL service routines Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Reported-by: Gabriel Laskar <gabriel@lse.epita.fr> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Gabriel Laskar <gabriel@lse.epita.fr> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06mm: memcontrol: let v2 cgroups follow changes in system swappinessJohannes Weiner1-0/+4
Cgroup2 currently doesn't have a per-cgroup swappiness setting. We might want to add one later - that's a different discussion - but until we do, the cgroups should always follow the system setting. Otherwise it will be unchangeably set to whatever the ancestor inherited from the system setting at the time of cgroup creation. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+5
Pull networking fixes from David Miller: "Some straggler bug fixes: 1) Batman-adv DAT must consider VLAN IDs when choosing candidate nodes, from Antonio Quartulli. 2) Fix botched reference counting of vlan objects and neigh nodes in batman-adv, from Sven Eckelmann. 3) netem can crash when it sees GSO packets, the fix is to segment then upon ->enqueue. Fix from Neil Horman with help from Eric Dumazet. 4) Fix VXLAN dependencies in mlx5 driver Kconfig, from Matthew Finlay. 5) Handle VXLAN ops outside of rcu lock, via a workqueue, in mlx5, since it can sleep. Fix also from Matthew Finlay. 6) Check mdiobus_scan() return values properly in pxa168_eth and macb drivers. From Sergei Shtylyov. 7) If the netdevice doesn't support checksumming, disable segmentation. From Alexandery Duyck. 8) Fix races between RDS tcp accept and sending, from Sowmini Varadhan. 9) In macb driver, probe MDIO bus before we register the netdev, otherwise we can try to open the device before it is really ready for that. Fix from Florian Fainelli. 10) Netlink attribute size for ILA "tunnels" not calculated properly, fix from Nicolas Dichtel" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: ipv6/ila: fix nlsize calculation for lwtunnel net: macb: Probe MDIO bus before registering netdev RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock. RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock vxlan: Add checksum check to the features check function net: Disable segmentation if checksumming is not supported net: mvneta: Remove superfluous SMP function call macb: fix mdiobus_scan() error check pxa168_eth: fix mdiobus_scan() error check net/mlx5e: Use workqueue for vxlan ops net/mlx5e: Implement a mlx5e workqueue net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue net/mlx5: Unmap only the relevant IO memory mapping netem: Segment GSO packets on enqueue batman-adv: Fix reference counting of hardif_neigh_node object for neigh_node batman-adv: Fix reference counting of vlan object for tt_local_entry batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP event batman-adv: fix DAT candidate selection (must use vid)
2016-05-03vxlan: Add checksum check to the features check functionAlexander Duyck1-0/+5
We need to perform an additional check on the inner headers to determine if we can offload the checksum for them. Previously this check didn't occur so we would generate an invalid frame in the case of an IPv6 header encapsulated inside of an IPv4 tunnel. To fix this I added a secondary check to vxlan_features_check so that we can verify that we can offload the inner checksum. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-02Minimal fix-up of bad hashing behavior of hash_64()Linus Torvalds1-2/+18
This is a fairly minimal fixup to the horribly bad behavior of hash_64() with certain input patterns. In particular, because the multiplicative value used for the 64-bit hash was intentionally bit-sparse (so that the multiply could be done with shifts and adds on architectures without hardware multipliers), some bits did not get spread out very much. In particular, certain fairly common bit ranges in the input (roughly bits 12-20: commonly with the most information in them when you hash things like byte offsets in files or memory that have block factors that mean that the low bits are often zero) would not necessarily show up much in the result. There's a bigger patch-series brewing to fix up things more completely, but this is the fairly minimal fix for the 64-bit hashing problem. It simply picks a much better constant multiplier, spreading the bits out a lot better. NOTE! For 32-bit architectures, the bad old hash_64() remains the same for now, since 64-bit multiplies are expensive. The bigger hashing cleanup will replace the 32-bit case with something better. The new constants were picked by George Spelvin who wrote that bigger cleanup series. I just picked out the constants and part of the comment from that series. Cc: stable@vger.kernel.org Cc: George Spelvin <linux@horizon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-3/+12
Pull networking fixes from David Miller: 1) MODULE_FIRMWARE firmware string not correct for iwlwifi 8000 chips, from Sara Sharon. 2) Fix SKB size checks in batman-adv stack on receive, from Sven Eckelmann. 3) Leak fix on mac80211 interface add error paths, from Johannes Berg. 4) Cannot invoke napi_disable() with BH disabled in myri10ge driver, fix from Stanislaw Gruszka. 5) Fix sign extension problem when computing feature masks in net_gso_ok(), from Marcelo Ricardo Leitner. 6) lan78xx driver doesn't count packets and packet lengths in its statistics properly, fix from Woojung Huh. 7) Fix the buffer allocation sizes in pegasus USB driver, from Petko Manolov. 8) Fix refcount overflows in bpf, from Alexei Starovoitov. 9) Unified dst cache handling introduced a preempt warning in ip_tunnel, fix by resetting rather then setting the cached route. From Paolo Abeni. 10) Listener hash collision test fix in soreuseport, from Craig Gallak * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) gre: do not pull header in ICMP error processing net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case tipc: only process unicast on intended node cxgb3: fix out of bounds read net/smscx5xx: use the device tree for mac address soreuseport: Fix TCP listener hash collision net: l2tp: fix reversed udp6 checksum flags ip_tunnel: fix preempt warning in ip tunnel creation/updating samples/bpf: fix trace_output example bpf: fix check_map_func_compatibility logic bpf: fix refcnt overflow drivers: net: cpsw: use of_phy_connect() in fixed-link case dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive drivers: net: cpsw: don't ignore phy-mode if phy-handle is used drivers: net: cpsw: fix segfault in case of bad phy-handle drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver gre: reject GUE and FOU in collect metadata mode pegasus: fixes reported packet length pegasus: fixes URB buffer allocation size; ...