summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-04-21writeback: safer lock nestingGreg Thelen4-26/+34
lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if the page's memcg is undergoing move accounting, which occurs when a process leaves its memcg for a new one that has memory.move_charge_at_immigrate set. unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the given inode is switching writeback domains. Switches occur when enough writes are issued from a new domain. This existing pattern is thus suspicious: lock_page_memcg(page); unlocked_inode_to_wb_begin(inode, &locked); ... unlocked_inode_to_wb_end(inode, locked); unlock_page_memcg(page); If both inode switch and process memcg migration are both in-flight then unlocked_inode_to_wb_end() will unconditionally enable interrupts while still holding the lock_page_memcg() irq spinlock. This suggests the possibility of deadlock if an interrupt occurs before unlock_page_memcg(). truncate __cancel_dirty_page lock_page_memcg unlocked_inode_to_wb_begin unlocked_inode_to_wb_end <interrupts mistakenly enabled> <interrupt> end_page_writeback test_clear_page_writeback lock_page_memcg <deadlock> unlock_page_memcg Due to configuration limitations this deadlock is not currently possible because we don't mix cgroup writeback (a cgroupv2 feature) and memory.move_charge_at_immigrate (a cgroupv1 feature). If the kernel is hacked to always claim inode switching and memcg moving_account, then this script triggers lockup in less than a minute: cd /mnt/cgroup/memory mkdir a b echo 1 > a/memory.move_charge_at_immigrate echo 1 > b/memory.move_charge_at_immigrate ( echo $BASHPID > a/cgroup.procs while true; do dd if=/dev/zero of=/mnt/big bs=1M count=256 done ) & while true; do sync done & sleep 1h & SLEEP=$! while true; do echo $SLEEP > a/cgroup.procs echo $SLEEP > b/cgroup.procs done The deadlock does not seem possible, so it's debatable if there's any reason to modify the kernel. I suggest we should to prevent future surprises. And Wang Long said "this deadlock occurs three times in our environment", so there's more reason to apply this, even to stable. Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch see "[PATCH for-4.4] writeback: safer lock nesting" https://lkml.org/lkml/2018/4/11/146 Wang Long said "this deadlock occurs three times in our environment" [gthelen@google.com: v4] Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com [akpm@linux-foundation.org: comment tweaks, struct initialization simplification] Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613 Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") Signed-off-by: Greg Thelen <gthelen@google.com> Reported-by: Wang Long <wanglong19@meituan.com> Acked-by: Wang Long <wanglong19@meituan.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> [v4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-21mm, pagemap: fix swap offset value for PMD migration entryHuang Ying1-1/+5
The swap offset reported by /proc/<pid>/pagemap may be not correct for PMD migration entries. If addr passed into pagemap_pmd_range() isn't aligned with PMD start address, the swap offset reported doesn't reflect this. And in the loop to report information of each sub-page, the swap offset isn't increased accordingly as that for PFN. This may happen after opening /proc/<pid>/pagemap and seeking to a page whose address doesn't align with a PMD start address. I have verified this with a simple test program. BTW: migration swap entries have PFN information, do we need to restrict whether to show them? [akpm@linux-foundation.org: fix typo, per Huang, Ying] Link: http://lkml.kernel.org/r/20180408033737.10897-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "Jerome Glisse" <jglisse@redhat.com> Cc: Daniel Colascione <dancol@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-21mm: fix do_pages_move status handlingMichal Hocko1-0/+3
Li Wang has reported that LTP move_pages04 test fails with the current tree: LTP move_pages04: TFAIL : move_pages04.c:143: status[1] is EPERM, expected EFAULT The test allocates an array of two pages, one is present while the other is not (resp. backed by zero page) and it expects EFAULT for the second page as the man page suggests. We are reporting EPERM which doesn't make any sense and this is a result of a bug from cf5f16b23ec9 ("mm: unclutter THP migration"). do_pages_move tries to handle as many pages in one batch as possible so we queue all pages with the same node target together and that corresponds to [start, i] range which is then used to update status array. add_page_for_migration will correctly notice the zero (resp. !present) page and returns with EFAULT which gets written to the status. But if this is the last page in the array we do not update start and so the last store_status after the loop will overwrite the range of the last batch with NUMA_NO_NODE (which corresponds to EPERM). Fix this by simply bailing out from the last flush if the pagelist is empty as there is clearly nothing more to do. Link: http://lkml.kernel.org/r/20180418121255.334-1-mhocko@kernel.org Fixes: cf5f16b23ec9 ("mm: unclutter THP migration") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Li Wang <liwang@redhat.com> Tested-by: Li Wang <liwang@redhat.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-21fork: unconditionally clear stack on forkKees Cook2-7/+2
One of the classes of kernel stack content leaks[1] is exposing the contents of prior heap or stack contents when a new process stack is allocated. Normally, those stacks are not zeroed, and the old contents remain in place. In the face of stack content exposure flaws, those contents can leak to userspace. Fixing this will make the kernel no longer vulnerable to these flaws, as the stack will be wiped each time a stack is assigned to a new process. There's not a meaningful change in runtime performance; it almost looks like it provides a benefit. Performing back-to-back kernel builds before: Run times: 157.86 157.09 158.90 160.94 160.80 Mean: 159.12 Std Dev: 1.54 and after: Run times: 159.31 157.34 156.71 158.15 160.81 Mean: 158.46 Std Dev: 1.46 Instead of making this a build or runtime config, Andy Lutomirski recommended this just be enabled by default. [1] A noisy search for many kinds of stack content leaks can be seen here: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak I did some more with perf and cycle counts on running 100,000 execs of /bin/true. before: Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841 Mean: 221015379122.60 Std Dev: 4662486552.47 after: Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348 Mean: 217745009865.40 Std Dev: 5935559279.99 It continues to look like it's faster, though the deviation is rather wide, but I'm not sure what I could do that would be less noisy. I'm open to ideas! Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversionDavid Howells1-1/+1
In do_mount() when the MS_* flags are being converted to MNT_* flags, MS_RDONLY got accidentally convered to SB_RDONLY. Undo this change. Fixes: e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20afs: Fix server record deletionDavid Howells1-1/+8
AFS server records get removed from the net->fs_servers tree when they're deleted, but not from the net->fs_addresses{4,6} lists, which can lead to an oops in afs_find_server() when a server record has been removed, for instance during rmmod. Fix this by deleting the record from the by-address lists before posting it for RCU destruction. The reason this hasn't been noticed before is that the fileserver keeps probing the local cache manager, thereby keeping the service record alive, so the oops would only happen when a fileserver eventually gets bored and stops pinging or if the module gets rmmod'd and a call comes in from the fileserver during the window between the server records being destroyed and the socket being closed. The oops looks something like: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c ... Workqueue: kafsd afs_process_async_call [kafs] RIP: 0010:afs_find_server+0x271/0x36f [kafs] ... Call Trace: afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs] afs_deliver_to_call+0x1ee/0x5e8 [kafs] afs_process_async_call+0x5b/0xd0 [kafs] process_one_work+0x2c2/0x504 worker_thread+0x1d4/0x2ac kthread+0x11f/0x127 ret_from_fork+0x24/0x30 Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds69-349/+786
Pull networking fixes from David Miller: 1) Unbalanced refcounting in TIPC, from Jon Maloy. 2) Only allow TCP_MD5SIG to be set on sockets in close or listen state. Once the connection is established it makes no sense to change this. From Eric Dumazet. 3) Missing attribute validation in neigh_dump_table(), also from Eric Dumazet. 4) Fix address comparisons in SCTP, from Xin Long. 5) Neigh proxy table clearing can deadlock, from Wolfgang Bumiller. 6) Fix tunnel refcounting in l2tp, from Guillaume Nault. 7) Fix double list insert in team driver, from Paolo Abeni. 8) af_vsock.ko module was accidently made unremovable, from Stefan Hajnoczi. 9) Fix reference to freed llc_sap object in llc stack, from Cong Wang. 10) Don't assume netdevice struct is DMA'able memory in virtio_net driver, from Michael S. Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) net/smc: fix shutdown in state SMC_LISTEN bnxt_en: Fix memory fault in bnxt_ethtool_init() virtio_net: sparse annotation fix virtio_net: fix adding vids on big-endian virtio_net: split out ctrl buffer net: hns: Avoid action name truncation docs: ip-sysctl.txt: fix name of some ipv6 variables vmxnet3: fix incorrect dereference when rxvlan is disabled llc: hold llc_sap before release_sock() MAINTAINERS: Direct networking documentation changes to netdev atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit" net: qmi_wwan: add Wistron Neweb D19Q1 net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN" net: stmmac: Disable ACS Feature for GMAC >= 4 net: mvpp2: Fix DMA address mask size net: change the comment of dev_mc_init net: qualcomm: rmnet: Fix warning seen with fill_info tun: fix vlan packet truncation tipc: fix infinite loop when dumping link monitor summary tipc: fix use-after-free in tipc_nametbl_stop ...
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds8-11/+39
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Assorted fixes. Some of that is only a matter with fault injection (broken handling of small allocation failure in various mount-related places), but the last one is a root-triggerable stack overflow, and combined with userns it gets really nasty ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Don't leak MNT_INTERNAL away from internal mounts mm,vmscan: Allow preallocating memory for register_shrinker(). rpc_pipefs: fix double-dput() orangefs_kill_sb(): deal with allocation failures jffs2_kill_sb(): deal with failed allocations hypfs_kill_super(): deal with failed allocations
2018-04-20Merge tag 'ecryptfs-4.17-rc2-fixes' of ↵Linus Torvalds4-21/+46
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull eCryptfs fixes from Tyler Hicks: "Minor cleanups and a bug fix to completely ignore unencrypted filenames in the lower filesystem when filename encryption is enabled at the eCryptfs layer" * tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: don't pass up plaintext names when using filename encryption ecryptfs: fix spelling mistake: "cadidate" -> "candidate" ecryptfs: lookup: Don't check if mount_crypt_stat is NULL
2018-04-20Merge tag 'for_v4.17-rc2' of ↵Linus Torvalds9-40/+63
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs - isofs memory leak fix - two fsnotify fixes of event mask handling - udf fix of UTF-16 handling - couple other smaller cleanups * tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix leak of UTF-16 surrogates into encoded strings fs: ext2: Adding new return type vm_fault_t isofs: fix potential memory leak in mount option parsing MAINTAINERS: add an entry for FSNOTIFY infrastructure fsnotify: fix typo in a comment about mark->g_list fsnotify: fix ignore mask logic in send_to_group() isofs compress: Remove VLA usage fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init fanotify: fix logic of events on child
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds6-38/+92
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: - suspend/resume handling fix for Raydium I2C-connected touchscreen from Aaron Ma - protocol fixup for certain BT-connected Wacoms from Aaron Armstrong Skomra - battery level reporting fix on BT-connected mice from Dmitry Torokhov - hidraw race condition fix from Rodrigo Rivas Costa * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: i2c-hid: fix inverted return value from i2c_hid_command() HID: i2c-hid: Fix resume issue on Raydium touchscreen device HID: wacom: bluetooth: send exit report for recent Bluetooth devices HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device HID: input: fix battery level reporting on BT mice
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds5-81/+163
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fix from Jiri Kosina: "Shadow variable API list_head initialization fix from Petr Mladek" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: Allow to call a custom callback when freeing shadow variables livepatch: Initialize shadow variables safely by a custom callback
2018-04-20Merge tag 'for-linus-4.17-rc2-tag' of ↵Linus Torvalds4-22/+313
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - some fixes of kmalloc() flags - one fix of the xenbus driver - an update of the pv sound driver interface needed for a driver which will go through the sound tree * tag 'for-linus-4.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: xenbus_dev_frontend: Really return response string xen/sndif: Sync up with the canonical definition in Xen xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_reg_add xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in xen_pcibk_config_quirks_init xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_device_alloc xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_init_device xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_probe
2018-04-20Merge tag 'mips_fixes_4.17_1' of ↵Linus Torvalds4-6/+26
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips Pull MIPS fixes from James Hogan: - io: Add barriers to read*() & write*() - dts: Fix boston PCI bus DTC warnings (4.17) - memset: Several corner case fixes (one 3.10, others longer) * tag 'mips_fixes_4.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: MIPS: uaccess: Add micromips clobbers to bzero invocation MIPS: memset.S: Fix clobber of v1 in last_fixup MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup MIPS: memset.S: EVA & fault support for small_memset MIPS: dts: Boston: Fix PCI bus dtc warnings: MIPS: io: Add barrier after register read in readX() MIPS: io: Prevent compiler reordering writeX()
2018-04-20Merge tag 'powerpc-4.17-3' of ↵Linus Torvalds5-4/+20
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix an off-by-one bug in our alternative asm patching which leads to incorrectly patched code. This bug lay dormant for nearly 10 years but we finally hit it due to a recent change. - Fix lockups when running KVM guests on Power8 due to a missing check when a thread that's running KVM comes out of idle. - Fix an out-of-spec behaviour in the XIVE code (P9 interrupt controller). - Fix EEH handling of bridge MMIO windows. - Prevent crashes in our RFI fallback flush handler if firmware didn't tell us the size of the L1 cache (only seen on simulators). Thanks to: Benjamin Herrenschmidt, Madhavan Srinivasan, Michael Neuling. * tag 'powerpc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/kvm: Fix lockups when running KVM guests on Power8 powerpc/eeh: Fix enabling bridge MMIO windows powerpc/xive: Fix trying to "push" an already active pool VP powerpc/64s: Default l1d_size to 64K in RFI fallback flush powerpc/lib: Fix off-by-one in alternate feature patching
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds30-716/+998
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes and kexec-file-load from Martin Schwidefsky: "After the common code kexec patches went in via Andrew we can now push the architecture parts to implement the kexec-file-load system call. Plus a few more bug fixes and cleanups, this includes an update to the default configurations" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/signal: cleanup uapi struct sigaction s390: rename default_defconfig to debug_defconfig s390: remove gcov defconfig s390: update defconfig s390: add support for IBM z14 Model ZR1 s390: remove couple of duplicate includes s390/boot: remove unused COMPILE_VERSION and ccflags-y s390/nospec: include cpu.h s390/decompressor: Ignore file vmlinux.bin.full s390/kexec_file: add generated files to .gitignore s390/Kconfig: Move kexec config options to "Processor type and features" s390/kexec_file: Add ELF loader s390/kexec_file: Add crash support to image loader s390/kexec_file: Add image loader s390/kexec_file: Add kexec_file_load system call s390/kexec_file: Add purgatory s390/kexec_file: Prepare setup.h for kexec_file_load s390/smsgiucv: disable SMSG on module unload s390/sclp: avoid potential usage of uninitialized value
2018-04-20Don't leak MNT_INTERNAL away from internal mountsAl Viro1-1/+2
We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for their copies. As it is, creating a deep stack of bindings of /proc/*/ns/* somewhere in a new namespace and exiting yields a stack overflow. Cc: stable@kernel.org Reported-by: Alexander Aring <aring@mojatatu.com> Bisected-by: Kirill Tkhai <ktkhai@virtuozzo.com> Tested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Tested-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-19net/smc: fix shutdown in state SMC_LISTENUrsula Braun1-6/+4
Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket crashes, because commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") releases the internal clcsock in smc_close_active() and sets smc->clcsock to NULL. For SHUT_RD the smc_close_active() call is removed. For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the clcsock is already released. Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19bnxt_en: Fix memory fault in bnxt_ethtool_init()Vasundhara Volam2-24/+27
In some firmware images, the length of BNX_DIR_TYPE_PKG_LOG nvram type could be greater than the fixed buffer length of 4096 bytes allocated by the driver. This was causing HWRM_NVM_READ to copy more data to the buffer than the allocated size, causing general protection fault. Fix the issue by allocating the exact buffer length returned by HWRM_NVM_FIND_DIR_ENTRY, instead of 4096. Move the kzalloc() call into the bnxt_get_pkgver() function. Fixes: 3ebf6f0a09a2 ("bnxt_en: Add installed-package firmware version reporting via Ethtool GDRVINFO") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19Merge branch 'virtio-ctrl-buffer-fixes'David S. Miller1-29/+39
Michael S. Tsirkin says: ==================== virtio: ctrl buffer fixes Here are a couple of fixes related to the virtio control buffer. Lightly tested on x86 only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19virtio_net: sparse annotation fixMichael S. Tsirkin1-1/+1
offloads is a buffer in virtio format, should use the __virtio64 tag. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19virtio_net: fix adding vids on big-endianMichael S. Tsirkin1-3/+3
Programming vids (adding or removing them) still passes guest-endian values in the DMA buffer. That's wrong if guest is big-endian and when virtio 1 is enabled. Note: this is on top of a previous patch: virtio_net: split out ctrl buffer Fixes: 9465a7a6f ("virtio_net: enable v1.0 support") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19virtio_net: split out ctrl bufferMichael S. Tsirkin1-29/+39
When sending control commands, virtio net sets up several buffers for DMA. The buffers are all part of the net device which means it's actually allocated by kvmalloc so it's in theory (on extreme memory pressure) possible to get a vmalloc'ed buffer which on some platforms means we can't DMA there. Fix up by moving the DMA buffers into a separate structure. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net: hns: Avoid action name truncationdann frazier1-1/+1
When longer interface names are used, the action names exposed in /proc/interrupts and /proc/irq/* maybe truncated. For example, when using the predictable name algorithm in systemd on a HiSilicon D05, I see: ubuntu@d05-3:~$ grep enahisic2i0-tx /proc/interrupts | sed 's/.* //' enahisic2i0-tx0 enahisic2i0-tx1 [...] enahisic2i0-tx8 enahisic2i0-tx9 enahisic2i0-tx1 enahisic2i0-tx1 enahisic2i0-tx1 enahisic2i0-tx1 enahisic2i0-tx1 enahisic2i0-tx1 Increase the max ring name length to allow for an interface name of IFNAMSIZE. After this change, I now see: $ grep enahisic2i0-tx /proc/interrupts | sed 's/.* //' enahisic2i0-tx0 enahisic2i0-tx1 enahisic2i0-tx2 [...] enahisic2i0-tx8 enahisic2i0-tx9 enahisic2i0-tx10 enahisic2i0-tx11 enahisic2i0-tx12 enahisic2i0-tx13 enahisic2i0-tx14 enahisic2i0-tx15 Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19docs: ip-sysctl.txt: fix name of some ipv6 variablesOlivier Gayot1-4/+4
The name of the following proc/sysctl entries were incorrectly documented: /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_number /proc/sys/net/ipv6/conf/<interface>/max_hbt_opts_number /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_length /proc/sys/net/ipv6/conf/<interface>/max_hbt_length Their name was set to the name of the symbol in the .data field of the control table instead of their .proc name. Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19vmxnet3: fix incorrect dereference when rxvlan is disabledRonak Doshi2-6/+15
vmxnet3_get_hdr_len() is used to calculate the header length which in turn is used to calculate the gso_size for skb. When rxvlan offload is disabled, vlan tag is present in the header and the function references ip header from sizeof(ethhdr) and leads to incorrect pointer reference. This patch fixes this issue by taking sizeof(vlan_ethhdr) into account if vlan tag is present and correctly references the ip hdr. Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Acked-by: Louis Luo <llouis@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19llc: hold llc_sap before release_sock()Cong Wang1-0/+7
syzbot reported we still access llc->sap in llc_backlog_rcv() after it is freed in llc_sap_remove_socket(): Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430 llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785 llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline] llc_conn_service net/llc/llc_conn.c:400 [inline] llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75 llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x12f/0x3a0 net/core/sock.c:2335 release_sock+0xa4/0x2b0 net/core/sock.c:2850 llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204 llc->sap is refcount'ed and llc_sap_remove_socket() is paired with llc_sap_add_socket(). This can be amended by holding its refcount before llc_sap_remove_socket() and releasing it after release_sock(). Reported-by: <syzbot+6e181fc95081c2cf9051@syzkaller.appspotmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19MAINTAINERS: Direct networking documentation changes to netdevJonathan Corbet1-0/+1
Networking docs changes go through the networking tree, so patch the MAINTAINERS file to direct authors to the right place. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit"Colin Ian King1-2/+2
Trivial fix to spelling mistake in message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net: qmi_wwan: add Wistron Neweb D19Q1Pawel Dembicki1-0/+1
This modem is embedded on dlink dwr-960 router. The oem configuration states: T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1435 ProdID=d191 Rev=ff.ff S: Manufacturer=Android S: Product=Android S: SerialNumber=0123456789ABCDEF C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none) E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us Tested on openwrt distribution Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN"Colin Ian King1-1/+1
Trivial fix to spelling mistake Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net: stmmac: Disable ACS Feature for GMAC >= 4Jose Abreu3-9/+7
ACS Feature is currently enabled for GMAC >= 4 but the llc_snap status is never checked in descriptor rx_status callback. This will cause stmmac to always strip packets even that ACS feature is already stripping them. Lets be safe and disable the ACS feature for GMAC >= 4 and always strip the packets for this GMAC version. Fixes: 477286b53f55 ("stmmac: add GMAC4 core support") Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net: mvpp2: Fix DMA address mask sizeMaxime Chevallier1-5/+7
PPv2 TX/RX descriptors uses 40bits DMA addresses, but 41 bits masks were used (GENMASK_ULL(40, 0)). This commit fixes that by using the correct mask. Fixes: e7c5359f2eed ("net: mvpp2: introduce PPv2.2 HW descriptors and adapt accessors") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19net: change the comment of dev_mc_initsunlianwen1-1/+1
The comment of dev_mc_init() is wrong. which use dev_mc_flush instead of dev_mc_init. Signed-off-by: Lianwen Sun <sunlw.fnst@cn.fujitsu.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19HID: i2c-hid: fix inverted return value from i2c_hid_command()Jiri Kosina1-1/+1
i2c_hid_command() returns non-zero in error cases (the actual errno). Error handling in for I2C_HID_QUIRK_RESEND_REPORT_DESCR case in i2c_hid_resume() had the check inverted; fix that. Fixes: 3e83eda467 ("HID: i2c-hid: Fix resume issue on Raydium touchscreen device") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-04-19powerpc/kvm: Fix lockups when running KVM guests on Power8Michael Ellerman1-2/+2
When running KVM guests on Power8 we can see a lockup where one CPU stops responding. This often leads to a message such as: watchdog: CPU 136 detected hard LOCKUP on other CPUs 72 Task dump for CPU 72: qemu-system-ppc R running task 10560 20917 20908 0x00040004 And then backtraces on other CPUs, such as: Task dump for CPU 48: ksmd R running task 10032 1519 2 0x00000804 Call Trace: ... --- interrupt: 901 at smp_call_function_many+0x3c8/0x460 LR = smp_call_function_many+0x37c/0x460 pmdp_invalidate+0x100/0x1b0 __split_huge_pmd+0x52c/0xdb0 try_to_unmap_one+0x764/0x8b0 rmap_walk_anon+0x15c/0x370 try_to_unmap+0xb4/0x170 split_huge_page_to_list+0x148/0xa30 try_to_merge_one_page+0xc8/0x990 try_to_merge_with_ksm_page+0x74/0xf0 ksm_scan_thread+0x10ec/0x1ac0 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x78 This is caused by commit 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle"), which added a check in pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and test of kvm_hstate.hwthread_req. The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when entering the guest, so it can then come out to cede with KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after setting hwthread_req to 1, but because hwthread_state is still KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we wake up from idle and won't go to kvm_start_guest. From there the thread will return somewhere garbage and crash. Fix it by skipping the store of hwthread_state, but not the test of hwthread_req, when coming out of idle. It's OK to skip the sync in that case because hwthread_req will have been set on the same thread, so there is no synchronisation required. Fixes: 8c1c7fb0b5ec ("powerpc/64s/idle: avoid sync for KVM state when waking from idle") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19powerpc/eeh: Fix enabling bridge MMIO windowsMichael Neuling1-1/+2
On boot we save the configuration space of PCIe bridges. We do this so when we get an EEH event and everything gets reset that we can restore them. Unfortunately we save this state before we've enabled the MMIO space on the bridges. Hence if we have to reset the bridge when we come back MMIO is not enabled and we end up taking an PE freeze when the driver starts accessing again. This patch forces the memory/MMIO and bus mastering on when restoring bridges on EEH. Ideally we'd do this correctly by saving the configuration space writes later, but that will have to come later in a larger EEH rewrite. For now we have this simple fix. The original bug can be triggered on a boston machine by doing: echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound On boston, this PHB has a PCIe switch on it. Without this patch, you'll see two EEH events, 1 expected and 1 the failure we are fixing here. The second EEH event causes the anything under the PHB to disappear (i.e. the i40e eth). With this patch, only 1 EEH event occurs and devices properly recover. Fixes: 652defed4875 ("powerpc/eeh: Check PCIe link after reset") Cc: stable@vger.kernel.org # v3.11+ Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19net: qualcomm: rmnet: Fix warning seen with fill_infoSubash Abhinov Kasiviswanathan1-5/+6
When the last rmnet device attached to a real device is removed, the real device is unregistered from rmnet. As a result, the real device lookup fails resulting in a warning when the fill_info handler is called as part of the rmnet device unregistration. Fix this by returning the rmnet flags as 0 when no real device is present. WARNING: CPU: 0 PID: 1779 at net/core/rtnetlink.c:3254 rtmsg_ifinfo_build_skb+0xca/0x10d Modules linked in: CPU: 0 PID: 1779 Comm: ip Not tainted 4.16.0-11872-g7ce2367 #1 Stack: 7fe655f0 60371ea3 00000000 00000000 60282bc6 6006b116 7fe65600 60371ee8 7fe65660 6003a68c 00000000 900000000 Call Trace: [<6006b116>] ? printk+0x0/0x94 [<6001f375>] show_stack+0xfe/0x158 [<60371ea3>] ? dump_stack_print_info+0xe8/0xf1 [<60282bc6>] ? rtmsg_ifinfo_build_skb+0xca/0x10d [<6006b116>] ? printk+0x0/0x94 [<60371ee8>] dump_stack+0x2a/0x2c [<6003a68c>] __warn+0x10e/0x13e [<6003a82c>] warn_slowpath_null+0x48/0x4f [<60282bc6>] rtmsg_ifinfo_build_skb+0xca/0x10d [<60282c4d>] rtmsg_ifinfo_event.part.37+0x1e/0x43 [<60282c2f>] ? rtmsg_ifinfo_event.part.37+0x0/0x43 [<60282d03>] rtmsg_ifinfo+0x24/0x28 [<60264e86>] dev_close_many+0xba/0x119 [<60282cdf>] ? rtmsg_ifinfo+0x0/0x28 [<6027c225>] ? rtnl_is_locked+0x0/0x1c [<6026ca67>] rollback_registered_many+0x1ae/0x4ae [<600314be>] ? unblock_signals+0x0/0xae [<6026cdc0>] ? unregister_netdevice_queue+0x19/0xec [<6026ceec>] unregister_netdevice_many+0x21/0xa1 [<6027c765>] rtnl_delete_link+0x3e/0x4e [<60280ecb>] rtnl_dellink+0x262/0x29c [<6027c241>] ? rtnl_get_link+0x0/0x3e [<6027f867>] rtnetlink_rcv_msg+0x235/0x274 Fixes: be81a85f5f87 ("net: qualcomm: rmnet: Implement fill_info") Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19MIPS: uaccess: Add micromips clobbers to bzero invocationMatt Redfearn1-2/+9
The micromips implementation of bzero additionally clobbers registers t7 & t8. Specify this in the clobbers list when invoking bzero. Fixes: 26c5e07d1478 ("MIPS: microMIPS: Optimise 'memset' core library function.") Reported-by: James Hogan <jhogan@kernel.org> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 3.10+ Patchwork: https://patchwork.linux-mips.org/patch/19110/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-18MIPS: memset.S: Fix clobber of v1 in last_fixupMatt Redfearn1-1/+1
The label .Llast_fixup\@ is jumped to on page fault within the final byte set loop of memset (on < MIPSR6 architectures). For some reason, in this fault handler, the v1 register is randomly set to a2 & STORMASK. This clobbers v1 for the calling function. This can be observed with the following test code: static int __init __attribute__((optimize("O0"))) test_clear_user(void) { register int t asm("v1"); char *test; int j, k; pr_info("\n\n\nTesting clear_user\n"); test = vmalloc(PAGE_SIZE); for (j = 256; j < 512; j++) { t = 0xa5a5a5a5; if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) { pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, k); } if (t != 0xa5a5a5a5) { pr_err("v1 was clobbered to 0x%x!\n", t); } } return 0; } late_initcall(test_clear_user); Which demonstrates that v1 is indeed clobbered (MIPS64): Testing clear_user v1 was clobbered to 0x1! v1 was clobbered to 0x2! v1 was clobbered to 0x3! v1 was clobbered to 0x4! v1 was clobbered to 0x5! v1 was clobbered to 0x6! v1 was clobbered to 0x7! Since the number of bytes that could not be set is already contained in a2, the andi placing a value in v1 is not necessary and actively harmful in clobbering v1. Reported-by: James Hogan <jhogan@kernel.org> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/19109/ Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-18Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-clientLinus Torvalds2-35/+76
Pull ceph fixes from Ilya Dryomov: "A couple of follow-up patches for -rc1 changes in rbd, support for a timeout on waiting for the acquisition of exclusive lock and a fix for uninitialized memory access in CephFS, marked for stable" * tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client: rbd: notrim map option rbd: adjust queue limits for "fancy" striping rbd: avoid Wreturn-type warnings ceph: always update atime/mtime/ctime for new inode rbd: support timeout in rbd_wait_state_locked() rbd: refactor rbd_wait_state_locked()
2018-04-18tun: fix vlan packet truncationBjørn Mork1-6/+1
Bogus trimming in tun_net_xmit() causes truncated vlan packets. skb->len is correct whether or not skb_vlan_tag_present() is true. There is no more reason to adjust the skb length on xmit in this driver than any other driver. tun_put_user() adds 4 bytes to the total for tagged packets because it transmits the tag inline to userspace. This is similar to a nic transmitting the tag inline on the wire. Reproducing the bug by sending any tagged packet through back-to-back connected tap interfaces: socat TUN,tun-type=tap,iff-up,tun-name=in TUN,tun-type=tap,iff-up,tun-name=out & ip link add link in name in.20 type vlan id 20 ip addr add 10.9.9.9/24 dev in.20 ip link set in.20 up tshark -nxxi in -f arp -c1 2>/dev/null & tshark -nxxi out -f arp -c1 2>/dev/null & ping -c 1 10.9.9.5 >/dev/null 2>&1 The output from the 'in' and 'out' interfaces are different when the bug is present: Capturing on 'in' 0000 ff ff ff ff ff ff 76 cf 76 37 d5 0a 81 00 00 14 ......v.v7...... 0010 08 06 00 01 08 00 06 04 00 01 76 cf 76 37 d5 0a ..........v.v7.. 0020 0a 09 09 09 00 00 00 00 00 00 0a 09 09 05 .............. Capturing on 'out' 0000 ff ff ff ff ff ff 76 cf 76 37 d5 0a 81 00 00 14 ......v.v7...... 0010 08 06 00 01 08 00 06 04 00 01 76 cf 76 37 d5 0a ..........v.v7.. 0020 0a 09 09 09 00 00 00 00 00 00 .......... Fixes: aff3d70a07ff ("tun: allow to attach ebpf socket filter") Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18tipc: fix infinite loop when dumping link monitor summaryTung Nguyen2-8/+5
When configuring the number of used bearers to MAX_BEARER and issuing command "tipc link monitor summary", the command enters infinite loop in user space. This issue happens because function tipc_nl_node_dump_monitor() returns the wrong 'prev_bearer' value when all potential monitors have been scanned. The correct behavior is to always try to scan all monitors until either the netlink message is full, in which case we return the bearer identity of the affected monitor, or we continue through the whole bearer array until we can return MAX_BEARERS. This solution also caters for the case where there may be gaps in the bearer array. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18tipc: fix use-after-free in tipc_nametbl_stopJon Maloy1-12/+17
When we delete a service item in tipc_nametbl_stop() we loop over all service ranges in the service's RB tree, and for each service range we loop over its pertaining publications while calling tipc_service_remove_publ() for each of them. However, tipc_service_remove_publ() has the side effect that it also removes the comprising service range item when there are no publications left. This leads to a "use-after-free" access when the inner loop continues to the next iteration, since the range item holding the list we are looping no longer exists. We fix this by moving the delete of the service range item outside the said function. Instead, we now let the two functions calling it test if the list is empty and perform the removal when that is the case. Reported-by: syzbot+d64b64afc55660106556@syzkaller.appspotmail.com Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18powerpc/xive: Fix trying to "push" an already active pool VPBenjamin Herrenschmidt1-0/+4
When setting up a CPU, we "push" (activate) a pool VP for it. However it's an error to do so if it already has an active pool VP. This happens when doing soft CPU hotplug on powernv since we don't tear down the CPU on unplug. The HW flags the error which gets captured by the diagnostics. Fix this by making sure to "pull" out any already active pool first. Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-18udf: Fix leak of UTF-16 surrogates into encoded stringsJan Kara1-0/+6
OSTA UDF specification does not mention whether the CS0 charset in case of two bytes per character encoding should be treated in UTF-16 or UCS-2. The sample code in the standard does not treat UTF-16 surrogates in any special way but on systems such as Windows which work in UTF-16 internally, filenames would be treated as being in UTF-16 effectively. In Linux it is more difficult to handle characters outside of Base Multilingual plane (beyond 0xffff) as NLS framework works with 2-byte characters only. Just make sure we don't leak UTF-16 surrogates into the resulting string when loading names from the filesystem for now. CC: stable@vger.kernel.org # >= v4.6 Reported-by: Mingye Wang <arthur200126@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-04-17KEYS: DNS: limit the length of option stringsEric Biggers1-7/+5
Adding a dns_resolver key whose payload contains a very long option name resulted in that string being printed in full. This hit the WARN_ONCE() in set_precision() during the printk(), because printk() only supports a precision of up to 32767 bytes: precision 1000000 too large WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0 Fix it by limiting option strings (combined name + value) to a much more reasonable 128 bytes. The exact limit is arbitrary, but currently the only recognized option is formatted as "dnserror=%lu" which fits well within this limit. Also ratelimit the printks. Reproducer: perl -e 'print "#", "A" x 1000000, "\x00"' | keyctl padd dns_resolver desc @s This bug was found using syzkaller. Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17sfc: check RSS is active for filter insertBert Kenward1-2/+2
For some firmware variants - specifically 'capture packed stream' - RSS filters are not valid. We must check if RSS is actually active rather than merely enabled. Fixes: 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multiToshiaki Makita2-3/+6
Syzkaller spotted an old bug which leads to reading skb beyond tail by 4 bytes on vlan tagged packets. This is caused because skb_vlan_tagged_multi() did not check skb_headlen. BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283 [inline] BUG: KMSAN: uninit-value in skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672 [inline] BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline] BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 eth_type_vlan include/linux/if_vlan.h:283 [inline] skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline] vlan_features_check include/linux/if_vlan.h:672 [inline] dflt_features_check net/core/dev.c:2949 [inline] netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009 validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084 __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590 packet_snd net/packet/af_packet.c:2944 [inline] packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x43ffa9 RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9 RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0 R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2737 [inline] __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:984 [inline] alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234 sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085 packet_alloc_skb net/packet/af_packet.c:2803 [inline] packet_snd net/packet/af_packet.c:2894 [inline] packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] sock_write_iter+0x3b9/0x470 net/socket.c:909 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776 do_iter_write+0x30d/0xd40 fs/read_write.c:932 vfs_writev fs/read_write.c:977 [inline] do_writev+0x3c9/0x830 fs/read_write.c:1012 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085 SyS_writev+0x56/0x80 fs/read_write.c:1082 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 58e998c6d239 ("offloading: Force software GSO for multiple vlan tags.") Reported-and-tested-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17MIPS: memset.S: Fix return of __clear_user from Lpartial_fixupMatt Redfearn1-1/+1
The __clear_user function is defined to return the number of bytes that could not be cleared. From the underlying memset / bzero implementation this means setting register a2 to that number on return. Currently if a page fault is triggered within the memset_partial block, the value loaded into a2 on return is meaningless. The label .Lpartial_fixup\@ is jumped to on page fault. In order to work out how many bytes failed to copy, the exception handler should find how many bytes left in the partial block (andi a2, STORMASK), add that to the partial block end address (a2), and subtract the faulting address to get the remainder. Currently it incorrectly subtracts the partial block start address (t1), which has additionally been clobbered to generate a jump target in memset_partial. Fix this by adding the block end address instead. This issue was found with the following test code: int j, k; for (j = 0; j < 512; j++) { if ((k = clear_user(NULL, j)) != j) { pr_err("clear_user (NULL %d) returned %d\n", j, k); } } Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64). Suggested-by: James Hogan <jhogan@kernel.org> Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/19108/ Signed-off-by: James Hogan <jhogan@kernel.org>