summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2016-09-02Merge branch 'overlayfs-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Most of this is regression fixes for posix acl behavior introduced in 4.8-rc1 (these were caught by the pjd-fstest suite). The are also miscellaneous fixes marked as stable material and cleanups. Other than overlayfs code, it touches <linux/fs.h> to add a constant with which to disable posix acl caching. No changes needed to the actual caching code, it automatically does the right thing, although later we may want to optimize this case. I'm now testing overlayfs with the following test suites to catch regressions: - unionmount-testsuite - xfstests - pjd-fstest" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: update doc ovl: listxattr: use strnlen() ovl: Switch to generic_getxattr ovl: copyattr after setting POSIX ACL ovl: Switch to generic_removexattr ovl: Get rid of ovl_xattr_noacl_handlers array ovl: Fix OVL_XATTR_PREFIX ovl: fix spelling mistake: "directries" -> "directories" ovl: don't cache acl on overlay layer ovl: use cached acl on underlying layer ovl: proper cleanup of workdir ovl: remove posix_acl_default from workdir ovl: handle umask and posix_acl_default correctly on creation ovl: don't copy up opaqueness
2016-09-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-11/+18
Merge fixes from Andrew Morton: "14 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: rapidio/tsi721: fix incorrect detection of address translation condition rapidio/documentation/mport_cdev: add missing parameter description kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd MAINTAINERS: Vladimir has moved mm, mempolicy: task->mempolicy must be NULL before dropping final reference printk/nmi: avoid direct printk()-s from __printk_nmi_flush() treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLE drivers/scsi/wd719x.c: remove last declaration using DEFINE_PCI_DEVICE_TABLE mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator lib/test_hash.c: fix warning in preprocessor symbol evaluation lib/test_hash.c: fix warning in two-dimensional array init kconfig: tinyconfig: provide whole choice blocks to avoid warnings kexec: fix double-free when failing to relocate the purgatory mm, oom: prevent premature OOM killer invocation for high order request
2016-09-02mm, mempolicy: task->mempolicy must be NULL before dropping final referenceDavid Rientjes1-0/+4
KASAN allocates memory from the page allocator as part of kmem_cache_free(), and that can reference current->mempolicy through any number of allocation functions. It needs to be NULL'd out before the final reference is dropped to prevent a use-after-free bug: BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr ffff88010b48102c CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140 ... Call Trace: dump_stack kasan_object_err kasan_report_error __asan_report_load2_noabort alloc_pages_current <-- use after free depot_save_stack save_stack kasan_slab_free kmem_cache_free __mpol_put <-- free do_exit This patch sets current->mempolicy to NULL before dropping the final reference. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1608301442180.63329@chino.kir.corp.google.com Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-02treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLEJoe Perches1-9/+0
It's been eliminated from the sources, remove it from everywhere else. Link: http://lkml.kernel.org/r/076eff466fd7edb550c25c8b25d76924ca0eba62.1472660229.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-02mm, vmscan: only allocate and reclaim from zones with pages managed by the ↵Mel Gorman1-2/+14
buddy allocator Firmware Assisted Dump (FA_DUMP) on ppc64 reserves substantial amounts of memory when booting a secondary kernel. Srikar Dronamraju reported that multiple nodes may have no memory managed by the buddy allocator but still return true for populated_zone(). Commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes") was reported to cause kswapd to spin at 100% CPU usage when fadump was enabled. The old code happened to deal with the situation of a populated node with zero free pages by co-incidence but the current code tries to reclaim populated zones without realising that is impossible. We cannot just convert populated_zone() as many existing users really need to check for present_pages. This patch introduces a managed_zone() helper and uses it in the few cases where it is critical that the check is made for managed pages -- zonelist construction and page reclaim. Link: http://lkml.kernel.org/r/20160831195104.GB8119@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-02Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds1-0/+1
Pull audit fixes from Paul Moore: "Two small patches to fix some bugs with the audit-by-executable functionality we introduced back in v4.3 (both patches are marked for the stable folks)" * 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit: audit: fix exe_file access in audit_exe_compare mm: introduce get_task_exe_file
2016-09-02Merge tag 'xfs-iomap-for-linus-4.8-rc5' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs and iomap fixes from Dave Chinner: "Most of these changes are small regression fixes that address problems introduced in the 4.8-rc1 window. The two fixes that aren't (IO completion fix and superblock inprogress check) are fixes for problems introduced some time ago and need to be pushed back to stable kernels. Changes in this update: - iomap FIEMAP_EXTENT_MERGED usage fix - additional mount-time feature restrictions - rmap btree query fixes - freeze/unmount io completion workqueue fix - memory corruption fix for deferred operations handling" * tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: track log done items directly in the deferred pending work item iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystems xfs: prevent dropping ioend completions during buftarg wait xfs: fix superblock inprogress check xfs: simple btree query range should look right if LE lookup fails xfs: fix some key handling problems in _btree_simple_query_range xfs: don't log the entire end of the AGF xfs: disallow mounting of realtime + rmap filesystems xfs: don't perform lookups on zero-height btrees
2016-09-01ovl: don't cache acl on overlay layerMiklos Szeredi1-0/+1
Some operations (setxattr/chmod) can make the cached acl stale. We either need to clear overlay's acl cache for the affected inode or prevent acl caching on the overlay altogether. Preventing caching has the following advantages: - no double caching, less memory used - overlay cache doesn't go stale when fs clears it's own cache Possible disadvantage is performance loss. If that becomes a problem get_acl() can be optimized for overlayfs. This patch disables caching by pre setting i_*acl to a value that - has bit 0 set, so is_uncached_acl() will return true - is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it The constant -3 was chosen for this purpose. Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-08-31mm: introduce get_task_exe_fileMateusz Guzik1-0/+1
For more convenient access if one has a pointer to the task. As a minor nit take advantage of the fact that only task lock + rcu are needed to safely grab ->exe_file. This saves mm refcount dance. Use the helper in proc_exe_link. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Richard Guy Briggs <rgb@redhat.com> Cc: <stable@vger.kernel.org> # 4.3.x Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-08-30mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKSJosh Poimboeuf1-1/+1
There are three usercopy warnings which are currently being silenced for gcc 4.6 and newer: 1) "copy_from_user() buffer size is too small" compile warning/error This is a static warning which happens when object size and copy size are both const, and copy size > object size. I didn't see any false positives for this one. So the function warning attribute seems to be working fine here. Note this scenario is always a bug and so I think it should be changed to *always* be an error, regardless of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS. 2) "copy_from_user() buffer size is not provably correct" compile warning This is another static warning which happens when I enable __compiletime_object_size() for new compilers (and CONFIG_DEBUG_STRICT_USER_COPY_CHECKS). It happens when object size is const, but copy size is *not*. In this case there's no way to compare the two at build time, so it gives the warning. (Note the warning is a byproduct of the fact that gcc has no way of knowing whether the overflow function will be called, so the call isn't dead code and the warning attribute is activated.) So this warning seems to only indicate "this is an unusual pattern, maybe you should check it out" rather than "this is a bug". I get 102(!) of these warnings with allyesconfig and the __compiletime_object_size() gcc check removed. I don't know if there are any real bugs hiding in there, but from looking at a small sample, I didn't see any. According to Kees, it does sometimes find real bugs. But the false positive rate seems high. 3) "Buffer overflow detected" runtime warning This is a runtime warning where object size is const, and copy size > object size. All three warnings (both static and runtime) were completely disabled for gcc 4.6 with the following commit: 2fb0815c9ee6 ("gcc4: disable __compiletime_object_size for GCC 4.6+") That commit mistakenly assumed that the false positives were caused by a gcc bug in __compiletime_object_size(). But in fact, __compiletime_object_size() seems to be working fine. The false positives were instead triggered by #2 above. (Though I don't have an explanation for why the warnings supposedly only started showing up in gcc 4.6.) So remove warning #2 to get rid of all the false positives, and re-enable warnings #1 and #3 by reverting the above commit. Furthermore, since #1 is a real bug which is detected at compile time, upgrade it to always be an error. Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer needed. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Nilay Vaish <nilayvaish@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-2/+12
Pull networking fixes from David Miller: 1) Segregate namespaces properly in conntrack dumps, from Liping Zhang. 2) tcp listener refcount fix in netfilter tproxy, from Eric Dumazet. 3) Fix timeouts in qed driver due to xmit_more, from Yuval Mintz. 4) Fix use-after-free in tcp_xmit_retransmit_queue(). 5) Userspace header fixups (use of __u32, missing includes, etc.) from Mikko Rapeli. 6) Further refinements to fragmentation wrt gso and tunnels, from Shmulik Ladkani. 7) Trigger poll correctly for zero length UDP packets, from Eric Dumazet. 8) TCP window scaling fix, also from Eric Dumazet. 9) SLAB_DESTROY_BY_RCU is not relevant any more for UDP sockets. 10) Module refcount leak in qdisc_create_dflt(), from Eric Dumazet. 11) Fix deadlock in cp_rx_poll() of 8139cp driver, from Gao Feng. 12) Memory leak in rhashtable's alloc_bucket_locks(), from Eric Dumazet. 13) Add new device ID to alx driver, from Owen Lin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits) Add Killer E2500 device ID in alx driver. net: smc91x: fix SMC accesses Documentation: networking: dsa: Remove platform device TODO net/mlx5: Increase number of ethtool steering priorities net/mlx5: Add error prints when validate ETS failed net/mlx5e: Fix memory leak if refreshing TIRs fails net/mlx5e: Add ethtool counter for TX xmit_more net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ net/mlx5e: Don't wait for SQ completions on close net/mlx5e: Don't post fragmented MPWQE when RQ is disabled net/mlx5e: Don't wait for RQ completions on close net/mlx5e: Limit UMR length to the device's limitation rhashtable: fix a memory leak in alloc_bucket_locks() sfc: fix potential stack corruption from running past stat bitmask team: loadbalance: push lacpdus to exact delivery net: hns: dereference ppe_cb->ppe_common_cb if it is non-null 8139cp: Fix one possible deadloop in cp_rx_poll i40e: Change some init flow for the client Revert "phy: IRQ cannot be shared" net: dsa: bcm_sf2: Fix race condition while unmasking interrupts ...
2016-08-29net: smc91x: fix SMC accessesRussell King1-0/+10
Commit b70661c70830 ("net: smc91x: use run-time configuration on all ARM machines") broke some ARM platforms through several mistakes. Firstly, the access size must correspond to the following rule: (a) at least one of 16-bit or 8-bit access size must be supported (b) 32-bit accesses are optional, and may be enabled in addition to the above. Secondly, it provides no emulation of 16-bit accesses, instead blindly making 16-bit accesses even when the platform specifies that only 8-bit is supported. Reorganise smc91x.h so we can make use of the existing 16-bit access emulation already provided - if 16-bit accesses are supported, use 16-bit accesses directly, otherwise if 8-bit accesses are supported, use the provided 16-bit access emulation. If neither, BUG(). This exactly reflects the driver behaviour prior to the commit being fixed. Since the conversion incorrectly cut down the available access sizes on several platforms, we also need to go through every platform and fix up the overly-restrictive access size: Arnd assumed that if a platform can perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access size needed to be specified - not so, all available access sizes must be specified. This likely fixes some performance regressions in doing this: if a platform does not support 8-bit accesses, 8-bit accesses have been emulated by performing a 16-bit read-modify-write access. Tested on the Intel Assabet/Neponset platform, which supports only 8-bit accesses, which was broken by the original commit. Fixes: b70661c70830 ("net: smc91x: use run-time configuration on all ARM machines") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-29iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystemsChristoph Hellwig1-1/+7
Filesystems like XFS that use extents should not set the FIEMAP_EXTENT_MERGED flag in the fiemap extent structures. To allow for both behaviors for the upcoming gfs2 usage split the iomap type field into type and flags, and only set FIEMAP_EXTENT_MERGED if the IOMAP_F_MERGED flag is set. The flags field will also come in handy for future features such as shared extents on reflink-enabled file systems. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-29Merge tag 'drm-fixes-for-4.8-rc4' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-0/+2
Pull drm fixes from Dave Airlie: "A bunch of fixes covering i915, amdgpu, one tegra and some core DRM ones. Nothing too strange at this point" * tag 'drm-fixes-for-4.8-rc4' of git://people.freedesktop.org/~airlied/linux: (21 commits) drm/atomic: Don't potentially reset color_mgmt_changed on successive property updates. drm: Protect fb_defio in drivers with CONFIG_KMS_FBDEV_EMULATION drm/amdgpu: skip TV/CV in display parsing drm/amdgpu: avoid a possible array overflow drm/amdgpu: fix lru size grouping v2 drm/tegra: dsi: Enhance runtime power management drm/i915: Fix botched merge that downgrades CSR versions. drm/i915/skl: Ensure pipes with changed wms get added to the state drm/i915/gen9: Only copy WM results for changed pipes to skl_hw drm/i915/skl: Add support for the SAGV, fix underrun hangs drm/i915/gen6+: Interpret mailbox error flags drm/i915: Reattach comment, complete type specification drm/i915: Unconditionally flush any chipset buffers before execbuf drm/i915/gen9: Drop invalid WARN() during data rate calculation drm/i915/gen9: Initialize intel_state->active_crtcs during WM sanitization (v2) drm: Reject page_flip for !DRIVER_MODESET drm/amdgpu: fix timeout value check in amd_sched_job_recovery drm/amdgpu: fix sdma_v2_4_ring_test_ib drm/amdgpu: fix amdgpu_move_blit on 32bit systems drm/radeon: fix radeon_move_blit on 32bit systems ...
2016-08-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull KVM fixes from Paolo Bonzini: "ARM: - fixes for ITS init issues, error handling, IRQ leakage, race conditions - an erratum workaround for timers - some removal of misleading use of errors and comments - a fix for GICv3 on 32-bit guests MIPS: - fix for where the guest could wrongly map the first page of physical memory x86: - nested virtualization fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: MIPS: KVM: Check for pfn noslot case kvm: nVMX: fix nested tsc scaling KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC arm64: KVM: report configured SRE value to 32-bit world arm64: KVM: remove misleading comment on pmu status KVM: arm/arm64: timer: Workaround misconfigured timer interrupt arm64: Document workaround for Cortex-A72 erratum #853709 KVM: arm/arm64: Change misleading use of is_error_pfn KVM: arm64: ITS: avoid re-mapping LPIs KVM: arm64: check for ITS device on MSI injection KVM: arm64: ITS: move ITS registration into first VCPU run KVM: arm64: vgic-its: Make updates to propbaser/pendbaser atomic KVM: arm64: vgic-its: Plug race in vgic_put_irq KVM: arm64: vgic-its: Handle errors from vgic_add_lpi KVM: arm64: ITS: return 1 on successful MSI injection
2016-08-27Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-2/+8
Merge fixes from Andrew Morton: "11 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: silently skip readahead for DAX inodes dax: fix device-dax region base fs/seq_file: fix out-of-bounds read mm: memcontrol: avoid unused function warning mm: clarify COMPACTION Kconfig text treewide: replace config_enabled() with IS_ENABLED() (2nd round) printk: fix parsing of "brl=" option soft_dirty: fix soft_dirty during THP split sysctl: handle error writing UINT_MAX to u32 fields get_maintainer: quiet noisy implicit -f vcs_file_exists checking byteswap: don't use __builtin_bswap*() with sparse
2016-08-27Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2-4/+12
Pull block fixes from Jens Axboe: "Here's a set of block fixes for the current 4.8-rc release. This contains: - a fix for a secure erase regression, from Adrian. - a fix for an mmc use-after-free bug regression, also from Adrian. - potential zero pointer deference in bdev freezing, from Andrey. - a race fix for blk_set_queue_dying() from Bart. - a set of xen blkfront fixes from Bob Liu. - three small fixes for bcache, from Eric and Kent. - a fix for a potential invalid NVMe state transition, from Gabriel. - blk-mq CPU offline fix, preventing us from issuing and completing a request on the wrong queue. From me. - revert two previous floppy changes, since they caused a user visibile regression. A better fix is in the works. - ensure that we don't send down bios that have more than 256 elements in them. Fixes a crash with bcache, for example. From Ming. - a fix for deferencing an error pointer with cgroup writeback. Fixes a regression. From Vegard" * 'for-linus' of git://git.kernel.dk/linux-block: mmc: fix use-after-free of struct request Revert "floppy: refactor open() flags handling" Revert "floppy: fix open(O_ACCMODE) for ioctl-only open" fs/block_dev: fix potential NULL ptr deref in freeze_bdev() blk-mq: improve warning for running a queue on the wrong CPU blk-mq: don't overwrite rq->mq_ctx block: make sure a big bio is split into at most 256 bvecs nvme: Fix nvme_get/set_features() with a NULL result pointer bdev: fix NULL pointer dereference xen-blkfront: free resources if xlvbd_alloc_gendisk fails xen-blkfront: introduce blkif_set_queue_limits() xen-blkfront: fix places not updated after introducing 64KB page granularity bcache: pr_err: more meaningful error message when nr_stripes is invalid bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two. bcache: register_bcache(): call blkdev_put() when cache_alloc() fails block: Fix race triggered by blk_set_queue_dying() block: Fix secure erase nvme: Prevent controller state invalid transition
2016-08-27Merge tag 'pci-v4.8-fixes-1' of ↵Linus Torvalds1-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Resource management: - Update "pci=resource_alignment" documentation (Mathias Koehrer) MSI: - Use positive flags in pci_alloc_irq_vectors() (Christoph Hellwig) - Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors() (Christoph Hellwig) Intel VMD host bridge driver: - Fix infinite loop executing irq's (Keith Busch)" * tag 'pci-v4.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/PCI: VMD: Fix infinite loop executing irq's PCI: Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors() PCI: Use positive flags in pci_alloc_irq_vectors() PCI: Update "pci=resource_alignment" documentation
2016-08-27sysctl: handle error writing UINT_MAX to u32 fieldsSubash Abhinov Kasiviswanathan1-0/+2
We have scripts which write to certain fields on 3.18 kernels but this seems to be failing on 4.4 kernels. An entry which we write to here is xfrm_aevent_rseqth which is u32. echo 4294967295 > /proc/sys/net/core/xfrm_aevent_rseqth Commit 230633d109e3 ("kernel/sysctl.c: detect overflows when converting to int") prevented writing to sysctl entries when integer overflow occurs. However, this does not apply to unsigned integers. Heinrich suggested that we introduce a new option to handle 64 bit limits and set min as 0 and max as UINT_MAX. This might not work as it leads to issues similar to __do_proc_doulongvec_minmax. Alternatively, we would need to change the datatype of the entry to 64 bit. static int __do_proc_doulongvec_minmax(void *data, struct ctl_table { i = (unsigned long *) data; //This cast is causing to read beyond the size of data (u32) vleft = table->maxlen / sizeof(unsigned long); //vleft is 0 because maxlen is sizeof(u32) which is lesser than sizeof(unsigned long) on x86_64. Introduce a new proc handler proc_douintvec. Individual proc entries will need to be updated to use the new handler. [akpm@linux-foundation.org: coding-style fixes] Fixes: 230633d109e3 ("kernel/sysctl.c:detect overflows when converting to int") Link: http://lkml.kernel.org/r/1471479806-5252-1-git-send-email-subashab@codeaurora.org Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-27byteswap: don't use __builtin_bswap*() with sparseJohannes Berg1-2/+6
Although sparse declares __builtin_bswap*(), it can't actually do constant folding inside them (yet). As such, things like switch (protocol) { case htons(ETH_P_IP): break; } which we do all over the place cause sparse to warn that it expects a constant instead of a function call. Disable __HAVE_BUILTIN_BSWAP*__ if __CHECKER__ is defined to avoid this. Fixes: 7322dd755e7d ("byteswap: try to avoid __builtin_constant_p gcc bug") Link: http://lkml.kernel.org/r/1470914102-26389-1-git-send-email-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-25Merge tag 'drm/tegra/for-4.8-rc4' of ↵Dave Airlie1-0/+2
git://anongit.freedesktop.org/tegra/linux into drm-fixes drm/tegra: Fixes for v4.8-rc4 This contains one fix for DSI runtime power management support that was introduced in v4.8-rc1. This is slightly more elaborate than I would've wished, but there are a few corner cases that needed fixing. * tag 'drm/tegra/for-4.8-rc4' of git://anongit.freedesktop.org/tegra/linux: drm/tegra: dsi: Enhance runtime power management
2016-08-24drm/tegra: dsi: Enhance runtime power managementThierry Reding1-0/+2
The MIPI DSI output on Tegra SoCs requires some external logic to calibrate the MIPI pads before a video signal can be transmitted. This MIPI calibration logic requires to be powered on while the MIPI pads are being used, which is currently done as part of the DSI driver's probe implementation. This is suboptimal because it will leave the MIPI calibration logic powered up even if the DSI output is never used. On Tegra114 and earlier this behaviour also causes the driver to hang while trying to power up the MIPI calibration logic because the power partition that contains the MIPI calibration logic will be powered on by the display controller at output pipeline configuration time. Thus the power up sequence for the MIPI calibration logic happens before it's power partition is guaranteed to be enabled. Fix this by splitting up the API into a request/free pair of functions that manage the runtime dependency between the DSI and the calibration modules (no registers are accessed) and a set of enable, calibrate and disable functions that program the MIPI calibration logic at points in time where the power partition is really enabled. While at it, make sure that the runtime power management also works in ganged mode, which is currently also broken. Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Tested-by: Jonathan Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2016-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-2/+2
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter updates for your net tree, they are: 1) Dump only conntrack that belong to this namespace via /proc file. This is some fallout from the conversion to single conntrack table for all netns, patch from Liping Zhang. 2) Missing MODULE_ALIAS_NF_LOGGER() for the ARP family that prevents module autoloading, also from Liping Zhang. 3) Report overquota event to the right netnamespace, again from Liping. 4) Fix tproxy listener sk refcount that leads to crash, from Eric Dumazet. 5) Fix racy refcounting on object deletion from nfnetlink and rule removal both for nfacct and cttimeout, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two lockless_dereference() related fixes" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/barriers: Suppress sparse warnings in lockless_dereference() Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"
2016-08-18locking/barriers: Suppress sparse warnings in lockless_dereference()Johannes Berg1-3/+3
After Peter's commit: 331b6d8c7afc ("locking/barriers: Validate lockless_dereference() is used on a pointer type") ... we get a lot of sparse warnings (one for every rcu_dereference, and more) since the expression here is assigning to the wrong address space. Instead of validating that 'p' is a pointer this way, instead make it fail compilation when it's not by using sizeof(*(p)). This will not cause any sparse warnings (tested, likely since the address space is irrelevant for sizeof), and will fail compilation when 'p' isn't a pointer type. Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 331b6d8c7afc ("locking/barriers: Validate lockless_dereference() is used on a pointer type") Link: http://lkml.kernel.org/r/1470909022-687-2-git-send-email-johannes@sipsolutions.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-18Merge tag 'kvm-arm-for-v4.8-rc3' of ↵Paolo Bonzini1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Fixes for v4.8-rc3 This tag contains the following fixes on top of v4.8-rc1: - ITS init issues - ITS error handling issues - ITS IRQ leakage fix - Plug a couple of ITS race conditions - An erratum workaround for timers - Some removal of misleading use of errors and comments - A fix for GICv3 on 32-bit guests
2016-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds4-85/+42
Pull networking fixes from David Miller: 1) Buffers powersave frame test is reversed in cfg80211, fix from Felix Fietkau. 2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme. 3) Fix some tg3 ethtool logic bugs, and one that would cause no interrupts to be generated when rx-coalescing is set to 0. From Satish Baddipadige and Siva Reddy Kallam. 4) QLCNIC mailbox corruption and napi budget handling fix from Manish Chopra. 5) Fix fib_trie logic when walking the trie during /proc/net/route output than can access a stale node pointer. From David Forster. 6) Several sctp_diag fixes from Phil Sutter. 7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel. 8) Checksum fixup fixes in bpf from Daniel Borkmann. 9) Memork leaks in nfnetlink, from Liping Zhang. 10) Use after free in rxrpc, from David Howells. 11) Use after free in new skb_array code of macvtap driver, from Jason Wang. 12) Calipso resource leak, from Colin Ian King. 13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang. 14) Fix bpf non-linear packet write helpers, from Daniel Borkmann. 15) Fix lockdep splats in macsec, from Sabrina Dubroca. 16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF handling. 17) Various tc-action bug fixes, from CONG Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net_sched: allow flushing tc police actions net_sched: unify the init logic for act_police net_sched: convert tcf_exts from list to pointer array net_sched: move tc offload macros to pkt_cls.h net_sched: fix a typo in tc_for_each_action() net_sched: remove an unnecessary list_del() net_sched: remove the leftover cleanup_a() mlxsw: spectrum: Allow packets to be trapped from any PG mlxsw: spectrum: Unmap 802.1Q FID before destroying it mlxsw: spectrum: Add missing rollbacks in error path mlxsw: reg: Fix missing op field fill-up mlxsw: spectrum: Trap loop-backed packets mlxsw: spectrum: Add missing packet traps mlxsw: spectrum: Mark port as active before registering it mlxsw: spectrum: Create PVID vPort before registering netdevice mlxsw: spectrum: Remove redundant errors from the code mlxsw: spectrum: Don't return upon error in removal path i40e: check for and deal with non-contiguous TCs ixgbe: Re-enable ability to toggle VLAN filtering ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths ...
2016-08-18netfilter: nfnetlink_acct: report overquota to the right netnsLiping Zhang1-2/+2
We should report the over quota message to the right net namespace instead of the init netns. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-16PCI: Use positive flags in pci_alloc_irq_vectors()Christoph Hellwig1-4/+6
Instead of passing negative flags like PCI_IRQ_NOMSI to prevent use of certain interrupt types, pass positive flags like PCI_IRQ_LEGACY, PCI_IRQ_MSI, etc., to specify the acceptable interrupt types. This is based on a number of pending driver conversions that just happend to be a whole more obvious to read this way, and given that we have no users in the tree yet it can still easily be done. I've also added a PCI_IRQ_ALL_TYPES catchall to keep the case of accepting all interrupt types very simple. [bhelgaas: changelog, fix PCI_IRQ_AFFINITY doc typo, remove mention of PCI_IRQ_NOLEGACY] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alexander Gordeev <agordeev@redhat.com>
2016-08-16block: Fix secure eraseAdrian Hunter2-4/+12
Commit 288dab8a35a0 ("block: add a separate operation type for secure erase") split REQ_OP_SECURE_ERASE from REQ_OP_DISCARD without considering all the places REQ_OP_DISCARD was being used to mean either. Fix those. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: 288dab8a35a0 ("block: add a separate operation type for secure erase") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-14net: remove type_check from dev_get_nest_level()Sabrina Dubroca1-2/+1
The idea for type_check in dev_get_nest_level() was to count the number of nested devices of the same type (currently, only macvlan or vlan devices). This prevented the false positive lockdep warning on configurations such as: eth0 <--- macvlan0 <--- vlan0 <--- macvlan1 However, this doesn't prevent a warning on a configuration such as: eth0 <--- macvlan0 <--- vlan0 eth1 <--- vlan1 <--- macvlan1 In this case, all the locks end up with a nesting subclass of 1, so lockdep thinks that there is still a deadlock: - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then take (vlan_netdev_xmit_lock_key, 1) - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then take (macvlan_netdev_addr_lock_key, 1) By removing the linktype check in dev_get_nest_level() and always incrementing the nesting depth, lockdep considers this configuration valid. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+12
Pull KVM fixes from Radim Krčmář: "KVM: - lock kvm_device list to prevent corruption on device creation. PPC: - split debugfs initialization from creation of the xics device to unlock the newly taken kvm lock earlier. s390: - prevent userspace from triggering two WARN_ON_ONCE. MIPS: - fix several issues in the management of TLB faults (Cc: stable)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: MIPS: KVM: Propagate kseg0/mapped tlb fault errors MIPS: KVM: Fix gfn range check in kseg0 tlb faults MIPS: KVM: Add missing gfn range check MIPS: KVM: Fix mapped fault broken commpage handling KVM: Protect device ops->create and list_add with kvm->lock KVM: PPC: Move xics_debugfs_init out of create KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed KVM: s390: set the prefix initially properly
2016-08-13Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+2
Pull block fixes from Jens Axboe: - an NVMe fix from Gabriel, fixing a suspend/resume issue on some setups - addition of a few missing entries in the block queue sysfs documentation, from Joe - a fix for a sparse shadow warning for the bvec iterator, from Johannes - a writeback deadlock involving raid issuing barriers, and not flushing the plug when we wakeup the flusher threads. From Konstantin - a set of patches for the NVMe target/loop/rdma code, from Roland and Sagi * 'for-linus' of git://git.kernel.dk/linux-block: bvec: avoid variable shadowing warning doc: update block/queue-sysfs.txt entries nvme: Suspend all queues before deletion mm, writeback: flush plugged IO in wakeup_flusher_threads() nvme-rdma: Remove unused includes nvme-rdma: start async event handler after reconnecting to a controller nvmet: Fix controller serial number inconsistency nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads nvmet-rdma: Correctly handle RDMA device hot removal nvme-rdma: Make sure to shutdown the controller if we can nvme-loop: Remove duplicate call to nvme_remove_namespaces nvme-rdma: Free the I/O tags when we delete the controller nvme-rdma: Remove duplicate call to nvme_remove_namespaces nvme-rdma: Fix device removal handling nvme-rdma: Queue ns scanning after a sucessful reconnection nvme-rdma: Don't leak uninitialized memory in connect request private data
2016-08-12Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, plus two uncore-PMU fixes, an uprobes fix, a perf-cgroups fix and an AUX events fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add enable_box for client MSR uncore perf/x86/intel/uncore: Fix uncore num_counters uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions perf/core: Set cgroup in CPU contexts for new cgroup events perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash perf probe ppc64le: Fix probe location when using DWARF perf probe: Add function to post process kernel trace events tools: Sync cpufeatures headers with the kernel toops: Sync tools/include/uapi/linux/bpf.h with the kernel tools: Sync cpufeatures.h and vmx.h with the kernel perf probe: Support signedness casting perf stat: Avoid skew when reading events perf probe: Fix module name matching perf probe: Adjust map->reloc offset when finding kernel symbol from map perf hists: Trim libtraceevent trace_seq buffers perf script: Add 'bpf-output' field to usage message
2016-08-12Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "A fix for an MSI regression" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi: Make sure PCI MSIs are activated early
2016-08-12Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2-1/+4
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user needs multiple different security services (e.g. krb5i and krb5p). - Stable patch to fix a regression introduced by the use of SO_REUSEPORT, and that prevented the use of multiple different NFS versions to the same server. - TCP socket reconnection timer fixes. - Patch from Neil to disable the use of IPv6 temporary addresses" * tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Cap the transport reconnection timer at 1/2 lease period NFSv4: Cleanup the setting of the nfs4 lease period SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout SUNRPC: Fix reconnection timeouts NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED SUNRPC: disable the use of IPv6 temporary addresses. SUNRPC: allow for upcalls for same uid but different gss service SUNRPC: Fix up socket autodisconnect SUNRPC: Handle EADDRNOTAVAIL on connection failures
2016-08-12KVM: Protect device ops->create and list_add with kvm->lockChristoffer Dall1-0/+6
KVM devices were manipulating list data structures without any form of synchronization, and some implementations of the create operations also suffered from a lack of synchronization. Now when we've split the xics create operation into create and init, we can hold the kvm->lock mutex while calling the create operation and when manipulating the devices list. The error path in the generic code gets slightly ugly because we have to take the mutex again and delete the device from the list, but holding the mutex during anon_inode_getfd or releasing/locking the mutex in the common non-error path seemed wrong. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-12KVM: PPC: Move xics_debugfs_init out of createChristoffer Dall1-0/+6
As we are about to hold the kvm->lock during the create operation on KVM devices, we should move the call to xics_debugfs_init into its own function, since holding a mutex over extended amounts of time might not be a good idea. Introduce an init operation on the kvm_device_ops struct which cannot fail and call this, if configured, after the device has been created. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-11bvec: avoid variable shadowing warningJohannes Berg1-1/+2
Due to the (indirect) nesting of min(..., min(...)), sparse will show a variable shadowing warning whenever bvec.h is included. Avoid that by assigning the inner min() to a temporary variable first. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-10perf/core: Set cgroup in CPU contexts for new cgroup eventsDavid Carrillo-Cisneros1-0/+4
There's a perf stat bug easy to observer on a machine with only one cgroup: $ perf stat -e cycles -I 1000 -C 0 -G / # time counts unit events 1.000161699 <not counted> cycles / 2.000355591 <not counted> cycles / 3.000565154 <not counted> cycles / 4.000951350 <not counted> cycles / We'd expect some output there. The underlying problem is that there is an optimization in perf_cgroup_sched_{in,out}() that skips the switch of cgroup events if the old and new cgroups in a task switch are the same. This optimization interacts with the current code in two ways that cause a CPU context's cgroup (cpuctx->cgrp) to be NULL even if a cgroup event matches the current task. These are: 1. On creation of the first cgroup event in a CPU: In current code, cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the aforesaid optimization, perf_cgroup_sched_in will run until the next cgroup switches in that CPU. This may happen late or never happen, depending on system's number of cgroups, CPU load, etc. 2. On deletion of the last cgroup event in a cpuctx: In list_del_event, cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in because cpuctx->cgrp == NULL until a cgroup switch occurs and perf_cgroup_sched_in is executed (updating cpuctx->cgrp). This patch fixes both problems by setting cpuctx->cgrp in list_add_event, mirroring what list_del_event does when removing a cgroup event from CPU context, as introduced in: commit 68cacd29167b ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()") With this patch, cpuctx->cgrp is always set/clear when installing/removing the first/last cgroup event in/from the CPU context. With cpuctx->cgrp correctly set, event_filter_match works as intended when events are sched in/out. After the fix, the output is as expected: $ perf stat -e cycles -I 1000 -a -G / # time counts unit events 1.004699159 627342882 cycles / 2.007397156 615272690 cycles / 3.010019057 616726074 cycles / Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1470124092-113192-1-git-send-email-davidcc@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-09Revert "printk: create pr_<level> functions"Linus Torvalds1-33/+15
This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142. Geert Uytterhoeven reports: "This change seems to have an (unintendent?) side-effect. Before, pr_*() calls without a trailing newline characters would be printed with a newline character appended, both on the console and in the output of the dmesg command. After this commit, no new line character is appended, and the output of the next pr_*() call of the same type may be appended, like in: - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000 - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM) + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)" Joe Perches says: "No, that is not intentional. The newline handling code inside vprintk_emit is a bit involved and for now I suggest a revert until this has all the same behavior as earlier" Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Requested-by: Joe Perches <joe@perches.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09KVM: arm64: ITS: return 1 on successful MSI injectionAndre Przywara1-0/+1
According to the KVM API documentation a successful MSI injection should return a value > 0 on success. Return possible errors in vgic_its_trigger_msi() and report a successful injection back to userland, while also reporting the case where the MSI could not be delivered due to the guest not having the LPI mapped, for instance. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-08-09genirq/msi: Make sure PCI MSIs are activated earlyMarc Zyngier1-0/+2
Bharat Kumar Gogada reported issues with the generic MSI code, where the end-point ended up with garbage in its MSI configuration (both for the vector and the message). It turns out that the two MSI paths in the kernel are doing slightly different things: generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI And it turns out that end-points are allowed to latch the content of the MSI configuration registers as soon as MSIs are enabled. In Bharat's case, the end-point ends up using whatever was there already, which is not what you want. In order to make things converge, we introduce a new MSI domain flag (MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set, this flag forces the programming of the end-point as soon as the MSIs are allocated. A consequence of this is that we have an extra activate in irq_startup, but that should be without much consequence. tglx: - Several people reported a VMWare regression with PCI/MSI-X passthrough. It turns out that the patch also cures that issue. - We need to have a look at the MSI disable interrupt path, where we write the msg to all zeros without disabling MSI in the PCI device. Is that correct? Fixes: 52f518a3a7c2 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts" Reported-and-tested-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com> Reported-and-tested-by: Foster Snowhill <forst@forstwoof.ru> Reported-by: Matthias Prager <linux@matthiasprager.de> Reported-by: Jason Taylor <jason.taylor@simplivity.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1468426713-31431-1-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-09qed: Add dcbx app support for IEEE Selection Field.Sudarsana Reddy Kalluru1-0/+8
MFW now supports the Selection field for IEEE mode. Add driver changes to use the newer MFW masks to read/write the port-id value. Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-09Merge tag 'usercopy-v4.8' of ↵Linus Torvalds3-0/+38
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull usercopy protection from Kees Cook: "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and copy_from_user bounds checking for most architectures on SLAB and SLUB" * tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: mm: SLUB hardened usercopy support mm: SLAB hardened usercopy support s390/uaccess: Enable hardened usercopy sparc/uaccess: Enable hardened usercopy powerpc/uaccess: Enable hardened usercopy ia64/uaccess: Enable hardened usercopy arm64/uaccess: Enable hardened usercopy ARM: uaccess: Enable hardened usercopy x86/uaccess: Enable hardened usercopy mm: Hardened usercopy mm: Implement stack frame object validation mm: Add is_migrate_cma_page
2016-08-08bpf: fix checksum fixups on bpf_skb_store_bytesDaniel Borkmann1-19/+33
bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way. Where we ran into an issue with bpf_skb_store_bytes() is when we did a single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The underlying issue has been tracked down to a buffer alignment issue. Meaning, that csum_partial() computations via skb_postpull_rcsum() and skb_postpush_rcsum() pair invoked had a wrong result since they operated on an odd address for the hoplimit, while other computations were done on an even address. This mix doesn't work as-is with skb_postpull_rcsum(), skb_postpush_rcsum() pair as it always expects at least half-word alignment of input buffers, which is normally the case. Thus, instead of these helpers using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(), csum_block_add(), respectively. For unaligned offsets, they rotate the sum to align it to a half-word boundary again, otherwise they work the same as csum_sub() and csum_add(). Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers use a 0 constant for offset so that the compiler optimizes the offset & 1 test away and generates the same code as with csum_sub()/_add(). Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08unsafe_[get|put]_user: change interface to use a error target labelLinus Torvalds1-2/+2
When I initially added the unsafe_[get|put]_user() helpers in commit 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses"), I made the mistake of modeling the interface on our traditional __[get|put]_user() functions, which return zero on success, or -EFAULT on failure. That interface is fairly easy to use, but it's actually fairly nasty for good code generation, since it essentially forces the caller to check the error value for each access. In particular, since the error handling is already internally implemented with an exception handler, and we already use "asm goto" for various other things, we could fairly easily make the error cases just jump directly to an error label instead, and avoid the need for explicit checking after each operation. So switch the interface to pass in an error label, rather than checking the error value in the caller. Best do it now before we start growing more users (the signal handling code in particular would be a good place to use the new interface). So rather than if (unsafe_get_user(x, ptr)) ... handle error .. the interface is now unsafe_get_user(x, ptr, label); where an error during the user mode fetch will now just cause a jump to 'label' in the caller. Right now the actual _implementation_ of this all still ends up being a "if (err) goto label", and does not take advantage of any exception label tricks, but for "unsafe_put_user()" in particular it should be fairly straightforward to convert to using the exception table model. Note that "unsafe_get_user()" is much harder to convert to a clever exception table model, because current versions of gcc do not allow the use of "asm goto" (for the exception) with output values (for the actual value to be fetched). But that is hopefully not a limitation in the long term. [ Also note that it might be a good idea to switch unsafe_get_user() to actually _return_ the value it fetches from user space, but this commit only changes the error handling semantics ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-08sctp: Export struct sctp_info to userspacePhil Sutter1-64/+0
This is required to correctly interpret INET_DIAG_INFO messages exported by sctp_diag module. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-07block: rename bio bi_rw to bi_opfJens Axboe3-11/+12
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07block/mm: make bdev_ops->rw_page() take a bool for read/writeJens Axboe4-15/+14
Commit abf545484d31 changed it from an 'rw' flags type to the newer ops based interface, but now we're effectively leaking some bdev internals to the rest of the kernel. Since we only care about whether it's a read or a write at that level, just pass in a bool 'is_write' parameter instead. Then we can also move op_is_write() and friends back under CONFIG_BLOCK protection. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>