summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2017-05-10Merge tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds7-18/+36
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - Fix use after free in write error path - Use GFP_NOIO for two allocations in writeback - Fix a hang in OPEN related to server reboot - Check the result of nfs4_pnfs_ds_connect - Fix an rcu lock leak Features: - Removal of the unmaintained and unused OSD pNFS layout - Cleanup and removal of lots of unnecessary dprintk()s - Cleanup and removal of some memory failure paths now that GFP_NOFS is guaranteed to never fail. - Remove the v3-only data server limitation on pNFS/flexfiles Bugfixes: - RPC/RDMA connection handling bugfixes - Copy offload: fixes to ensure the copied data is COMMITed to disk. - Readdir: switch back to using the ->iterate VFS interface - File locking fixes from Ben Coddington - Various use-after-free and deadlock issues in pNFS - Write path bugfixes" * tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (89 commits) pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled NFSv4.1: Work around a Linux server bug... NFS append COMMIT after synchronous COPY NFSv4: Fix exclusive create attributes encoding NFSv4: Fix an rcu lock leak nfs: use kmap/kunmap directly NFS: always treat the invocation of nfs_getattr as cache hit when noac is on Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commits pNFS: Fix a typo in pnfs_generic_alloc_ds_commits pNFS: Fix a deadlock when coalescing writes and returning the layout pNFS: Don't clear the layout return info if there are segments to return pNFS: Ensure we commit the layout if it has been invalidated pNFS: Don't send COMMITs to the DSes if the server invalidated our layout pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path pNFS: Ensure we check layout validity before marking it for return NFS4.1 handle interrupted slot reuse from ERR_DELAY NFSv4: check return value of xdr_inline_decode nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout() ...
2017-05-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds4-11/+93
Pull virtio updates from Michael Tsirkin: "Fixes, cleanups, performance A bunch of changes to virtio, most affecting virtio net. Also ptr_ring batched zeroing - first of batching enhancements that seems ready." * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: s390/virtio: change maintainership tools/virtio: fix spelling mistake: "wakeus" -> "wakeups" virtio_net: tidy a couple debug statements ptr_ring: support testing different batching sizes ringtest: support test specific parameters ptr_ring: batch ring zeroing virtio: virtio_driver doc virtio_net: don't reset twice on XDP on/off virtio_net: fix support for small rings virtio_net: reduce alignment for buffers virtio_net: rework mergeable buffer handling virtio_net: allow specifying context for rx virtio: allow extra context per descriptor tools/virtio: fix build breakage virtio: add context flag to find vqs virtio: wrap find_vqs ringtest: fix an assert statement
2017-05-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-1/+25
Pull more KVM updates from Paolo Bonzini: "ARM: - bugfixes - moved shared 32-bit/64-bit files to virt/kvm/arm - support for saving/restoring virtual ITS state to userspace PPC: - XIVE (eXternal Interrupt Virtualization Engine) support x86: - nVMX improvements, including emulated page modification logging (PML) which brings nice performance improvements on some workloads" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits) KVM: arm/arm64: vgic-its: Cleanup after failed ITT restore KVM: arm/arm64: Don't call map_resources when restoring ITS tables KVM: arm/arm64: Register ITS iodev when setting base address KVM: arm/arm64: Get rid of its->initialized field KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs KVM: arm/arm64: Slightly rework kvm_vgic_addr KVM: arm/arm64: Make vgic_v3_check_base more broadly usable KVM: arm/arm64: Refactor vgic_register_redist_iodevs KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus nVMX: Advertise PML to L1 hypervisor nVMX: Implement emulated Page Modification Logging kvm: x86: Add a hook for arch specific dirty logging emulation kvm: nVMX: Validate CR3 target count on nested VM-entry KVM: set no_llseek in stat_fops_per_vm KVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enable KVM: arm/arm64: Clarification and relaxation to ITS save/restore ABI KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES KVM: arm64: vgic-its: Fix pending table sync KVM: arm64: vgic-its: ITT save and restore KVM: arm64: vgic-its: Device table save/restore ...
2017-05-10Merge tag 'armsoc-tee' of ↵Linus Torvalds1-0/+277
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull TEE driver infrastructure and OP-TEE drivers from Arnd Bergmann: "This introduces a generic TEE framework in the kernel, to handle trusted environemtns (security coprocessor or software implementations such as OP-TEE/TrustZone). I'm sending it separately from the other arm-soc driver changes to give it a little more visibility, once the subsystem is merged, we will likely keep this in the arm₋soc drivers branch or have the maintainers submit pull requests directly, depending on the patch volume. I have reviewed earlier versions in the past, and have reviewed the latest version in person during Linaro Connect BUD17. Here is my overall assessment of the subsystem: - There is clearly demand for this, both for the generic infrastructure and the specific OP-TEE implementation. - The code has gone through a large number of reviews, and the review comments have all been addressed, but the reviews were not coming up with serious issues any more and nobody volunteered to vouch for the quality. - The user space ioctl interface is sufficient to work with the OP-TEE driver, and it should in principle work with other TEE implementations that follow the GlobalPlatform[1] standards, but it might need to be extended in minor ways depending on specific requirements of future TEE implementations - The main downside of the API to me is how the user space is tied to the TEE implementation in hardware or firmware, but uses a generic way to communicate with it. This seems to be an inherent problem with what it is trying to do, and I could not come up with any better solution than what is implemented here. For a detailed history of the patch series, see https://lkml.org/lkml/2017/3/10/1277" * tag 'armsoc-tee' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm64: dt: hikey: Add optee node Documentation: tee subsystem and op-tee driver tee: add OP-TEE driver tee: generic TEE subsystem dt/bindings: add bindings for optee
2017-05-10Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds14-95/+601
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ...
2017-05-10Merge tag 'pm-extra-4.12-rc1' of ↵Linus Torvalds2-6/+26
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These add new CPU IDs to a couple of drivers, fix a possible NULL pointer dereference in the cpuidle core, update DT-related things in the generic power domains framework and finally update the suspend/resume infrastructure to improve the handling of wakeups from suspend-to-idle. Specifics: - Add Intel Gemini Lake CPU IDs to the intel_idle and intel_rapl drivers (David Box). - Add a NULL pointer check to the cpuidle core to prevent it from crashing on platforms with incomplete cpuidle configuration (Fei Li). - Fix DT-related documentation in the generic power domains (genpd) framework and add a MAINTAINERS entry for DT-related material in genpd (Viresh Kumar). - Update the system suspend/resume infrastructure to improve the handling of aborts of suspend transitions in progress in the wakeup framework and rework the suspend-to-idle core loop to make it possible to filter out spurious wakeup events (specifically the ones coming from ACPI) without resuming all the way up to user space every time (Rafael Wysocki)" * tag 'pm-extra-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle PM / wakeup: Integrate mechanism to abort transitions in progress x86/intel_idle: add Gemini Lake support cpuidle: check dev before usage in cpuidle_use_deepest_state() powercap: intel_rapl: Add support for Gemini Lake PM / Domains: Add DT file to MAINTAINERS PM / Domains: Fix DT example
2017-05-10Merge tag 'ceph-for-4.12-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds7-13/+38
Pull ceph updates from Ilya Dryomov: "The two main items are support for disabling automatic rbd exclusive lock transfers from myself and the long awaited -ENOSPC handling series from Jeff. The former will allow rbd users to take advantage of exclusive lock's built-in blacklist/break-lock functionality while staying in control of who owns the lock. With the latter in place, we will abort filesystem writes on -ENOSPC instead of having them block indefinitely. Beyond that we've got the usual pile of filesystem fixes from Zheng, some refcount_t conversion patches from Elena and a patch for an ancient open() flags handling bug from Alexander" * tag 'ceph-for-4.12-rc1' of git://github.com/ceph/ceph-client: (31 commits) ceph: fix memory leak in __ceph_setxattr() ceph: fix file open flags on ppc64 ceph: choose readdir frag based on previous readdir reply rbd: exclusive map option rbd: return ResponseMessage result from rbd_handle_request_lock() rbd: kill rbd_is_lock_supported() rbd: support updating the lock cookie without releasing the lock rbd: store lock cookie rbd: ignore unlock errors rbd: fix error handling around rbd_init_disk() rbd: move rbd_unregister_watch() call into rbd_dev_image_release() rbd: move rbd_dev_destroy() call out of rbd_dev_image_release() ceph: when seeing write errors on an inode, switch to sync writes Revert "ceph: SetPageError() for writeback pages if writepages fails" ceph: handle epoch barriers in cap messages libceph: add an epoch_barrier field to struct ceph_osd_client libceph: abort already submitted but abortable requests when map or pool goes full libceph: allow requests to return immediately on full conditions if caller wishes libceph: remove req->r_replay_version ceph: make seeky readdir more efficient ...
2017-05-10Merge tag 'dmaengine-4.12-rc1' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds2-64/+21
Pull dmaengine updates from Vinod Koul: "This time again a smaller update consisting of: - support for TI DA8xx dma controller and updates to the cppi driver - updates on bunch of drivers like xilinx, pl08x, stm32-dma, mv_xor, ioat, dmatest" * tag 'dmaengine-4.12-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (35 commits) dmaengine: pl08x: remove lock documentation dmaengine: pl08x: fix pl08x_dma_chan_state documentation dmaengine: pl08x: Use the BIT() macro consistently dmaengine: pl080: Fix some missing kerneldoc dmaengine: pl080: Cut some unused defines dmaengine: dmatest: Add check for supported buffer count (sg_buffers) dmaengine: dmatest: Select DMA_ENGINE_RAID as its needed for the slave_sg test dmaengine: virt-dma: Convert to use list_for_each_entry_safe() dma-debug: use offset_in_page() macro dmaengine: mv_xor: use offset_in_page() macro dmaengine: dmatest: use offset_in_page() macro dmaengine: sun4i: fix invalid argument dmaengine: ioat: use setup_timer dmaengine: cppi41: Fix an Oops happening in cppi41_dma_probe() dmaengine: pl330: remove pdata based initialization dmaengine: cppi: fix build error due to bad variable dmaengine: imx-sdma: add 1ms delay to ensure SDMA channel is stopped dmaengine: cppi41: use managed functions devm_*() dmaengine: cppi41: fix cppi41_dma_tx_status() logic dmaengine: qcom_hidma: pause the channel on shutdown ...
2017-05-10Merge tag 'iommu-updates-v4.12' of ↵Linus Torvalds9-89/+46
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: - code optimizations for the Intel VT-d driver - ability to switch off a previously enabled Intel IOMMU - support for 'struct iommu_device' for OMAP, Rockchip and Mediatek IOMMUs - header optimizations for IOMMU core code headers and a few fixes that became necessary in other parts of the kernel because of that - ACPI/IORT updates and fixes - Exynos IOMMU optimizations - updates for the IOMMU dma-api code to bring it closer to use per-cpu iova caches - new command-line option to set default domain type allocated by the iommu core code - another command line option to allow the Intel IOMMU switched off in a tboot environment - ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using an IDENTITY domain in conjunction with DMA ops, Support for SMR masking, Support for 16-bit ASIDs (was previously broken) - various other small fixes and improvements * tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (63 commits) soc/qbman: Move dma-mapping.h include to qman_priv.h soc/qbman: Fix implicit header dependency now causing build fails iommu: Remove trace-events include from iommu.h iommu: Remove pci.h include from trace/events/iommu.h arm: dma-mapping: Don't override dma_ops in arch_setup_dma_ops() ACPI/IORT: Fix CONFIG_IOMMU_API dependency iommu/vt-d: Don't print the failure message when booting non-kdump kernel iommu: Move report_iommu_fault() to iommu.c iommu: Include device.h in iommu.h x86, iommu/vt-d: Add an option to disable Intel IOMMU force on iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed iommu/arm-smmu: Correct sid to mask iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid() iommu: Make iommu_bus_notifier return NOTIFY_DONE rather than error code omap3isp: Remove iommu_group related code iommu/omap: Add iommu-group support iommu/omap: Make use of 'struct iommu_device' iommu/omap: Store iommu_dev pointer in arch_data iommu/omap: Move data structures to omap-iommu.h iommu/omap: Drop legacy-style device support ...
2017-05-10Merge branches 'pm-domains', 'pm-cpuidle', 'pm-sleep' and 'powercap'Rafael J. Wysocki2-6/+26
* pm-domains: PM / Domains: Add DT file to MAINTAINERS PM / Domains: Fix DT example * pm-cpuidle: x86/intel_idle: add Gemini Lake support cpuidle: check dev before usage in cpuidle_use_deepest_state() * pm-sleep: ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle PM / wakeup: Integrate mechanism to abort transitions in progress * powercap: powercap: intel_rapl: Add support for Gemini Lake
2017-05-09Merge tag 'armsoc-drivers' of ↵Linus Torvalds3-2/+9
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Olof Johansson: "Driver updates for ARM SoCs: Reset subsystem, merged through arm-soc by tradition: - Make bool drivers explicitly non-modular - New support for i.MX7 and Arria10 reset controllers PATA driver for Palmchip BK371 (acked by Tejun) Power domain drivers for i.MX (GPC, GPCv2) - Moved out of mach-imx for GPC - Bunch of tweaks, fixes, etc PMC support for Tegra186 SoC detection support for Renesas RZ/G1H and RZ/G1N Move Tegra flow controller driver from mach directory to drivers/soc - (Power management / CPU power driver) Misc smaller tweaks for other platforms" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (60 commits) soc: pm-domain: Fix the mangled urls soc: renesas: rcar-sysc: Add support for R-Car H3 ES2.0 soc: renesas: rcar-sysc: Add support for fixing up power area tables soc: renesas: Register SoC device early soc: imx: gpc: add workaround for i.MX6QP to the GPC PD driver dt-bindings: imx-gpc: add i.MX6 QuadPlus compatible soc: imx: gpc: add defines for domain index soc: imx: Add GPCv2 power gating driver dt-bindings: Add GPCv2 power gating driver ARM/clk: move the ICST library to drivers/clk ARM: plat-versatile: remove stale clock header ARM: keystone: Drop PM domain support for k2g soc: ti: Add ti_sci_pm_domains driver dt-bindings: Add TI SCI PM Domains PM / Domains: Do not check if simple providers have phandle cells PM / Domains: Add generic data pointer to genpd data struct soc/tegra: Add initial flowctrl support for Tegra132/210 soc/tegra: flowctrl: Add basic platform driver soc/tegra: Move Tegra flowctrl driver ARM: tegra: Remove unnecessary inclusion of flowctrl header ...
2017-05-09Merge branch 'work.misc' of ↵Linus Torvalds4-152/+14
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted bits and pieces from various people. No common topic in this pile, sorry" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/affs: add rename exchange fs/affs: add rename2 to prepare multiple methods Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx() fs: don't set *REFERENCED on single use objects fs: compat: Remove warning from COMPATIBLE_IOCTL remove pointless extern of atime_need_update_rcu() fs: completely ignore unknown open flags fs: add a VALID_OPEN_FLAGS fs: remove _submit_bh() fs: constify tree_descr arrays passed to simple_fill_super() fs: drop duplicate header percpu-rwsem.h fs/affs: bugfix: Write files greater than page size on OFS fs/affs: bugfix: enable writes on OFS disks fs/affs: remove node generation check fs/affs: import amigaffs.h fs/affs: bugfix: make symbolic links work again
2017-05-09ptr_ring: batch ring zeroingMichael S. Tsirkin1-9/+54
A known weakness in ptr_ring design is that it does not handle well the situation when ring is almost full: as entries are consumed they are immediately used again by the producer, so consumer and producer are writing to a shared cache line. To fix this, add batching to consume calls: as entries are consumed do not write NULL into the ring until we get a multiple (in current implementation 2x) of cache lines away from the producer. At that point, write them all out. We do the write out in the reverse order to keep producer from sharing cache with consumer for as long as possible. Writeout also triggers when ring wraps around - there's no special reason to do this but it helps keep the code a bit simpler. What should we do if getting away from producer by 2 cache lines would mean we are keeping the ring moe than half empty? Maybe we should reduce the batching in this case, current patch simply reduces the batching. Notes: - it is no longer true that a call to consume guarantees that the following call to produce will succeed. No users seem to assume that. - batching can also in theory reduce the signalling rate: users that would previously send interrups to the producer to wake it up after consuming each entry would now only need to do this once in a batch. Doing this would be easy by returning a flag to the caller. No users seem to do signalling on consume yet so this was not implemented yet. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2017-05-09virtio: virtio_driver docCornelia Huck1-0/+4
Add comments for the virtio_driver members that were not documented. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09Merge tag 'kvm-arm-for-v4.12-round2' of ↵Paolo Bonzini2-0/+25
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD Second round of KVM/ARM Changes for v4.12. Changes include: - A fix related to the 32-bit idmap stub - A fix to the bitmask used to deode the operands of an AArch32 CP instruction - We have moved the files shared between arch/arm/kvm and arch/arm64/kvm to virt/kvm/arm - We add support for saving/restoring the virtual ITS state to userspace
2017-05-09KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpusChristoffer Dall1-0/+11
There are occasional needs to use the index of vcpu in the kvm->vcpus array to map something related to a VCPU. For example, unlike the vcpu->vcpu_id, the vcpu index is guaranteed to not be sparse across all vcpus which is useful when allocating a memory area for each vcpu. Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2017-05-09Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The main thing here is a new implementation of the in-kernel XICS interrupt controller emulation for POWER9 machines, from Ben Herrenschmidt. POWER9 has a new interrupt controller called XIVE (eXternal Interrupt Virtualization Engine) which is able to deliver interrupts directly to guest virtual CPUs in hardware without hypervisor intervention. With this new code, the guest still sees the old XICS interface but performance is better because the XICS emulation in the host uses the XIVE directly rather than going through a XICS emulation in firmware. Conflicts: arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix] arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
2017-05-09Merge tags 'for-linus' and 'for-next' of ↵Linus Torvalds1-0/+19
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull more rdma updates from Doug Ledford: "As mentioned in my first pull request, this is the subsequent pull requests I had. This is all I have, and in fact this cleans out the RDMA subsystem's entire patchworks queue of kernel changes that are ready to go (well, it did for the weekend anyway, a few new patches are in, but they'll be coming during the -rc cycle). The first tag contains a single patch that would have conflicted if taken from my tree or DaveM's tree as it needed our trees merged to come cleanly. The second tag contains the patch series from Intel plus three other stragllers that came in late last week. I took them because it allowed me to legitimately claim that the RDMA patchworks queue was, for a short time, 100% cleared of all waiting kernel patches, woohoo! :-). I have it under my for-next tag, so it did get 0day and linux- next over the end of last week, and linux-next did show one minor conflict. Summary: 'for-linus' tag: - mlx5/IPoIB fixup patch 'for-next' tag: - the hfi1 15 patch set that landed late - IPoIB get_link_ksettings which landed late because I asked for a respin - one late rxe change - one -rc worthy fix that's in early" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/mlx5: Enable IPoIB acceleration * tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: rxe: expose num_possible_cpus() cnum_comp_vectors IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type IB/hfi1: Clean up on context initialization failure IB/hfi1: Fix an assign/ordering issue with shared context IDs IB/hfi1: Clean up context initialization IB/hfi1: Correctly clear the pkey IB/hfi1: Search shared contexts on the opened device, not all devices IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit IB/hfi1: Use filedata rather than filepointer IB/hfi1: Name function prototype parameters IB/hfi1: Fix a subcontext memory leak IB/hfi1: Return an error on memory allocation failure IB/hfi1: Adjust default eager_buffer_size to 8MB IB/hfi1: Get rid of divide when setting the tx request header IB/hfi1: Fix yield logic in send engine IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line IB/hfi1: Fix checks for Offline transient state IB/ipoib: add get_link_ksettings in ethtool
2017-05-09Merge tag 'pci-v4.12-changes' of ↵Linus Torvalds11-43/+441
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add framework for supporting PCIe devices in Endpoint mode (Kishon Vijay Abraham I) - use non-postable PCI config space mappings when possible (Lorenzo Pieralisi) - clean up and unify mmap of PCI BARs (David Woodhouse) - export and unify Function Level Reset support (Christoph Hellwig) - avoid FLR for Intel 82579 NICs (Sasha Neftin) - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig) - short-circuit config access failures for disconnected devices (Keith Busch) - remove D3 sleep delay when possible (Adrian Hunter) - freeze PME scan before suspending devices (Lukas Wunner) - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava) - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann) - add arch-specific alignment control to improve device passthrough by avoiding multiple BARs in a page (Yongji Xie) - add sysfs sriov_drivers_autoprobe to control VF driver binding (Bodong Wang) - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas) - fix crashes when unbinding host controllers that don't support removal (Brian Norris) - add driver for MicroSemi Switchtec management interface (Logan Gunthorpe) - add driver for Faraday Technology FTPCI100 host bridge (Linus Walleij) - add i.MX7D support (Andrey Smirnov) - use generic MSI support for Aardvark (Thomas Petazzoni) - make Rockchip driver modular (Brian Norris) - advertise 128-byte Read Completion Boundary support for Rockchip (Shawn Lin) - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin) - convert atomic_t to refcount_t in HV driver (Elena Reshetova) - add CPU IRQ affinity in HV driver (K. Y. Srinivasan) - fix PCI bus removal in HV driver (Long Li) - add support for ThunderX2 DMA alias topology (Jayachandran C) - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki) - add ITE 8893 bridge DMA alias quirk (Jarod Wilson) - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices (Manish Jaggi) * tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits) PCI: Don't allow unbinding host controllers that aren't prepared ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP MAINTAINERS: Add PCI Endpoint maintainer Documentation: PCI: Add userguide for PCI endpoint test function tools: PCI: Add sample test script to invoke pcitest tools: PCI: Add a userspace tool to test PCI endpoint Documentation: misc-devices: Add Documentation for pci-endpoint-test driver misc: Add host side PCI driver for PCI test function device PCI: Add device IDs for DRA74x and DRA72x dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access PCI: dwc: dra7xx: Workaround for errata id i870 dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode PCI: dwc: dra7xx: Add EP mode support PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently dt-bindings: PCI: Add DT bindings for PCI designware EP mode PCI: dwc: designware: Add EP mode support Documentation: PCI: Add binding documentation for pci-test endpoint function ixgbe: Use pcie_flr() instead of duplicating it IB/hfi1: Use pcie_flr() instead of duplicating it PCI: imx6: Fix spelling mistake: "contol" -> "control" ...
2017-05-09Merge tag 'tty-4.12-rc1' of ↵Linus Torvalds5-176/+30
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here is the "big" TTY/Serial patch updates for 4.12-rc1 Not a lot of new things here, the normal number of serial driver updates and additions, tiny bugs fixed, and some core files split up to make future changes a bit easier for Nicolas's "tiny-tty" work. All of these have been in linux-next for a while" * tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (62 commits) serial: small Makefile reordering tty: split job control support into a file of its own tty: move baudrate handling code to a file of its own console: move console_init() out of tty_io.c serial: 8250_early: Add earlycon support for Palmchip UART tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44 vt: make mouse selection of non-ASCII consistent vt: set mouse selection word-chars to gpm's default imx-serial: Reduce RX DMA startup latency when opening for reading serial: omap: suspend device on probe errors serial: omap: fix runtime-pm handling on unbind tty: serial: omap: add UPF_BOOT_AUTOCONF flag for DT init serial: samsung: Remove useless spinlock serial: samsung: Add missing checks for dma_map_single failure serial: samsung: Use right device for DMA-mapping calls serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off tty: fix comment typo s/repsonsible/responsible/ tty: amba-pl011: Fix spurious TX interrupts serial: xuartps: Enable clocks in the pm disable case also serial: core: Re-use struct uart_port {name} field ...
2017-05-09Merge branch 'akpm' (patches from Andrew)Linus Torvalds24-100/+157
Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...
2017-05-09mm: introduce memalloc_noreclaim_{save,restore}Vlastimil Babka1-0/+12
The previous patch ("mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC") has shown that simply setting and clearing PF_MEMALLOC in current->flags can result in wrongly clearing a pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim. Let's introduce helpers that support proper nesting by saving the previous stat of the flag, similar to the existing memalloc_noio_* and memalloc_nofs_* helpers. Convert existing setting/clearing of PF_MEMALLOC within mm to the new helpers. There are no known issues with the converted code, but the change makes it more robust. Link: http://lkml.kernel.org/r/20170405074700.29871-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Boris Brezillon <boris.brezillon@free-electrons.com> Cc: Chris Leech <cleech@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Lee Duncan <lduncan@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09time: delete CURRENT_TIME_SEC and CURRENT_TIMEDeepa Dinamani1-3/+0
All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have been replaced by other time functions. These macros are also not y2038 safe. And, all their use cases can be fulfilled by y2038 safe ktime_get_* variants. Link: http://lkml.kernel.org/r/1491613030-11599-12-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flagTetsuo Handa1-3/+2
Commit afddba49d18f ("fs: introduce write_begin, write_end, and perform_write aops") introduced AOP_FLAG_UNINTERRUPTIBLE flag which was checked in pagecache_write_begin(), but that check was removed by 4e02ed4b4a2f ("fs: remove prepare_write/commit_write"). Between these two commits, commit d9414774dc0c ("cifs: Convert cifs to new aops.") added a check in cifs_write_begin(), but that check was soon removed by commit a98ee8c1c707 ("[CIFS] fix regression in cifs_write_begin/cifs_write_end"). Therefore, AOP_FLAG_UNINTERRUPTIBLE flag is checked nowhere. Let's remove this flag. This patch has no functionality changes. Link: http://lkml.kernel.org/r/1489294781-53494-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Nick Piggin <npiggin@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09include/linux/uaccess.h: remove expensive WARN_ON in pagefault_disabled_decAndi Kleen1-1/+0
pagefault_disabled_dec is frequently used inline, and it has a WARN_ON for underflow that expands to about 6.5k of extra code. The warning doesn't seem to be that useful and worth so much code so remove it. If it was needed could make it depending on some debug kernel option. Saves ~6.5k in my kernel text data bss dec hex filename 9039417 5367568 11116544 25523529 1857549 vmlinux-before-pf 9032805 5367568 11116544 25516917 1855b75 vmlinux-pf Link: http://lkml.kernel.org/r/20170315021431.13107-8-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09kref: remove WARN_ON for NULL release functionsAndi Kleen1-6/+0
The kref functions check for NULL release functions. This WARN_ON seems rather pointless. We will eventually release and then just crash nicely. It is also somewhat expensive because these functions are inlined in a lot of places. Removing the WARN_ONs saves around 2.3k in this kernel (likely more in others with more drivers) text data bss dec hex filename 9083992 5367600 11116544 25568136 1862388 vmlinux-before-load-avg 9070166 5367600 11116544 25554310 185ed86 vmlinux-load-avg Link: http://lkml.kernel.org/r/20170315021431.13107-5-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09include/linux/filter.h: use set_memory.h headerLaura Abbott1-1/+3
set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-11-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09treewide: spelling: correct diffrent[iate] and banlance typosJoe Perches1-1/+1
Add these misspellings to scripts/spelling.txt too Link: http://lkml.kernel.org/r/962aace119675e5fe87be2a88ddac1a5486f8e60.1490931810.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09scripts/spelling.txt: add regsiter -> register spelling mistakeStephen Boyd2-2/+2
This typo is quite common. Fix it and add it to the spelling file so that checkpatch catches it earlier. Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09treewide: use kv[mz]alloc* rather than opencoded variantsMichal Hocko2-6/+9
There are many code paths opencoding kvmalloc. Let's use the helper instead. The main difference to kvmalloc is that those users are usually not considering all the aspects of the memory allocator. E.g. allocation requests <= 32kB (with 4kB pages) are basically never failing and invoke OOM killer to satisfy the allocation. This sounds too disruptive for something that has a reasonable fallback - the vmalloc. On the other hand those requests might fallback to vmalloc even when the memory allocator would succeed after several more reclaim/compaction attempts previously. There is no guarantee something like that happens though. This patch converts many of those places to kv[mz]alloc* helpers because they are more conservative. Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390 Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim Acked-by: David Sterba <dsterba@suse.com> # btrfs Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4 Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5 Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Santosh Raspatur <santosh@chelsio.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Yishai Hadas <yishaih@mellanox.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: "Yan, Zheng" <zyan@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09mm, vmalloc: properly track vmalloc usersMichal Hocko1-0/+19
__vmalloc_node_flags used to be static inline but this has changed by "mm: introduce kv[mz]alloc helpers" because kvmalloc_node needs to use it as well and the code is outside of the vmalloc proper. I haven't realized that changing this will lead to a subtle bug though. The function is responsible to track the caller as well. This caller is then printed by /proc/vmallocinfo. If __vmalloc_node_flags is not inline then we would get only direct users of __vmalloc_node_flags as callers (e.g. v[mz]alloc) which reduces usefulness of this debugging feature considerably. It simply doesn't help to see that the given range belongs to vmalloc as a caller: 0xffffc90002c79000-0xffffc90002c7d000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N0=3 0xffffc90002c81000-0xffffc90002c85000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3 0xffffc90002c8d000-0xffffc90002c91000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3 0xffffc90002c95000-0xffffc90002c99000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3 We really want to catch the _caller_ of the vmalloc function. Fix this issue by making __vmalloc_node_flags static inline again. Link: http://lkml.kernel.org/r/20170502134657.12381-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09mm: introduce kv[mz]alloc helpersMichal Hocko3-2/+15
Patch series "kvmalloc", v5. There are many open coded kmalloc with vmalloc fallback instances in the tree. Most of them are not careful enough or simply do not care about the underlying semantic of the kmalloc/page allocator which means that a) some vmalloc fallbacks are basically unreachable because the kmalloc part will keep retrying until it succeeds b) the page allocator can invoke a really disruptive steps like the OOM killer to move forward which doesn't sound appropriate when we consider that the vmalloc fallback is available. As it can be seen implementing kvmalloc requires quite an intimate knowledge if the page allocator and the memory reclaim internals which strongly suggests that a helper should be implemented in the memory subsystem proper. Most callers, I could find, have been converted to use the helper instead. This is patch 6. There are some more relying on __GFP_REPEAT in the networking stack which I have converted as well and Eric Dumazet was not opposed [2] to convert them as well. [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com This patch (of 9): Using kmalloc with the vmalloc fallback for larger allocations is a common pattern in the kernel code. Yet we do not have any common helper for that and so users have invented their own helpers. Some of them are really creative when doing so. Let's just add kv[mz]alloc and make sure it is implemented properly. This implementation makes sure to not make a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also to not warn about allocation failures. This also rules out the OOM killer as the vmalloc is a more approapriate fallback than a disruptive user visible action. This patch also changes some existing users and removes helpers which are specific for them. In some cases this is not possible (e.g. ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and require GFP_NO{FS,IO} context which is not vmalloc compatible in general (note that the page table allocation is GFP_KERNEL). Those need to be fixed separately. While we are at it, document that __vmalloc{_node} about unsupported gfp mask because there seems to be a lot of confusion out there. kvmalloc_node will warn about GFP_KERNEL incompatible (which are not superset) flags to catch new abusers. Existing ones would have to die slowly. [sfr@canb.auug.org.au: f2fs fixup] Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part] Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09sysv,ipc: cacheline align kern_ipc_permDavidlohr Bueso2-6/+4
Assign 'struct kern_ipc_perm' its own cacheline to avoid false sharing with sysv ipc calls. While the structure itself is rather read-mostly throughout the lifespan of ipc, the spinlock causes most of the invalidations. One example is commit 31a7c4746e9 ("ipc/sem.c: cacheline align the ipc spinlock for semaphores"). Therefore, extend this to all ipc. The effect of cacheline alignment on sems can be seen in sembench, which deals mostly with semtimedop wait/wakes is seen to improve raw throughput (worker loops) between 8 to 12% on a 24-core x86 with over 4 threads. Link: http://lkml.kernel.org/r/1486673582-6979-4-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09pidns: expose task pid_ns_for_children to userspaceKirill Tkhai1-0/+1
pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. This patch solves the problem, and it exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09ns: allow ns_entries to have custom symlink contentKirill Tkhai1-0/+1
Patch series "Expose task pid_ns_for_children to userspace". pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. If they have a custom pid_ns_for_children before dump, they must have the same ns after restore. Otherwise, restored task bumped into enviroment it does not expect. This patchset solves the problem. It exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] This patch (of 2): Make possible to have link content prefix yyy different from the link name xxx: $ readlink /proc/[pid]/ns/xxx yyy:[4026531838] This will be used in next patch. Link: http://lkml.kernel.org/r/149201120318.6007.7362655181033883000.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09ia64: reuse append_elf_note() and final_note() functionsHari Bathini2-0/+6
Get rid of multiple definitions of append_elf_note() & final_note() functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also, define Elf_Word and use it instead of generic u32 or the more specific Elf64_Word. Link: http://lkml.kernel.org/r/149035342324.6881.11667840929850361402.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_COREHari Bathini3-55/+71
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and reuse crashkernel parameter for fadump", v4. Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. This patchset removes dependency with CONFIG_KEXEC for crashkernel parameter and vmcoreinfo related code as it can be reused without kexec support. Also, crashkernel parameter is reused instead of fadump_reserve_mem to reserve memory for fadump. The first patch moves crashkernel parameter parsing and vmcoreinfo related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The second patch reuses the definitions of append_elf_note() & final_note() functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump) in powerpc. The next patch reuses crashkernel parameter for reserving memory for fadump, instead of the fadump_reserve_mem parameter. This has the advantage of using all syntaxes crashkernel parameter supports, for fadump as well. The last patch updates fadump kernel documentation about use of crashkernel parameter. This patch (of 5): Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. But currently, code related to vmcoreinfo and parsing of crashkernel parameter is built under CONFIG_KEXEC_CORE. This patch introduces CONFIG_CRASH_CORE and moves the above mentioned code under this config, allowing code reuse without dependency on CONFIG_KEXEC. There is no functional change with this patch. Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09cpumask: make "nr_cpumask_bits" unsignedAlexey Dobriyan1-2/+2
Bit searching functions accept "unsigned long" indices but "nr_cpumask_bits" is "int" which is signed, so inevitable sign extensions occur on x86_64. Those MOVSX are #1 MOVSX bloat by number of uses across whole kernel. Change "nr_cpumask_bits" to unsigned, this number can't be negative after all. It allows to do implicit zero-extension on x86_64 without MOVSX. Change signed comparisons into unsigned comparisons where necessary. Other uses looks fine because it is either argument passed to a function or comparison is already unsigned. Net win on allyesconfig type of kernel: ~2.8 KB (!) add/remove: 0/0 grow/shrink: 8/725 up/down: 93/-2926 (-2833) function old new delta xen_exit_mmap 691 735 +44 qstat_read 426 440 +14 __cpufreq_cooling_register 1678 1687 +9 trace_rb_cpu_prepare 447 455 +8 vermagic 54 60 +6 nfp_driver_version 54 60 +6 rcu_torture_stats_print 1147 1151 +4 find_next_push_cpu 267 269 +2 xen_irq_resume 961 960 -1 ... init_vp_index 946 906 -40 od_set_powersave_bias 328 281 -47 power_cpu_exit 193 139 -54 arch_show_interrupts 3538 3484 -54 select_idle_sibling 1558 1471 -87 Total: Before=158358910, After=158356077, chg -0.00% Same arguments apply to "nr_cpu_ids" but I haven't yet found enough courage to delve into this issue (and proper fix may require new type "cpu_t" which is whole separate story). Link: http://lkml.kernel.org/r/20170309205322.GA1728@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smpMatthias Kaehlcke1-8/+3
jiffies_64 is defined in kernel/time/timer.c with ____cacheline_aligned_in_smp, however this macro is not part of the declaration of jiffies and jiffies_64 in jiffies.h. As a result clang generates the following warning: kernel/time/timer.c:57:26: error: section does not match previous declaration [-Werror,-Wsection] __visible u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES; ^ include/linux/cache.h:39:36: note: expanded from macro '__cacheline_aligned_in_smp' ^ include/linux/cache.h:34:4: note: expanded from macro '__cacheline_aligned' __section__(".data..cacheline_aligned"))) ^ include/linux/jiffies.h:77:12: note: previous attribute is here extern u64 __jiffy_data jiffies_64; ^ include/linux/jiffies.h:70:38: note: expanded from macro '__jiffy_data' Link: http://lkml.kernel.org/r/20170403190200.70273-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Cc: Grant Grundler <grundler@chromium.org> Cc: Michael Davidson <md@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09mm, compaction: change migrate_async_suitable() to suitable_migration_source()Vlastimil Babka1-0/+5
Preparation for making the decisions more complex and depending on compact_control flags. No functional change. Link: http://lkml.kernel.org/r/20170307131545.28577-6-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09mm, page_alloc: count movable pages when stealing from pageblockVlastimil Babka1-4/+1
When stealing pages from pageblock of a different migratetype, we count how many free pages were stolen, and change the pageblock's migratetype if more than half of the pageblock was free. This might be too conservative, as there might be other pages that are not free, but were allocated with the same migratetype as our allocation requested. While we cannot determine the migratetype of allocated pages precisely (at least without the page_owner functionality enabled), we can count pages that compaction would try to isolate for migration - those are either on LRU or __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be done as part of free page stealing with little additional overhead. The page stealing code is changed so that it considers free pages plus pages of the "good" migratetype for the decision whether to change pageblock's migratetype. The result should be more accurate migratetype of pageblocks wrt the actual pages in the pageblocks, when stealing from semi-occupied pageblocks. This should help the efficiency of page grouping by mobility. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 47%. The number of movable allocations falling back to other pageblocks are increased by 55%, but these events don't cause permanent fragmentation, so the tradeoff should be positive. Later patches also offset the movable fallback increase to some extent. [akpm@linux-foundation.org: merge fix] Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-09NFS append COMMIT after synchronous COPYOlga Kornievskaia1-0/+1
Instead of messing with the commit path which has been causing issues, add a COMMIT op after the COPY and ask for stable copies in the first space. It saves a round trip, since after the COPY, the client sends a COMMIT anyway. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-35/+39
Pull KVM updates from Paolo Bonzini: "ARM: - HYP mode stub supports kexec/kdump on 32-bit - improved PMU support - virtual interrupt controller performance improvements - support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) MIPS: - basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) PPC: - in-kernel acceleration for VFIO s390: - support for guests without storage keys - adapter interruption suppression x86: - usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits - emulation of CPL3 CPUID faulting generic: - first part of VCPU thread request API - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) kvm: nVMX: Don't validate disabled secondary controls KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick Revert "KVM: Support vCPU-based gfn->hva cache" tools/kvm: fix top level makefile KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING KVM: Documentation: remove VM mmap documentation kvm: nVMX: Remove superfluous VMX instruction fault checks KVM: x86: fix emulation of RSM and IRET instructions KVM: mark requests that need synchronization KVM: return if kvm_vcpu_wake_up() did wake up the VCPU KVM: add explicit barrier to kvm_vcpu_kick KVM: perform a wake_up in kvm_make_all_cpus_request KVM: mark requests that do not need a wakeup KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up KVM: x86: always use kvm_make_request instead of set_bit KVM: add kvm_{test,clear}_request to replace {test,clear}_bit s390: kvm: Cpu model support for msa6, msa7 and msa8 KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK kvm: better MWAIT emulation for guests KVM: x86: virtualize cpuid faulting ...
2017-05-08Merge tag 'for-f2fs-4.12' of ↵Linus Torvalds1-5/+12
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've focused on enhancing performance with regards to block allocation, GC, and discard/in-place-update IO controls. There are a bunch of clean-ups as well as minor bug fixes. Enhancements: - disable heap-based allocation by default - issue small-sized discard commands by default - change the policy of data hotness for logging - distinguish IOs in terms of size and wbc type - start SSR earlier to avoid foreground GC - enhance data structures managing discard commands - enhance in-place update flow - add some more fault injection routines - secure one more xattr entry Bug fixes: - calculate victim cost for GC correctly - remain correct victim segment number for GC - race condition in nid allocator and initializer - stale pointer produced by atomic_writes - fix missing REQ_SYNC for flush commands - handle missing errors in more corner cases" * tag 'for-f2fs-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (111 commits) f2fs: fix a mount fail for wrong next_scan_nid f2fs: enhance scalability of trace macro f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS f2fs: Make flush bios explicitely sync f2fs: show available_nids in f2fs/status f2fs: flush dirty nats periodically f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard f2fs: allow cpc->reason to indicate more than one reason f2fs: release cp and dnode lock before IPU f2fs: shrink size of struct discard_cmd f2fs: don't hold cmd_lock during waiting discard command f2fs: nullify fio->encrypted_page for each writes f2fs: sanity check segment count f2fs: introduce valid_ipu_blkaddr to clean up f2fs: lookup extent cache first under IPU scenario f2fs: reconstruct code to write a data page f2fs: introduce __wait_discard_cmd f2fs: introduce __issue_discard_cmd f2fs: enable small discard by default f2fs: delay awaking discard thread ...
2017-05-08Merge tag 'fscrypt_for_linus' of ↵Linus Torvalds3-11/+83
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Only bug fixes and cleanups for this merge window" * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: correct collision claim for digested names MAINTAINERS: fscrypt: update mailing list, patchwork, and git ext4: clean up ext4_match() and callers f2fs: switch to using fscrypt_match_name() ext4: switch to using fscrypt_match_name() fscrypt: introduce helper function for filename matching fscrypt: avoid collisions when presenting long encrypted filenames f2fs: check entire encrypted bigname when finding a dentry ubifs: check for consistent encryption contexts in ubifs_lookup() f2fs: sync f2fs_lookup() with ext4_lookup() ext4: remove "nokey" check from ext4_lookup() fscrypt: fix context consistency check when key(s) unavailable fscrypt: Remove __packed from fscrypt_policy fscrypt: Move key structure and constants to uapi fscrypt: remove fscrypt_symlink_data_len() fscrypt: remove unnecessary checks for NULL operations
2017-05-08KVM: arm64: vgic-v3: vgic_v3_lpi_sync_pending_statusEric Auger1-0/+2
this new helper synchronizes the irq pending_latch with the LPI pending bit status found in rdist pending table. As the status is consumed, we reset the bit in pending table. As we need the PENDBASER_ADDRESS() in vgic-v3, let's move its definition in the irqchip header. We restore the full length of the field, ie [51:16]. Same for PROPBASER_ADDRESS with full field length of [51:12]. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
2017-05-08KVM: arm64: vgic-its: Interpret MAPD Size field and check related errorsEric Auger1-0/+2
Up to now the MAPD's ITT size field has been ignored. It encodes the number of eventid bit minus 1. It should be used to check the eventid when a MAPTI command is issued on a device. Let's store the number of eventid bits in the its_device and do the check on MAPTI. Also make sure the ITT size field does not exceed the GITS_TYPER IDBITS field. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
2017-05-08KVM: arm64: vgic-its: Implement vgic_mmio_uaccess_write_its_iidrEric Auger1-0/+5
The GITS_IIDR revision field is used to encode the migration ABI revision. So we need to restore it to check the table layout is readable by the destination. By writing the IIDR, userspace thus forces the ABI revision to be used and this must be less than or equal to the max revision KVM supports. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
2017-05-08KVM: arm64: vgic-its: Introduce migration ABI infrastructureEric Auger1-0/+5
We plan to support different migration ABIs, ie. characterizing the ITS table layout format in guest RAM. For example, a new ABI will be needed if vLPIs get supported for nested use case. So let's introduce an array of supported ABIs (at the moment a single ABI is supported though). The following characteristics are foreseen to vary with the ABI: size of table entries, save/restore operation, the way abi settings are applied. By default the MAX_ABI_REV is applied on its creation. In subsequent patches we will introduce a way for the userspace to change the ABI in use. The entry sizes now are set according to the ABI version and not hardcoded anymore. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
2017-05-06Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds3-4/+15
Pull block fixes and updates from Jens Axboe: "Some fixes and followup features/changes that should go in, in this merge window. This contains: - Two fixes for lightnvm from Javier, fixing problems in the new code merge previously in this merge window. - A fix from Jan for the backing device changes, fixing an issue in NFS that causes a failure to mount on certain setups. - A change from Christoph, cleaning up the blk-mq init and exit request paths. - Remove elevator_change(), which is now unused. From Bart. - A fix for queue operation invocation on a dead queue, from Bart. - A series fixing up mtip32xx for blk-mq scheduling, removing a bandaid we previously had in place for this. From me. - A regression fix for this series, fixing a case where we wait on workqueue flushing from an invalid (non-blocking) context. From me. - A fix/optimization from Ming, ensuring that we don't both quiesce and freeze a queue at the same time. - A fix from Peter on lock ordering for CPU hotplug. Not a real problem right now, but will be once the CPU hotplug rework goes in. - A series from Omar, cleaning up out blk-mq debugfs support, and adding support for exporting info from schedulers in debugfs as well. This is really useful in debugging stalls or livelocks. From Omar" * 'for-linus' of git://git.kernel.dk/linux-block: (28 commits) mq-deadline: add debugfs attributes kyber: add debugfs attributes blk-mq-debugfs: allow schedulers to register debugfs attributes blk-mq: untangle debugfs and sysfs blk-mq: move debugfs declarations to a separate header file blk-mq: Do not invoke queue operations on a dead queue blk-mq-debugfs: get rid of a bunch of boilerplate blk-mq-debugfs: rename hw queue directories from <n> to hctx<n> blk-mq-debugfs: don't open code strstrip() blk-mq-debugfs: error on long write to queue "state" file blk-mq-debugfs: clean up flag definitions blk-mq-debugfs: separate flags with | nfs: Fix bdi handling for cloned superblocks block/mq: Cure cpu hotplug lock inversion lightnvm: fix bad back free on error path lightnvm: create cmd before allocating request blk-mq: don't use sync workqueue flushing from drivers mtip32xx: convert internal commands to regular block infrastructure mtip32xx: cleanup internal tag assumptions block: don't call blk_mq_quiesce_queue() after queue is frozen ...