summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-02-06Merge branch 'linus' into sched/urgent, to resolve conflictsIngo Molnar9071-253256/+389529
Conflicts: arch/arm64/kernel/entry.S arch/x86/Kconfig include/linux/sched/mm.h kernel/fork.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06Merge tag 'media/v4.16-2' of ↵Linus Torvalds953-14035/+21723
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - videobuf2 was moved to a media/common dir, as it is now used by the DVB subsystem too - Digital TV core memory mapped support interface - new sensor driver: ov7740 - several improvements at ddbridge driver - new V4L2 driver: IPU3 CIO2 CSI-2 receiver unit, found on some Intel SoCs - new tuner driver: tda18250 - finally got rid of all LIRC staging drivers - as we don't have old lirc drivers anymore, restruct the lirc device code - add support for UVC metadata - add a new staging driver for NVIDIA Tegra Video Decoder Engine - DVB kAPI headers moved to include/media - synchronize the kAPI and uAPI for the DVB subsystem, removing the gap for non-legacy APIs - reduce the kAPI gap for V4L2 - lots of other driver enhancements, cleanups, etc. * tag 'media/v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (407 commits) media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32 media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32 media: v4l2-compat-ioctl32.c: avoid sizeof(type) media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32 media: v4l2-compat-ioctl32.c: fix the indentation media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF media: v4l2-ioctl.c: don't copy back the result for -ENOTTY media: v4l2-ioctl.c: use check_fmt for enum/g/s/try_fmt media: vivid: fix module load error when enabling fb and no_error_inj=1 media: dvb_demux: improve debug messages media: dvb_demux: Better handle discontinuity errors media: cxusb, dib0700: ignore XC2028_I2C_FLUSH media: ts2020: avoid integer overflows on 32 bit machines media: i2c: ov7740: use gpio/consumer.h instead of gpio.h media: entity: Add a nop variant of media_entity_cleanup ...
2018-02-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds55-644/+919
Pull more rdma updates from Doug Ledford: "Items of note: - two patches fix a regression in the 4.15 kernel. The 4.14 kernel worked fine with NVMe over Fabrics and mlx5 adapters. That broke in 4.15. The fix is here. - one of the patches (the endian notation patch from Lijun) looks like a lot of lines of change, but it's mostly mechanical in nature. It amounts to the biggest chunk of change in it (it's about 2/3rds of the overall pull request). Summary: - Clean up some function signatures in rxe for clarity - Tidy the RDMA netlink header to remove unimplemented constants - bnxt_re driver fixes, one is a regression this window. - Minor hns driver fixes - Various fixes from Dan Carpenter and his tool - Fix IRQ cleanup race in HFI1 - HF1 performance optimizations and a fix to report counters in the right units - Fix for an IPoIB startup sequence race with the external manager - Oops fix for the new kabi path - Endian cleanups for hns - Fix for mlx5 related to the new automatic affinity support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits) net/mlx5: increase async EQ to avoid EQ overrun mlx5: fix mlx5_get_vector_affinity to start from completion vector 0 RDMA/hns: Fix the endian problem for hns IB/uverbs: Use the standard kConfig format for experimental IB: Update references to libibverbs IB/hfi1: Add 16B rcvhdr trace support IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node IB/core: Avoid a potential OOPs for an unused optional parameter IB/core: Map iWarp AH type to undefined in rdma_ah_find_type IB/ipoib: Fix for potential no-carrier state IB/hfi1: Show fault stats in both TX and RX directions IB/hfi1: Remove blind constants from 16B update IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times IB/hfi1: Do not override given pcie_pset value IB/hfi1: Optimize process_receive_ib() IB/hfi1: Remove unnecessary fecn and becn fields IB/hfi1: Look up ibport using a pointer in receive path IB/hfi1: Optimize packet type comparison using 9B and bypass code paths IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet IB/hfi1: Remove dependence on qp->s_hdrwords ...
2018-02-06Merge tag 'libnvdimm-for-4.16' of ↵Linus Torvalds52-525/+1120
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ross Zwisler: - Require struct page by default for filesystem DAX to remove a number of surprising failure cases. This includes failures with direct I/O, gdb and fork(2). - Add support for the new Platform Capabilities Structure added to the NFIT in ACPI 6.2a. This new table tells us whether the platform supports flushing of CPU and memory controller caches on unexpected power loss events. - Revamp vmem_altmap and dev_pagemap handling to clean up code and better support future future PCI P2P uses. - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}. - Enhance nfit_test so we can test some of the new things added in version 1.6 of the DSM specification. This includes testing firmware download and simulating the Last Shutdown State (LSS) status. * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits) libnvdimm, namespace: remove redundant initialization of 'nd_mapping' acpi, nfit: fix register dimm error handling libnvdimm, namespace: make min namespace size 4K tools/testing/nvdimm: force nfit_test to depend on instrumented modules libnvdimm/nfit_test: adding support for unit testing enable LSS status libnvdimm/nfit_test: add firmware download emulation nfit-test: Add platform cap support from ACPI 6.2a to test libnvdimm: expose platform persistence attribute for nd_region acpi: nfit: add persistent memory control flag for nd_region acpi: nfit: Add support for detect platform CPU cache flush on power loss device-dax: Fix trailing semicolon libnvdimm, btt: fix uninitialized err_lock dax: require 'struct page' by default for filesystem dax ext2: auto disable dax instead of failing mount ext4: auto disable dax instead of failing mount mm, dax: introduce pfn_t_special() mm: Fix devm_memremap_pages() collision handling mm: Fix memory size alignment in devm_memremap_pages_release() memremap: merge find_dev_pagemap into get_dev_pagemap memremap: change devm_memremap_pages interface to use struct dev_pagemap ...
2018-02-06Merge tag 'pci-v4.16-changes' of ↵Linus Torvalds248-3756/+4503
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - skip AER driver error recovery callbacks for correctable errors reported via ACPI APEI, as we already do for errors reported via the native path (Tyler Baicar) - fix DPC shared interrupt handling (Alex Williamson) - print full DPC interrupt number (Keith Busch) - enable DPC only if AER is available (Keith Busch) - simplify DPC code (Bjorn Helgaas) - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn Helgaas) - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn Helgaas) - move ASPM internal interfaces out of public header (Bjorn Helgaas) - allow hot-removal of VGA devices (Mika Westerberg) - speed up unplug and shutdown by assuming Thunderbolt controllers don't support Command Completed events (Lukas Wunner) - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling, Jay Cornwall) - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes) - clean up PCI DMA interface usage (Christoph Hellwig) - remove PCI pool API (replaced with DMA pool) (Romain Perier) - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan Kaya) - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring) - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler) - remove warnings on sysfs mmap failure (Bjorn Helgaas) - quiet ROM validation messages (Alex Deucher) - remove redundant memory alloc failure messages (Markus Elfring) - fill in types for compile-time VGA and other I/O port resources (Bjorn Helgaas) - make "pci=pcie_scan_all" work for Root Ports as well as Downstream Ports to help AmigaOne X1000 (Bjorn Helgaas) - add SPDX tags to all PCI files (Bjorn Helgaas) - quirk Marvell 9128 DMA aliases (Alex Williamson) - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas) - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas Cassel) - use DMA API to get MSI address for DesignWare IP (Niklas Cassel) - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I) - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun) - add support for ARTPEC-7 SoC (Niklas Cassel) - add endpoint-mode support for ARTPEC (Niklas Cassel) - add Cadence PCIe host and endpoint controller driver (Cyrille Pitchen) - handle multiple INTx status bits being set in dra7xx (Vignesh R) - translate dra7xx hwirq range to fix INTD handling (Vignesh R) - remove deprecated Exynos PHY initialization code (Jaehoon Chung) - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu) - fix NULL pointer dereference in iProc BCMA driver (Ray Jui) - fix Keystone interrupt-controller-node lookup (Johan Hovold) - constify qcom driver structures (Julia Lawall) - rework Tegra config space mapping to increase space available for endpoints (Vidya Sagar) - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy) - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy) - add support for Global Fabric Manager Server (GFMS) event to Microsemi Switchtec switch driver (Logan Gunthorpe) - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao) * tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits) PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller PCI: endpoint: Fix EPF device name to support multi-function devices PCI: endpoint: Add the function number as argument to EPC ops PCI: cadence: Add host driver for Cadence PCIe controller dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller PCI: Add vendor ID for Cadence PCI: Add generic function to probe PCI host controllers PCI: generic: fix missing call of pci_free_resource_list() PCI: OF: Add generic function to parse and allocate PCI resources PCI: Regroup all PCI related entries into drivers/pci/Makefile PCI/DPC: Reformat DPC register definitions PCI/DPC: Add and use DPC Status register field definitions PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error() PCI/DPC: Remove unnecessary RP PIO register structs PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info() PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info() PCI/DPC: Make RP PIO log size check more generic PCI/DPC: Rename local "status" to "dpc_status" PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error() ...
2018-02-06sched/fair: Use a recently used CPU as an idle candidate and the basis for SISMel Gorman3-2/+29
The select_idle_sibling() (SIS) rewrite in commit: 10e2f1acd010 ("sched/core: Rewrite and improve select_idle_siblings()") ... replaced a domain iteration with a search that broadly speaking does a wrapped walk of the scheduler domain sharing a last-level-cache. While this had a number of improvements, one consequence is that two tasks that share a waker/wakee relationship push each other around a socket. Even though two tasks may be active, all cores are evenly used. This is great from a search perspective and spreads a load across individual cores, but it has adverse consequences for cpufreq. As each CPU has relatively low utilisation, cpufreq may decide the utilisation is too low to used a higher P-state and overall computation throughput suffers. While individual cpufreq and cpuidle drivers may compensate by artifically boosting P-state (at c0) or avoiding lower C-states (during idle), it does not help if hardware-based cpufreq (e.g. HWP) is used. This patch tracks a recently used CPU based on what CPU a task was running on when it last was a waker a CPU it was recently using when a task is a wakee. During SIS, the recently used CPU is used as a target if it's still allowed by the task and is idle. The benefit may be non-obvious so consider an example of two tasks communicating back and forth. Task A may be an application doing IO where task B is a kworker or kthread like journald. Task A may issue IO, wake B and B wakes up A on completion. With the existing scheme this may look like the following (potentially different IDs if SMT is in use but similar principal applies). A (cpu 0) wake B (wakes on cpu 1) B (cpu 1) wake A (wakes on cpu 2) A (cpu 2) wake B (wakes on cpu 3) etc. A careful reader may wonder why CPU 0 was not idle when B wakes A the first time and it's simply due to the fact that A can be rescheduled to another CPU and the pattern is that prev == target when B tries to wakeup A and the information about CPU 0 has been lost. With this patch, the pattern is more likely to be: A (cpu 0) wake B (wakes on cpu 1) B (cpu 1) wake A (wakes on cpu 0) A (cpu 0) wake B (wakes on cpu 1) etc i.e. two communicating casts are more likely to use just two cores instead of all available cores sharing a LLC. The most dramatic speedup was noticed on dbench using the XFS filesystem on UMA as clients interact heavily with workqueues in that configuration. Note that a similar speedup is not observed on ext4 as the wakeup pattern is different: 4.15.0-rc9 4.15.0-rc9 waprev-v1 biasancestor-v1 Hmean 1 287.54 ( 0.00%) 817.01 ( 184.14%) Hmean 2 1268.12 ( 0.00%) 1781.24 ( 40.46%) Hmean 4 1739.68 ( 0.00%) 1594.47 ( -8.35%) Hmean 8 2464.12 ( 0.00%) 2479.56 ( 0.63%) Hmean 64 1455.57 ( 0.00%) 1434.68 ( -1.44%) The results can be less dramatic on NUMA where automatic balancing interferes with the test. It's also known that network benchmarks running on localhost also benefit quite a bit from this patch (roughly 10% on netperf RR for UDP and TCP depending on the machine). Hackbench also seens small improvements (6-11% depending on machine and thread count). The facebook schbench was also tested but in most cases showed little or no different to wakeup latencies. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180130104555.4125-5-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06sched/fair: Do not migrate if the prev_cpu is idleMel Gorman1-1/+7
wake_affine_idle() prefers to move a task to the current CPU if the wakeup is due to an interrupt. The expectation is that the interrupt data is cache hot and relevant to the waking task as well as avoiding a search. However, there is no way to determine if there was cache hot data on the previous CPU that may exceed the interrupt data. Furthermore, round-robin delivery of interrupts can migrate tasks around a socket where each CPU is under-utilised. This can interact badly with cpufreq which makes decisions based on per-cpu data. It has been observed on machines with HWP that p-states are not boosted to their maximum levels even though the workload is latency and throughput sensitive. This patch uses the previous CPU for the task if it's idle and cache-affine with the current CPU even if the current CPU is idle due to the wakup being related to the interrupt. This reduces migrations at the cost of the interrupt data not being cache hot when the task wakes. A variety of workloads were tested on various machines and no adverse impact was noticed that was outside noise. dbench on ext4 on UMA showed roughly 10% reduction in the number of CPU migrations and it is a case where interrupts are frequent for IO competions. In most cases, the difference in performance is quite small but variability is often reduced. For example, this is the result for pgbench running on a UMA machine with different numbers of clients. 4.15.0-rc9 4.15.0-rc9 baseline waprev-v1 Hmean 1 22096.28 ( 0.00%) 22734.86 ( 2.89%) Hmean 4 74633.42 ( 0.00%) 75496.77 ( 1.16%) Hmean 7 115017.50 ( 0.00%) 113030.81 ( -1.73%) Hmean 12 126209.63 ( 0.00%) 126613.40 ( 0.32%) Hmean 16 131886.91 ( 0.00%) 130844.35 ( -0.79%) Stddev 1 636.38 ( 0.00%) 417.11 ( 34.46%) Stddev 4 614.64 ( 0.00%) 583.24 ( 5.11%) Stddev 7 542.46 ( 0.00%) 435.45 ( 19.73%) Stddev 12 173.93 ( 0.00%) 171.50 ( 1.40%) Stddev 16 671.42 ( 0.00%) 680.30 ( -1.32%) CoeffVar 1 2.88 ( 0.00%) 1.83 ( 36.26%) Note that the different in performance is marginal but for low utilisation, there is less variability. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180130104555.4125-4-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06sched/fair: Restructure wake_affine*() to return a CPU idMel Gorman1-18/+17
This is a preparation patch that has wake_affine*() return a CPU ID instead of a boolean. The intent is to allow the wake_affine() helpers to be avoided if a decision is already made. This patch has no functional change. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180130104555.4125-3-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06sched/fair: Remove unnecessary parameters from wake_affine_idle()Mel Gorman1-5/+3
wake_affine_idle() takes parameters it never uses so clean it up. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180130104555.4125-2-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06sched/rt: Make update_curr_rt() more accurateWen Yang1-2/+3
rq->clock_task may be updated between the two calls of rq_clock_task() in update_curr_rt(). Calling rq_clock_task() only once makes it more accurate and efficient, taking update_curr() as reference. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: zhong.weidong@zte.com.cn Link: http://lkml.kernel.org/r/1517800721-42092-1-git-send-email-wen.yang99@zte.com.cn Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06sched/rt: Up the root domain ref count when passing it around via IPIsSteven Rostedt (VMware)3-2/+22
When issuing an IPI RT push, where an IPI is sent to each CPU that has more than one RT task scheduled on it, it references the root domain's rto_mask, that contains all the CPUs within the root domain that has more than one RT task in the runable state. The problem is, after the IPIs are initiated, the rq->lock is released. This means that the root domain that is associated to the run queue could be freed while the IPIs are going around. Add a sched_get_rd() and a sched_put_rd() that will increment and decrement the root domain's ref count respectively. This way when initiating the IPIs, the scheduler will up the root domain's ref count before releasing the rq->lock, ensuring that the root domain does not go away until the IPI round is complete. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()Steven Rostedt (VMware)1-7/+8
When the rto_push_irq_work_func() is called, it looks at the RT overloaded bitmask in the root domain via the runqueue (rq->rd). The problem is that during CPU up and down, nothing here stops rq->rd from changing between taking the rq->rd->rto_lock and releasing it. That means the lock that is released is not the same lock that was taken. Instead of using this_rq()->rd to get the root domain, as the irq work is part of the root domain, we can simply get the root domain from the irq work that is passed to the routine: container_of(work, struct root_domain, rto_push_work) This keeps the root domain consistent. Reported-by: Pavan Kondeti <pkondeti@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06sched/core: Optimize update_stats_*()Peter Zijlstra2-16/+20
These functions are already gated by schedstats_enabled(), there is no point in then issuing another static_branch for every individual update in them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06sched/core: Optimize ttwu_stat()Peter Zijlstra2-8/+10
The whole of ttwu_stat() is guarded by a single schedstat_enabled(), there is absolutely no point in then issuing another static_branch for every single schedstat_inc() in there. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-06Merge tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds3-11/+24
Pull more xfs updates from Darrick Wong: "As promised, here's a (much smaller) second pull request for the second week of the merge cycle. This time around we have a couple patches shutting off unsupported fs configurations, and a couple of cleanups. Last, we turn off EXPERIMENTAL for the reverse mapping btree, since the primary downstream user of that information (online fsck) is now upstream and I haven't seen any major failures in a few kernel releases. Summary: - Print scrub build status in the xfs build info. - Explicitly call out the remaining two scenarios where we don't support reflink and never have. - Remove EXPERIMENTAL tag from reverse mapping btree!" * tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove experimental tag for reverse mapping xfs: don't allow reflink + realtime filesystems xfs: don't allow DAX on reflink filesystems xfs: add scrub to XFS_BUILD_OPTIONS xfs: fix u32 type usage in sb validation function
2018-02-06Merge branch 'overlayfs-linus' of ↵Linus Torvalds15-409/+1905
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This work from Amir adds NFS export capability to overlayfs. NFS exporting an overlay filesystem is a challange because we want to keep track of any copy-up of a file or directory between encoding the file handle and decoding it. This is achieved by indexing copied up objects by lower layer file handle. The index is already used for hard links, this patchset extends the use to NFS file handle decoding" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (51 commits) ovl: check ERR_PTR() return value from ovl_encode_fh() ovl: fix regression in fsnotify of overlay merge dir ovl: wire up NFS export operations ovl: lookup indexed ancestor of lower dir ovl: lookup connected ancestor of dir in inode cache ovl: hash non-indexed dir by upper inode for NFS export ovl: decode pure lower dir file handles ovl: decode indexed dir file handles ovl: decode lower file handles of unlinked but open files ovl: decode indexed non-dir file handles ovl: decode lower non-dir file handles ovl: encode lower file handles ovl: copy up before encoding non-connectable dir file handle ovl: encode non-indexed upper file handles ovl: decode connected upper dir file handles ovl: decode pure upper file handles ovl: encode pure upper file handles ovl: document NFS export vfs: factor out helpers d_instantiate_anon() and d_alloc_anon() ovl: store 'has_upper' and 'opaque' as bit flags ...
2018-02-05membarrier/selftest: Test private expedited sync core commandMathieu Desnoyers1-0/+73
Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alice Ferrazzi <alice.ferrazzi@gmail.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Elder <paul.elder@pitt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-12-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05membarrier/arm64: Provide core serializing commandMathieu Desnoyers2-0/+5
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-11-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05membarrier/x86: Provide core serializing commandMathieu Desnoyers4-3/+14
There are two places where core serialization is needed by membarrier: 1) When returning from the membarrier IPI, 2) After scheduler updates curr to a thread with a different mm, before going back to user-space, since the curr->mm is used by membarrier to check whether it needs to send an IPI to that CPU. x86-32 uses IRET as return from interrupt, and both IRET and SYSEXIT to go back to user-space. The IRET instruction is core serializing, but not SYSEXIT. x86-64 uses IRET as return from interrupt, which takes care of the IPI. However, it can return to user-space through either SYSRETL (compat code), SYSRETQ, or IRET. Given that SYSRET{L,Q} is not core serializing, we rely instead on write_cr3() performed by switch_mm() to provide core serialization after changing the current mm, and deal with the special case of kthread -> uthread (temporarily keeping current mm into active_mm) by adding a sync_core() in that specific case. Use the new sync_core_before_usermode() to guarantee this. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-10-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05membarrier: Provide core serializing command, *_SYNC_COREMathieu Desnoyers5-18/+106
Provide core serializing membarrier command to support memory reclaim by JIT. Each architecture needs to explicitly opt into that support by documenting in their architecture code how they provide the core serializing instructions required when returning from the membarrier IPI, and after the scheduler has updated the curr->mm pointer (before going back to user-space). They should then select ARCH_HAS_MEMBARRIER_SYNC_CORE to enable support for that command on their architecture. Architectures selecting this feature need to either document that they issue core serializing instructions when returning to user-space, or implement their architecture-specific sync_core_before_usermode(). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-9-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05lockin/x86: Implement sync_core_before_usermode()Mathieu Desnoyers2-0/+29
Ensure that a core serializing instruction is issued before returning to user-mode. x86 implements return to user-space through sysexit, sysrel, and sysretq, which are not core serializing. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-8-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05locking: Introduce sync_core_before_usermode()Mathieu Desnoyers2-0/+24
Introduce an architecture function that ensures the current CPU issues a core serializing instruction before returning to usermode. This is needed for the membarrier "sync_core" command. Architectures defining the sync_core_before_usermode() static inline need to select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-7-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05membarrier/selftest: Test global expedited commandMathieu Desnoyers1-5/+54
Test the new MEMBARRIER_CMD_GLOBAL_EXPEDITED and MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED commands. Adapt to the MEMBARRIER_CMD_SHARED -> MEMBARRIER_CMD_GLOBAL rename. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alice Ferrazzi <alice.ferrazzi@gmail.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Elder <paul.elder@pitt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-6-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05membarrier: Provide GLOBAL_EXPEDITED commandMathieu Desnoyers4-18/+153
Allow expedited membarrier to be used for data shared between processes through shared memory. Processes wishing to receive the membarriers register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED. This allows extremely simple kernel-level implementation: we have almost everything we need with the PRIVATE_EXPEDITED barrier code. All we need to do is to add a flag in the mm_struct that will be used to check whether we need to send the IPI to the current thread of each CPU. There is a slight downside to this approach compared to targeting specific shared memory users: when performing a membarrier operation, all registered "global" receivers will get the barrier, even if they don't share a memory mapping with the sender issuing MEMBARRIER_CMD_GLOBAL_EXPEDITED. This registration approach seems to fit the requirement of not disturbing processes that really deeply care about real-time: they simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED" command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward compatibility. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-5-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05membarrier: Document scheduler barrier requirementsMathieu Desnoyers3-11/+36
Document the membarrier requirement on having a full memory barrier in __schedule() after coming from user-space, before storing to rq->curr. It is provided by smp_mb__after_spinlock() in __schedule(). Document that membarrier requires a full barrier on transition from kernel thread to userspace thread. We currently have an implicit barrier from atomic_dec_and_test() in mmdrop() that ensures this. The x86 switch_mm_irqs_off() full barrier is currently provided by many cpumask update operations as well as write_cr3(). Document that write_cr3() provides this barrier. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-4-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05powerpc, membarrier: Skip memory barrier in switch_mm()Mathieu Desnoyers8-11/+58
Allow PowerPC to skip the full memory barrier in switch_mm(), and only issue the barrier when scheduling into a task belonging to a process that has registered to use expedited private. Threads targeting the same VM but which belong to different thread groups is a tricky case. It has a few consequences: It turns out that we cannot rely on get_nr_threads(p) to count the number of threads using a VM. We can use (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) instead to skip the synchronize_sched() for cases where the VM only has a single user, and that user only has a single thread. It also turns out that we cannot use for_each_thread() to set thread flags in all threads using a VM, as it only iterates on the thread group. Therefore, test the membarrier state variable directly rather than relying on thread flags. This means membarrier_register_private_expedited() needs to set the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows private expedited membarrier commands to succeed. membarrier_arch_switch_mm() now tests for the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05membarrier/selftest: Test private expedited commandMathieu Desnoyers1-16/+95
Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands. Add checks expecting specific error values on system calls expected to fail. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alice Ferrazzi <alice.ferrazzi@gmail.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Elder <paul.elder@pitt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-2-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-05Merge tag 'rproc-v4.16' of git://github.com/andersson/remoteprocLinus Torvalds11-204/+107
Pull remoteproc updates from Bjorn Andersson: "This contains a few bug fixes and a cleanup up of the resource-table handling in the framework, which removes the need for drivers with no resource table to provide a fake one" * tag 'rproc-v4.16' of git://github.com/andersson/remoteproc: remoteproc: Reset table_ptr on stop remoteproc: Drop dangling find_rsc_table dummies remoteproc: Move resource table load logic to find remoteproc: Don't handle empty resource table remoteproc: Merge rproc_ops and rproc_fw_ops remoteproc: Clone rproc_ops in rproc_alloc() remoteproc: Cache resource table size remoteproc: Remove depricated crash completion virtio_remoteproc: correct put_device virtio_device.dev
2018-02-05Merge tag 'rpmsg-v4.16' of git://github.com/andersson/remoteprocLinus Torvalds4-19/+55
Pull rpmsg updates from Bjorn Andersson: "This fixes a few issues found in the SMD and GLINK drivers and corrects the handling of SMD channels that are found in an (previously) unexpected state" * tag 'rpmsg-v4.16' of git://github.com/andersson/remoteproc: rpmsg: smd: Fix double unlock in __qcom_smd_send() rpmsg: glink: Fix missing mutex_init() in qcom_glink_alloc_channel() rpmsg: smd: Don't hold the tx lock during wait rpmsg: smd: Fail send on a closed channel rpmsg: smd: Wake up all waiters rpmsg: smd: Create device for all channels rpmsg: smd: Perform handshake during open rpmsg: glink: smem: Ensure ordering during tx drivers: rpmsg: remove duplicate includes remoteproc: qcom: Use PTR_ERR_OR_ZERO() in glink prob
2018-02-05Merge tag 'mmc-v4.16-2' of ↵Linus Torvalds3-10/+161
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC host fixes from Ulf Hansson: - renesas_sdhi: Fix build error in case NO_DMA=y - sdhci: Implement a bounce buffer to address throughput regressions * tag 'mmc-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: MMC_SDHI_{SYS,INTERNAL}_DMAC should depend on HAS_DMA mmc: sdhci: Implement an SDHCI-specific bounce buffer
2018-02-05Merge tag 'pwm/for-4.16-rc1' of ↵Linus Torvalds4-1/+30
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "The Meson PWM controller driver gains support for the AXG series and a minor bug is fixed for the STMPE driver. To round things off, the class is now set for PWM channels exported via sysfs which allows non-root access, provided that the system has been configured accordingly" * tag 'pwm/for-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: meson: Add clock source configuration for Meson-AXG dt-bindings: pwm: Update bindings for the Meson-AXG pwm: stmpe: Fix wrong register offset for hwpwm=2 case pwm: Set class for exported channels in sysfs
2018-02-05net: mediatek: Explicitly include pinctrl headersThierry Reding1-0/+1
The Mediatek ethernet driver fails to build after commit 23c35f48f5fb ("pinctrl: remove include file from <linux/device.h>") because it relies on the pinctrl/consumer.h and pinctrl/devinfo.h being pulled in by the device.h header implicitly. Include these headers explicitly to avoid the build failure. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-05mmc: meson-gx-mmc: Explicitly include pinctr/consumer.hThierry Reding1-0/+1
The Meson GX MMC driver fails to build after commit 23c35f48f5fb ("pinctrl: remove include file from <linux/device.h>") because it relies on the pinctrl/consumer.h being pulled in by the device.h header implicitly. Include the header explicitly to avoid the build failure. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-05drm/rockchip: lvds: Explicitly include pinctrl headersThierry Reding1-0/+1
The Rockchip LVDS driver fails to build after commit 23c35f48f5fb ("pinctrl: remove include file from <linux/device.h>") because it relies on the pinctrl/consumer.h and pinctrl/devinfo.h being pulled in by the device.h header implicitly. Include these headers explicitly to avoid the build failure. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-05pinctrl: files should directly include apis they useStephen Rothwell1-0/+1
Fixes: 23c35f48f5fb ("pinctrl: remove include file from <linux/device.h>") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-05net/mlx5: increase async EQ to avoid EQ overrunMax Gurtovoy1-1/+1
Currently the async EQ has 256 entries only. It might not be big enough for the SW to handle all the needed pending events. For example, in case of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target and the target goes down, the FW will raise 1024 "last WQE reached" events and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond it the FW should be able to delay the event and raise it later on using internal backpressure mechanism. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-05mlx5: fix mlx5_get_vector_affinity to start from completion vector 0Sagi Grimberg1-1/+1
The consumers of this routine expects the affinity map of of vector index relative to the first completion vector. The upper layers are not aware of internal/private completion vectors that mlx5 allocates for its own usage. Hence, return the affinity map of vector index relative to the first completion vector. Fixes: 05e0cc84e00c ("net/mlx5: Fix get vector affinity helper function") Reported-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Cc: <stable@vger.kernel.org> # v4.15 Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-05RDMA/hns: Fix the endian problem for hnsoulijun8-320/+363
The hip06 and hip08 run on a little endian ARM, it needs to revise the annotations to indicate that the HW uses little endian data in the various DMA buffers, and flow the necessary swaps throughout. The imm_data use big endian mode. The cpu_to_le32/le32_to_cpu swaps are no-op for this, which makes the only substantive change the handling of imm_data which is now mandatory swapped. This also keep match with the userspace hns driver and resolve the warning by sparse. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-05ovl: check ERR_PTR() return value from ovl_encode_fh()Amir Goldstein1-0/+3
Another fix for an issue reported by 0-day robot. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 8ed5eec9d6c4 ("ovl: encode pure upper file handles") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-05ovl: fix regression in fsnotify of overlay merge dirAmir Goldstein1-0/+2
A re-factoring patch in NFS export series has passed the wrong argument to ovl_get_inode() causing a regression in the very recent fix to fsnotify of overlay merge dir. The regression has caused merge directory inodes to be hashed by upper instead of lower real inode, when NFS export and directory indexing is disabled. That caused an inotify watch to become obsolete after directory copy up and drop caches. LTP test inotify07 was improved to catch this regression. The regression also caused multiple redirect dirs to same origin not to be detected on lookup with NFS export disabled. An xfstest was added to cover this case. Fixes: 0aceb53e73be ("ovl: do not pass overlay dentry to ovl_get_inode()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-04Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds38-554/+977
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull spectre/meltdown updates from Thomas Gleixner: "The next round of updates related to melted spectrum: - The initial set of spectre V1 mitigations: - Array index speculation blocker and its usage for syscall, fdtable and the n180211 driver. - Speculation barrier and its usage in user access functions - Make indirect calls in KVM speculation safe - Blacklisting of known to be broken microcodes so IPBP/IBSR are not touched. - The initial IBPB support and its usage in context switch - The exposure of the new speculation MSRs to KVM guests. - A fix for a regression in x86/32 related to the cpu entry area - Proper whitelisting for known to be safe CPUs from the mitigations. - objtool fixes to deal proper with retpolines and alternatives - Exclude __init functions from retpolines which speeds up the boot process. - Removal of the syscall64 fast path and related cleanups and simplifications - Removal of the unpatched paravirt mode which is yet another source of indirect unproteced calls. - A new and undisputed version of the module mismatch warning - A couple of cleanup and correctness fixes all over the place Yet another step towards full mitigation. There are a few things still missing like the RBS underflow mitigation for Skylake and other small details, but that's being worked on. That said, I'm taking a belated christmas vacation for a week and hope that everything is magically solved when I'm back on Feb 12th" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES KVM/x86: Add IBPB support KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL x86/pti: Mark constant arrays as __initconst x86/spectre: Simplify spectre_v2 command line parsing x86/retpoline: Avoid retpolines for built-in __init functions x86/kvm: Update spectre-v1 mitigation KVM: VMX: make MSR bitmaps per-VCPU x86/paravirt: Remove 'noreplace-paravirt' cmdline option x86/speculation: Use Indirect Branch Prediction Barrier in context switch x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable" x86/spectre: Report get_user mitigation for spectre_v1 nl80211: Sanitize array index in parse_txq_params vfs, fdtable: Prevent bounds-check bypass via speculative execution x86/syscall: Sanitize syscall table de-references under speculation x86/get_user: Use pointer masking to limit speculation ...
2018-02-04Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds6-6/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of changes: - a fixup for kexec related to 5-level paging mode. That covers most of the cases except kexec from a 5-level kernel to a 4-level kernel. The latter needs more work and is going to come in 4.17 - two trivial fixes for build warnings triggered by LTO and gcc-8" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/power: Fix swsusp_arch_resume prototype x86/dumpstack: Avoid uninitlized variable x86/kexec: Make kexec (mostly) work in 5-level paging mode
2018-02-04Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds4-6/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "Two small changes: - a fix for a interrupt regression caused by the vector management changes in 4.15 affecting museum pieces which rely on interrupt probing for legacy (e.g. parallel port) devices. One of the startup calls in the autoprobe code was not changed to the new activate_and_startup() function resulting in a warning and as a consequence failing to discover the device interrupt. - a trivial update to the copyright/license header of the STM32 irq chip driver" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Make legacy autoprobing work again irqchip/stm32: Fix copyright
2018-02-04Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-blockLinus Torvalds13-30/+83
Pull more block updates from Jens Axboe: "Most of this is fixes and not new code/features: - skd fix from Arnd, fixing a build error dependent on sla allocator type. - blk-mq scheduler discard merging fixes, one from me and one from Keith. This fixes a segment miscalculation for blk-mq-sched, where we mistakenly think two segments are physically contigious even though the request isn't carrying real data. Also fixes a bio-to-rq merge case. - Don't re-set a bit on the buffer_head flags, if it's already set. This can cause scalability concerns on bigger machines and workloads. From Kemi Wang. - Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to distuingish between a local (device related) resource starvation and a global one. The latter might happen without IO being in flight, so it has to be handled a bit differently. From Ming" * tag 'for-linus-20180204' of git://git.kernel.dk/linux-block: block: skd: fix incorrect linux/slab_def.h inclusion buffer: Avoid setting buffer bits that are already set blk-mq-sched: Enable merging discard bio into request blk-mq: fix discard merge with scheduler attached blk-mq: introduce BLK_STS_DEV_RESOURCE
2018-02-04Merge tag 'ntb-4.16' of git://github.com/jonmason/ntbLinus Torvalds14-1954/+3550
Pull NTB updates from Jon Mason: "Bug fixes galore, removal of the ntb atom driver, and updates to the ntb tools and tests to support the multi-port interface" * tag 'ntb-4.16' of git://github.com/jonmason/ntb: (37 commits) NTB: ntb_perf: fix cast to restricted __le32 ntb_perf: Fix an error code in perf_copy_chunk() ntb_hw_switchtec: Make function switchtec_ntb_remove() static NTB: ntb_tool: fix memory leak on 'buf' on error exit path NTB: ntb_perf: fix printing of resource_size_t NTB: ntb_hw_idt: Set NTB_TOPO_SWITCH topology NTB: ntb_test: Update ntb_perf tests NTB: ntb_test: Update ntb_tool MW tests NTB: ntb_test: Add ntb_tool Message tests NTB: ntb_test: Update ntb_tool Scratchpad tests NTB: ntb_test: Update ntb_tool DB tests NTB: ntb_test: Update ntb_tool link tests NTB: ntb_test: Add ntb_tool port tests NTB: ntb_test: Safely use paths with whitespace NTB: ntb_perf: Add full multi-port NTB API support NTB: ntb_tool: Add full multi-port NTB API support NTB: ntb_pp: Add full multi-port NTB API support NTB: Fix UB/bug in ntb_mw_get_align() NTB: Set dma mask and dma coherent mask to NTB devices NTB: Rename NTB messaging API methods ...
2018-02-04Merge tag 'mailbox-v4.16' of ↵Linus Torvalds3-16/+51
git://git.linaro.org/landing-teams/working/fujitsu/integration Pull mailbox updates from Jassi Brar: "Misc driver changes only: - TI-MsgMgr: Fix print format for a printk - TI-MSgMgr: SPDX license switch for the driver - QCOM-IPC: Convert driver to use regmap - QCOM-IPC: Spawn sibling clock device from mailbox driver" * tag 'mailbox-v4.16' of git://git.linaro.org/landing-teams/working/fujitsu/integration: dt-bindings: mailbox: qcom: Document the APCS clock binding mailbox: qcom: Create APCS child device for clock controller mailbox: qcom: Convert APCS IPC driver to use regmap mailbox: ti-msgmgr: Use %zu for size_t print format mailbox: ti-msgmgr: Switch to SPDX Licensing
2018-02-04Merge branch 'i2c/for-4.16' of ↵Linus Torvalds45-1012/+1381
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "I2C has the following changes for you: - new flag to mark DMA safe buffers in i2c_msg. Also, some infrastructure around it. And docs. - huge refactoring of the at24 driver led by the new maintainer Bartosz - update I2C bus recovery to send STOP after recovery - conversion from gpio to gpiod for I2C bus recovery - adding a fault-injector to the i2c-gpio driver - lots of small driver improvements, and bigger ones to i2c-sh_mobile" * 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (99 commits) i2c: mv64xxx: Add myself as maintainer for this driver i2c: mv64xxx: Fix clock resource by adding an optional bus clock i2c: mv64xxx: Remove useless test before clk_disable_unprepare i2c: mxs: use true and false for boolean values i2c: meson: update doc description to fix build warnings i2c: meson: add configurable divider factors dt-bindings: i2c: update documentation for the Meson-AXG i2c: imx-lpi2c: add runtime pm support i2c: rcar: fix some trivial typos in comments i2c: davinci: fix the cpufreq transition i2c: rk3x: add proper kerneldoc header i2c: rk3x: account for const type of of_device_id.data i2c: acorn: remove outdated path from file header i2c: acorn: add MODULE_LICENSE tag i2c: rcar: implement bus recovery i2c: send STOP after successful bus recovery i2c: ensure SDA is released in recovery if SDA is controllable i2c: add 'set_sda' to bus_recovery_info i2c: add identifier in declarations for i2c_bus_recovery i2c: make kerneldoc about bus recovery more precise ...
2018-02-04Merge tag 'fscrypt_for_linus' of ↵Linus Torvalds17-500/+500
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Refactor support for encrypted symlinks to move common code to fscrypt" Ted also points out about the merge: "This makes the f2fs symlink code use the fscrypt_encrypt_symlink() from the fscrypt tree. This will end up dropping the kzalloc() -> f2fs_kzalloc() change, which means the fscrypt-specific allocation won't get tested by f2fs's kmalloc error injection system; which is fine" * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits) fscrypt: fix build with pre-4.6 gcc versions fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info() fscrypt: document symlink length restriction fscrypt: fix up fscrypt_fname_encrypted_size() for internal use fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names fscrypt: calculate NUL-padding length in one place only fscrypt: move fscrypt_symlink_data to fscrypt_private.h fscrypt: remove fscrypt_fname_usr_to_disk() ubifs: switch to fscrypt_get_symlink() ubifs: switch to fscrypt ->symlink() helper functions ubifs: free the encrypted symlink target f2fs: switch to fscrypt_get_symlink() f2fs: switch to fscrypt ->symlink() helper functions ext4: switch to fscrypt_get_symlink() ext4: switch to fscrypt ->symlink() helper functions fscrypt: new helper function - fscrypt_get_symlink() fscrypt: new helper functions for ->symlink() fscrypt: trim down fscrypt.h includes fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h ...
2018-02-04IB/uverbs: Use the standard kConfig format for experimentalJason Gunthorpe1-1/+1
We really don't want people turning this on just yet, make it very clear with capital letters. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-04IB: Update references to libibverbsJason Gunthorpe3-4/+5
These days the userspace comes from rdma-core, revise references in the kernel to point to the current repository. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>