summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2019-06-26keys: Kill off request_key_async{,_with_auxdata}David Howells1-11/+0
Kill off request_key_async{,_with_auxdata}() as they're not currently used. Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-26net/mlx5e: Specifying known origin of packets matching the flowJianbo Liu1-0/+1
In vport metadata matching, source port number is replaced by metadata. While FW has no idea about what it is in the metadata, a syndrome will happen. Specify a known origin to avoid the syndrome. However, there is no functional change because ANY_VPORT (0) is filled in flow_source, the same default value as before, as a pre-step towards metadata matching for fast path. There are two other values can be filled in flow_source. When setting 0x1, packet matching this rule is from uplink, while 0x2 is for packet from other local vports. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ↵Jianbo Liu1-0/+17
ingress ACLs When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport as it is not represented by single VHCA only in this case. So we change to match on metadata instead of source vport. To do that, a rule is created in all vports and uplink ingress ACLs, to save the source vport number and vhca id in the packet's metadata in order to match on it later. The metadata register used is the first of the 32-bit type C registers. It can be used for matching and header modify operations. The higher 16 bits of this register are for vhca id, and the lower 16 ones is for vport number. This change is not for dual-port RoCE only. If HW and FW allow, the vport metadata matching is enabled by default. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Add flow context for flow tagJianbo Liu1-4/+11
Refactor the flow data structures, add new flow_context and move flow_tag into it, as flow_tag doesn't belong to the rule action. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Introduce vport metadata matching bits and enum constantsJianbo Liu1-7/+49
When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport. So we replace the match on source port with the match on metadata that was configured in ingress ACL, and that metadata will be passed further also to the NIC RX table of the eswitch manager. Introduce vport metadata matching bits and enum constants as a pre-step towards metadata matching. o metadata type C registers in the misc parameters 2 fields. o esw_uplink_ingress_acl bit in esw cap. If it set, the device supports ingress ACL for the uplink vport. o fdb_to_vport_reg_* bits in flow table cap and esw vport context, to support propagating the metadata to the nic rx through the loopback path. o flow_source in flow context, to indicate the known origin of packets. o enum constants, to support the above bits. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26i40e: fix 'Unknown bps' in dmesg for 2.5Gb/5Gb speedsAleksandr Loktionov1-0/+4
This patch fixes 'NIC Link is Up, Unknown bps' message in dmesg for 2.5Gb/5Gb speeds. This problem is fixed by adding constants for VIRTCHNL_LINK_SPEED_2_5GB and VIRTCHNL_LINK_SPEED_5GB cases in the i40e_virtchnl_link_speed() function. Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-26i2c: add newly exported functions to the header, tooWolfram Sang1-0/+6
Nobody (including me) noticed that these functions were exported but not added to the header :/ Fixes: 7159dbdae3c5 ("i2c: core: improve return value handling of i2c_new_device and i2c_new_dummy") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-06-26HID: intel-ish-hid: fix wrong driver_data usageHyungwoo Yang1-0/+1
Currently, in suspend() and resume(), ishtp client drivers are using driver_data to get "struct ishtp_cl_device" object which is set by bus driver. It's wrong since the driver_data should not be owned bus. driver_data should be owned by the corresponding ishtp client driver. Due to this, some ishtp client driver like cros_ec_ishtp which uses its driver_data to transfer its data to its child doesn't work correctly. So this patch removes setting driver_data in bus drier and instead of using driver_data to get "struct ishtp_cl_device", since "struct device" is embedded in "struct ishtp_cl_device", we introduce a helper function that returns "struct ishtp_cl_device" from "struct device". Signed-off-by: Hyungwoo Yang <hyungwoo.yang@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2019-06-26Merge tag 'v5.2-rc6' into asoc-5.3Mark Brown883-6225/+1143
Linux 5.2-rc6
2019-06-26pwm: Add support referencing PWMs from ACPINikolaus Voss1-0/+10
In analogy to referencing a GPIO using the "gpios" property from ACPI, support referencing a PWM using the "pwms" property. ACPI entries must look like Package () {"pwms", Package () { <PWM device reference>, <PWM index>, <PWM period> [, <PWM flags>]}} In contrast to the DT implementation, only _one_ PWM entry in the "pwms" property is supported. As a consequence "pwm-names"-property and con_id lookup aren't supported. Support for ACPI is added via the firmware-node framework which is an abstraction layer on top of ACPI/DT. To keep this patch clean, DT and ACPI paths are kept separate. The firmware-node framework could be used to unify both paths in a future patch. To support leds-pwm driver, an additional method devm_fwnode_pwm_get() which supports both ACPI and DT configuration is exported. Signed-off-by: Nikolaus Voss <nikolaus.voss@loewensteinmedical.de> [thierry.reding@gmail.com: fix build failures for !ACPI] Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2019-06-26cpufreq: Move the IS_ENABLED(CPU_THERMAL) macro into a stubDaniel Lezcano1-0/+6
cpufreq_online() and cpufreq_offline() [un]register the driver as a cooling device. This is done if the driver is flagged as a cooling device in addition with an IS_ENABLED() check to compile out the branching code. Group this test in a stub function added in the cpufreq header instead of having the IS_ENABLED() in the code. Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-26Merge branch 'opp/linux-next' of ↵Rafael J. Wysocki1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull operating performance points (OPP) framework changes for v5.3 from Viresh Kumar: "This pull request contains: - OPP core changes to support a wider range of devices, like IO devices (Rajendra Nayak and Stehpen Boyd). - Fixes around genpd_virt_devs (Viresh Kumar). - Fix for platform with set_opp() callback (Dmitry Osipenko)." * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: opp: Don't use IS_ERR on invalid supplies opp: Make dev_pm_opp_set_rate() handle freq = 0 to drop performance votes opp: Don't overwrite rounded clk rate opp: Allocate genpd_virt_devs from dev_pm_opp_attach_genpd() opp: Attach genpds to devices from within OPP core
2019-06-26usb: renesas_usbhs: Add has_new_pipe_configs flagYoshihiro Shimoda1-0/+1
In the future, each struct renesas_usbhs_driver_param is stored on the each platform related source code (e.g. rcar3.c). So, to simplify the source code, this patch adds a new flag has_new_pipe_configs. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-26usb: renesas_usbhs: Remove type member from renesas_usbhs_driver_paramYoshihiro Shimoda1-7/+0
Now no one uses the type member so that this patch removes it. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-26usb: renesas_usbhs: Use a specific flag instead of type for multi_clksYoshihiro Shimoda1-0/+1
To remove the type of renesas_usbhs_driver_param in the future, this patch uses a specific flag "multi_clks". Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-26usb: renesas_usbhs: remove notify_hotplug callbackYoshihiro Shimoda1-25/+1
The notify_hotplug callback was supported in v3.10, but the last user (armadillo800eva) was removed by the commit 1fa59bda21c7 ("ARM: shmobile: Remove legacy board code for Armadillo-800 EVA"). So, this patch removes it. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-26clk: Add devm_clk_bulk_get_optional() functionSylwester Nawrocki1-0/+28
Add managed version of the clk_bulk_get_optional() helper function. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> [sboyd@kernel.org: Mark __devm_clk_bulk_get() static] Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-06-26clk: Add clk_bulk_get_optional() functionSylwester Nawrocki1-0/+19
clk_bulk_get_optional() allows to get a group of clocks where one or more is optional. For a not available clock, e.g. not specifed in the clock consumer node in DT, its respective struct clk pointer will be NULL. This allows for operating on a group of returned clocks (struct clk_bulk_data array) with existing clk_bulk* APIs. Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-06-25linux/dim: Add completions count to dim_sampleYamin Friedman1-3/+25
Added a measurement of completions per/msec to allow for completion based dim algorithms. In order to use dynamic interrupt moderation with RDMA we need to have a different measurment than packets per second. This change is meant to prepare for adding a new DIM method. All drivers that use net_dim and thus do not need a completion count will have the completions set to 0. Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25linux/dim: Move implementation to .c filesTal Gilboa2-337/+255
Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c. This is both more structurally appealing and would allow to only expose externally used functions. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25linux/dim: Rename externally used net_dim membersTal Gilboa2-26/+25
Removed 'net' prefix from functions and structs used by external drivers. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25linux/dim: Rename net_dim_sample() to net_dim_update_sample()Tal Gilboa2-3/+4
In order to avoid confusion between the function and the similarly named struct. In preparation for removing the 'net' prefix from dim members. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25linux/dim: Rename externally exposed macrosTal Gilboa2-15/+15
Renamed macros in use by external drivers. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25linux/dim: Remove "net" prefix from internal DIM membersTal Gilboa2-87/+86
Only renaming functions and structs which aren't used by an external code. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25linux/dim: Move logic to dim.hTal Gilboa2-146/+155
In preparation for supporting more implementations of the DIM algorithm, I'm moving what would become common logic to a common library. Downstream DIM implementations will use the common lib for their implementation. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25clocksource/drivers/exynos_mct: Increase priority over ARM arch timerMarek Szyprowski1-1/+1
Exynos SoCs based on CA7/CA15 have 2 timer interfaces: custom Exynos MCT (Multi Core Timer) and standard ARM Architected Timers. There are use cases, where both timer interfaces are used simultanously. One of such examples is using Exynos MCT for the main system timer and ARM Architected Timers for the KVM and virtualized guests (KVM requires arch timers). Exynos Multi-Core Timer driver (exynos_mct) must be however started before ARM Architected Timers (arch_timer), because they both share some common hardware blocks (global system counter) and turning on MCT is needed to get ARM Architected Timer working properly. To ensure selecting Exynos MCT as the main system timer, increase MCT timer rating. To ensure proper starting order of both timers during suspend/resume cycle, increase MCT hotplug priority over ARM Archictected Timers. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-06-25pwm: Add consumer device linkFabrice Gasnier1-2/+4
Add a device link between the PWM consumer and the PWM provider. This enforces the PWM user to get suspended before the PWM provider. It allows proper synchronization of suspend/resume sequences: the PWM user is responsible for properly stopping PWM, before the provider gets suspended: see [1]. Add the device link in: - of_pwm_get() - pwm_get() - devm_*pwm_get() variants as it requires a reference to the device for the PWM consumer. [1] https://lkml.org/lkml/2019/2/5/770 Suggested-by: Thierry Reding <thierry.reding@gmail.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2019-06-25dma-direct: handle DMA_ATTR_NO_KERNEL_MAPPING in common codeChristoph Hellwig1-0/+2
DMA_ATTR_NO_KERNEL_MAPPING is generally implemented by allocating normal cacheable pages or CMA memory, and then returning the page pointer as the opaque handle. Lift that code from the xtensa and generic dma remapping implementations into the generic dma-direct code so that we don't even call arch_dma_alloc for these allocations. Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-06-25iommu/io-pgtable: Replace IO_PGTABLE_QUIRK_NO_DMA with specific flagWill Deacon1-7/+4
IO_PGTABLE_QUIRK_NO_DMA is a bit of a misnomer, since it's really just an indication of whether or not the page-table walker for the IOMMU is coherent with the CPU caches. Since cache coherency is more than just a quirk, replace the flag with its own field in the io_pgtable_cfg structure. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2019-06-25regulator: core: Expose some of core functions needed by couplersDmitry Osipenko1-0/+35
Expose some of internal functions that are required for implementation of customized regulator couplers. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-06-25regulator: core: Introduce API for regulators coupling customizationDmitry Osipenko3-4/+66
Right now regulator core supports only one type of regulators coupling, the "voltage max-spread" which keeps voltages of coupled regulators in a given range from each other. A more sophisticated coupling may be required in practice, one example is the NVIDIA Tegra SoCs which besides the max-spreading have other restrictions that must be adhered. Introduce API that allow platforms to provide their own customized coupling algorithms. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-06-25pinctrl: remove unneeded #ifdef around declarationsMasahiro Yamada4-26/+6
What is the point in surrounding the whole of declarations with ifdef like this? #ifdef CONFIG_FOO int foo(void); #endif If CONFIG_FOO is not defined, all callers of foo() will fail with implicit declaration errors since the top Makefile adds -Werror-implicit-function-declaration to KBUILD_CFLAGS. This breaks the build earlier when you are doing something wrong. That's it. Anyway, it will fail to link since the definition of foo() is not compiled. In summary, these ifdef are unneeded. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-06-25timekeeping: Boot should be boottime for coarse ns accessorJason A. Donenfeld1-1/+1
Somewhere in all the patchsets before, this cleanup got lost. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20190624091539.13512-1-Jason@zx2c4.com
2019-06-25dma-mapping: add a dma_alloc_need_uncached helperChristoph Hellwig1-0/+14
Check if we need to allocate uncached memory for a device given the allocation flags. Switch over the uncached segment check to this helper to deal with architectures that do not support the dma_cache_sync operation and thus should not returned cacheable memory for DMA_ATTR_NON_CONSISTENT allocations. Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-06-25Merge tag 'drm/tegra/for-5.3-rc1' of ↵Dave Airlie1-0/+2
git://anongit.freedesktop.org/tegra/linux into drm-next drm/tegra: Changes for v5.3-rc1 This contains a couple of small improvements and cleanups for the Tegra DRM driver. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thierry Reding <thierry.reding@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190621150753.19550-1-thierry.reding@gmail.com
2019-06-25net/mlx5: Convert mkey_table to XArrayMatthew Wilcox2-16/+2
The lock protecting the data structure does not need to be an rwlock. The only read access to the lock is in an error path, and if that's limiting your scalability, you have bigger performance problems. Eliminate mlx5_mkey_table in favour of using the xarray directly. reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may be called in interrupt context. This also fixes a minor bug where SRCU locking was being used on the radix tree read side, when RCU was needed too. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso778-4984/+1130
Resolve conflict between d2912cb15bdd ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500") removing the GPL disclaimer and fe03d4745675 ("Update my email address") which updates Jozsef Kadlecsik's email. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-24tpm: Don't duplicate events from the final event log in the TCG2 logMatthew Garrett1-0/+1
After the first call to GetEventLog() on UEFI systems using the TCG2 crypto agile log format, any further log events (other than those triggered by ExitBootServices()) will be logged in both the main log and also in the Final Events Log. While the kernel only calls GetEventLog() immediately before ExitBootServices(), we can't control whether earlier parts of the boot process have done so. This will result in log entries that exist in both logs, and so the current approach of simply appending the Final Event Log to the main log will result in events being duplicated. We can avoid this problem by looking at the size of the Final Event Log just before we call ExitBootServices() and exporting this to the main kernel. The kernel can then skip over all events that occured before ExitBootServices() and only append events that were not also logged to the main log. Signed-off-by: Matthew Garrett <mjg59@google.com> Reported-by: Joe Richey <joerichey@google.com> Suggested-by: Joe Richey <joerichey@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2019-06-24tpm: Reserve the TPM final events tableMatthew Garrett2-9/+102
UEFI systems provide a boot services protocol for obtaining the TPM event log, but this is unusable after ExitBootServices() is called. Unfortunately ExitBootServices() itself triggers additional TPM events that then can't be obtained using this protocol. The platform provides a mechanism for the OS to obtain these events by recording them to a separate UEFI configuration table which the OS can then map. Unfortunately this table isn't self describing in terms of providing its length, so we need to parse the events inside it to figure out how long it is. Since the table isn't mapped at this point, we need to extend the length calculation function to be able to map the event as it goes along. (Fixes by Bartosz Szczepanek <bsz@semihalf.com>) Signed-off-by: Matthew Garrett <mjg59@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Bartosz Szczepanek <bsz@semihalf.com> Tested-by: Bartosz Szczepanek <bsz@semihalf.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2019-06-24tpm: Abstract crypto agile event size calculationsMatthew Garrett1-0/+68
We need to calculate the size of crypto agile events in multiple locations, including in the EFI boot stub. The easiest way to do this is to put it in a header file as an inline and leave a wrapper to ensure we don't end up with multiple copies of it embedded in the existing code. Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Bartosz Szczepanek <bsz@semihalf.com> Tested-by: Bartosz Szczepanek <bsz@semihalf.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
2019-06-24perf/cgroups: Don't rotate events for cgroups unnecessarilyIan Rogers1-0/+5
Currently perf_rotate_context assumes that if the context's nr_events != nr_active a rotation is necessary for perf event multiplexing. With cgroups, nr_events is the total count of events for all cgroups and nr_active will not include events in a cgroup other than the current task's. This makes rotation appear necessary for cgroups when it is not. Add a perf_event_context flag that is set when rotation is necessary. Clear the flag during sched_out and set it when a flexible sched_in fails due to resources. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20190601082722.44543-1-irogers@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24Merge tag 'v5.2-rc6' into perf/core, to refresh branchIngo Molnar273-1234/+391
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24sched/uclamp: Extend sched_setattr() to support utilization clampingPatrick Bellasi1-0/+9
The SCHED_DEADLINE scheduling class provides an advanced and formal model to define tasks requirements that can translate into proper decisions for both task placements and frequencies selections. Other classes have a more simplified model based on the POSIX concept of priorities. Such a simple priority based model however does not allow to exploit most advanced features of the Linux scheduler like, for example, driving frequencies selection via the schedutil cpufreq governor. However, also for non SCHED_DEADLINE tasks, it's still interesting to define tasks properties to support scheduler decisions. Utilization clamping exposes to user-space a new set of per-task attributes the scheduler can use as hints about the expected/required utilization for a task. This allows to implement a "proactive" per-task frequency control policy, a more advanced policy than the current one based just on "passive" measured task utilization. For example, it's possible to boost interactive tasks (e.g. to get better performance) or cap background tasks (e.g. to be more energy/thermal efficient). Introduce a new API to set utilization clamping values for a specified task by extending sched_setattr(), a syscall which already allows to define task specific properties for different scheduling classes. A new pair of attributes allows to specify a minimum and maximum utilization the scheduler can consider for a task. Do that by validating the required clamp values before and then applying the required changes using _the_ same pattern already in use for __setscheduler(). This ensures that the task is re-enqueued with the new clamp values. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alessio Balsini <balsini@android.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lkml.kernel.org/r/20190621084217.8167-7-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24sched/uclamp: Add system default clampsPatrick Bellasi2-0/+21
Tasks without a user-defined clamp value are considered not clamped and by default their utilization can have any value in the [0..SCHED_CAPACITY_SCALE] range. Tasks with a user-defined clamp value are allowed to request any value in that range, and the required clamp is unconditionally enforced. However, a "System Management Software" could be interested in limiting the range of clamp values allowed for all tasks. Add a privileged interface to define a system default configuration via: /proc/sys/kernel/sched_uclamp_util_{min,max} which works as an unconditional clamp range restriction for all tasks. With the default configuration, the full SCHED_CAPACITY_SCALE range of values is allowed for each clamp index. Otherwise, the task-specific clamp is capped by the corresponding system default value. Do that by tracking, for each task, the "effective" clamp value and bucket the task has been refcounted in at enqueue time. This allows to lazy aggregate "requested" and "system default" values at enqueue time and simplifies refcounting updates at dequeue time. The cached bucket ids are used to avoid (relatively) more expensive integer divisions every time a task is enqueued. An active flag is used to report when the "effective" value is valid and thus the task is actually refcounted in the corresponding rq's bucket. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alessio Balsini <balsini@android.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lkml.kernel.org/r/20190621084217.8167-5-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24sched/uclamp: Add CPU's clamp buckets refcountingPatrick Bellasi3-6/+73
Utilization clamping allows to clamp the CPU's utilization within a [util_min, util_max] range, depending on the set of RUNNABLE tasks on that CPU. Each task references two "clamp buckets" defining its minimum and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp bucket is active if there is at least one RUNNABLE tasks enqueued on that CPU and refcounting that bucket. When a task is {en,de}queued {on,from} a rq, the set of active clamp buckets on that CPU can change. If the set of active clamp buckets changes for a CPU a new "aggregated" clamp value is computed for that CPU. This is because each clamp bucket enforces a different utilization clamp value. Clamp values are always MAX aggregated for both util_min and util_max. This ensures that no task can affect the performance of other co-scheduled tasks which are more boosted (i.e. with higher util_min clamp) or less capped (i.e. with higher util_max clamp). A task has: task_struct::uclamp[clamp_id]::bucket_id to track the "bucket index" of the CPU's clamp bucket it refcounts while enqueued, for each clamp index (clamp_id). A runqueue has: rq::uclamp[clamp_id]::bucket[bucket_id].tasks to track how many RUNNABLE tasks on that CPU refcount each clamp bucket (bucket_id) of a clamp index (clamp_id). It also has a: rq::uclamp[clamp_id]::bucket[bucket_id].value to track the clamp value of each clamp bucket (bucket_id) of a clamp index (clamp_id). The rq::uclamp::bucket[clamp_id][] array is scanned every time it's needed to find a new MAX aggregated clamp value for a clamp_id. This operation is required only when it's dequeued the last task of a clamp bucket tracking the current MAX aggregated clamp value. In this case, the CPU is either entering IDLE or going to schedule a less boosted or more clamped task. The expected number of different clamp values configured at build time is small enough to fit the full unordered array into a single cache line, for configurations of up to 7 buckets. Add to struct rq the basic data structures required to refcount the number of RUNNABLE tasks for each clamp bucket. Add also the max aggregation required to update the rq's clamp value at each enqueue/dequeue event. Use a simple linear mapping of clamp values into clamp buckets. Pre-compute and cache bucket_id to avoid integer divisions at enqueue/dequeue time. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alessio Balsini <balsini@android.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24sched/debug: Add a new sched_trace_*() helper functionsQais Yousef1-1/+15
The new functions allow modules to access internal data structures of unexported struct cfs_rq and struct rq to extract important information from the tracepoints to be introduced in later patches. While at it fix alphabetical order of struct declarations in sched.h Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavankumar Kondeti <pkondeti@codeaurora.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uwe Kleine-Konig <u.kleine-koenig@pengutronix.de> Link: https://lkml.kernel.org/r/20190604111459.2862-3-qais.yousef@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()Vincent Guittot3-13/+5
The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is unused since commit: 765d0af19f5f ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'") Remove it. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: gregkh@linuxfoundation.org Cc: linux@armlinux.org.uk Cc: quentin.perret@arm.com Cc: rafael@kernel.org Link: https://lkml.kernel.org/r/1560783617-5827-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24Merge tag 'v5.2-rc6' into sched/core, to refresh the branchIngo Molnar266-1215/+297
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24perf/x86: Disable extended registers for non-supported PMUsKan Liang2-0/+9
The perf fuzzer caused Skylake machine to crash: [ 9680.085831] Call Trace: [ 9680.088301] <IRQ> [ 9680.090363] perf_output_sample_regs+0x43/0xa0 [ 9680.094928] perf_output_sample+0x3aa/0x7a0 [ 9680.099181] perf_event_output_forward+0x53/0x80 [ 9680.103917] __perf_event_overflow+0x52/0xf0 [ 9680.108266] ? perf_trace_run_bpf_submit+0xc0/0xc0 [ 9680.113108] perf_swevent_hrtimer+0xe2/0x150 [ 9680.117475] ? check_preempt_wakeup+0x181/0x230 [ 9680.122091] ? check_preempt_curr+0x62/0x90 [ 9680.126361] ? ttwu_do_wakeup+0x19/0x140 [ 9680.130355] ? try_to_wake_up+0x54/0x460 [ 9680.134366] ? reweight_entity+0x15b/0x1a0 [ 9680.138559] ? __queue_work+0x103/0x3f0 [ 9680.142472] ? update_dl_rq_load_avg+0x1cd/0x270 [ 9680.147194] ? timerqueue_del+0x1e/0x40 [ 9680.151092] ? __remove_hrtimer+0x35/0x70 [ 9680.155191] __hrtimer_run_queues+0x100/0x280 [ 9680.159658] hrtimer_interrupt+0x100/0x220 [ 9680.163835] smp_apic_timer_interrupt+0x6a/0x140 [ 9680.168555] apic_timer_interrupt+0xf/0x20 [ 9680.172756] </IRQ> The XMM registers can only be collected by PEBS hardware events on the platforms with PEBS baseline support, e.g. Icelake, not software/probe events. Add capabilities flag PERF_PMU_CAP_EXTENDED_REGS to indicate the PMU which support extended registers. For X86, the extended registers are XMM registers. Add has_extended_regs() to check if extended registers are applied. The generic code define the mask of extended registers as 0 if arch headers haven't overridden it. Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 878068ea270e ("perf/x86: Support outputting XMM registers") Link: https://lkml.kernel.org/r/1559081314-9714-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24regulator: s2mps11: Add support for disabling S2MPS11 regulators in suspendKrzysztof Kozlowski1-0/+5
The driver supported turning off regulators in suspend only for S2MPS14 device. However this makes also sense for S2MPS11 and can reduce the power consumption during suspend to RAM. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org>