kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2023-02-22	netfilter: ip6t_rpfilter: Fix regression with VRF interfaces	Phil Sutter	1	-6/+26
	When calling ip6_route_lookup() for the packet arriving on the VRF interface, the result is always the real (slave) interface. Expect this when validating the result. Fixes: acc641ab95b66 ("netfilter: rpfilter/fib: Populate flowic_l3mdev field") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-22	Merge tag 'm68k-for-v6.3-tag1' of ↵	Linus Torvalds	1	-1/+7
	git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - Add seccomp support - defconfig updates - Miscellaneous fixes and improvements * tag 'm68k-for-v6.3-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: /proc/hardware should depend on PROC_FS selftests/seccomp: Add m68k support m68k: Add kernel seccomp support m68k: Check syscall_trace_enter() return code m68k: defconfig: Update defconfigs for v6.2-rc3 m68k: q40: Do not initialise statics to 0
2023-02-22	Merge tag 'x86_cpu_for_v6.3_rc1' of ↵	Linus Torvalds	2	-0/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Cache the AMD debug registers in per-CPU variables to avoid MSR writes where possible, when supporting a debug registers swap feature for SEV-ES guests - Add support for AMD's version of eIBRS called Automatic IBRS which is a set-and-forget control of indirect branch restriction speculation resources on privilege change - Add support for a new x86 instruction - LKGS - Load kernel GS which is part of the FRED infrastructure - Reset SPEC_CTRL upon init to accomodate use cases like kexec which rediscover - Other smaller fixes and cleanups * tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd: Cache debug register values in percpu variables KVM: x86: Propagate the AMD Automatic IBRS feature to the guest x86/cpu: Support AMD Automatic IBRS x86/cpu, kvm: Add the SMM_CTL MSR not present feature x86/cpu, kvm: Add the Null Selector Clears Base feature x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code x86/cpu, kvm: Add support for CPUID_80000021_EAX x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h> x86/gsseg: Use the LKGS instruction if available for load_gs_index() x86/gsseg: Move load_gs_index() to its own new header file x86/gsseg: Make asm_load_gs_index() take an u16 x86/opcode: Add the LKGS instruction to x86-opcode-map x86/cpufeature: Add the CPU feature bit for LKGS x86/bugs: Reset speculation control settings on init x86/cpu: Remove redundant extern x86_read_arch_cap_msr()
2023-02-21	Merge tag 'thermal-6.3-rc1' of ↵	Linus Torvalds	2	-2/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "The majority of changes here are related to the general switch-over to using arrays of generic trip point structures registered along with a thermal zone instead of trip point callbacks (this has been done mostly by Daniel Lezcano with some help from yours truly on the Intel drivers front). Apart from that and the related reorganization of code, there are some enhancements of the existing driver and a new Mediatek Low Voltage Thermal Sensor (LVTS) driver. The Intel powerclamp undergoes a major rework so it will use the generic idle_inject facility for CPU idle time injection going forward and it will take additional module parameters for specifying the subset of CPUs to be affected by it (work done by Srinivas Pandruvada). Also included are assorted fixes and a whole bunch of cleanups. Specifics: - Rework a large bunch of drivers to use the generic thermal trip structure and use the opportunity to do more cleanups by removing unused functions from the OF code (Daniel Lezcano) - Remove core header inclusion from drivers (Daniel Lezcano) - Fix some locking issues related to the generic thermal trip rework (Johan Hovold) - Fix a crash when requesting the critical temperature on tegra, which is related to the generic trip point work (Jon Hunter) - Clean up thermal device unregistration code (Viresh Kumar) - Fix and clean up thermal control core initialization error code paths (Daniel Lezcano) - Relocate the trip points handling code into a separate file (Daniel Lezcano) - Make the thermal core fail registration of thermal zones and cooling devices if the thermal class has not been registered (Rafael Wysocki) - Add trip point initialization helper functions for ACPI-defined trip points and modify two thermal drivers to use them (Rafael Wysocki, Daniel Lezcano) - Make the core thermal control code use sysfs_emit_at() instead of scnprintf() where applicable (ye xingchen) - Consolidate code accessing the Intel TCC (Thermal Control Circuitry) MSRs by introducing library functions for that and making the TCC-related code in thermal drivers use them (Zhang Rui) - Enhance the x86_pkg_temp_thermal driver to support dynamic tjmax changes (Zhang Rui) - Address an "unsigned expression compared with zero" warning in the intel_soc_dts_iosf thermal driver (Yang Li) - Update comments regarding two functions in the Intel Menlow thermal driver (Deming Wang) - Use sysfs_emit_at() instead of scnprintf() in the int340x thermal driver (ye xingchen) - Make the intel_pch thermal driver support the Wellsburg PCH (Tim Zimmermann) - Modify the intel_pch and processor_thermal_device_pci thermal drivers use generic trip point tables instead of thermal zone trip point callbacks (Daniel Lezcano) - Add production mode attribute sysfs attribute to the int340x thermal driver (Srinivas Pandruvada) - Rework dynamic trip point updates handling and locking in the int340x thermal driver (Rafael Wysocki) - Make the int340x thermal driver use a generic trip points table instead of thermal zone trip point callbacks (Rafael Wysocki, Daniel Lezcano) - Clean up and improve the int340x thermal driver (Rafael Wysocki) - Simplify and clean up the intel_pch thermal driver (Rafael Wysocki) - Fix the Intel powerclamp thermal driver and make it use the common idle injection framework (Srinivas Pandruvada) - Add two module parameters, cpumask and max_idle, to the Intel powerclamp thermal driver to allow it to affect only a specific subset of CPUs instead of all of them (Srinivas Pandruvada) - Make the Intel quark_dts thermal driver Use generic trip point objects instead of its own trip point representation (Daniel Lezcano) - Add toctree entry for thermal documents and fix two issues in the Intel powerclamp driver documentation (Bagas Sanjaya) - Use strscpy() to instead of strncpy() in the thermal core (Xu Panda) - Fix thermal_sampling_exit() (Vincent Guittot) - Add Mediatek Low Voltage Thermal Sensor (LVTS) driver (Balsam Chihi) - Add r8a779g0 RCar support to the rcar_gen3 thermal driver (Geert Uytterhoeven) - Fix useless call to set_trips() when resuming in the rcar_gen3 thermal control driver and add interrupt support detection at init time to it (Niklas Söderlund) - Fix memory corruption in the hi3660 thermal driver (Yongqin Liu) - Fix include path for libnl3 in pkg-config file for libthermal (Vibhav Pant) - Remove syscfg-based driver for st as the platform is not supported any more (Alain Volmat)" * tag 'thermal-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (135 commits) thermal/drivers/st: Remove syscfg based driver thermal: Remove core header inclusion from drivers tools/lib/thermal: Fix include path for libnl3 in pkg-config file. thermal/drivers/hisi: Drop second sensor hi3660 thermal/drivers/rcar_gen3_thermal: Fix device initialization thermal/drivers/rcar_gen3_thermal: Create device local ops struct thermal/drivers/rcar_gen3_thermal: Do not call set_trips() when resuming thermal/drivers/rcar_gen3: Add support for R-Car V4H dt-bindings: thermal: rcar-gen3-thermal: Add r8a779g0 support thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver dt-bindings: thermal: mediatek: Add LVTS thermal controllers thermal/drivers/mediatek: Relocate driver to mediatek folder tools/lib/thermal: Fix thermal_sampling_exit() Documentation: powerclamp: Fix numbered lists formatting Documentation: powerclamp: Escape wildcard in cpumask description Documentation: admin-guide: Add toctree entry for thermal docs thermal: intel: powerclamp: Add two module parameters Documentation: admin-guide: Move intel_powerclamp documentation thermal: core: Use sysfs_emit_at() instead of scnprintf() thermal: intel: powerclamp: Fix duration module parameter ...
2023-02-21	Merge tag 'pm-6.3-rc1' of ↵	Linus Torvalds	3	-9/+9
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add EPP support to the AMD P-state cpufreq driver, add support for new platforms to the Intel RAPL power capping driver, intel_idle and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194, drop the custom cpufreq driver for loongson1 that is not necessary any more (and the corresponding cpufreq platform device), fix assorted issues and clean up code. Specifics: - Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes Karny, Arnd Bergmann, Bagas Sanjaya) - Drop the custom cpufreq driver for loongson1 that is not necessary any more and the corresponding cpufreq platform device (Keguang Zhang) - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig entries (Paul E. McKenney) - Enable thermal cooling for Tegra194 (Yi-Wei Wang) - Register module device table and add missing compatibles for cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss) - Various dt binding updates for qcom-cpufreq-nvmem and opp-v2-kryo-cpu (Christian Marangi) - Make kobj_type structure in the cpufreq core constant (Thomas Weißschuh) - Make cpufreq_unregister_driver() return void (Uwe Kleine-König) - Make the TEO cpuidle governor check CPU utilization in order to refine idle state selection (Kajetan Puchalski) - Make Kconfig select the haltpoll cpuidle governor when the haltpoll cpuidle driver is selected and replace a default_idle() call in that driver with arch_cpu_idle() to allow MWAIT to be used (Li RongQing) - Add Emerald Rapids Xeon support to the intel_idle driver (Artem Bityutskiy) - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to avoid randconfig build failures (Arnd Bergmann) - Make kobj_type structures used in the cpuidle sysfs interface constant (Thomas Weißschuh) - Make the cpuidle driver registration code update microsecond values of idle state parameters in accordance with their nanosecond values if they are provided (Rafael Wysocki) - Make the PSCI cpuidle driver prevent topology CPUs from being suspended on PREEMPT_RT (Krzysztof Kozlowski) - Document that pm_runtime_force_suspend() cannot be used with DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald) - Add EXPORT macros for exporting PM functions from drivers (Richard Fitzgerald) - Remove /** from non-kernel-doc comments in hibernation code (Randy Dunlap) - Fix possible name leak in powercap_register_zone() (Yang Yingliang) - Add Meteor Lake and Emerald Rapids support to the intel_rapl power capping driver (Zhang Rui) - Modify the idle_inject power capping facility to support 100% idle injection (Srinivas Pandruvada) - Fix large time windows handling in the intel_rapl power capping driver (Zhang Rui) - Fix memory leaks with using debugfs_lookup() in the generic PM domains and Energy Model code (Greg Kroah-Hartman) - Add missing 'cache-unified' property in the example for kryo OPP bindings (Rob Herring) - Fix error checking in opp_migrate_dentry() (Qi Zheng) - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad Dybcio) - Modify some power management utilities to use the canonical ftrace path (Ross Zwisler) - Correct spelling problems for Documentation/power/ as reported by codespell (Randy Dunlap)" * tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits) Documentation: amd-pstate: disambiguate user space sections cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables PM: Add EXPORT macros for exporting PM functions cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT MIPS: loongson32: Drop obsolete cpufreq platform device powercap: intel_rapl: Fix handling for large time window cpuidle: driver: Update microsecond values of state parameters as needed cpuidle: sysfs: make kobj_type structures constant cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies PM: EM: fix memory leak with using debugfs_lookup() PM: domains: fix memory leak with using debugfs_lookup() cpufreq: Make kobj_type structure constant cpufreq: davinci: Fix clk use after free cpufreq: amd-pstate: avoid uninitialized variable use cpufreq: Make cpufreq_unregister_driver() return void OPP: fix error checking in opp_migrate_dentry() dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible ...
2023-02-21	Merge tag 'rcu.2023.02.10a' of ↵	Linus Torvalds	6	-17/+16
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably: - Throttling callback invocation based on the number of callbacks that are now ready to invoke instead of on the total number of callbacks - Several patches that suppress false-positive boot-time diagnostics, for example, due to lockdep not yet being initialized - Make expedited RCU CPU stall warnings dump stacks of any tasks that are blocking the stalled grace period. (Normal RCU CPU stall warnings have done this for many years) - Lazy-callback fixes to avoid delays during boot, suspend, and resume. (Note that lazy callbacks must be explicitly enabled, so this should not (yet) affect production use cases) - Make kfree_rcu() and friends take advantage of polled grace periods, thus reducing memory footprint by almost two orders of magnitude, admittedly on a microbenchmark This also begins the transition from kfree_rcu(p) to kfree_rcu_mightsleep(p). This transition was motivated by bugs where kfree_rcu(p), which can block, was typed instead of the intended kfree_rcu(p, rh) - SRCU updates, perhaps most notably fixing a bug that causes SRCU to fail when booted on a system with a non-zero boot CPU. This surprising situation actually happens for kdump kernels on the powerpc architecture This also adds an srcu_down_read() and srcu_up_read(), which act like srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side critical section to be handed off from one task to another - Clean up the now-useless SRCU Kconfig option There are a few more commits that are not yet acked or pulled into maintainer trees, and these will be in a pull request for a later merge window - RCU-tasks updates, perhaps most notably these fixes: - A strange interaction between PID-namespace unshare and the RCU-tasks grace period that results in a low-probability but very real hang - A race between an RCU tasks rude grace period on a single-CPU system and CPU-hotplug addition of the second CPU that can result in a too-short grace period - A race between shrinking RCU tasks down to a single callback list and queuing a new callback to some other CPU, but where that queuing is delayed for more than an RCU grace period. This can result in that callback being stranded on the non-boot CPU - Torture-test updates and fixes - Torture-test scripting updates and fixes - Provide additional RCU CPU stall-warning information in kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute timeout limit for expedited RCU CPU stall warnings * tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() kernel/notifier: Remove CONFIG_SRCU init: Remove "select SRCU" fs/quota: Remove "select SRCU" fs/notify: Remove "select SRCU" fs/btrfs: Remove "select SRCU" fs: Remove CONFIG_SRCU drivers/pci/controller: Remove "select SRCU" drivers/net: Remove "select SRCU" drivers/md: Remove "select SRCU" drivers/hwtracing/stm: Remove "select SRCU" drivers/dax: Remove "select SRCU" drivers/base: Remove CONFIG_SRCU rcu: Disable laziness if lazy-tracking says so rcu: Track laziness during boot and suspend rcu: Remove redundant call to rcu_boost_kthread_setaffinity() rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts rcu: Align the output of RCU CPU stall warning messages rcu: Add RCU stall diagnosis information sched: Add helper nr_context_switches_cpu() ...
2023-02-21	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	3	-4/+22
	Per-next-PR merge. net/smc/af_smc.c b5dd4d698171 ("net/smc: llc_conf_mutex refactor, replace it with rw_semaphore") e40b801b3603 ("net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()") https://lore.kernel.org/all/20230221124008.6303c330@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21	Merge tag 'x86_vdso_for_v6.3_rc1' of ↵	Linus Torvalds	1	-5/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Borislav Petkov: - Add getcpu support for the 32-bit version of the vDSO - Some smaller fixes * tag 'x86_vdso_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Fix -Wmissing-prototypes warnings x86/vdso: Fake 32bit VDSO build on 64bit compile for vgetcpu selftests: Emit a warning if getcpu() is missing on 32bit x86/vdso: Provide getcpu for x86-32. x86/cpu: Provide the full setup for getcpu() on x86-32 x86/vdso: Move VDSO image init to vdso2c generated code
2023-02-21	Merge tag 'x86_alternatives_for_v6.3_rc1' of ↵	Linus Torvalds	1	-3/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm alternatives updates from Borislav Petkov: - Teach the static_call patching infrastructure to handle conditional tall calls properly which can be static calls too - Add proper struct alt_instr.flags which controls different aspects of insn patching behavior * tag 'x86_alternatives_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/static_call: Add support for Jcc tail-calls x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions x86/alternatives: Introduce int3_emulate_jcc() x86/alternatives: Add alt_instr.flags
2023-02-21	Merge tag 'v6.2' into iommufd.git for-next	Jason Gunthorpe	67	-340/+934
	Resolve conflicts from the signature change in iommu_map: - drivers/infiniband/hw/usnic/usnic_uiom.c Switch iommu_map_atomic() to iommu_map(.., GFP_ATOMIC) - drivers/vfio/vfio_iommu_type1.c Following indenting change for GFP_KERNEL Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-21	sefltests: netdevsim: wait for devlink instance after netns removal	Jiri Pirko	1	-0/+18
	When devlink instance is put into network namespace and that network namespace gets deleted, devlink instance is moved back into init_ns. This is done as a part of cleanup_net() routine. Since cleanup_net() is called asynchronously from workqueue, there is no guarantee that the devlink instance move is done after "ip netns del" returns. So fix this race by making sure that the devlink instance is present before any other operation. Reported-by: Amir Tzin <amirtz@nvidia.com> Fixes: b74c37fd35a2 ("selftests: netdevsim: add tests for devlink reload with resources") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230220132336.198597-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-21	selftest: fib_tests: Always cleanup before exit	Roxana Nicolescu	1	-0/+2
	Usage of `set -e` before executing a command causes immediate exit on failure, without cleanup up the resources allocated at setup. This can affect the next tests that use the same resources, leading to a chain of failures. A simple fix is to always call cleanup function when the script exists. This approach is already used by other existing tests. Fixes: 1056691b2680 ("selftests: fib_tests: Make test results more verbose") Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com> Link: https://lore.kernel.org/r/20230220110400.26737-2-roxana.nicolescu@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-21	Merge tag 'sched-core-2023-02-20' of ↵	Linus Torvalds	32	-4665/+4484
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Improve the scalability of the CFS bandwidth unthrottling logic with large number of CPUs. - Fix & rework various cpuidle routines, simplify interaction with the generic scheduler code. Add __cpuidle methods as noinstr to objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks. - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query previously issued registrations. - Limit scheduler slice duration to the sysctl_sched_latency period, to improve scheduling granularity with a large number of SCHED_IDLE tasks. - Debuggability enhancement on sys_exit(): warn about disabled IRQs, but also enable them to prevent a cascade of followup problems and repeat warnings. - Fix the rescheduling logic in prio_changed_dl(). - Micro-optimize cpufreq and sched-util methods. - Micro-optimize ttwu_runnable() - Micro-optimize the idle-scanning in update_numa_stats(), select_idle_capacity() and steal_cookie_task(). - Update the RSEQ code & self-tests - Constify various scheduler methods - Remove unused methods - Refine __init tags - Documentation updates - Misc other cleanups, fixes * tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits) sched/rt: pick_next_rt_entity(): check list_entry sched/deadline: Add more reschedule cases to prio_changed_dl() sched/fair: sanitize vruntime of entity being placed sched/fair: Remove capacity inversion detection sched/fair: unlink misfit task from cpu overutilized objtool: mem() are not uaccess safe cpuidle: Fix poll_idle() noinstr annotation sched/clock: Make local_clock() noinstr sched/clock/x86: Mark sched_clock() noinstr x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read() x86/atomics: Always inline arch_atomic64() cpuidle: tracing, preempt: Squash _rcuidle tracing cpuidle: tracing: Warn about !rcu_is_watching() cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG cpuidle: drivers: firmware: psci: Dont instrument suspend code KVM: selftests: Fix build of rseq test exit: Detect and fix irq disabled state in oops cpuidle, arm64: Fix the ARM64 cpuidle logic cpuidle: mvebu: Fix duplicate flags assignment sched/fair: Limit sched slice duration ...
2023-02-21	Merge tag 'for-netdev' of ↵	Jakub Kicinski	118	-631/+2334
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-02-17 We've added 64 non-merge commits during the last 7 day(s) which contain a total of 158 files changed, 4190 insertions(+), 988 deletions(-). The main changes are: 1) Add a rbtree data structure following the "next-gen data structure" precedent set by recently-added linked-list, that is, by using kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky. 2) Add a new benchmark for hashmap lookups to BPF selftests, from Anton Protopopov. 3) Fix bpf_fib_lookup to only return valid neighbors and add an option to skip the neigh table lookup, from Martin KaFai Lau. 4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accouting for container environments, from Yafang Shao. 5) Batch of ice multi-buffer and driver performance fixes, from Alexander Lobakin. 6) Fix a bug in determining whether global subprog's argument is PTR_TO_CTX, which is based on type names which breaks kprobe progs, from Andrii Nakryiko. 7) Prep work for future -mcpu=v4 LLVM option which includes usage of BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier, from Eduard Zingerman. 8) More prep work for later building selftests with Memory Sanitizer in order to detect usages of undefined memory, from Ilya Leoshkevich. 9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer dereference via sendmsg(), from Maciej Fijalkowski. 10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui. 11) Fix BPF memory allocator in combination with BPF hashtab where it could corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao. 12) Fix LoongArch BPF JIT to always use 4 instructions for function address so that instruction sequences don't change between passes, from Hengqi Chen. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits) selftests/bpf: Add bpf_fib_lookup test bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup riscv, bpf: Add bpf trampoline support for RV64 riscv, bpf: Add bpf_arch_text_poke support for RV64 riscv, bpf: Factor out emit_call for kernel and bpf context riscv: Extend patch_text for multiple instructions Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" selftests/bpf: Add global subprog context passing tests selftests/bpf: Convert test_global_funcs test to test_loader framework bpf: Fix global subprog context argument resolution logic LoongArch, bpf: Use 4 instructions for function address in JIT bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state bpf: Disable bh in bpf_test_run for xdp and tc prog xsk: check IFF_UP earlier in Tx path Fix typos in selftest/bpf files selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd() libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd() ... ==================== Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-21	tools/virtio: enable to build with retpoline	Shunsuke Mie	1	-1/+1
	Add build options to bring it close to a linux kernel. It allows for testing that is close to reality. Signed-off-by: Shunsuke Mie <mie@igel.co.jp> Message-Id: <20230202104538.2041879-1-mie@igel.co.jp> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21	tracing/probe: add a char type to show the character value of traced arguments	Donglin Peng	1	-0/+47
	There are scenes that we want to show the character value of traced arguments other than a decimal or hexadecimal or string value for debug convinience. I add a new type named 'char' to do it and a new test case file named 'kprobe_args_char.tc' to do selftest for char type. For example: The to be traced function is 'void demo_func(char type, char *name);', we can add a kprobe event as follows to show argument values as we want: echo 'p:myprobe demo_func $arg1:char +0($arg2):char[5]' > kprobe_events we will get the following trace log: ... myprobe: (demo_func+0x0/0x29) arg1='A' arg2={'b','p','f','1',''} Link: https://lore.kernel.org/all/20221219110613.367098-1-dolinux.peng@gmail.com/ Signed-off-by: Donglin Peng <dolinux.peng@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21	selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols	Masami Hiramatsu (Google)	1	-1/+1
	Fix kprobe probepoint testcase to ignore __pfx_* prefix symbols. Those are introduced by commit b341b20d648b ("x86: Add prefix symbols for function padding") for identifying PADDING_BYTES of NOPs. Since kprobe events can not probe these prefix symbols, this testcase has to skip those symbols. Link: https://lore.kernel.org/all/167309835609.640500.9664678940260305746.stgit@devnote3/ Fixes: b341b20d648b ("x86: Add prefix symbols for function padding") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Shuah Khan <skhan@linuxfoundation.org>
2023-02-21	selftests/ftrace: Fix eprobe syntax test case to check filter support	Masami Hiramatsu (Google)	1	-1/+3
	Fix eprobe syntax test case to check whether the kernel supports the filter on eprobe for filter syntax test command. Without this fix, this test case will fail if the kernel supports eprobe but doesn't support the filter on eprobe. Link: https://lore.kernel.org/all/167309834742.640500.379128668288448035.stgit@devnote3/ Fixes: 9e14bae7d049 ("selftests/ftrace: Add eprobe syntax error testcase") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Shuah Khan <skhan@linuxfoundation.org>
2023-02-20	objtool: add UACCESS exceptions for __tsan_volatile_read/write	Arnd Bergmann	1	-0/+2
	A lot of the tsan helpers are already excempt from the UACCESS warnings, but some more functions were added that need the same thing: kernel/kcsan/core.o: warning: objtool: __tsan_volatile_read16+0x0: call to __tsan_unaligned_read16() with UACCESS enabled kernel/kcsan/core.o: warning: objtool: __tsan_volatile_write16+0x0: call to __tsan_unaligned_write16() with UACCESS enabled vmlinux.o: warning: objtool: __tsan_unaligned_volatile_read16+0x4: call to __tsan_unaligned_read16() with UACCESS enabled vmlinux.o: warning: objtool: __tsan_unaligned_volatile_write16+0x4: call to __tsan_unaligned_write16() with UACCESS enabled As Marco points out, these functions don't even call each other explicitly but instead gcc (but not clang) notices the functions being identical and turns one symbol into a direct branch to the other. Link: https://lkml.kernel.org/r/20230215130058.3836177-4-arnd@kernel.org Fixes: 75d75b7a4d54 ("kcsan: Support distinguishing volatile accesses") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-20	Merge tag 'fs.idmapped.v6.3' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for non-vfs developers without detailed knowledge in this area this was a potential source for bugs. This finishes the conversion. Instead of passing the plain namespace around this updates all places that currently take a pointer to a mnt_userns with a pointer to struct mnt_idmap. Now that the conversion is done all helpers down to the really low-level helpers only accept a struct mnt_idmap argument instead of two namespace arguments. Conflating mount and other idmappings will now cause the compiler to complain loudly thus eliminating the possibility of any bugs. This makes it impossible for filesystem developers to mix up mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. Everything associated with struct mnt_idmap is moved into a single separate file. With that change no code can poke around in struct mnt_idmap. It can only be interacted with through dedicated helpers. That means all filesystems are and all of the vfs is completely oblivious to the actual implementation of idmappings. We are now also able to extend struct mnt_idmap as we see fit. For example, we can decouple it completely from namespaces for users that don't require or don't want to use them at all. We can also extend the concept of idmappings so we can cover filesystem specific requirements. In combination with the vfs{g,u}id_t work we finished in v6.2 this makes this feature substantially more robust and thus difficult to implement wrong by a given filesystem and also protects the vfs. - Enable idmapped mounts for tmpfs and fulfill a longstanding request. A long-standing request from users had been to make it possible to create idmapped mounts for tmpfs. For example, to share the host's tmpfs mount between multiple sandboxes. This is a prerequisite for some advanced Kubernetes cases. Systemd also has a range of use-cases to increase service isolation. And there are more users of this. However, with all of the other work going on this was way down on the priority list but luckily someone other than ourselves picked this up. As usual the patch is tiny as all the infrastructure work had been done multiple kernel releases ago. In addition to all the tests that we already have I requested that Rodrigo add a dedicated tmpfs testsuite for idmapped mounts to xfstests. It is to be included into xfstests during the v6.3 development cycle. This should add a slew of additional tests. * tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits) shmem: support idmapped mounts for tmpfs fs: move mnt_idmap fs: port vfs{g,u}id helpers to mnt_idmap fs: port fs{g,u}id helpers to mnt_idmap fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap fs: port i_{g,u}id_{needs_}update() to mnt_idmap quota: port to mnt_idmap fs: port privilege checking helpers to mnt_idmap fs: port inode_owner_or_capable() to mnt_idmap fs: port inode_init_owner() to mnt_idmap fs: port acl to mnt_idmap fs: port xattr to mnt_idmap fs: port ->permission() to pass mnt_idmap fs: port ->fileattr_set() to pass mnt_idmap fs: port ->set_acl() to pass mnt_idmap fs: port ->get_acl() to pass mnt_idmap fs: port ->tmpfile() to pass mnt_idmap fs: port ->rename() to pass mnt_idmap fs: port ->mknod() to pass mnt_idmap fs: port ->mkdir() to pass mnt_idmap ...
2023-02-20	ktest: Restore stty setting at first in dodie	Masami Hiramatsu (Google)	1	-5/+5
	The do_send_email() will call die before restoring stty if sendmail setting is not correct or sendmail is not installed. It is safer to restore it in the beginning of dodie(). Link: https://lkml.kernel.org/r/167420617635.2988775.13045295332829029437.stgit@devnote3 Cc: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2023-02-20	ktest.pl: Add RUN_TIMEOUT option with default unlimited	Steven Rostedt	2	-4/+21
	There is a disconnect between the run_command function and the wait_for_input. The wait_for_input has a default timeout of 2 minutes. But if that happens, the run_command loop will exit out to the waitpid() of the executing command. This fails in that it no longer monitors the command, and also, the ssh to the test box can hang when its finished, as it's waiting for the pipe it's writing to to flush, but the loop that reads that pipe has already exited, leaving the command stuck, and the test hangs. Instead, make the default "wait_for_input" of the run_command infinite, and allow the user to override it if they want with a default timeout option "RUN_TIMEOUT". But this fixes the hang that happens when the pipe is full and the ssh session never exits. Cc: stable@vger.kernel.org Fixes: 6e98d1b4415fe ("ktest: Add timeout to ssh command") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2023-02-20	ktest.pl: Give back console on Ctrt^C on monitor	Steven Rostedt	1	-0/+3
	When monitoring the console output, the stdout is being redirected to do so. If Ctrl^C is hit during this mode, the stdout is not back to the console, the user does not see anything they type (no echo). Add "end_monitor" to the SIGINT interrupt handler to give back the console on Ctrl^C. Cc: stable@vger.kernel.org Fixes: 9f2cdcbbb90e7 ("ktest: Give console process a dedicated tty") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2023-02-20	ktest.pl: Fix missing "end_monitor" when machine check fails	Steven Rostedt	1	-1/+2
	In the "reboot" command, it does a check of the machine to see if it is still alive with a simple "ssh echo" command. If it fails, it will assume that a normal "ssh reboot" is not possible and force a power cycle. In this case, the "start_monitor" is executed, but the "end_monitor" is not, and this causes the screen will not be given back to the console. That is, after the test, a "reset" command needs to be performed, as "echo" is turned off. Cc: stable@vger.kernel.org Fixes: 6474ace999edd ("ktest.pl: Powercycle the box on reboot if no connection can be made") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2023-02-20	self-tests: more rps self tests	Paolo Abeni	1	-12/+29
	Explicitly check for child netns and main ns independency Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20	Merge tag 'kvmarm-6.3' of ↵	Paolo Bonzini	19	-10/+1110
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.3 - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place. - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company). - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests. - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM. - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested. - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems. - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor. - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host. - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer This also drags in arm64's 'for-next/sme2' branch, because both it and the PSCI relay changes touch the EL2 initialization code.
2023-02-20	selftests/net: Interpret UDP_GRO cmsg data as an int value	Jakub Sitnicki	1	-4/+2
	Data passed to user-space with a (SOL_UDP, UDP_GRO) cmsg carries an int (see udp_cmsg_recv), not a u16 value, as strace confirms: recvmsg(8, {msg_name=..., msg_iov=[{iov_base="\0\0..."..., iov_len=96000}], msg_iovlen=1, msg_control=[{cmsg_len=20, <-- sizeof(cmsghdr) + 4 cmsg_level=SOL_UDP, cmsg_type=0x68}], <-- UDP_GRO msg_controllen=24, msg_flags=0}, 0) = 11200 Interpreting the data as an u16 value won't work on big-endian platforms. Since it is too late to back out of this API decision [1], fix the test. [1]: https://lore.kernel.org/netdev/20230131174601.203127-1-jakub@cloudflare.com/ Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-18	tracing: Always use canonical ftrace path	Ross Zwisler	2	-3/+3
	The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing Many comments and Kconfig help messages in the tracing code still refer to this older debugfs path, so let's update them to avoid confusion. Link: https://lore.kernel.org/linux-trace-kernel/20230215223350.2658616-2-zwisler@google.com Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-18	selftests/bpf: Add bpf_fib_lookup test	Martin KaFai Lau	2	-0/+209
	This patch tests the bpf_fib_lookup helper when looking up a neigh in NUD_FAILED and NUD_STALE state. It also adds test for the new BPF_FIB_LOOKUP_SKIP_NEIGH flag. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-2-martin.lau@linux.dev
2023-02-18	bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup	Martin KaFai Lau	1	-0/+6
	The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added. In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected. This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev
2023-02-17	Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"	Martin KaFai Lau	1	-4/+3
	This reverts commit 6c20822fada1b8adb77fa450d03a0d449686a4a9. build bot failed on arch with different cache line size: https://lore.kernel.org/bpf/50c35055-afa9-d01e-9a05-ea5351280e4f@intel.com/ Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-17	perf tests stat_all_metrics: Change true workload to sleep workload for ↵	Kajol Jain	1	-1/+1
	system wide check Testcase stat_all_metrics.sh fails in powerpc: 98: perf all metrics test : FAILED! Logs with verbose: [command]# ./perf test 98 -vv 98: perf all metrics test : --- start --- test child forked, pid 13262 Testing BRU_STALL_CPI Testing COMPLETION_STALL_CPI ---- Testing TOTAL_LOCAL_NODE_PUMPS_P23 Metric 'TOTAL_LOCAL_NODE_PUMPS_P23' not printed in: Error: Invalid event (hv_24x7/PM_PB_LNS_PUMP23,chip=3/) in per-thread mode, enable system wide with '-a'. Testing TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01 Metric 'TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01' not printed in: Error: Invalid event (hv_24x7/PM_PB_RTY_LNS_PUMP01,chip=3/) in per-thread mode, enable system wide with '-a'. ---- Based on above logs, we could see some of the hv-24x7 metric events fails, and logs suggest to run the metric event with -a option. This change happened after the commit a4b8cfcabb1d90ec ("perf stat: Delay metric parsing"), which delayed the metric parsing phase and now before metric parsing phase perf tool identifies, whether target is system-wide or not. With this change, perf_event_open will fails with workload monitoring for uncore events as expected. The perf all metric test case fails as some of the hv-24x7 metric events may need bigger workload with system wide monitoring to get the data. Fix this issue by changing current system wide check from true workload to sleep 0.01 workload. Result with the patch changes in powerpc: 98: perf all metrics test : Ok Fixes: a4b8cfcabb1d90ec ("perf stat: Delay metric parsing") Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230215093827.124921-1-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-17	selftests/bpf: Add global subprog context passing tests	Andrii Nakryiko	2	-0/+106
	Add tests validating that it's possible to pass context arguments into global subprogs for various types of programs, including a particularly tricky KPROBE programs (which cover kprobes, uprobes, USDTs, a vast and important class of programs). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230216045954.3002473-4-andrii@kernel.org
2023-02-17	selftests/bpf: Convert test_global_funcs test to test_loader framework	Andrii Nakryiko	18	-123/+174
	Convert 17 test_global_funcs subtests into test_loader framework for easier maintenance and more declarative way to define expected failures/successes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230216045954.3002473-3-andrii@kernel.org
2023-02-17	perf vendor events power10: Add JSON metric events to present CPI stall ↵	Athira Rajeev	2	-5/+5
	cycles in powerpc Power10 Performance Monitoring Unit (PMU) provides events to understand stall cycles of different pipeline stages. These events along with completed instructions provides useful metrics for application tuning. Patch implements the JSON changes to collect counter statistics to present the high level CPI stall breakdown metrics. New metric group is named as "CPI_STALL_RATIO" and this new metric group presents these stall metrics: - DISPATCHED_CPI ( Dispatch stall cycles per insn ) - ISSUE_STALL_CPI ( Issue stall cycles per insn ) - EXECUTION_STALL_CPI ( Execution stall cycles per insn ) - COMPLETION_STALL_CPI ( Completition stall cycles per insn ) To avoid multipling of events, PM_RUN_INST_CMPL event has been modified to use PMC5(performance monitoring counter5) instead of PMC4. This change is needed, since completion stall event is using PMC4. Usage example: ./perf stat --metric-no-group -M CPI_STALL_RATIO <workload> Performance counter stats for 'workload': 63,056,817,982 PM_CMPL_STALL # 0.28 COMPLETION_STALL_CPI 1,743,988,038,896 PM_ISSUE_STALL # 7.73 ISSUE_STALL_CPI 225,597,495,030 PM_RUN_INST_CMPL # 6.18 DISPATCHED_CPI # 37.48 EXECUTION_STALL_CPI 1,393,916,546,654 PM_DISP_STALL_CYC 8,455,376,836,463 PM_EXEC_STALL "--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled in all group for more accuracy. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230216061240.18067-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-17	perf intel-pt: Synthesize cycle events	Steinar H. Gunderson	5	-21/+101
	There is no good reason why we cannot synthesize "cycle" events from Intel PT just as we can synthesize "instruction" events, in particular when CYC packets are available. This enables using PT to getting much more accurate cycle profiles than regular sampling (record -e cycles) when the work last for very short periods (<10 ms). Thus, add support for this, based off of the existing IPC calculation framework. The new option to --itrace is "y" (for cYcles), as c was taken for calls. Cycle and instruction events can be synthesized together, and are by default. The only real caveat is that CYC packets are only emitted whenever some other packet is, which in practice is when a branch instruction is encountered (and not even all branches). Thus, even at no subsampling (e.g. --itrace=y0ns), it is impossible to get more accuracy than a single basic block, and all cycles spent executing that block will get attributed to the branch instruction that ends the packet. Thus, one cannot know whether the cycles came from e.g. a specific load, a mispredicted branch, or something else. When subsampling (which is the default), the cycle events will get smeared out even more, but will still be generally useful to attribute cycle counts to functions. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Steinar H. Gunderson <sesse@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220322082452.1429091-1-sesse@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-17	Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net	David S. Miller	3	-6/+177
	Some of the devlink bits were tricky, but I think I got it right. Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-17	Fix typos in selftest/bpf files	Taichi Nishimura	10	-13/+13
	Run spell checker on files in selftest/bpf and fixed typos. Signed-off-by: Taichi Nishimura <awkrail01@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/bpf/20230216085537.519062-1-awkrail01@gmail.com
2023-02-17	selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()	Ilya Leoshkevich	34	-101/+109
	Use the new type-safe wrappers around bpf_obj_get_info_by_fd(). Fix a prog/map mixup in prog_holds_map(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230214231221.249277-6-iii@linux.ibm.com
2023-02-17	bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()	Ilya Leoshkevich	9	-36/+41
	Use the new type-safe wrappers around bpf_obj_get_info_by_fd(). Split the bpf_obj_get_info_by_fd() call in build_btf_type_table() in two, since knowing the type helps with the Memory Sanitizer. Improve map_parse_fd_and_info() type safety by using struct bpf_map_info * instead of void * for info. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20230214231221.249277-4-iii@linux.ibm.com
2023-02-17	libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()	Ilya Leoshkevich	4	-14/+14
	Use the new type-safe wrappers around bpf_obj_get_info_by_fd(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230214231221.249277-3-iii@linux.ibm.com
2023-02-17	libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()	Ilya Leoshkevich	3	-0/+34
	These are type-safe wrappers around bpf_obj_get_info_by_fd(). They found one problem in selftests, and are also useful for adding Memory Sanitizer annotations. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230214231221.249277-2-iii@linux.ibm.com
2023-02-16	Merge tag 'net-6.2-final' of ↵	Linus Torvalds	2	-2/+177
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Fixes from the main networking tree only, probably because all sub-trees have backed off and haven't submitted their changes. None of the fixes here are particularly scary and no outstanding regressions. In an ideal world the "current release" sections would be empty at this stage but that never happens. Current release - regressions: - fix unwanted sign extension in netdev_stats_to_stats64() Current release - new code bugs: - initialize net->notrefcnt_tracker earlier - devlink: fix netdev notifier chain corruption - nfp: make sure mbox accesses in IPsec code are atomic - ice: fix check for weight and priority of a scheduling node Previous releases - regressions: - ice: xsk: fix cleaning of XDP_TX frame, prevent inf loop - igb: fix I2C bit banging config with external thermal sensor Previous releases - always broken: - sched: tcindex: update imperfect hash filters respecting rcu - mpls: fix stale pointer if allocation fails during device rename - dccp/tcp: avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions - remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues() - af_key: fix heap information leak - ipv6: fix socket connection with DSCP (correct interpretation of the tclass field vs fib rule matching) - tipc: fix kernel warning when sending SYN message - vmxnet3: read RSS information from the correct descriptor (eop)" * tag 'net-6.2-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits) devlink: Fix netdev notifier chain corruption igb: conditionalize I2C bit banging on external thermal sensor support net: mpls: fix stale pointer if allocation fails during device rename net/sched: tcindex: search key must be 16 bits tipc: fix kernel warning when sending SYN message igb: Fix PPS input and output using 3rd and 4th SDP net: use a bounce buffer for copying skb->mark ixgbe: add double of VLAN header when computing the max MTU i40e: add double of VLAN header when computing the max MTU ixgbe: allow to increase MTU to 3K with XDP enabled net: stmmac: Restrict warning on disabling DMA store and fwd mode net/sched: act_ctinfo: use percpu stats net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence ice: fix lost multicast packets in promisc mode ice: Fix check for weight and priority of a scheduling node bnxt_en: Fix mqprio and XDP ring checking logic net: Fix unwanted sign extension in netdev_stats_to_stats64() net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path net: openvswitch: fix possible memory leak in ovs_meter_cmd_set() af_key: Fix heap information leak ...
2023-02-16	Merge branch 'topic/apple-gmux' into for-next	Takashi Iwai	97	-396/+1249
	Pull vga_switcheroo fix for Macs Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-02-16	perf c2c: Add report option to show false sharing in adjacent cachelines	Feng Tang	5	-17/+49
	Many platforms have feature of adjacent cachelines prefetch, when it is enabled, for data in RAM of 2 cachelines (2N and 2N+1) granularity, if one is fetched to cache, the other one could likely be fetched too, which sort of extends the cacheline size to double, thus the false sharing could happens in adjacent cachelines. 0Day has captured performance changed related with this [1], and some commercial software explicitly makes its hot global variables 128 bytes aligned (2 cache lines) to avoid this kind of extended false sharing. So add an option "--double-cl" for 'perf c2c report' to show false sharing in double cache line granularity, which acts just like the cacheline size is doubled. There is no change to c2c record. The hardware events of shared cacheline are still per cacheline, and this option just changes the granularity of how events are grouped and displayed. In the 'perf c2c report' output below (will-it-scale's 'pagefault2' case on old kernel): ---------------------------------------------------------------------- 26 31 2 0 0 0 0xffff888103ec6000 ---------------------------------------------------------------------- 35.48% 50.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff8133148b 1153 66 971 3748 74 [k] get_mem_cgroup_from_mm 6.45% 0.00% 0.00% 0.00% 0.00% 0x10 0 1 0xffffffff813396e4 570 0 1531 879 75 [k] mem_cgroup_charge 25.81% 50.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81331472 949 70 593 3359 74 [k] get_mem_cgroup_from_mm 19.35% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81339686 1352 0 1073 1022 74 [k] mem_cgroup_charge 9.68% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff813396d6 1401 0 863 768 74 [k] mem_cgroup_charge 3.23% 0.00% 0.00% 0.00% 0.00% 0x54 0 1 0xffffffff81333106 618 0 804 11 9 [k] uncharge_batch The offset 0x10 and 0x54 used to displayed in 2 groups, and now they are listed together to give users a hint of extended false sharing. [1]. https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/ Committer notes: Link: https://lore.kernel.org/r/Y+wvVNWqXb70l4uy@feng-clx Removed -a, leaving just as --double-cl, as this probably is not used so frequently and perhaps will be even auto-detected if we manage to record the MSR where this is configured. Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Feng Tang <feng.tang@intel.com> Tested-by: Leo Yan <leo.yan@linaro.org> Acked-by: Joe Mario <jmario@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230214075823.246414-1-feng.tang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-16	selftests: seg6: add selftest for PSP flavor in SRv6 End behavior	Andrea Mayer	2	-0/+870
	This selftest is designed for testing the PSP flavor in SRv6 End behavior. It instantiates a virtual network composed of several nodes: hosts and SRv6 routers. Each node is realized using a network namespace that is properly interconnected to others through veth pairs. The test makes use of the SRv6 End behavior and of the PSP flavor needed for removing the SRH from the IPv6 header at the penultimate node. The correct execution of the behavior is verified through reachability tests carried out between hosts. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16	net/sched: Retire rsvp classifier	Jamal Hadi Salim	1	-203/+0
	The rsvp classifier has served us well for about a quarter of a century but has has not been getting much maintenance attention due to lack of known users. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16	net/sched: Retire tcindex classifier	Jamal Hadi Salim	1	-227/+0
	The tcindex classifier has served us well for about a quarter of a century but has not been getting much TLC due to lack of known users. Most recently it has become easy prey to syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16	net/sched: Retire dsmark qdisc	Jamal Hadi Salim	1	-140/+0
	The dsmark qdisc has served us well over the years for diffserv but has not been getting much attention due to other more popular approaches to do diffserv services. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16	net/sched: Retire ATM qdisc	Jamal Hadi Salim	1	-94/+0
	The ATM qdisc has served us well over the years but has not been getting much TLC due to lack of known users. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>