Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED
- Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
which was causing issues with per-CPU task scheduling and reenqueuing
behavior
* tag 'sched_ext-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext, sched/core: Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED
Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This contains two cgroup changes. Both are pretty low risk.
- Fix deadlock in cgroup destruction when repeatedly
mounting/unmounting perf_event and net_prio controllers.
The issue occurs because cgroup_destroy_wq has max_active=1, causing
root destruction to wait for CSS offline operations that are queued
behind it.
The fix splits cgroup_destroy_wq into three separate workqueues to
eliminate the blocking.
- Set of->priv to NULL upon file release to make potential bugs to
manifest as NULL pointer dereferences rather than use-after-free
errors"
* tag 'cgroup-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/psi: Set of->priv to NULL upon file release
cgroup: split cgroup_destroy_wq into 3 workqueues
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- KVM mm fixes
- Postcopy fix
|
|
KVM x86 fix for 6.17-rcN
Sync the vTPR from the local APIC to the VMCB even when AVIC is active, to fix
a bug where host updates to the vTPR, e.g. via KVM_SET_LAPIC or emulation of a
guest access, effectively get lost and result in interrupt delivery issues in
the guest.
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #3
- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
visiting from an MMU notifier
- Fixes to the TLB match process and TLB invalidation range for
managing the VCNR pseudo-TLB
- Prevent SPE from erroneously profiling guests due to UNKNOWN reset
values in PMSCR_EL1
- Fix save/restore of host MDCR_EL2 to account for eagerly programming
at vcpu_load() on VHE systems
- Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
where an xarray's spinlock was nested with a *raw* spinlock
- Permit stage-2 read permission aborts which are possible in the case
of NV depending on the guest hypervisor's stage-2 translation
- Call raw_spin_unlock() instead of the internal spinlock API
- Fix parameter ordering when assigning VBAR_EL1
|
|
Since the PXP start comes after __xe_exec_queue_init() has completed,
we need to cleanup what was done in that function in case of a PXP
start error.
__xe_exec_queue_init calls the submission backend init() function,
so we need to introduce an opposite for that. Unfortunately, while
we already have a fini() function pointer, it performs other
operations in addition to cleaning up what was done by the init().
Therefore, for clarity, the existing fini() has been renamed to
destroy(), while a new fini() has been added to only clean up what was
done by the init(), with the latter being called by the former (via
xe_exec_queue_fini).
Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com
(cherry picked from commit 626667321deb4c7a294725406faa3dd71c3d445d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
On partial failure, some sysfs files created before the failure might
not be removed. Add common cleanup step to remove them all immediately,
as is should be harmless to attempt to remove non-existing files.
Fixes: 0e414bf7ad01 ("drm/xe: Expose PCIe link downgrade attributes")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Zongyao Bai <zongyao.bai@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250915214716.1327379-2-zongyao.bai@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 1a869168d91f1a1a2b0db22cea0295c67908e5d8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Check in snd_intel_dsp_check_soundwire() that the pointer returned by
ACPI_HANDLE() is not NULL, before passing it on to other functions.
The original code assumed a non-NULL return, but if it was unexpectedly
NULL it would end up passed to acpi_walk_namespace() as the start
point, and would result in
[ 3.219028] BUG: kernel NULL pointer dereference, address:
0000000000000018
[ 3.219029] #PF: supervisor read access in kernel mode
[ 3.219030] #PF: error_code(0x0000) - not-present page
[ 3.219031] PGD 0 P4D 0
[ 3.219032] Oops: Oops: 0000 [#1] SMP NOPTI
[ 3.219035] CPU: 2 UID: 0 PID: 476 Comm: (udev-worker) Tainted: G S
AW E 6.17.0-rc5-test #1 PREEMPT(voluntary)
[ 3.219038] Tainted: [S]=CPU_OUT_OF_SPEC, [A]=OVERRIDDEN_ACPI_TABLE,
[W]=WARN, [E]=UNSIGNED_MODULE
[ 3.219040] RIP: 0010:acpi_ns_walk_namespace+0xb5/0x480
This problem was triggered by a bugged DSDT that the kernel couldn't parse.
But it shouldn't be possible to SEGFAULT the kernel just because of some
bugs in ACPI.
Fixes: 0650857570d1 ("ALSA: hda: add autodetection for SoundWire")
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:
- fix integer overflow in dm-stripe
- limit tag size in dm-integrity to 255 bytes
- fix 'alignment inconsistency' warning in dm-raid
* tag 'for-6.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-raid: don't set io_min and io_opt for raid1
dm-integrity: limit MAX_TAG_SIZE to 255
dm-stripe: fix a possible integer overflow
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- in zoned mode, turn assertion to proper code when reserving space in
relocation block group
- fix search key of extended ref (hardlink) when replaying log
- fix initialization of file extent tree on filesystems without
no-holes feature
- add harmless data race annotation to block group comparator
* tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: annotate block group access with data_race() when sorting for reclaim
btrfs: initialize inode::file_extent_tree after i_mode has been set
btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg()
btrfs: fix invalid extref key setup when replaying dentry
|
|
The parameter max_hw_wzeroes_unmap_sectors in queue_limits should be
equal to max_write_zeroes_sectors if it is set to a non-zero value.
However, when the backend bdev is specified, this parameter is
initialized to UINT_MAX during the call to blk_set_stacking_limits(),
while only max_write_zeroes_sectors is adjusted. Therefore, this
discrepancy triggers a value check failure in blk_validate_limits().
Since the drvd driver doesn't yet support unmap write zeroes, so fix
this failure by explicitly setting max_hw_wzeroes_unmap_sectors to
zero.
Fixes: 0c40d7cb5ef3 ("block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
These commands
modprobe brd rd_size=1048576
vgcreate vg /dev/ram*
lvcreate -m4 -L10 -n lv vg
trigger the following warnings:
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
The warnings are caused by the fact that io_min is 512 and physical block
size is 4096.
If there's chunk-less raid, such as raid1, io_min shouldn't be set to zero
because it would be raised to 512 and it would trigger the warning.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.17
Pull MD fixes from Yu Kuai:
"For 6.17 on drivers supporting write zeros, raid{0,1,10,5} are broken
and can't be assembled."
* tag 'md-6.17-20250917' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux:
md: init queue_limits->max_hw_wzeroes_unmap_sectors parameter
|
|
The sanity check previously added to uaudio_transfer_buffer_setup()
assumed the allocated buffer being linear-mapped. But the buffer
allocated via usb_alloc_coherent() isn't always so, rather to be used
with (SG-)DMA API. This leaded to a false-positive warning and the
driver failed to work.
Actually uaudio_transfer_buffer_setup() deals only with the DMA-API
addresses for MEM_XFER_BUF type, while other callers of
uaudio_iommu_map() are with pages with physical addresses for
MEM_EVENT_RING and MEM_XFER_RING types. So this patch splits the
mapping helper function to two different ones, uaudio_iommu_map() for
the DMA pages and uaudio_iommu_map_pa() for the latter, in order to
handle mapping differently for each type. Along with it, the
unnecessary address check that caused probe error is dropped, too.
Fixes: 3335a1bbd624 ("ALSA: qc_audio_offload: try to reduce address space confusion")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reported-and-tested-by: Luca Weiss <luca.weiss@fairphone.com>
Closes: https://lore.kernel.org/DBR2363A95M1.L9XBNC003490@fairphone.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Adjust register settings for SAR adc button detection mode
to fix noise issue in headset.
Signed-off-by: Jack Yu <jack.yu@realtek.com>
Link: https://patch.msgid.link/766cd1d2dd7a403ba65bb4cc44845f71@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Since commit 7d5e9737efda ("net: rfkill: gpio: get the name and type from
device property") rfkill_find_type() gets called with the possibly
uninitialized "const char *type_name;" local variable.
On x86 systems when rfkill-gpio binds to a "BCM4752" or "LNV4752"
acpi_device, the rfkill->type is set based on the ACPI acpi_device_id:
rfkill->type = (unsigned)id->driver_data;
and there is no "type" property so device_property_read_string() will fail
and leave type_name uninitialized, leading to a potential crash.
rfkill_find_type() does accept a NULL pointer, fix the potential crash
by initializing type_name to NULL.
Note likely sofar this has not been caught because:
1. Not many x86 machines actually have a "BCM4752"/"LNV4752" acpi_device
2. The stack happened to contain NULL where type_name is stored
Fixes: 7d5e9737efda ("net: rfkill: gpio: get the name and type from device property")
Cc: stable@vger.kernel.org
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20250913113515.21698-1-hansg@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
Miri Korenblit says:
====================
iwlwifi fix
====================
The fix is for byte count tables in 7000/8000 family devices.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
EXT_GROUP_SCHED
While collecting SCX related fields in struct task_group into struct
scx_task_group, 6e6558a6bc41 ("sched_ext, sched/core: Factor out struct
scx_task_group") forgot update tg->scx_weight usage in tg_weight(), which
leads to build failure when CONFIG_FAIR_GROUP_SCHED is disabled but
CONFIG_EXT_GROUP_SCHED is enabled. Fix it.
Fixes: 6e6558a6bc41 ("sched_ext, sched/core: Factor out struct scx_task_group")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509170230.MwZsJSWa-lkp@intel.com/
Tested-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Tariq Toukan says:
====================
mlx5e misc fixes 2025-09-15
This patchset provides misc bug fixes from the team to the mlx5 Eth
driver.
====================
Link: https://patch.msgid.link/1757939074-617281-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The cited commit adds a miss table for switchdev mode. But it
uses the same level as policy table. Will hit the following error
when running command:
# ip xfrm state add src 192.168.1.22 dst 192.168.1.21 proto \
esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \
0x3a189a7f9374955d3817886c8587f1da3df387ff 128 \
mode tunnel offload dev enp8s0f0 dir in
Error: mlx5_core: Device failed to offload this state.
The dmesg error is:
mlx5_core 0000:03:00.0: ipsec_miss_create:578:(pid 311797): fail to create IPsec miss_rule err=-22
Fix it by adding a new miss level to avoid the error.
Fixes: 7d9e292ecd67 ("net/mlx5e: Move IPSec policy check after decryption")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The function mlx5_uplink_netdev_get() gets the uplink netdevice
pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can
be removed and its pointer cleared when unbound from the mlx5_core.eth
driver. This results in a NULL pointer, causing a kernel panic.
BUG: unable to handle page fault for address: 0000000000001300
at RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core]
Call Trace:
<TASK>
mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core]
esw_offloads_enable+0x593/0x910 [mlx5_core]
mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core]
mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core]
devlink_nl_eswitch_set_doit+0x60/0xd0
genl_family_rcv_msg_doit+0xe0/0x130
genl_rcv_msg+0x183/0x290
netlink_rcv_skb+0x4b/0xf0
genl_rcv+0x24/0x40
netlink_unicast+0x255/0x380
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
__sys_sendto+0x119/0x180
do_syscall_64+0x53/0x1d0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Ensure the pointer is valid before use by checking it for NULL. If it
is valid, immediately call netdev_hold() to take a reference, and
preventing the netdevice from being freed while it is in use.
Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
DPLL maintainers should probably be CCed on driver patches, too.
Remove the *, which makes the pattern only match files directly
under drivers/dpll but not its sub-directories.
Acked-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20250915234255.1306612-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I'm trying to generate Rust bindings for netlink using the yaml spec.
It looks like there's a typo in conntrack spec: attribute set conntrack-attrs
defines attributes "counters-{orig,reply}" (plural), while get operation
references "counter-{orig,reply}" (singular). The latter should be fixed, as it
denotes multiple counters (packet and byte). The corresonding C define is
CTA_COUNTERS_ORIG.
Also, dump request references "nfgen-family" attribute, which neither exists in
conntrack-attrs attrset nor ctattr_type enum. There's member of nfgenmsg struct
with the same name, which is where family value is actually taken from.
> static int ctnetlink_dump_exp_ct(struct net *net, struct sock *ctnl,
> struct sk_buff *skb,
> const struct nlmsghdr *nlh,
> const struct nlattr * const cda[],
> struct netlink_ext_ack *extack)
> {
> int err;
> struct nfgenmsg *nfmsg = nlmsg_data(nlh);
> u_int8_t u3 = nfmsg->nfgen_family;
^^^^^^^^^^^^
Signed-off-by: Remy D. Farley <one-d-wide@protonmail.com>
Fixes: 23fc9311a526 ("netlink: specs: add conntrack dump and stats dump support")
Link: https://patch.msgid.link/20250913140515.1132886-1-one-d-wide@protonmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"A small set of fixes for crashes in different commands and conditions"
* tag 'perf-tools-fixes-for-v6.17-2025-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf maps: Ensure kmap is set up for all inserts
perf lock: Provide a host_env for session new
perf subcmd: avoid crash in exclude_cmds when excludes is empty
|
|
If OD is not enabled then restoring cached clock settings doesn't make
sense and actually leads to errors in resume.
Check if enabled before restoring settings.
Fixes: 4e9526924d09 ("drm/amd: Restore cached manual clock settings during resume")
Reported-by: Jérôme Lécuyer <jerome.4a4c@gmail.com>
Closes: https://lore.kernel.org/amd-gfx/0ffe2692-7bfa-4821-856e-dd0f18e2c32b@amd.com/T/#me6db8ddb192626360c462b7570ed7eba0c6c9733
Suggested-by: Jérôme Lécuyer <jerome.4a4c@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1a4dd33cc6e1baaa81efdbe68227a19f51c50f20)
Cc: stable@vger.kernel.org
|
|
When igc_led_setup() fails, igc_probe() fails and triggers kernel panic
in free_netdev() since unregister_netdev() is not called. [1]
This behavior can be tested using fault-injection framework, especially
the failslab feature. [2]
Since LED support is not mandatory, treat LED setup failures as
non-fatal and continue probe with a warning message, consequently
avoiding the kernel panic.
[1]
kernel BUG at net/core/dev.c:12047!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 937 Comm: repro-igc-led-e Not tainted 6.17.0-rc4-enjuk-tnguy-00865-gc4940196ab02 #64 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:free_netdev+0x278/0x2b0
[...]
Call Trace:
<TASK>
igc_probe+0x370/0x910
local_pci_probe+0x3a/0x80
pci_device_probe+0xd1/0x200
[...]
[2]
#!/bin/bash -ex
FAILSLAB_PATH=/sys/kernel/debug/failslab/
DEVICE=0000:00:05.0
START_ADDR=$(grep " igc_led_setup" /proc/kallsyms \
| awk '{printf("0x%s", $1)}')
END_ADDR=$(printf "0x%x" $((START_ADDR + 0x100)))
echo $START_ADDR > $FAILSLAB_PATH/require-start
echo $END_ADDR > $FAILSLAB_PATH/require-end
echo 1 > $FAILSLAB_PATH/times
echo 100 > $FAILSLAB_PATH/probability
echo N > $FAILSLAB_PATH/ignore-gfp-wait
echo $DEVICE > /sys/bus/pci/drivers/igc/bind
Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
There's another issue with aci.lock and previous patch uncovers it.
aci.lock is being destroyed during removing ixgbe while some of the
ixgbe closing routines are still ongoing. These routines use Admin
Command Interface which require taking aci.lock which has been already
destroyed what leads to call trace.
[ +0.000004] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ +0.000007] WARNING: CPU: 12 PID: 10277 at kernel/locking/mutex.c:155 mutex_lock+0x5f/0x70
[ +0.000002] Call Trace:
[ +0.000003] <TASK>
[ +0.000006] ixgbe_aci_send_cmd+0xc8/0x220 [ixgbe]
[ +0.000049] ? try_to_wake_up+0x29d/0x5d0
[ +0.000009] ixgbe_disable_rx_e610+0xc4/0x110 [ixgbe]
[ +0.000032] ixgbe_disable_rx+0x3d/0x200 [ixgbe]
[ +0.000027] ixgbe_down+0x102/0x3b0 [ixgbe]
[ +0.000031] ixgbe_close_suspend+0x28/0x90 [ixgbe]
[ +0.000028] ixgbe_close+0xfb/0x100 [ixgbe]
[ +0.000025] __dev_close_many+0xae/0x220
[ +0.000005] dev_close_many+0xc2/0x1a0
[ +0.000004] ? kernfs_should_drain_open_files+0x2a/0x40
[ +0.000005] unregister_netdevice_many_notify+0x204/0xb00
[ +0.000006] ? __kernfs_remove.part.0+0x109/0x210
[ +0.000006] ? kobj_kset_leave+0x4b/0x70
[ +0.000008] unregister_netdevice_queue+0xf6/0x130
[ +0.000006] unregister_netdev+0x1c/0x40
[ +0.000005] ixgbe_remove+0x216/0x290 [ixgbe]
[ +0.000021] pci_device_remove+0x42/0xb0
[ +0.000007] device_release_driver_internal+0x19c/0x200
[ +0.000008] driver_detach+0x48/0x90
[ +0.000003] bus_remove_driver+0x6d/0xf0
[ +0.000006] pci_unregister_driver+0x2e/0xb0
[ +0.000005] ixgbe_exit_module+0x1c/0xc80 [ixgbe]
Same as for the previous commit, the issue has been highlighted by the
commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path").
Move destroying aci.lock to the end of ixgbe_remove(), as this
simply fixes the issue.
Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently aci.lock is initialized too late. A bunch of ACI callbacks
using the lock are called prior it's initialized.
Commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path")
highlights that issue what results in call trace.
[ 4.092899] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 4.092910] WARNING: CPU: 0 PID: 578 at kernel/locking/mutex.c:154 mutex_lock+0x6d/0x80
[ 4.098757] Call Trace:
[ 4.098847] <TASK>
[ 4.098922] ixgbe_aci_send_cmd+0x8c/0x1e0 [ixgbe]
[ 4.099108] ? hrtimer_try_to_cancel+0x18/0x110
[ 4.099277] ixgbe_aci_get_fw_ver+0x52/0xa0 [ixgbe]
[ 4.099460] ixgbe_check_fw_error+0x1fc/0x2f0 [ixgbe]
[ 4.099650] ? usleep_range_state+0x69/0xd0
[ 4.099811] ? usleep_range_state+0x8c/0xd0
[ 4.099964] ixgbe_probe+0x3b0/0x12d0 [ixgbe]
[ 4.100132] local_pci_probe+0x43/0xa0
[ 4.100267] work_for_cpu_fn+0x13/0x20
[ 4.101647] </TASK>
Move aci.lock mutex initialization to ixgbe_sw_init() before any ACI
command is sent. Along with that move also related SWFW semaphore in
order to reduce size of ixgbe_probe() and that way all locks are
initialized in ixgbe_sw_init().
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
i40e has a feature which writes to memory location last descriptor
successfully sent. Memory barrier in i40e_clean_tx_irq() was used to
avoid forward-reading descriptor fields in case DD bit was not set.
Having mentioned feature in place implies that such situation will not
happen as we know in advance how many descriptors HW has dealt with.
Besides, this barrier placement was wrong. Idea is to have this
protection *after* reading DD bit from HW descriptor, not before.
Digging through git history showed me that indeed barrier was before DD
bit check, anyways the commit introducing i40e_get_head() should have
wiped it out altogether.
Also, there was one commit doing s/read_barrier_depends/smp_rmb when get
head feature was already in place, but it was only theoretical based on
ixgbe experiences, which is different in these terms as that driver has
to read DD bit from HW descriptor.
Fixes: 1943d8ba9507 ("i40e/i40evf: enable hardware feature head write back")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each
buffer in the current frame. This function was introduced as part of
handling multi-buffer XDP support in the ice driver.
It works by iterating over the buffers from first_desc up to 1 plus the
total number of fragments in the frame, cached from before the XDP program
was executed.
If the hardware posts a descriptor with a size of 0, the logic used in
ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added
as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a
fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't
call ice_put_rx_buf().
Because we don't call ice_put_rx_buf(), we don't attempt to re-use the
page or free it. This leaves a stale page in the ring, as we don't
increment next_to_alloc.
The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented
properly, and that it always points to a buffer with a NULL page. Since
this function doesn't check, it will happily recycle a page over the top
of the next_to_alloc buffer, losing track of the old page.
Note that this leak only occurs for multi-buffer frames. The
ice_put_rx_mbuf() function always handles at least one buffer, so a
single-buffer frame will always get handled correctly. It is not clear
precisely why the hardware hands us descriptors with a size of 0 sometimes,
but it happens somewhat regularly with "jumbo frames" used by 9K MTU.
To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on
all buffers between first_desc and next_to_clean. Borrow the logic of a
similar function in i40e used for this same purpose. Use the same logic
also in ice_get_pgcnts().
Instead of iterating over just the number of fragments, use a loop which
iterates until the current index reaches to the next_to_clean element just
past the current frame. Unlike i40e, the ice_put_rx_mbuf() function does
call ice_put_rx_buf() on the last buffer of the frame indicating the end of
packet.
For non-linear (multi-buffer) frames, we need to take care when adjusting
the pagecnt_bias. An XDP program might release fragments from the tail of
the frame, in which case that fragment page is already released. Only
update the pagecnt_bias for the first descriptor and fragments still
remaining post-XDP program. Take care to only access the shared info for
fragmented buffers, as this avoids a significant cache miss.
The xdp_xmit value only needs to be updated if an XDP program is run, and
only once per packet. Drop the xdp_xmit pointer argument from
ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function
directly. This avoids needing to pass the argument and avoids an extra
bit-wise OR for each buffer in the frame.
Move the increment of the ntc local variable to ensure its updated *before*
all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic
requires the index of the element just after the current frame.
Now that we use an index pointer in the ring to identify the packet, we no
longer need to track or cache the number of fragments in the rx_ring.
Cc: Christoph Petrausch <christoph.petrausch@deepl.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/
Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Priya Singh <priyax.singh@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
scx_bpf_reenqueue_local() can be called from ops.cpu_release() when a
CPU is taken by a higher scheduling class to give tasks queued to the
CPU's local DSQ a chance to be migrated somewhere else, instead of
waiting indefinitely for that CPU to become available again.
In doing so, we decided to skip migration-disabled tasks, under the
assumption that they cannot be migrated anyway.
However, when a higher scheduling class preempts a CPU, the running task
is always inserted at the head of the local DSQ as a migration-disabled
task. This means it is always skipped by scx_bpf_reenqueue_local(), and
ends up being confined to the same CPU even if that CPU is heavily
contended by other higher scheduling class tasks.
As an example, let's consider the following scenario:
$ schedtool -a 0,1, -e yes > /dev/null
$ sudo schedtool -F -p 99 -a 0, -e \
stress-ng -c 1 --cpu-load 99 --cpu-load-slice 1000
The first task (SCHED_EXT) can run on CPU0 or CPU1. The second task
(SCHED_FIFO) is pinned to CPU0 and consumes ~99% of it. If the SCHED_EXT
task initially runs on CPU0, it will remain there because it always sees
CPU0 as "idle" in the short gaps left by the RT task, resulting in ~1%
utilization while CPU1 stays idle:
0[||||||||||||||||||||||100.0%] 8[ 0.0%]
1[ 0.0%] 9[ 0.0%]
2[ 0.0%] 10[ 0.0%]
3[ 0.0%] 11[ 0.0%]
4[ 0.0%] 12[ 0.0%]
5[ 0.0%] 13[ 0.0%]
6[ 0.0%] 14[ 0.0%]
7[ 0.0%] 15[ 0.0%]
PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command
1067 root RT 0 R 0 99.0 0.2 0:31.16 stress-ng-cpu [run]
975 arighi 20 0 R 0 1.0 0.0 0:26.32 yes
By allowing scx_bpf_reenqueue_local() to re-enqueue migration-disabled
tasks, the scheduler can choose to migrate them to other CPUs (CPU1 in
this case) via ops.enqueue(), leading to better CPU utilization:
0[||||||||||||||||||||||100.0%] 8[ 0.0%]
1[||||||||||||||||||||||100.0%] 9[ 0.0%]
2[ 0.0%] 10[ 0.0%]
3[ 0.0%] 11[ 0.0%]
4[ 0.0%] 12[ 0.0%]
5[ 0.0%] 13[ 0.0%]
6[ 0.0%] 14[ 0.0%]
7[ 0.0%] 15[ 0.0%]
PID USER PRI NI S CPU CPU%▽MEM% TIME+ Command
577 root RT 0 R 0 100.0 0.2 0:23.17 stress-ng-cpu [run]
555 arighi 20 0 R 1 100.0 0.0 0:28.67 yes
It's debatable whether per-CPU tasks should be re-enqueued as well, but
doing so is probably safer: the scheduler can recognize re-enqueued
tasks through the %SCX_ENQ_REENQ flag, reassess their placement, and
either put them back at the head of the local DSQ or let another task
attempt to take the CPU.
This also prevents giving per-CPU tasks an implicit priority boost,
which would otherwise make them more likely to reclaim CPUs preempted by
higher scheduling classes.
Fixes: 97e13ecb02668 ("sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Acked-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The parameter max_hw_wzeroes_unmap_sectors in queue_limits should be
equal to max_write_zeroes_sectors if it is set to a non-zero value.
However, the stacked md drivers call md_init_stacking_limits() to
initialize this parameter to UINT_MAX but only adjust
max_write_zeroes_sectors when setting limits. Therefore, this
discrepancy triggers a value check failure in blk_validate_limits().
$ modprobe scsi_debug num_parts=2 dev_size_mb=8 lbprz=1 lbpws=1
$ mdadm --create /dev/md0 --level=0 --raid-device=2 /dev/sda1 /dev/sda2
mdadm: Defaulting to version 1.2 metadata
mdadm: RUN_ARRAY failed: Invalid argument
Fix this failure by explicitly setting max_hw_wzeroes_unmap_sectors to
max_write_zeroes_sectors. Since the linear and raid0 drivers support
write zeroes, so they can support unmap write zeroes operation if all of
the backend devices support it. However, the raid1/10/5 drivers don't
support write zeroes, so we have to set it to zero.
Fixes: 0c40d7cb5ef3 ("block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits")
Reported-by: John Garry <john.g.garry@oracle.com>
Closes: https://lore.kernel.org/linux-block/803a2183-a0bb-4b7a-92f1-afc5097630d2@oracle.com/
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Tested-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Li Nan <linan122@huawei.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/linux-raid/20250910111107.3247530-2-yi.zhang@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
|
|
The xe_preempt_fence_create() function returns error pointers. It
never returns NULL. Update the error checking to match.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 75cc23ffe5b422bc3cbd5cf0956b8b86e4b0e162)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add missing mutex unlock before returning from the error path in
cdns_mhdp_atomic_enable().
Fixes: 935a92a1c400 ("drm: bridge: cdns-mhdp8546: Fix possible null pointer dereference")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qi Xi <xiqi2@huawei.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20250904034447.665427-1-xiqi2@huawei.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
|
|
Merge series from Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com>:
Fix the lpaif_type configuration for the I2S interface.
The proper lpaif interface type required to allow DSP to vote
appropriate clock setting for I2S interface and also Add support
for configuring the DAI format on MI2S interfaces to allow setting
the appropriate bit clock and frame clock polarity, ensuring correct
audio data transmissionover MI2S.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-current
intel-gpio fixes for v6.17-rc7
* Fix a regression to make GpioInt() by index work again
* Ingnore spurious wakeups from touchpad on GPD G1619-05
* Accept debounce from GpioIo() resources
|
|
It turns out that the dual screen models use 0x5E for attaching and
detaching the keyboard instead of 0x5F. So, re-add the codes by
reverting commit cf3940ac737d ("platform/x86: asus-wmi: Remove extra
keys from ignore_key_wlan quirk"). For our future reference, add a
comment next to 0x5E indicating that it is used for that purpose.
Fixes: cf3940ac737d ("platform/x86: asus-wmi: Remove extra keys from ignore_key_wlan quirk")
Reported-by: Rahul Chandra <rahul@chandra.net>
Closes: https://lore.kernel.org/all/10020-68c90c80-d-4ac6c580@106290038/
Cc: stable@kernel.org
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://patch.msgid.link/20250916072818.196462-1-lkml@antheas.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Include the ACPI ID AMDI0108, which is used on upcoming AMD platforms, in
the PMF driver's list of supported devices.
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20250915090546.2759130-1-Shyam-sundar.S-k@amd.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
The VESA AUX backlight fails to be enable luminance based backlight
mainpulation becaused luminance_set is false by default.
Fix it by using luminance support control capabitliy.
Fixes: e13af5166a359 ("drm/i915/backlight: Use drm helper to initialize edp backlight")
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/20250823121647.275834-1-aaron.ma@canonical.com
(cherry picked from commit 72136efb875d8438c20b9c8ab72945d474933471)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
|
|
Parallel concurrent writes to the same zram index result in leaked
zsmalloc handles. Schematically we can have something like this:
CPU0 CPU1
zram_slot_lock()
zs_free(handle)
zram_slot_lock()
zram_slot_lock()
zs_free(handle)
zram_slot_lock()
compress compress
handle = zs_malloc() handle = zs_malloc()
zram_slot_lock
zram_set_handle(handle)
zram_slot_lock
zram_slot_lock
zram_set_handle(handle)
zram_slot_lock
Either CPU0 or CPU1 zsmalloc handle will leak because zs_free() is done
too early. In fact, we need to reset zram entry right before we set its
new handle, all under the same slot lock scope.
Link: https://lkml.kernel.org/r/20250909045150.635345-1-senozhatsky@chromium.org
Fixes: 71268035f5d7 ("zram: free slot memory early during write")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reported-by: Changhui Zhong <czhong@redhat.com>
Closes: https://lore.kernel.org/all/CAGVVp+UtpGoW5WEdEU7uVTtsSCjPN=ksN6EcvyypAtFDOUf30A@mail.gmail.com/
Tested-by: Changhui Zhong <czhong@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
`netif_rx()` already increments `rx_dropped` core stat when it fails.
The driver was also updating `ndev->stats.rx_dropped` in the same path.
Since both are reported together via `ip -s -s` command, this resulted
in drops being counted twice in user-visible stats.
Keep the driver update on `if (unlikely(!skb))`, but skip it after
`netif_rx()` errors.
Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter")
Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250913060135.35282-3-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
mptcp: pm: nl: announce deny-join-id0 flag
During the connection establishment, a peer can tell the other one that
it cannot establish new subflows to the initial IP address and port by
setting the 'C' flag [1]. Doing so makes sense when the sender is behind
a strict NAT, operating behind a legacy Layer 4 load balancer, or using
anycast IP address for example.
When this 'C' flag is set, the path-managers must then not try to
establish new subflows to the other peer's initial IP address and port.
The in-kernel PM has access to this info, but the userspace PM didn't,
not letting the userspace daemon able to respect the RFC8684.
Here are a few fixes related to this 'C' flag (aka 'deny-join-id0'):
- Patch 1: add remote_deny_join_id0 info on passive connections. A fix
for v5.14.
- Patch 2: let the userspace PM daemon know about the deny_join_id0
attribute, so when set, it can avoid creating new subflows to the
initial IP address and port. A fix for v5.19.
- Patch 3: a validation for the previous commit.
- Patch 4: record the deny_join_id0 info when TFO is used. A fix for
v6.2.
- Patch 5: not related to deny-join-id0, but it fixes errors messages in
the sockopt selftests, not to create confusions. A fix for v6.5.
====================
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-0-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch fixes several issues in the error reporting of the MPTCP sockopt
selftest:
1. Fix diff not printed: The error messages for counter mismatches had
the actual difference ('diff') as argument, but it was missing in the
format string. Displaying it makes the debugging easier.
2. Fix variable usage: The error check for 'mptcpi_bytes_acked' incorrectly
used 'ret2' (sent bytes) for both the expected value and the difference
calculation. It now correctly uses 'ret' (received bytes), which is the
expected value for bytes_acked.
3. Fix off-by-one in diff: The calculation for the 'mptcpi_rcv_delta' diff
was 's.mptcpi_rcv_delta - ret', which is off-by-one. It has been
corrected to 's.mptcpi_rcv_delta - (ret + 1)' to match the expected
value in the condition above it.
Fixes: 5dcff89e1455 ("selftests: mptcp: explicitly tests aggregate counters")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-5-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When TFO is used, the check to see if the 'C' flag (deny join id0) was
set was bypassed.
This flag can be set when TFO is used, so the check should also be done
when TFO is used.
Note that the set_fully_established label is also used when a 4th ACK is
received. In this case, deny_join_id0 will not be set.
Fixes: dfc8d0603033 ("mptcp: implement delayed seq generation for passive fastopen")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-4-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The previous commit adds the MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 flag. Make
sure it is correctly announced by the other peer when it has been
received.
pm_nl_ctl will now display 'deny_join_id0:1' when monitoring the events,
and when this flag was set by the other peer.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-3-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During the connection establishment, a peer can tell the other one that
it cannot establish new subflows to the initial IP address and port by
setting the 'C' flag [1]. Doing so makes sense when the sender is behind
a strict NAT, operating behind a legacy Layer 4 load balancer, or using
anycast IP address for example.
When this 'C' flag is set, the path-managers must then not try to
establish new subflows to the other peer's initial IP address and port.
The in-kernel PM has access to this info, but the userspace PM didn't.
The RFC8684 [1] is strict about that:
(...) therefore the receiver MUST NOT try to open any additional
subflows toward this address and port.
So it is important to tell the userspace about that as it is responsible
for the respect of this flag.
When a new connection is created and established, the Netlink events
now contain the existing but not currently used 'flags' attribute. When
MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 is set, it means no other subflows
to the initial IP address and port -- info that are also part of the
event -- can be established.
Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6 [1]
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/532
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-2-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a SYN containing the 'C' flag (deny join id0) was received, this
piece of information was not propagated to the path-manager.
Even if this flag is mainly set on the server side, a client can also
tell the server it cannot try to establish new subflows to the client's
initial IP address and port. The server's PM should then record such
info when received, and before sending events about the new connection.
Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-1-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
selftests: mptcp: avoid spurious errors on TCP disconnect
This series should fix the recent instabilities seen by MPTCP and NIPA
CIs where the 'mptcp_connect.sh' tests fail regularly when running the
'disconnect' subtests with "plain" TCP sockets, e.g.
# INFO: disconnect
# 63 ns1 MPTCP -> ns1 (10.0.1.1:20001 ) MPTCP (duration 996ms) [ OK ]
# 64 ns1 MPTCP -> ns1 (10.0.1.1:20002 ) TCP (duration 851ms) [ OK ]
# 65 ns1 TCP -> ns1 (10.0.1.1:20003 ) MPTCP Unexpected revents: POLLERR/POLLNVAL(19)
# (duration 896ms) [FAIL] file received by server does not match (in, out):
# -rw-r--r-- 1 root root 11112852 Aug 19 09:16 /tmp/tmp.hlJe5DoMoq.disconnect
# Trailing bytes are:
# /{ga 6@=#.8:-rw------- 1 root root 10085368 Aug 19 09:16 /tmp/tmp.blClunilxx
# Trailing bytes are:
# /{ga 6@=#.8:66 ns1 MPTCP -> ns1 (dead:beef:1::1:20004) MPTCP (duration 987ms) [ OK ]
# 67 ns1 MPTCP -> ns1 (dead:beef:1::1:20005) TCP (duration 911ms) [ OK ]
# 68 ns1 TCP -> ns1 (dead:beef:1::1:20006) MPTCP (duration 980ms) [ OK ]
# [FAIL] Tests of the full disconnection have failed
These issues started to be visible after some behavioural changes in
TCP, where too quick re-connections after a shutdown() can now be more
easily rejected. Patch 3 modifies the selftests to wait, but this
resolution revealed an issue in MPTCP which is fixed by patch 1 (a fix
for v5.9 kernel).
Patches 2 and 4 improve some errors reported by the selftests, and patch
5 helps with the debugging of such issues.
====================
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-0-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To be able to find which capture files have been produced after several
runs.
This prefix was not printed anywhere before.
While at it, always use the same prefix by taking info from ns1, instead
of "$connector_ns", which is sometimes ns1, sometimes ns2 in the
subtests.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-5-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is better than printing random bytes in the terminal.
Note that Jakub suggested 'hexdump', but Mat found out this tool is not
often installed by default. 'od' can do a similar job, and it is in the
POSIX specs and available in coreutils, so it should be on more systems.
While at it, display a few more bytes, just to fill in the two lines.
And no need to display the 3rd only line showing the next number of
bytes: 0000040.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-4-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|