| Age | Commit message (Collapse) | Author | Files | Lines |
|
Cross-merge networking fixes after downstream PR (net-7.1-rc3).
Conflicts:
net/ipv4/igmp.c
726fa7da2d8c ("ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation")
c6bebaa744f7 ("ipv4: igmp: annotate data-races in igmp_heard_query()")
https://lore.kernel.org/a7365e4873340f7a5e30411207de3bf9@kernel.org
Adjacent changes:
net/psp/psp_main.c
30cb24f97d44 ("psp: strip variable-length PSP header in psp_dev_rcv()")
c2b22277ad89 ("psp: validate IPv4 header fields in psp_dev_rcv()")
net/sched/sch_fq_codel.c
f83e07b29246 ("net/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()")
3f3aa77ff1c8 ("net/sched: add qstats_cpu_drop_inc() helper")
net/wireless/pmsr.c
0f3c0a197309 ("wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage")
410aa47fd9d3 ("wifi: cfg80211: allow suppressing FTM result reporting for PD requests")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter, IPsec, Bluetooth and WiFi.
Current release - fix to a fix:
- ipmr: add __rcu to netns_ipv4.mrt, make sure we hold the RCU lock
in all relevant places
Current release - new code bugs:
- fixes for the recently added resizable hash tables
- ipv6: make sure we default IPv6 tunnel drivers to =m now that IPv6
itself is built in
- drv: octeontx2-af: fixes for parser/CAM fixes
Previous releases - regressions:
- phy: micrel: fix LAN8814 QSGMII soft reset
- wifi:
- cw1200: revert "Fix locking in error paths"
- ath12k: fix crash on WCN7850, due to adding the same queue
buffer to a list multiple times
Previous releases - always broken:
- number of info leak fixes
- ipv6: implement limits on extension header parsing
- wifi: number of fixes for missing bound checks in the drivers
- Bluetooth: fixes for races and locking issues
- af_unix:
- fix an issue between garbage collection and PEEK
- fix yet another issue with OOB data
- xfrm: esp: avoid in-place decrypt on shared skb frags
- netfilter: replace skb_try_make_writable() by skb_ensure_writable()
- openvswitch: vport: fix race between tunnel creation and linking
leading to invalid memory accesses (type confusion)
- drv: amd-xgbe: fix PTP addend overflow causing frozen clock
Misc:
- sched/isolation: make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN
(for relevant IPVS change)"
* tag 'net-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (190 commits)
net: sparx5: configure serdes for 1000BASE-X in sparx5_port_init()
net: sparx5: fix wrong chip ids for TSN SKUs
net: stmmac: dwmac-nuvoton: fix NULL pointer dereference in nvt_set_phy_intf_sel()
tcp: Fix dst leak in tcp_v6_connect().
ipmr: Call ipmr_fib_lookup() under RCU.
net: phy: broadcom: Save PHY counters during suspend
net/smc: fix missing sk_err when TCP handshake fails
af_unix: Reject SIOCATMARK on non-stream sockets
veth: fix OOB txq access in veth_poll() with asymmetric queue counts
eth: fbnic: fix double-free of PCS on phylink creation failure
net: ethernet: cortina: Drop half-assembled SKB
selftests: mptcp: pm: restrict 'unknown' check to pm_nl_ctl
selftests: mptcp: check output: catch cmd errors
mptcp: pm: prio: skip closed subflows
mptcp: pm: ADD_ADDR rtx: return early if no retrans
mptcp: pm: ADD_ADDR rtx: skip inactive subflows
mptcp: pm: ADD_ADDR rtx: resched blocked ADD_ADDR quicker
mptcp: pm: ADD_ADDR rtx: free sk if last
mptcp: pm: ADD_ADDR rtx: always decrease sk refcount
mptcp: pm: ADD_ADDR rtx: fix potential data-race
...
|
|
Pull smb server fixes from Steve French:
- Fix memory leak in connection free
- Fix inherited ACL ACE validation
- Minor cleanup
- Fix for share config
- Fix durable handle cleanup race
- Fix close_file_table_ids in session teardown
- smbdirect fixes:
- Fix memory region registration
- Two fixes for out-of-tree builds
* tag 'v7.1-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: validate inherited ACE SID length
ksmbd: fix kernel-doc warnings from ksmbd_conn_get/put()
ksmbd: fail share config requests when path allocation fails
ksmbd: close durable scavenger races against m_fp_list lookups
ksmbd: harden file lifetime during session teardown
ksmbd: centralize ksmbd_conn final release to plug transport leak
smb: smbdirect: fix MR registration for coalesced SG lists
smb: smbdirect: introduce and use include/linux/smbdirect.h
smb: smbdirect: make use of DEFAULT_SYMBOL_NAMESPACE and EXPORT_SYMBOL_GPL
|
|
This driver supports both SPI and MMIO based register access, but only
the former has devicetree support. While MMIO mode would have worked
with old-style board files, those have never defined such a device
upstream.
Remove the MMIO mode, leaving SPI as the only way to use this driver,
but leave it in two loadable modules. More cleanups can be done by
combining the two into one file.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260505180459.1247690-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Representor callbacks can be registered and unregistered while the
E-Switch is already in switchdev mode, and the same E-Switch may also be
reconfigured by devlink, VF changes and SF changes. Serialize these paths
with the per-E-Switch representor mutex instead of relying on ad-hoc bit
state and wait queues.
Take the representor lock around the mode transition, VF/SF representor
changes and representor ops registration. Keep mode_lock and the
representor lock unnested by using the operation flag while the mode lock
is dropped. During mode changes, drop the representor lock around the
auxiliary bus rescan because driver bind/unbind may register or unregister
representor ops.
Split representor ops registration into locked public wrappers and blocked
internal helpers, clear the ops pointer on unregister, and add nested
wrappers for the shared-FDB master IB path that registers peer
representor ops while another E-Switch representor lock is already held.
On unregister, always call __unload_reps_all_vport() before marking reps
unregistered and clearing rep_ops. The per-representor state check makes
this a no-op for types that were not loaded, so unregister no longer has
to infer load state from esw->mode.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260503202726.266415-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Lots of new content in cfg80211/mac80211, notably
- more NAN work, mostly complete now (also hwsim)
- more UHR work (e.g. non-primary channel access),
this will continue for a while
- FTM ranging APIs
* tag 'wireless-next-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (70 commits)
wifi: mac80211: explicitly disable FTM responder on AP stop
wifi: iwlwifi: don't blindly start the responder upon BSS_CHANGED_FTM_RESPONDER
wifi: mac80211_hwsim: claim HT STBC capability
wifi: mac80211_hwsim: enable NAN_DATA interface simulation support
wifi: mac80211_hwsim: Support Tx of multicast data on NAN
wifi: mac80211_hwsim: Do not declare support for NDPE
wifi: mac80211_hwsim: Declare support for secure NAN
wifi: mac80211_hwsim: add NAN data path TX/RX support
wifi: mac80211_hwsim: set HAS_RATE_CONTROL when using NAN
wifi: mac80211_hwsim: implement NAN schedule callbacks
wifi: mac80211_hwsim: add NAN PHY capabilities
wifi: mac80211_hwsim: add NAN_DATA interface limits
wifi: mac80211_hwsim: implement NAN synchronization
wifi: mac80211_hwsim: protect tsf_offset using a spinlock
wifi: mac80211_hwsim: only RX on NAN when active on a slot
wifi: mac80211_hwsim: select NAN TX channel based on current TSF
wifi: mac80211_hwsim: limit TX of frames to the NAN DW
wifi: cfg80211: don't allow NAN DATA on multi radio devices
wifi: mac80211: check AP using NPCA has NPCA capability
wifi: mac80211: don't parse full UHR operation from beacons
...
====================
Link: https://patch.msgid.link/20260506111147.224296-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
- Fix devm_alloc_workqueue() passing a va_list as a positional arg to
the variadic alloc_workqueue() macro, which garbled wq->name and
skipped lockdep init on the devm path. Fold both noprof entry points
onto a va_list helper.
Also, annotate it using __printf(1, 0)
* tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Annotate alloc_workqueue_va() with __printf(1, 0)
workqueue: fix devm_alloc_workqueue() va_list misuse
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- During v6.19, cgroup task unlink was moved from do_exit() to after the
final task switch to satisfy a controller invariant. That left the kernel
seeing tasks past exit_signals() longer than userspace expected, and
several v7.0 follow-ups tried to bridge the gap by making rmdir wait for
the kernel side. None held up.
The latest is an A-A deadlock when rmdir is invoked by the reaper of
zombies whose pidns teardown the rmdir itself is waiting on, which
points at the synchronizing approach being fundamentally wrong.
Take a different approach: drop the wait, leave rmdir's user-visible
side returning as soon as cgroup.procs is empty, and defer the css
percpu_ref kill that drives ->css_offline() until the cgroup is fully
depopulated.
Tagged for stable. Somewhat invasive but contained. The hope is that
fixing forward sticks. If not, the fallback is to revert the entire
chain and rework on the development branch.
Note that this doesn't plug a pre-existing analogous race in
cgroup_apply_control_disable() (controller disable via
subtree_control). Not a regression. The development branch will do
the more invasive restructuring needed for that.
- Documentation update for cgroup-v1 charge-commit section that still
referenced functions removed when the memcg hugetlb try-commit-cancel
protocol was retired.
* tag 'cgroup-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: Update charge-commit section
cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix idle CPU selection returning prev_cpu outside the task's cpus_ptr
when the BPF caller's allowed mask was wider. Stable backport.
- Two opposite-direction gaps in scx_task_iter's cgroup-scoped mode
versus the global mode:
- Tasks past exit_signals() are filtered by the cgroup walk but kept
by global. Sub-scheduler enable abort leaked __scx_init_task()
state. Add a CSS_TASK_ITER_WITH_DEAD flag to cgroup's task
iterator (scx_task_iter is its only user) and use it.
- Tasks past sched_ext_dead() are still returned, tripping
WARN_ON_ONCE() in callers or making them touch torn-down state.
Mark and skip under the per-task rq lock.
* tag 'sched_ext-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: idle: Recheck prev_cpu after narrowing allowed mask
sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()
cgroup, sched_ext: Include exiting tasks in cgroup iter
|
|
UHR defines bit 8 to mean multi-link power management, add
a definition for it. Also reindent the other definitions to
use tabs, not spaces.
Link: https://patch.msgid.link/20260428110915.c6b6a06016cf.I7ebd97397507d320124547017e21191b55c5d34d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Since 802.11bn D1.4 the DBE capabilities are after the
PHY capabilities, not between MAC and PHY, adjust the
code accordingly.
Also add a struct for DBE capabilities and use it for
checking the correct length instead of hard-coding the
lengths.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260428103657.b40af50f182d.I75306a092dc2c8a9eb7276160f0b7144b4846d18@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Currently there is no way to install an LTF key seed that can be
used in non-trigger-based (NTB) and trigger-based (TB) FTM ranging
to protect NDP frames. Without this, drivers cannot enable PHY-layer
security for peer measurement sessions, leaving ranging measurements
vulnerable to eavesdropping and manipulation.
Introduce NL80211_KEY_LTF_SEED attribute and the dedicated extended
feature flag NL80211_EXT_FEATURE_SET_KEY_LTF_SEED to allow drivers
to advertise and install LTF key seeds via nl80211. The key seed
must be configured beforehand to ensure the peer measurement session
is secure. The driver must advertise both NL80211_EXT_FEATURE_SECURE_LTF
and NL80211_EXT_FEATURE_SET_KEY_LTF_SEED for the key seed installation
to be permitted.
The LTF key seed is pairwise key material and must only be used with
pairwise key type. Reject attempts to use it with other key types.
Signed-off-by: Peddolla Harshavardhan Reddy <peddolla.reddy@oss.qualcomm.com>
Link: https://patch.msgid.link/20260420090856.2152905-13-peddolla.reddy@oss.qualcomm.com
[fix policy coding style]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In IGMP, MRC and QQIC fields are not correctly encoded
when generating query packets. Since the receiver of the
query interprets these fields using the IGMPv3 floating-
point decoding logic, any value that exceeds the linear
threshold is incorrectly parsed as an exponential value,
leading to an incorrect interval calculation.
Encode and assign the corresponding protocol fields during
query generation. Introduce the logic to dynamically
calculate the exponent and mantissa using bit-scan (fls).
This ensures MRC and QQIC fields (8-bit) are properly
encoded when transmitting query packets with intervals
that exceed their respective linear threshold value of
128 (for MRT/QQI).
RFC3376: for both MRC and QQIC, values >= 128 represent
the same floating-point encoding as follows:
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|1| exp | mant |
+-+-+-+-+-+-+-+-+
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Link: https://patch.msgid.link/20260502131907.987-4-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Get rid of the IGMPV3_MRC macro and use the igmpv3_mrt() API to
calculate the Max Resp Time from the Maximum Response Code.
Similarly, for IGMPV3_QQIC, use the igmpv3_qqi() API to calculate
the Querier's Query Interval from the QQIC field.
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Link: https://patch.msgid.link/20260502131907.987-2-royujjal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred
affinity management"), kthreads default to use the HK_TYPE_DOMAIN
cpumask. IOW, it is no longer affected by the setting of the nohz_full
boot kernel parameter.
That means HK_TYPE_KTHREAD should now be an alias of HK_TYPE_DOMAIN
instead of HK_TYPE_KERNEL_NOISE to correctly reflect the current kthread
behavior. Make the change as HK_TYPE_KTHREAD is still being used in
some networking code.
Fixes: 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
scx_task_iter's cgroup-scoped mode can return tasks whose
sched_ext_dead() has already completed: cgroup_task_dead() removes
from cset->tasks after sched_ext_dead() in finish_task_switch() and is
irq-work deferred on PREEMPT_RT. The global mode is fine -
sched_ext_dead() removes from scx_tasks via list_del_init() first.
Callers (sub-sched enable prep/abort/apply, scx_sub_disable(),
scx_fail_parent()) assume returned tasks are still on @sch and trip
WARN_ON_ONCE() or operate on torn-down state otherwise.
Set %SCX_TASK_OFF_TASKS in sched_ext_dead() under @p's rq lock and
have scx_task_iter_next_locked() skip flagged tasks under the same
lock. Setter and reader serialize on the per-task rq lock - no race.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") made
css_task_iter_advance() skip exiting tasks so cgroup.procs stays consistent
with waitpid() visibility. Unfortunately, this broke scx_task_iter.
scx_task_iter walks either scx_tasks (global) or a cgroup subtree via
css_task_iter() and the two modes are expected to cover the same set of
tasks. After the above change the cgroup-scoped mode silently skips tasks
past exit_signals() that are still on scx_tasks.
scx_sub_enable_workfn()'s abort path is one of the symptoms: an exiting
SCX_TASK_SUB_INIT task can race past the cgroup iter leaking
__scx_init_task() state. Other iterations share the same gap.
Add CSS_TASK_ITER_WITH_DEAD to opt out of the skip and use it from
scx_task_iter().
Fixes: b0e4c2f8a0f0 ("sched_ext: Implement cgroup subtree iteration for scx_task_iter")
Reported-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
A chain of commits going back to v7.0 reworked rmdir to satisfy the
controller invariant that a subsystem's ->css_offline() must not run while
tasks are still doing kernel-side work in the cgroup.
[1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out")
[2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
[3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
[4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition")
[5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context")
[1] moved task cset unlink from do_exit() to finish_task_switch() so a
task's cset link drops only after the task has fully stopped scheduling.
That made tasks past exit_signals() linger on cset->tasks until their final
context switch, which led to a series of problems as what userspace expected
to see after rmdir diverged from what the kernel needs to wait for. [2]-[5]
tried to bridge that divergence: [2] filtered the exiting tasks from
cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4]
fixed the wait's condition; [5] made nr_dying_subsys_* visible
synchronously.
The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the
rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g.
host PID 1 systemd reaping orphan pids that were re-parented to it during
the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those
pids to free, the pids can't free because PID 1 is the reaper and it's stuck
in rmdir, and the system A-A deadlocks. No internal lock ordering breaks
this; the wait itself is the bug.
The css killing side that drove the original reorder, however, can be made
cleanly asynchronous: ->css_offline() is already async, run from
css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to
make that chain start only after all tasks have left the cgroup. rmdir's
user-visible side then returns as soon as cgroup.procs and friends are
empty, while ->css_offline() still runs only after the cgroup is fully
drained.
Verified by the original reproducer (pidns teardown + zombie reaper, runs
under vng) which hangs vanilla and succeeds here, and by per-commit
deterministic repros for [2], [3], [4], [5] with a boot parameter that
widens the post-exit_signals() window so each state is reliably reachable.
Some stress tests on top of that.
cgroup_apply_control_disable() has the same shape of pre-existing race:
when a controller is disabled via subtree_control, kill_css() ran
synchronously while tasks past exit_signals() could still be linked to
the cgroup's csets, and ->css_offline() could fire before they drained.
This patch preserves the existing synchronous behavior at that call site
(kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch
will defer kill_css_finish() there using a per-css trigger.
This seems like the right approach and I don't see problems with it. The
changes are somewhat invasive but not excessively so, so backporting to
-stable should be okay. If something does turn out to be wrong, the fallback
is to revert the entire chain ([1]-[5]) and rework in the development branch
instead.
v2: Pin cgrp across the deferred destroy work with explicit
cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1
wasn't actually broken (ordered cgroup_offline_wq + queue_work order
in cgroup_task_dead() saved it) but the explicit ref removes the
dependency on those non-obvious invariants. Also note the
pre-existing cgroup_apply_control_disable() race in the description;
a follow-up will defer kill_css_finish() there.
Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
Cc: stable@vger.kernel.org # v7.0+
Reported-and-tested-by: Martin Pitt <martin@piware.de>
Link: https://lore.kernel.org/all/afHNg2VX2jy9bW7y@piware.de/
Link: https://lore.kernel.org/all/35e0670adb4abeab13da2c321582af9f@kernel.org/
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Add detecting and parsing EEE device capability.
Recently EEE functionality support has been introduced to E610 FW.
Currently ixgbe driver has no possibility to detect whether NVM
loaded on given adapter supports EEE.
There's dedicated device capability element reflecting FW support
for given EEE link speed.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-1-6f27ae1cd073@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add mlx5_dma_pool alloc/free paths, and wire mlx5_frag_buf allocation
and free paths to use them.
mlx5_frag_buf_alloc_node() now selects an mlx5_dma_pool to allocate
fragments from, instead of directly allocating full coherent pages.
mlx5_frag_buf_free() frees from the respective pool.
mlx5_dma_pool_alloc() keeps allocation fast by maintaining pages with
available indexes at the head of the list, so the common allocation path
can take a free index immediately. New backing pages are allocated only
when no free index is available.
mlx5_dma_pool_free() returns released indexes to the pool and frees a
backing page once all of its indexes become free. This avoids keeping
fully free pages for the lifetime of the pool and reduces coherent DMA
memory footprint.
Signed-off-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260429201429.223809-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce mlx5 DMA pool and pool-page data structures, and add the
creation and teardown paths.
Each NUMA node owns a set of mlx5_dma_pool instances, each one with a
different block size. The sizes are defined as all powers of two
starting from MLX5_ADAPTER_PAGE_SHIFT and up to PAGE_SHIFT. Since
mlx5_frag_bufs are used to back objects whose sizes are encoded relative
to MLX5_ADAPTER_PAGE_SHIFT, a smaller block_shift value cannot be used.
Requests larger than PAGE_SIZE continue to be handled as page-sized
fragments, as in the existing frag-buf allocation model.
Signed-off-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260429201429.223809-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
rate_interval_us to tcp_sock_write_tx group
These fields are used in TX path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tp->bytes_acked is touched in TX path only.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These fields are touched in when payload is sent.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
segs_in is changed for each incoming packet, including ACK packets.
segs_out is changed for each outgoing packet, including ACK packets.
They belong to tcp_sock_write_txrx group.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These counters are changed whenever sent data is acknowleged.
They do not belong to tcp_sock_write_txrx group, because TCP receivers
do not touch them.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260430100021.211139-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull drm fixes from Dave Airlie:
"Fixes for rc2, the usual amdgpu/xe double header, I think xe had a
couple of weeks combined due to some maintainer access issues,
otherwise there's just a few misc fixes and documentation fixups.
core and helpers:
- calculate framebuffer geometry with format helpers
- fix docs
amdgpu:
- GFX12 fix for CONFIG_DRM_DEBUG_MM configs
- Fix DC analog support
- Userq fixes
- GART placement fix
- Aldebaran SMU fixes
- AMDGPU_INFO_READ_MMR_REG fix
- UVD 3.1 fix
- GC 6 TCC fix
- Fix root reservation in amdgpu_vm_handle_fault()
- RAS fix
- Module reload fix for APUs
- Fix build for CONFIG_DRM_FBDEV_EMULATION=n
- IGT DWB regression fix
- GC 11.5.4 fix
- VCN user fence fixes
- JPEG user fence fixes
- SMU 13.0.6 fix
- VCN 3/4 IB parser fixes
- NV3x+ dGPU vblank fix
- DCE6/8 fixes for LVDS/eDP panels without an EDID
amdkfd:
- Fix for when CONFIG_HSA_AMD is not set
- SVM fixes
xe:
- uapi: Add missing pad and extensions check
- uapi: Reject unsafe PAT indices for CPU cached memory
- Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge
- Xe3p tuning and workaround fixes
- USE drm mm instead of drm SA for CCS read/write
- Fix leaks and null derefs
- Fix Wa_18022495364
appletbdrm:
- allocate protocol buffers with kvzalloc()
dma-buf:
- fix docs
imagination:
- avoid segfault in debugfs
ofdrm:
- put PCI device reference on errors
udl:
- increase USB timeout"
* tag 'drm-fixes-2026-05-02' of https://gitlab.freedesktop.org/drm/kernel: (77 commits)
drm/xe/uapi: Reject coh_none PAT index for CPU_ADDR_MIRROR
drm/xe/uapi: Reject coh_none PAT index for CPU cached memory in madvise
drm/xe/xelp: Fix Wa_18022495364
drm/xe/gsc: Fix BO leak on error in query_compatibility_version()
drm/xe/eustall: Fix drm_dev_put called before stream disable in close
drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl()
drm/xe: Fix dma-buf attachment leak in xe_gem_prime_import()
drm/xe: Fix bo leak in xe_dma_buf_init_obj() on allocation failure
drm/xe/bo: Fix bo leak on GGTT flag validation in xe_bo_init_locked()
drm/xe/bo: Fix bo leak on unaligned size validation in xe_bo_init_locked()
drm/xe: Fix potential NULL deref in xe_exec_queue_tlb_inval_last_fence_put_unlocked
drm/xe/vf: Use drm mm instead of drm sa for CCS read/write
drm/xe: Add memory pool with shadow support
drm/xe/debugfs: Correct printing of register whitelist ranges
drm/xe: Mark ROW_CHICKEN5 as a masked register
drm/xe/tuning: Use proper register offset for GAMSTLB_CTRL
drm/xe/xe3p_lpg: Add missing indirect ring state feature flag
drm/xe: Drop redundant rtp entries for Wa_14019988906 & Wa_14019877138
drm/xe/vm: Add missing pad and extensions check
drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge()
...
|
|
drivers/net/Space.c is the last remnant of the linux-2.4.x driver model
that required each subsystem and device driver init function to be called
from init/main.c explicitly, before the introduction of initcall levels.
In linux-7.0, this was only used for a handful of ISA network drivers,
with the ne2000 driver being the last one.
Fold the code into ne.c directly, with minimal changes to preserve
the existing command line parsing.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260429145624.2948432-2-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following batch contains Netfilter fixes for net:
1) Replace skb_try_make_writable() by skb_ensure_writable() in
nft_fwd_netdev and the flowtable to deal with uncloned packets
having their network header in paged fragments.
2) Drop packet if output device does not exist and ensure sufficient
headroom in nft_fwd_netdev before transmitting the skb.
3) Use the existing dup recursion counter in nft_fwd_netdev for the
neigh_xmit variant, from Weiming Shi.
4) Add .check_hooks interface to x_tables to detach the control plane
hook check based on the match/target configuration. Then, update
nft_compat to use .check_hooks from .validate path, this fixes a
lack of hook validation for several match/targets.
5) Fix incorrect .usersize in xt_CT, from Florian Westphal.
6) Fix a memleak with netdev tables in dormant state,
from Florian Westphal.
7) Several patches to check if the packet is a fragment, then skip
layer 4 inspection, for x_tables and nf_tables; as well as common
nf_socket infrastructure. The xt_hashlimit match drops fragments
to stay consistent with the existing approach when failing to parse
the layer 4 protocol header.
8) Ensure sufficient headroom in the flowtable before transmitting
the skb.
9) Fix the flowtable inline vlan approach for double-tagged vlan:
Reverse the iteration over .encap[] since it represents the
encapsulation as seen from the ingress path. Postpone pushing
layer 2 header so output device is available to calculate needed
headroom. Finally, add and use nf_flow_vlan_push() to fix it.
10) Fix flowtable inline pppoe with GSO packets. Moreover, use
FLOW_OFFLOAD_XMIT_DIRECT to fill up destination hardware
address since neighbour cache does not exist in pppoe.
11) Use skb_pull_rcsum() to decapsulate vlan and pppoe headers, for
double-tagged vlan in particular this should provide some benefits
in certain scenarios.
More notes regarding 9-11):
- sashiko is also signalling to use it for IPIP headers, but that needs
more adjustments such setting skb->protocol after removing the IPIP
header, will follow up in a separated patch.
- I plan to submit selftests to cover double-tagged-vlan. As for pppoe,
it should be possible but that would mandate a few userspace dependencies.
This has been semi-automatically tested by me and reporters describing
broken double-vlan-tagged and pppoe currently in the flowtable.
* tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header
netfilter: flowtable: fix inline pppoe encapsulation in xmit path
netfilter: flowtable: fix inline vlan encapsulation in xmit path
netfilter: flowtable: ensure sufficient headroom in xmit path
netfilter: xtables: fix L4 header parsing for non-first fragments
netfilter: nf_tables: skip L4 header parsing for non-first fragments
netfilter: nf_socket: skip socket lookup for non-first fragments
netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables
netfilter: xt_CT: fix usersize for v1 and v2 revision
netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate
netfilter: x_tables: add .check_hooks to matches and targets
netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding
netfilter: replace skb_try_make_writable() by skb_ensure_writable()
====================
Link: https://patch.msgid.link/20260501122237.296262-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This makes it easier to rebuild cifs.ko and ksmbd.ko against
a running kernel.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/linux-cifs/aehrPuY60VMcYGU8@infradead.org/
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- MD pull request via Yu:
- Fix a raid5 UAF on IO across the reshape position
- Avoid failing RAID1/RAID10 devices for invalid IO errors
- Fix RAID10 divide-by-zero when far_copies is zero
- Restore bitmap grow through sysfs
- Use mddev_is_dm() instead of open-coding gendisk checks
- Use ATTRIBUTE_GROUPS() for md default sysfs attributes
- Replace open-coded wait loops with wait_event helpers
- NVMe pull request via Keith:
- Target data transfer size configuation (Aurelien)
- Enable P2P for RDMA (Shivaji Kant)
- TCP target updates (Maurizio, Alistair, Chaitanya, Shivam Kumar)
- TCP host updates (Alistair, Chaitanya)
- Authentication updates (Alistair, Daniel, Chris Leech)
- Multipath fixes (John Garry)
- New quirks (Alan Cui, Tao Jiang)
- Apple driver fix (Fedor Pchelkin)
- PCI admin doorbell update fix (Keith)
- Properly propagate CDROM read-only state to the block layer
* tag 'block-7.1-20260430' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (35 commits)
md: use ATTRIBUTE_GROUPS() for md default sysfs attributes
md: use mddev_is_dm() instead of open-coding gendisk checks
md/raid1: replace wait loop with wait_event_idle() in raid1_write_request()
md/md-bitmap: add a none backend for bitmap grow
md/md-bitmap: split bitmap sysfs groups
md: factor bitmap creation away from sysfs handling
md: use mddev_lock_nointr() in mddev_suspend_and_lock_nointr()
md: replace wait loop with wait_event() in md_handle_request()
md/raid10: fix divide-by-zero in setup_geo() with zero far_copies
md/raid1,raid10: don't fail devices for invalid IO errors
MAINTAINERS: Add Xiao Ni as md/raid reviewer
md/raid5: Fix UAF on IO across the reshape position
cdrom, scsi: sr: propagate read-only status to block layer via set_disk_ro()
nvme-auth: Hash DH shared secret to create session key
nvme-pci: fix missed admin queue sq doorbell write
nvme-auth: Include SC_C in RVAL controller hash
nvme-tcp: teardown circular locking fixes
nvmet-tcp: Don't clear tls_key when freeing sq
Revert "nvmet-tcp: Don't free SQ on authentication success"
nvme: skip trace completion for host path errors
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"20 hotfixes. All are for MM (and for MMish maintainers). 9 are
cc:stable and the remainder are for post-7.0 issues or aren't deemed
suitable for backporting.
There are two DAMON series from SeongJae Park which address races
which could lead to use-after-free errors, and avoid the possibility
of presenting stale parameter values to users"
* tag 'mm-hotfixes-stable-2026-04-30-15-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: memcontrol: fix rcu unbalance in get_non_dying_memcg_end()
mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry()
MAINTAINERS: remove stale kdump project URL
mm/damon/stat: detect and use fresh enabled value
mm/damon/lru_sort: detect and use fresh enabled and kdamond_pid values
mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values
selftests/mm: specify requirement for PROC_MEM_ALWAYS_FORCE=y
mm/damon/sysfs-schemes: protect path kfree() with damon_sysfs_lock
mm/damon/sysfs-schemes: protect memcg_path kfree() with damon_sysfs_lock
MAINTAINERS: update Li Wang's email address
MAINTAINERS, mailmap: update email address for Qi Zheng
MAINTAINERS: update Liam's email address
mm/hugetlb_cma: round up per_node before logging it
MAINTAINERS: fix regex pattern in CORE MM category
mm/vma: do not try to unmap a VMA if mmap_prepare() invoked from mmap()
mm: start background writeback based on per-wb threshold for strictlimit BDIs
kho: fix error handling in kho_add_subtree()
liveupdate: fix return value on session allocation failure
mailmap: update entry for Dan Carpenter
vmalloc: fix buffer overflow in vrealloc_node_align()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2026-04-29
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Extend query_esw_functions output for multi-function support
net/mlx5: Remove unused host_sf_enable field
net/mlx5: Add function_id_type for enable/disable_hca cmds
mlx5: Rename the vport number enums for host PF and VF
====================
Link: https://patch.msgid.link/20260429212747.224411-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Besides an out-of-bound bug, this is about properly supporting Winbond
octal SPI NAND chips which use a specific pattern for stuffing more
address bits in some operations. This uses the spi-mem flag in SPI
NAND that was added to the spi-mem layer just before the merge window
through the spi tree"
* tag 'mtd/fixes-for-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: spinand: winbond: Fix ODTR write VCR on W35NxxJW
mtd: spinand: winbond: Set the packed page read flag to W35N02/04JW
mtd: spinand: Add support for packed read data ODTR commands
mtd: spi-nor: debugfs: fix out-of-bounds read in spi_nor_params_show()
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Some new content already, notably:
- mac80211: major rework of station bandwidth handling,
fixing issues with lower capability than AP
- general: cleanups for EMLSR spec issues (drafts differed)
- ath9k: GPIO interface improvements
- ath12k: replace dynamic memory allocation in WMI RX path
* tag 'wireless-next-2026-04-30' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (39 commits)
wifi: brcmsmac: phy_lcn: Remove dead code in wlc_lcnphy_radio_2064_channel_tune_4313()
wifi: mac80211: always allow transmitting null-data on TXQs
wifi: mac80211: use kstrtobool_from_user() in debugfs callbacks
wifi: cfg80211: validate cipher suite for NAN Data keys
wifi: nl80211: check link is beaconing for color change
wifi: mac80211: clarify an 802.11 VHT spec reference
wifi: mac80211: fix per-station PHY capability bandwidth
wifi: mac80211: clarify per-STA bandwidth handling
wifi: nl80211: always validate AP operation/PHY regulatory
wifi: cfg80211: provide HT/VHT operation for AP beacon
wifi: nl80211: reject too short HT/VHT/HE/EHT capability/operation
wifi: cfg80211: move AP HT/VHT/... operation to beacon info
wifi: nl80211: reject beacons with bad HE operation
wifi: cfg80211: remove HE/SAE H2E required fields
wifi: mac80211: remove ieee80211_sta_cur_vht_bw()
wifi: mac80211: clean up ieee80211_sta_cap_rx_bw()
wifi: mac80211: clean up initial STA NSS/bandwidth handling
wifi: mac80211: clean up STA NSS handling
wifi: mac80211: simplify ieee80211_sta_rx_bw_to_chan_width()
wifi: nl80211: document channel opmode change channel width
...
====================
Link: https://patch.msgid.link/20260430120304.249081-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-7.1-rc2).
No conflicts, or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Current release - regressions:
- ipmr: free mr_table after RCU grace period.
Previous releases - regressions:
- core: add net_iov_init() and use it to initialize ->page_type
- sched: taprio: fix NULL pointer dereference in class dump
- netfilter: nf_tables:
- use list_del_rcu for netlink hooks
- fix strict mode inbound policy matching
- tcp: make probe0 timer handle expired user timeout
- vrf: fix a potential NPD when removing a port from a VRF
- eth: ice:
- fix NULL pointer dereference in ice_reset_all_vfs()
- fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
Previous releases - always broken:
- page_pool: fix memory-provider leak in error path
- sched: sch_cake: annotate data-races in cake_dump_stats()
- mptcp: fix scheduling with atomic in timestamp sockopt
- psp: check for device unregister when creating assoc
- tls: fix strparser anchor skb leak on offload RX setup failure
- eth:
- stmmac: prevent NULL deref when RX memory exhausted
- airoha: do not read uninitialized fragment address
- rtl8150: fix use-after-free in rtl8150_start_xmit()
Misc:
- add Ido Schimmel as IPv4/IPv6 maintainer
- add David Heidelberg as NFC subsystem maintainer"
* tag 'net-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net/sched: cls_flower: revert unintended changes
sfc: fix error code in efx_devlink_info_running_versions()
net: tls: fix strparser anchor skb leak on offload RX setup failure
ice: add dpll peer notification for paired SMA and U.FL pins
ice: fix missing dpll notifications for SW pins
dpll: export __dpll_pin_change_ntf() for use under dpll_lock
ice: fix SMA and U.FL pin state changes affecting paired pin
ice: fix missing SMA pin initialization in DPLL subsystem
ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
ice: fix NULL pointer dereference in ice_reset_all_vfs()
iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
iavf: wait for PF confirmation before removing VLAN filters
iavf: stop removing VLAN filters from PF on interface down
iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
page_pool: fix memory-provider leak in page_pool_create_percpu() error path
bonding: 3ad: implement proper RCU rules for port->aggregator
net: airoha: Do not return err in ndo_stop() callback
hv_sock: fix ARM64 support
MAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel
selftests: drv-net: clarify linters and frameworks in README
...
|
|
Add pin-operstate enum and operstate_on_dpll_get callback to report
the actual hardware status of a pin with respect to its parent DPLL
device. Unlike pin-state (which reflects administrative intent set
by the user), operstate reflects what the hardware is actually doing.
Defined operational states:
- active: pin is qualified and actively used by the DPLL
- standby: pin is qualified but not actively used by the DPLL
- no-signal: pin does not have a valid signal
- qual-failed: pin signal failed qualification
The operstate is reported inside the pin-parent-device nested
attribute alongside the existing state and phase-offset attributes.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Petr Oros <poros@redhat.com>
Link: https://patch.msgid.link/20260428154907.2820654-2-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Export __dpll_pin_change_ntf() so that drivers can send pin change
notifications from within pin callbacks, which are already called
under dpll_lock. Using dpll_pin_change_ntf() in that context would
deadlock.
Add lockdep_assert_held() to catch misuse without the lock held.
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-9-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add a new .check_hooks interface for checking if the match/target is
used from the validate hook according to its configuration.
Move existing conditional hook check based on the match/target
configuration from .checkentry to .check_hooks for the following
matches/targets:
- addrtype
- devgroup
- physdev
- policy
- set
- TCPMSS
- SET
This is a preparation patch to fix nft_compat, not functional changes
are intended.
Based on patch from Florian Westphal.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix inverted check of registering the stats for branch tracing
When calling register_stat_tracer() which returns zero on success and
negative on error, the callers were checking the return of zero as an
error and printing a warning message. Because this was just a normal
printk() message and not a WARN(), it wasn't caught in any testing.
Fix the check to print the warning message when an error actually
happens.
- Fix a typo in a comment in tracepoint.h
- Limit the size of event probes to 3K in size
It is possible to create a dynamic event probe via the tracefs system
that is greater than the max size of an event that the ring buffer
can hold. This basically causes the event to become useless.
Limit the size of an event probe to be 3K as that should be large
enough to handle any dynamic events being created, and fits within
the PAGE_SIZE sub-buffers of the ring buffer.
* tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Limit size of event probe to 3K
tracepoint: Fix typo in tracepoint.h comment
tracing: branch: Fix inverted check on stat tracer registration
|
|
Update the query_esw_functions command to support a new response layout
that can report data for multiple network functions. Setting bit 14 of
the op_mod field selects the v1 layout with network_function_params
entries instead of the legacy host_params_context.
The query_host_net_function_v1 read-only capability indicates firmware
support for layout version 1, and query_host_net_function_num_max
advertises the maximum number of network function entries.
Define a new network_function_params layout and a net_function_params
union that groups host_params_context and network_function_params.
Rework the query_esw_functions output to use a flexible array of this
union, and adjust existing driver callers to use it.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260428053851.220089-5-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Drop the unused host_sf_enable array from
mlx5_ifc_query_esw_functions_out_bits layout. This field has been
deprecated in firmware and is not referenced by the mlx5 driver, so it
can be safely removed.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260428053851.220089-4-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add a function_id_type field to the enable_hca and disable_hca command
input layouts in mlx5_ifc.h to allow using vhca_id as the function index
instead of function_id. The new field support by firmware is indicated
by the function_id_type_vhca_id capability bit, which is already exposed
in hca caps.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260428053851.220089-3-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Rename the vport number enums MLX5_VPORT_PF to MLX5_VPORT_HOST_PF and
MLX5_VPORT_FIRST_VF to MLX5_VPORT_FIRST_HOST_VF to indicate that these
vport indices represent the host PF and its VFs. This prepares the code
for upcoming support of an additional PF type.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260428053851.220089-2-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) IEEE1394 ARP payload contains no target hardware address in the
ARP packet. Apparently, arp_tables was never updated to deal with
IEEE1394 ARP properly. To deal with this, return no match in case
the target hardware address selector is used, either for inverse or
normal match. Moreover, arpt_mangle disallows mangling of the target
hardware and IP address because, it is not worth to adjust the
offset calculation to fix this, we suspect no users of arp_tables
for this family.
2) Use list_del_rcu() to delete device hooks in nf_tables, this hook
list is RCU protected, concurrent netlink dump readers can be
walking on this list, fix it by adding a helper function and use it
for consistency. From Florian Westphal.
3) Add list_splice_rcu(), this is useful for joining the local list of
new device hooks to the RCU protected hook list in chain and
flowtable. Reviewed by Paul E. McKenney.
4) Use list_splice_rcu() to publish the new device hooks in chain and
flowtable to fix concurrent netlink dump traversal.
5) Add a new hook transaction object to track device hook deletions.
The current approach moves device hooks to be deleted around during
the preparation phase, this breaks concurrent RCU reader via netlink
dump. This new hook transaction is combined with NFT_HOOK_REMOVE
flag to annotate hooks for removal in the preparation phase.
6) xt_policy inbound policy check in strict mode can lead to
out-of-bound access of the secpath array due to incorrect.
The iteration over the secpath needs to be reversed in the inbound
to check for the human readable policy, expecting inner in first
position and outer in second position, the secpath from inbound
actually stores outer in first position then in second position.
From Jiexun Wang.
7) Fix possible zero shift in nft_bitwise triggering UBSAN splat,
reject zero shift from control plane, from Kai Ma.
8) Replace simple_strtoul() in the conntrack SIP helper since it relies
on nul-terminated strings. From Florian Westphal.
* tag 'nf-26-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_conntrack_sip: don't use simple_strtoul
netfilter: reject zero shift in nft_bitwise
netfilter: xt_policy: fix strict mode inbound policy matching
netfilter: nf_tables: add hook transactions for device deletions
netfilter: nf_tables: join hook list via splice_list_rcu() in commit phase
rculist: add list_splice_rcu() for private lists
netfilter: nf_tables: use list_del_rcu for netlink hooks
netfilter: arp_tables: fix IEEE1394 ARP payload parsing
====================
Link: https://patch.msgid.link/20260428095840.51961-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
"The merge window pulled in the cgroup sub-scheduler infrastructure,
and new AI reviews are accelerating bug reporting and fixing - hence
the larger than usual fixes batch:
- Use-after-frees during scheduler load/unload:
- The disable path could free the BPF scheduler while deferred
irq_work / kthread work was still in flight
- cgroup setter callbacks read the active scheduler outside the
rwsem that synchronizes against teardown
Fix both, and reuse the disable drain in the enable error paths so
the BPF JIT page can't be freed under live callbacks.
- Several BPF op invocations didn't tell the framework which runqueue
was already locked, so helper kfuncs that re-acquire the runqueue
by CPU could deadlock on the held lock
Fix the affected callsites, including recursive parent-into-child
dispatch.
- The hardlockup notifier ran from NMI but eventually took a
non-NMI-safe lock. Bounce it through irq_work.
- A handful of bugs in the new sub-scheduler hierarchy:
- helper kfuncs hard-coded the root instead of resolving the
caller's scheduler
- the enable error path tried to disable per-task state that had
never been initialized, and leaked cpus_read_lock on the way
out
- a sysfs object was leaked on every load/unload
- the dispatch fast-path used the root scheduler instead of the
task's
- a couple of CONFIG #ifdef guards were misclassified
- Verifier-time hardening: BPF programs of unrelated struct_ops types
(e.g. tcp_congestion_ops) could call sched_ext kfuncs - a semantic
bug and, once sub-sched was enabled, a KASAN out-of-bounds read.
Now rejected at load. Plus a few NULL and cross-task argument
checks on sched_ext kfuncs, and a selftest covering the new deny.
- rhashtable (Herbert): restore the insecure_elasticity toggle and
bounce the deferred-resize kick through irq_work to break a
lock-order cycle observable from raw-spinlock callers. sched_ext's
scheduler-instance hash is the first user of both.
- The bypass-mode load balancer used file-scope cpumasks; with
multiple scheduler instances now possible, those raced. Move to
per-instance cpumasks, plus a follow-up to skip tasks whose
recorded CPU is stale relative to the new owning runqueue.
- Smaller fixes:
- a dispatch queue's first-task tracking misbehaved when a parked
iterator cursor sat in the list
- the runqueue's next-class wasn't promoted on local-queue
enqueue, leaving an SCX task behind RT in edge cases
- the reference qmap scheduler stopped erroring on legitimate
cross-scheduler task-storage misses"
* tag 'sched_ext-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (26 commits)
sched_ext: Fix scx_flush_disable_work() UAF race
sched_ext: Call wakeup_preempt() in local_dsq_post_enq()
sched_ext: Release cpus_read_lock on scx_link_sched() failure in root enable
sched_ext: Reject NULL-sch callers in scx_bpf_task_set_slice/dsq_vtime
sched_ext: Refuse cross-task select_cpu_from_kfunc calls
sched_ext: Align cgroup #ifdef guards with SUB_SCHED vs GROUP_SCHED
sched_ext: Make bypass LB cpumasks per-scheduler
sched_ext: Pass held rq to SCX_CALL_OP() for core_sched_before
sched_ext: Pass held rq to SCX_CALL_OP() for dump_cpu/dump_task
sched_ext: Save and restore scx_locked_rq across SCX_CALL_OP
sched_ext: Use dsq->first_task instead of list_empty() in dispatch_enqueue() FIFO-tail
sched_ext: Resolve caller's scheduler in scx_bpf_destroy_dsq() / scx_bpf_dsq_nr_queued()
sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters
sched_ext: Don't disable tasks in scx_sub_enable_workfn() abort path
sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
sched_ext: Guard scx_dsq_move() against NULL kit->dsq after failed iter_new
sched_ext: Unregister sub_kset on scheduler disable
sched_ext: Defer scx_hardlockup() out of NMI
sched_ext: sync disable_irq_work in bpf_scx_unreg()
sched_ext: Fix local_dsq_post_enq() to use task's scheduler in sub-sched
...
|
|
Change "my" to "may" in the description of subsystem configurations.
Link: https://patch.msgid.link/20260422021819.1788091-1-synte4028@gmail.com
Signed-off-by: Sheng Che Peng <synte4028@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
devm_alloc_workqueue() built a va_list and passed it as a single
positional argument to the variadic alloc_workqueue() macro:
va_start(args, max_active);
wq = alloc_workqueue(fmt, flags, max_active, args);
va_end(args);
C does not allow forwarding a va_list through a ... parameter.
alloc_workqueue() expands to alloc_workqueue_noprof(), which runs
its own va_start() over its ... params, so the inner
vsnprintf(wq->name, sizeof(wq->name), fmt, args) in
__alloc_workqueue() received the outer va_list object as the first
variadic slot rather than the caller's actual format arguments.
Add a new static helper alloc_workqueue_va() that wraps
__alloc_workqueue() and runs wq_init_lockdep() on success, and
fold both alloc_workqueue_noprof() and devm_alloc_workqueue_noprof()
onto it as suggested by Tejun.
The wq_init_lockdep() step is required on the devm path
too, otherwise __flush_workqueue()'s on-stack
COMPLETION_INITIALIZER_ONSTACK_MAP would NULL-deref wq->lockdep_map.
No caller changes are required. devm_alloc_ordered_workqueue() is
a macro forwarding to devm_alloc_workqueue() and inherits the fix.
Two in-tree callers actively trigger the broken path on every probe:
drivers/power/supply/mt6370-charger.c:889
drivers/power/supply/max77705_charger.c:649
both of which use devm_alloc_ordered_workqueue(dev, "%s", 0,
dev_name(dev)).
A standalone reproducer module is available at[1].
Link: https://github.com/leitao/debug/blob/main/workqueue/valist/wq_va_test.c [1]
Fixes: 1dfc9d60a69e ("workqueue: devres: Add device-managed allocate workqueue")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Transition Timeout is not specific to EMLSR, and is used by both EMLSR
and EMLMR mode.
Signed-off-by: Pablo Martin-Gomez <pmartin-gomez@freebox.fr>
Link: https://patch.msgid.link/20260410170429.343617-5-pmartin-gomez@freebox.fr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|