| Age | Commit message (Collapse) | Author | Files | Lines |
|
All inet6_cork users also use one inet_cork_full.
Reduce number of parameters and increase data locality.
This saves ~275 bytes of code on x86_64.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the following check, to detect bugs sooner for CONFIG_DEBUG_NET=y
builds.
DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head);
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130160253.2936789-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Many WWAN modems come with embedded GNSS receiver inside and have a
dedicated port to output geopositioning data. On the one hand, the
GNSS receiver has little in common with WWAN modem and just shares a
host interface and should be exported using the GNSS subsystem. On the
other hand, GNSS receiver is not automatically activated and needs a
generic WWAN control port (AT, MBIM, etc.) to be turned on. And a user
space software needs extra information to find the control port.
Introduce the new type of WWAN port - NMEA. When driver asks to register
a NMEA port, the core allocates common parent WWAN device as usual, but
exports the NMEA port via the GNSS subsystem and acts as a proxy between
the device driver and the GNSS subsystem.
From the WWAN device driver perspective, a NMEA port is registered as a
regular WWAN port without any difference. And the driver interacts only
with the WWAN core. From the user space perspective, the NMEA port is a
GNSS device which parent can be used to enumerate and select the proper
control port for the GNSS receiver management.
CC: Slark Xiao <slark_xiao@163.com>
CC: Muhammad Nuzaihan <zaihan@unrealasia.net>
CC: Qiang Yu <quic_qianyu@quicinc.com>
CC: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
CC: Johan Hovold <johan@kernel.org>
Suggested-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20260126062158.308598-6-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Another fairly large set of changes, notably:
- cfg80211/mac80211
- most of EPPKE/802.1X over auth frames support
- additional FTM capabilities
- split up drop reasons better, removing generic RX_DROP
- NAN cleanups/fixes
- ath11k:
- support for Channel Frequency Response measurement
- ath12k:
- support for the QCC2072 chipset
- iwlwifi:
- partial NAN support
- UNII-9 support
- some UHR/802.11bn FW APIs
- remove most of MLO/EHT from iwlmvm
(such devices use iwlmld)
- rtw89:
- preparations for RTL8922DE support
* tag 'wireless-next-2026-01-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (184 commits)
wifi: iwlegacy: add missing mutex protection in il4965_store_tx_power()
wifi: iwlegacy: add missing mutex protection in il3945_store_measurement()
wifi: mac80211: use u64_stats_t with u64_stats_sync properly
wifi: p54: Fix memory leak in p54_beacon_update()
wifi: cfg80211: treat deprecated INDOOR_SP_AP_OLD control value as LPI mode
wifi: rtw88: sdio: Migrate to use sdio specific shutdown function
wifi: rsi: sdio: Migrate to use sdio specific shutdown function
sdio: Provide a bustype shutdown function
wifi: nl80211/cfg80211: support operating as RSTA in PMSR FTM request
wifi: nl80211/cfg80211: add negotiated burst period to FTM result
wifi: nl80211/cfg80211: clarify periodic FTM parameters for non-EDCA based ranging
wifi: nl80211/cfg80211: add new FTM capabilities
wifi: iwlwifi: rename struct iwl_mcc_allowed_ap_type_cmd::offset_map
wifi: iwlwifi: mvm: Remove link_id from time_events
wifi: iwlwifi: mld: change cluster_id type to u8 array
wifi: iwlwifi: support V13 of iwl_lari_config_change_cmd
wifi: iwlwifi: split bios_value_u32 to separate the header
wifi: iwlwifi: uefi: cache the DSM functions
wifi: iwlwifi: acpi: cache the DSM functions
wifi: iwlwifi: mvm: Cleanup MLO code
...
====================
Link: https://patch.msgid.link/20260129110136.176980-39-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.19-rc8).
No adjacent changes, conflicts:
drivers/net/ethernet/spacemit/k1_emac.c
2c84959167d64 ("net: spacemit: Check for netif_carrier_ok() in emac_stats_update()")
f66086798f91f ("net: spacemit: Remove broken flow control support")
https://lore.kernel.org/aXjAqZA3iEWD_DGM@sirena.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As the stmmac_priv struct is passed to the fix_soc_reset() method which
has the ioaddr, there is no need to pass ioaddr separately. Pass just
the stmmac_priv struct. Fix up the glues that use it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1vkLmM-00000005vE1-0nop@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To prepare sdio drivers to migrate away from struct device_driver::shutdown
(and then eventually remove that callback) create a serdev driver shutdown
callback and migration code to keep the existing behaviour. Note this
introduces a warning for each driver that isn't converted yet to that
callback at register time.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/397f45c2818f6632151f92b70e547262f373c3b6.1768232321.git.u.kleine-koenig@baylibre.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add netns support to loopback and vhost. Keep netns disabled for
virtio-vsock, but add necessary changes to comply with common API
updates.
This is the patch in the series when vhost-vsock namespaces actually
come online.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-3-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add netns logic to vsock core. Additionally, modify transport hook
prototypes to be used by later transport-specific patches (e.g.,
*_seqpacket_allow()).
Namespaces are supported primarily by changing socket lookup functions
(e.g., vsock_find_connected_socket()) to take into account the socket
namespace and the namespace mode before considering a candidate socket a
"match".
This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode to
report the mode and /proc/sys/net/vsock/child_ns_mode to set the mode
for new namespaces.
Add netns functionality (initialization, passing to transports, procfs,
etc...) to the af_vsock socket layer. Later patches that add netns
support to transports depend on this patch.
This patch changes the allocation of random ports for connectible vsocks
in order to avoid leaking the random port range starting point to other
namespaces.
dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are
modified to take a vsk in order to perform logic on namespace modes. In
future patches, the net will also be used for socket
lookups in these functions.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260121-vsock-vmtest-v16-1-2859a7512097@meta.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix the the buggy conversion of fuse_reverse_inval_entry() introduced
during the creation rework
- Disallow nfs delegation requests for directories by setting
simple_nosetlease()
- Require an opt-in for getting readdir flag bits outside of S_DT_MASK
set in d_type
- Fix scheduling delayed writeback work by only scheduling when the
dirty time expiry interval is non-zero and cancel the delayed work if
the interval is set to zero
- Use rounded_jiffies_interval for dirty time work
- Check the return value of sb_set_blocksize() for romfs
- Wait for batched folios to be stable in __iomap_get_folio()
- Use private naming for fuse hash size
- Fix the stale dentry cleanup to prevent a race that causes a UAF
* tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: document d_dispose_if_unused()
fuse: shrink once after all buckets have been scanned
fuse: clean up fuse_dentry_tree_work()
fuse: add need_resched() before unlocking bucket
fuse: make sure dentry is evicted if stale
fuse: fix race when disposing stale dentries
fuse: use private naming for fuse hash size
writeback: use round_jiffies_relative for dirtytime_work
iomap: wait for batched folios to be stable in __iomap_get_folio
romfs: check sb_set_blocksize() return value
docs: clarify that dirtytime_expire_seconds=0 disables writeback
writeback: fix 100% CPU usage when dirtytime_expire_interval is 0
readdir: require opt-in for d_type flags
vboxsf: don't allow delegations to be set on directories
ceph: don't allow delegations to be set on directories
gfs2: don't allow delegations to be set on directories
9p: don't allow delegations to be set on directories
smb/client: properly disallow delegations on directories
nfs: properly disallow delegation requests on directories
fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
|
|
idev->mr_maxdelay is read and written locklessly,
add READ_ONCE()/WRITE_ONCE() annotations.
While we are at it, make this field an u32.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260122172247.2429403-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NETDEV_RSS_KEY_LEN has been set to 52 bytes in 2014, until now.
Jakub suggested we bump the size to 128 bytes or more.
Some drivers (like idpf) were already working around the core limit.
Since this change might cause some issues in admin scripts,
bump it directly to 256 in one go.
tjbp26:~# cat /proc/sys/net/core/netdev_rss_key | wc -c
768
tjbp26:~# ethtool -x eth1
RX flow hash indirection table for eth1 with 32 RX ring(s):
...
RSS hash key:
fe:16:5b:2f:93:85:c2:c9:c1:ef:bd:60:c6:e0:2b:99:4d:bf:b7:14:c8:1e:8d:cb:31:17:51:da:55:eb:91:d9:9e:f9:89:9b:44:a1:dc:08:72:3a:b3:d6:31:86:9a:fe:02:3a:0d:eb:a1:7c:f5:a3:51:3b:08:56:c9:3f:71:69:01:ba:70:38
RSS hash function:
toeplitz: on
xor: off
crc32: off
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20260122075206.504ec591@kernel.org/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260122190349.2771064-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
clang is not inlining skb_frag_unref() and __skb_frag_unref()
in gro fast path.
It also does not inline gro_try_pull_from_frag0().
Using __always_inline fixes this issue, makes the
kernel faster _and_ smaller.
Also change __skb_frag_ref(), skb_frag_ref() and skb_page_unref()
to let them inlined for the last patch in this series.
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 2/6 grow/shrink: 1/2 up/down: 218/-511 (-293)
Function old new delta
gro_pull_from_frag0 - 188 +188
__pfx_gro_pull_from_frag0 - 16 +16
skb_shift 1125 1139 +14
__pfx_skb_frag_unref 16 - -16
__pfx_gro_try_pull_from_frag0 16 - -16
__pfx___skb_frag_unref 16 - -16
__skb_frag_unref 36 - -36
skb_frag_unref 59 - -59
dev_gro_receive 1608 1514 -94
napi_gro_frags 892 771 -121
gro_try_pull_from_frag0 153 - -153
Total: Before=22566192, After=22565899, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260122045720.1221017-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The following (anti-)pattern was observed in the code tree:
do {
start = u64_stats_fetch_begin(&pstats->syncp);
memcpy(&temp, &pstats->stats, sizeof(temp));
} while (u64_stats_fetch_retry(&pstats->syncp, start));
On 64bit arches, struct u64_stats_sync is empty and provides no help
against load/store tearing, especially for memcpy(), for which arches may
provide their highly-optimized implements.
In theory the affected code should convert to u64_stats_t, or use
READ_ONCE()/WRITE_ONCE() properly.
However since there are needs to copy chunks of statistics, instead of
writing loops at random places, we provide a safe memcpy() variant for
u64_stats.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260120092137.2161162-2-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc/iio driver fixes from Greg KH:
"Here are some small char/misc/iio and some other minor driver
subsystem fixes for 6.19-rc7. Nothing huge here, just some fixes for
reported issues including:
- lots of little iio driver fixes
- comedi driver fixes
- mux driver fix
- w1 driver fixes
- uio driver fix
- slimbus driver fixes
- hwtracing bugfix
- other tiny bugfixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (36 commits)
comedi: dmm32at: serialize use of paged registers
mei: trace: treat reg parameter as string
uio: pci_sva: correct '-ENODEV' check logic
uacce: ensure safe queue release with state management
uacce: implement mremap in uacce_vm_ops to return -EPERM
uacce: fix isolate sysfs check condition
uacce: fix cdev handling in the cleanup path
slimbus: core: clean up of_slim_get_device()
slimbus: core: fix of_slim_get_device() kernel doc
slimbus: core: amend slim_get_device() kernel doc
slimbus: core: fix device reference leak on report present
slimbus: core: fix runtime PM imbalance on report present
slimbus: core: fix OF node leak on registration failure
intel_th: rename error label
intel_th: fix device leak on output open()
comedi: Fix getting range information for subdevices 16 to 255
mux: mmio: Fix IS_ERR() vs NULL check in probe()
interconnect: debugfs: initialize src_node and dst_node to empty strings
iio: dac: ad3552r-hs: fix out-of-bound write in ad3552r_hs_write_data_source
iio: accel: iis328dq: fix gain values
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:
- Always inline I/O and IRQ methods using build_assert!() to avoid
false positive build errors
- Do not free the driver's device private data in I2C shutdown()
avoiding race conditions that can lead to UAF bugs
- Drop the driver's device private data after the driver has been
fully unbound from its device to avoid UAF bugs from &Device<Bound>
scopes, such as IRQ callbacks
* tag 'driver-core-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
rust: driver: drop device private data post unbind
rust: driver: add DriverData type to the DriverLayout trait
rust: driver: add DEVICE_DRIVER_OFFSET to the DriverLayout trait
rust: driver: introduce a DriverLayout trait
rust: auxiliary: add Driver::unbind() callback
rust: i2c: do not drop device private data on shutdown()
rust: irq: always inline functions using build_assert with arguments
rust: io: always inline functions using build_assert with arguments
|
|
Some/most devices implementing gso_partial need to disable the GSO partial
features when the IP ID can't be mangled; to that extend each of them
implements something alike the following[1]:
if (skb->encapsulation && !(features & NETIF_F_TSO_MANGLEID))
features &= ~NETIF_F_TSO;
in the ndo_features_check() op, which leads to a bit of duplicate code.
Later patch in the series will implement GSO partial support for virtual
devices, and the current status quo will require more duplicate code and
a new indirect call in the TX path for them.
Introduce the mangleid_features mask, allowing the core to disable NIC
features based on/requiring MANGLEID, without any further intervention
from the driver.
The same functionality could be alternatively implemented adding a single
boolean flag to the struct net_device, but would require an additional
checks in ndo_features_check().
Also note that [1] is incorrect if the NIC additionally implements
NETIF_F_GSO_UDP_L4, mangleid_features transparently handle even such a
case.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/5a7cdaeea40b0a29b88e525b6c942d73ed3b8ce7.1769011015.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.19-rc7).
Conflicts:
drivers/net/ethernet/huawei/hinic3/hinic3_irq.c
b35a6fd37a00 ("hinic3: Add adaptive IRQ coalescing with DIM")
fb2bb2a1ebf7 ("hinic3: Fix netif_queue_set_napi queue_index input parameter error")
https://lore.kernel.org/fc0a7fdf08789a52653e8ad05281a0a849e79206.1768915707.git.zhuyikai1@h-partners.com
drivers/net/wireless/ath/ath12k/mac.c
drivers/net/wireless/ath/ath12k/wifi7/hw.c
31707572108d ("wifi: ath12k: Fix wrong P2P device link id issue")
c26f294fef2a ("wifi: ath12k: Move ieee80211_ops callback to the arch specific module")
https://lore.kernel.org/20260114123751.6a208818@canb.auug.org.au
Adjacent changes:
drivers/net/wireless/ath/ath12k/mac.c
8b8d6ee53dfd ("wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel")
914c890d3b90 ("wifi: ath12k: Add framework for hardware specific ieee80211_ops registration")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The stmmac implementation used by NXP for the i.MX8MP SoC is subject to
errata ERR050694. According to this errata, when no preamble byte is
transferred before the SFD from the PHY to the MAC, the MAC will discard
the frame.
Setting the PHY_F_KEEP_PREAMBLE_BEFORE_SFD flag instructs PHYs that
support it to keep the preamble byte before the SFD. This ensures that
the MAC successfully receives frames.
As this is an issue in the MAC implementation, only enable the flag for
the i.MX8MP SoC where the errata applies but not for other SoCs using a
working stmmac implementation.
The exact wording of the errata ERR050694 from NXP:
The IEEE 802.3 standard states that, in MII/GMII modes, the byte
preceding the SFD (0xD5), SMD-S (0xE6,0x4C, 0x7F, or 0xB3), or SMD-C
(0x61, 0x52, 0x9E, or 0x2A) byte can be a non-PREAMBLE byte or there can
be no preceding preamble byte. The MAC receiver must successfully
receive a packet without any preamble(0x55) byte preceding the SFD,
SMD-S, or SMD-C byte.
However due to the defect, in configurations where frame preemption is
enabled, when preamble byte does not precede the SFD, SMD-S, or SMD-C
byte, the received packet is discarded by the MAC receiver. This is
because, the start-of-packet detection logic of the MAC receiver
incorrectly checks for a preamble byte.
NXP refers to IEEE 802.3 where in clause 35.2.3.2.2 Receive case (GMII)
they show two tables one where the preamble is preceding the SFD and one
where it is not. The text says:
The operation of 1000 Mb/s PHYs can result in shrinkage of the preamble
between transmission at the source GMII and reception at the destination
GMII. Table 35-3 depicts the case where no preamble bytes are conveyed
across the GMII. This case may not be possible with a specific PHY, but
illustrates the minimum preamble with which MAC shall be able to
operate. Table 35-4 depicts the case where the entire preamble is
conveyed across the GMII.
This workaround was tested on a Verdin iMX8MP by enforcing 10 MBit/s:
ethtool -s end0 speed 10
Without keeping the preamble, no packet were received. With keeping the
preamble, everything worked as expected.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260120203905.23805-4-eichest@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new flag, PHY_F_KEEP_PREAMBLE_BEFORE_SFD, to indicate that the PHY
shall not remove the preamble before the SFD if it supports it. MACs
that do not support receiving frames without a preamble can set this
flag.
Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260120203905.23805-2-eichest@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
Subject: netfilter: updates for net-next
1) Speed up nftables transactions after earlier transaction failed.
Due to a (harmeless) bug we remained in slow paranoia mode until
a successful transaction completes.
2) Allow generic tracker to resolve clashes, this avoids very rare
packet drops. From Yuto Hamaguchi.
3) Increase the cleanup budget to 64 entries in nf_conncount to reap
more entries in one go, from Fernando Fernandez Mancera.
4) Allow icmp trackers to resolve clashes, this avoids very rare
initial packet drop with test cases that have high-frequency pings.
After this all trackers except tcp and sctp allow clash resolution.
5) Disentangle netfilter headers, don't include nftables/xtables headers
in subsystems that are unrelated.
6) Don't rely on implicit includes coming from nf_conntrack_proto_gre.h.
7) Allow nfnetlink_queue nfq instance struct to get accounted via memcg,
from Scott Mitchell.
8) Reject bogus xt target/match data upfront via netlink policiy in
nft_compat interface rather than relying on x_tables API to do it.
9) Fix nf_conncount breakage when trying to limit loopback flows via
prerouting rule, from Fernando Fernandez Mancera.
This is a recent breakage but not seen as urgent enough to rush this
via net tree at this late stage in development cycle.
10) Fix a possible off-by-one when parsing tcp option in xtables tcpmss
match. Also handled via -next due to late stage in development
cycle.
* tag 'nf-next-26-01-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: xt_tcpmss: check remaining length before reading optlen
netfilter: nf_conncount: fix tracking of connections from localhost
netfilter: nft_compat: add more restrictions on netlink attributes
netfilter: nfnetlink_queue: nfqnl_instance GFP_ATOMIC -> GFP_KERNEL_ACCOUNT allocation
netfilter: nf_conntrack: don't rely on implicit includes
netfilter: don't include xt and nftables.h in unrelated subsystems
netfilter: nf_conntrack: enable icmp clash support
netfilter: nf_conncount: increase the connection clean up limit to 64
netfilter: nf_conntrack: Add allow_clash to generic protocol handler
netfilter: nf_tables: reset table validation state on abort
====================
Link: https://patch.msgid.link/20260120191803.22208-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some drivers of MAC + tightly integrated PCS (example: SJA1105 + XPCS
covered by same reset domain) need to perform resets at runtime.
The reset is triggered by the MAC driver, and it needs to restore its
and the PCS' registers, all invisible to phylink.
However, there is a desire to simplify the API through which the MAC and
the PCS interact, so this becomes challenging.
Phylink holds all the necessary state to help with this operation, and
can offer two helpers which walk the MAC and PCS drivers again through
the callbacks required during a destructive reset operation. The
procedure is as follows:
Before reset, MAC driver calls phylink_replay_link_begin():
- Triggers phylink mac_link_down() and pcs_link_down() methods
After reset, MAC driver calls phylink_replay_link_end():
- Triggers phylink mac_config() -> pcs_config() -> mac_link_up() ->
pcs_link_up() methods.
MAC and PCS registers are restored with no other custom driver code.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260119121954.1624535-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Mediatek LynxI PCS is used from the MT7530 DSA driver (where it does
not have an OF presence) and from mtk_eth_soc, where it does
(Documentation/devicetree/bindings/net/pcs/mediatek,sgmiisys.yaml
informs of a combined clock provider + SGMII PCS "SGMIISYS" syscon
block).
Currently, mtk_eth_soc parses the SGMIISYS OF node for the
"mediatek,pnswap" property and sets a bit in the "flags" argument of
mtk_pcs_lynxi_create() if set.
I'd like to deprecate "mediatek,pnswap" in favour of a property which
takes the current phy-mode into consideration. But this is only known at
mtk_pcs_lynxi_config() time, and not known at mtk_pcs_lynxi_create(),
when the SGMIISYS OF node is parsed.
To achieve that, we must pass the OF node of the PCS, if it exists, to
mtk_pcs_lynxi_create(), and let the PCS take a reference on it and
handle property parsing whenever it wants.
Use the fwnode API which is more general than OF (in case we ever need
to describe the PCS using some other format). This API should be NULL
tolerant, so add no particular tests for the mt7530 case.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260119091220.1493761-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
clang does not inline this helper in GRO fast path.
We can save space and cpu cycles.
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/2 grow/shrink: 2/0 up/down: 156/-218 (-62)
Function old new delta
tcp6_gro_complete 227 311 +84
tcp4_gro_complete 325 397 +72
__pfx___skb_incr_checksum_unnecessary 32 - -32
__skb_incr_checksum_unnecessary 186 - -186
Total: Before=22592724, After=22592662, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260120164903.1912995-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We can change tcp_rsk() and inet_rsk() to propagate their argument
const qualifier thanks to container_of_const().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260120125353.1470456-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'add-devm_clk_bulk_get_optional_enable-helper-and-use-in-axi-ethernet-driver'
Suraj Gupta says:
====================
Add devm_clk_bulk_get_optional_enable() helper and use in AXI Ethernet driver
This patch series introduces a new managed clock framework helper function
and demonstrates its usage in AXI ethernet driver.
Device drivers frequently need to get optional bulk clocks, prepare them,
and enable them during probe, while ensuring automatic cleanup on device
unbind. Currently, this requires three separate operations with manual
cleanup handling.
The new devm_clk_bulk_get_optional_enable() helper combines these
operations into a single managed call, eliminating boilerplate code and
following the established pattern of devm_clk_bulk_get_all_enabled().
====================
Link: https://patch.msgid.link/20260116192725.972966-1-suraj.gupta2@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new managed clock framework helper function that combines getting
optional bulk clocks and enabling them in a single operation.
The devm_clk_bulk_get_optional_enable() function simplifies the common
pattern where drivers need to get optional bulk clocks, prepare and enable
them, and have them automatically disabled/unprepared and freed when the
device is unbound.
This new API follows the established pattern of
devm_clk_bulk_get_all_enabled() and reduces boilerplate code in drivers
that manage multiple optional clocks.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
Reviewed-by: Brian Masney <bmasney@redhat.com>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Link: https://patch.msgid.link/20260116192725.972966-2-suraj.gupta2@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
HIPPI has not been relevant for over two decades. It was rapidly
eclipsed by Fibre Channel, and even when it was new, it was
confined to very high-end hardware. The HIPPI code has only
received tree-wide changes and fixes by inspection in the entire
Git history. Remove HIPPI support and the rrunner HIPPI driver,
and move the former maintainer to the CREDITS file. Keep the
include/uapi/linux/if_hippi.h header because it is used by the TUN
code, and to avoid breaking userspace, however unlikely that may be.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260119022451.22344-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 77b9c4a438fc66e2ab004c411056b3fb71a54f2c, reversing
changes made to 4515ec4ad58a37e70a9e1256c0b993958c9b7497:
931420a2fc36 ("selftests/net: Add netkit container tests")
ab771c938d9a ("selftests/net: Make NetDrvContEnv support queue leasing")
6be87fbb2776 ("selftests/net: Add env for container based tests")
61d99ce3dfc2 ("selftests/net: Add bpf skb forwarding program")
920da3634194 ("netkit: Add xsk support for af_xdp applications")
eef51113f8af ("netkit: Add netkit notifier to check for unregistering devices")
b5ef109d22d4 ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create")
b5c3fa4a0b16 ("netkit: Add single device mode for netkit")
0073d2fd679d ("xsk: Proxy pool management for leased queues")
1ecea95dd3b5 ("xsk: Extend xsk_rcv_check validation")
804bf334d08a ("net: Proxy netdev_queue_get_dma_dev for leased queues")
0caa9a8ddec3 ("net: Proxy net_mp_{open,close}_rxq for leased queues")
ff8889ff9107 ("net, ethtool: Disallow leased real rxqs to be resized")
9e2103f36110 ("net: Add lease info to queue-get response")
31127deddef4 ("net: Implement netdev_nl_queue_create_doit")
a5546e18f77c ("net: Add queue-create operation")
The series will conflict with io_uring work, and the code needs more
polish.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
- A patch series from David Hildenbrand which fixes a few things
related to hugetlb PMD sharing
- The remainder are singletons, please see their changelogs for details
* tag 'mm-hotfixes-stable-2026-01-20-13-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: restore per-memcg proactive reclaim with !CONFIG_NUMA
mm/kfence: fix potential deadlock in reboot notifier
Docs/mm/allocation-profiling: describe sysctrl limitations in debug mode
mm: do not copy page tables unnecessarily for VM_UFFD_WP
mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather
mm/rmap: fix two comments related to huge_pmd_unshare()
mm/hugetlb: fix two comments related to huge_pmd_unshare()
mm/hugetlb: fix hugetlb_pmd_shared()
mm: remove unnecessary and incorrect mmap lock assert
x86/kfence: avoid writing L1TF-vulnerable PTEs
mm/vma: do not leak memory when .mmap_prepare swaps the file
migrate: correct lock ordering for hugetlb file folios
panic: only warn about deprecated panic_print on write access
fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
mm: take into account mm_cid size for mm_struct static definitions
mm: rename cpu_bitmap field to flexible_array
mm: add missing static initializer for init_mm::mm_cid.lock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
- minor fixes for the corner cases of the SWIOTLB pool management
(Robin Murphy)
* tag 'dma-mapping-6.19-2026-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma/pool: Avoid allocating redundant pools
mm_zone: Generalise has_managed_dma()
dma/pool: Improve pool lookup
|
|
Commit ab04b530e7e8 ("mm: introduce copy-on-fork VMAs and make
VM_MAYBE_GUARD one") aggregates flags checks in vma_needs_copy(),
including VM_UFFD_WP.
However in doing so, it incorrectly performed this check against src_vma.
This check was done on the assumption that all relevant flags are copied
upon fork.
However the userfaultfd logic is very innovative in that it implements
custom logic on fork in dup_userfaultfd(), including a rather well hidden
case where lacking UFFD_FEATURE_EVENT_FORK causes VM_UFFD_WP to not be
propagated to the destination VMA.
And indeed, vma_needs_copy(), prior to this patch, did check this property
on dst_vma, not src_vma.
Since all the other relevant flags are copied on fork, we can simply fix
this by checking against dst_vma.
While we're here, we fix a comment against VM_COPY_ON_FORK (noting that it
did indeed already reference dst_vma) to make it abundantly clear that we
must check against the destination VMA.
Link: https://lkml.kernel.org/r/20260114110006.1047071-1-lorenzo.stoakes@oracle.com
Fixes: ab04b530e7e8 ("mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Chris Mason <clm@meta.com>
Closes: https://lore.kernel.org/all/20260113231257.3002271-1-clm@meta.com/
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
mmu_gather
As reported, ever since commit 1013af4f585f ("mm/hugetlb: fix
huge_pmd_unshare() vs GUP-fast race") we can end up in some situations
where we perform so many IPI broadcasts when unsharing hugetlb PMD page
tables that it severely regresses some workloads.
In particular, when we fork()+exit(), or when we munmap() a large
area backed by many shared PMD tables, we perform one IPI broadcast per
unshared PMD table.
There are two optimizations to be had:
(1) When we process (unshare) multiple such PMD tables, such as during
exit(), it is sufficient to send a single IPI broadcast (as long as
we respect locking rules) instead of one per PMD table.
Locking prevents that any of these PMD tables could get reused before
we drop the lock.
(2) When we are not the last sharer (> 2 users including us), there is
no need to send the IPI broadcast. The shared PMD tables cannot
become exclusive (fully unshared) before an IPI will be broadcasted
by the last sharer.
Concurrent GUP-fast could walk into a PMD table just before we
unshared it. It could then succeed in grabbing a page from the
shared page table even after munmap() etc succeeded (and supressed
an IPI). But there is not difference compared to GUP-fast just
sleeping for a while after grabbing the page and re-enabling IRQs.
Most importantly, GUP-fast will never walk into page tables that are
no-longer shared, because the last sharer will issue an IPI
broadcast.
(if ever required, checking whether the PUD changed in GUP-fast
after grabbing the page like we do in the PTE case could handle
this)
So let's rework PMD sharing TLB flushing + IPI sync to use the mmu_gather
infrastructure so we can implement these optimizations and demystify the
code at least a bit. Extend the mmu_gather infrastructure to be able to
deal with our special hugetlb PMD table sharing implementation.
To make initialization of the mmu_gather easier when working on a single
VMA (in particular, when dealing with hugetlb), provide
tlb_gather_mmu_vma().
We'll consolidate the handling for (full) unsharing of PMD tables in
tlb_unshare_pmd_ptdesc() and tlb_flush_unshared_tables(), and track
in "struct mmu_gather" whether we had (full) unsharing of PMD tables.
Because locking is very special (concurrent unsharing+reuse must be
prevented), we disallow deferring flushing to tlb_finish_mmu() and instead
require an explicit earlier call to tlb_flush_unshared_tables().
From hugetlb code, we call huge_pmd_unshare_flush() where we make sure
that the expected lock protecting us from concurrent unsharing+reuse is
still held.
Check with a VM_WARN_ON_ONCE() in tlb_finish_mmu() that
tlb_flush_unshared_tables() was properly called earlier.
Document it all properly.
Notes about tlb_remove_table_sync_one() interaction with unsharing:
There are two fairly tricky things:
(1) tlb_remove_table_sync_one() is a NOP on architectures without
CONFIG_MMU_GATHER_RCU_TABLE_FREE.
Here, the assumption is that the previous TLB flush would send an
IPI to all relevant CPUs. Careful: some architectures like x86 only
send IPIs to all relevant CPUs when tlb->freed_tables is set.
The relevant architectures should be selecting
MMU_GATHER_RCU_TABLE_FREE, but x86 might not do that in stable
kernels and it might have been problematic before this patch.
Also, the arch flushing behavior (independent of IPIs) is different
when tlb->freed_tables is set. Do we have to enlighten them to also
take care of tlb->unshared_tables? So far we didn't care, so
hopefully we are fine. Of course, we could be setting
tlb->freed_tables as well, but that might then unnecessarily flush
too much, because the semantics of tlb->freed_tables are a bit
fuzzy.
This patch changes nothing in this regard.
(2) tlb_remove_table_sync_one() is not a NOP on architectures with
CONFIG_MMU_GATHER_RCU_TABLE_FREE that actually don't need a sync.
Take x86 as an example: in the common case (!pv, !X86_FEATURE_INVLPGB)
we still issue IPIs during TLB flushes and don't actually need the
second tlb_remove_table_sync_one().
This optimized can be implemented on top of this, by checking e.g., in
tlb_remove_table_sync_one() whether we really need IPIs. But as
described in (1), it really must honor tlb->freed_tables then to
send IPIs to all relevant CPUs.
Notes on TLB flushing changes:
(1) Flushing for non-shared PMD tables
We're converting from flush_hugetlb_tlb_range() to
tlb_remove_huge_tlb_entry(). Given that we properly initialize the
MMU gather in tlb_gather_mmu_vma() to be hugetlb aware, similar to
__unmap_hugepage_range(), that should be fine.
(2) Flushing for shared PMD tables
We're converting from various things (flush_hugetlb_tlb_range(),
tlb_flush_pmd_range(), flush_tlb_range()) to tlb_flush_pmd_range().
tlb_flush_pmd_range() achieves the same that
tlb_remove_huge_tlb_entry() would achieve in these scenarios.
Note that tlb_remove_huge_tlb_entry() also calls
__tlb_remove_tlb_entry(), however that is only implemented on
powerpc, which does not support PMD table sharing.
Similar to (1), tlb_gather_mmu_vma() should make sure that TLB
flushing keeps on working as expected.
Further, note that the ptdesc_pmd_pts_dec() in huge_pmd_share() is not a
concern, as we are holding the i_mmap_lock the whole time, preventing
concurrent unsharing. That ptdesc_pmd_pts_dec() usage will be removed
separately as a cleanup later.
There are plenty more cleanups to be had, but they have to wait until
this is fixed.
[david@kernel.org: fix kerneldoc]
Link: https://lkml.kernel.org/r/f223dd74-331c-412d-93fc-69e360a5006c@kernel.org
Link: https://lkml.kernel.org/r/20251223214037.580860-5-david@kernel.org
Fixes: 1013af4f585f ("mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race")
Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reported-by: Uschakow, Stanislav" <suschako@amazon.de>
Closes: https://lore.kernel.org/all/4d3878531c76479d9f8ca9789dc6485d@amazon.de/
Tested-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/hugetlb: fixes for PMD table sharing (incl. using
mmu_gather)", v3.
One functional fix, one performance regression fix, and two related
comment fixes.
I cleaned up my prototype I recently shared [1] for the performance fix,
deferring most of the cleanups I had in the prototype to a later point.
While doing that I identified the other things.
The goal of this patch set is to be backported to stable trees "fairly"
easily. At least patch #1 and #4.
Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing
Patch #2 + #3 are simple comment fixes that patch #4 interacts with.
Patch #4 is a fix for the reported performance regression due to excessive
IPI broadcasts during fork()+exit().
The last patch is all about TLB flushes, IPIs and mmu_gather.
Read: complicated
There are plenty of cleanups in the future to be had + one reasonable
optimization on x86. But that's all out of scope for this series.
Runtime tested, with a focus on fixing the performance regression using
the original reproducer [2] on x86.
This patch (of 4):
We switched from (wrongly) using the page count to an independent shared
count. Now, shared page tables have a refcount of 1 (excluding
speculative references) and instead use ptdesc->pt_share_count to identify
sharing.
We didn't convert hugetlb_pmd_shared(), so right now, we would never
detect a shared PMD table as such, because sharing/unsharing no longer
touches the refcount of a PMD table.
Page migration, like mbind() or migrate_pages() would allow for migrating
folios mapped into such shared PMD tables, even though the folios are not
exclusive. In smaps we would account them as "private" although they are
"shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the
pagemap interface.
Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared().
Link: https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org
Link: https://lkml.kernel.org/r/20251223214037.580860-2-david@kernel.org
Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [1]
Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [2]
Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count")
Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Tested-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Uschakow, Stanislav" <suschako@amazon.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
several netfilter compilation units rely on implicit includes
coming from nf_conntrack_proto_gre.h.
Clean this up and add the required dependencies where needed.
nf_conntrack.h requires net_generic() helper.
Place various gre/ppp/vlan includes to where they are needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
conntrack, xtables and nftables are distinct subsystems, don't use them
in other subystems.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Add a netdevice notifier in netkit to watch for NETDEV_UNREGISTER events.
If the target device is indeed NETREG_UNREGISTERING and previously leased
a queue to a netkit device, then collect the related netkit devices and
batch-unregister_netdevice_many() them.
If this would not be done, then the netkit device would hold a reference
on the physical device preventing it from going away. However, in case of
both io_uring zero-copy as well as AF_XDP this situation is handled
gracefully and the allocated resources are torn down.
In the case where mentioned infra is used through netkit, the applications
have a reference on netkit, and netkit in turn holds a reference on the
physical device. In order to have netkit release the reference on the
physical device, we need such watcher to then unregister the netkit ones.
This is generally quite similar to the dependency handling in case of
tunnels (e.g. vxlan bound to a underlying netdev) where the tunnel device
gets removed along with the physical device.
# ip a
[...]
4: enp10s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.2/24 scope global enp10s0f0np0
valid_lft forever preferred_lft forever
[...]
8: nk@NONE: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
[...]
# rmmod mlx5_ib
# rmmod mlx5_core
[ 309.261822] mlx5_core 0000:0a:00.0 mlx5_0: Port: 1 Link DOWN
[ 344.235236] mlx5_core 0000:0a:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[ 344.246948] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[ 344.463754] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[ 344.770155] mlx5_core 0000:0a:00.1: E-Switch: cleanup
[ 345.345709] mlx5_core 0000:0a:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[ 345.357524] mlx5_core 0000:0a:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[ 350.995989] mlx5_core 0000:0a:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
[ 351.574396] mlx5_core 0000:0a:00.0: E-Switch: cleanup
# ip a
[...]
[ both enp10s0f0np0 and nk gone ]
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260115082603.219152-12-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Above the while() loop in wait_sb_inodes(), we document that we must wait
for all pages under writeback for data integrity. Consequently, if a
mapping, like fuse, traditionally does not have data integrity semantics,
there is no need to wait at all; we can simply skip these inodes.
This restores fuse back to prior behavior where syncs are no-ops. This
fixes a user regression where if a system is running a faulty fuse server
that does not reply to issued write requests, this causes wait_sb_inodes()
to wait forever.
Link: https://lkml.kernel.org/r/20260105211737.4105620-2-joannelkoong@gmail.com
Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Tested-by: J. Neuschäfer <j.neuschaefer@gmx.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Bernd Schubert <bschubert@ddn.com>
Cc: Bonaccorso Salvatore <carnil@debian.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Both init_mm and efi_mm static definitions need to make room for the 2
mm_cid cpumasks.
This fixes possible out-of-bounds accesses to init_mm and efi_mm.
Add a space between # and define for the mm_alloc_cid() definition to make
it consistent with the coding style used in the rest of this header file.
Link: https://lkml.kernel.org/r/20251224173358.647691-4-mathieu.desnoyers@efficios.com
Fixes: af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Aboorva Devarajan <aboorvad@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Christan König <christian.koenig@amd.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Liam R . Howlett" <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Martin Liu <liumartin@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The cpu_bitmap flexible array now contains more than just the cpu_bitmap.
In preparation for changing the static mm_struct definitions to cover for
the additional space required, change the cpu_bitmap type from "unsigned
long" to "char", require an unsigned long alignment of the flexible array,
and rename the field from "cpu_bitmap" to "flexible_array".
Introduce the MM_STRUCT_FLEXIBLE_ARRAY_INIT macro to statically initialize
the flexible array. This covers the init_mm and efi_mm static
definitions.
This is a preparation step for fixing the missing mm_cid size for static
mm_struct definitions.
Link: https://lkml.kernel.org/r/20251224173358.647691-3-mathieu.desnoyers@efficios.com
Fixes: af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Aboorva Devarajan <aboorvad@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Christan König <christian.koenig@amd.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Liam R . Howlett" <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Martin Liu <liumartin@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
USB4 v2 link used in peer-to-peer networking is symmetric 80Gbps so in
order to support reading this link speed, add support for it to ethtool.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260115115646.328898-3-mika.westerberg@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, userspace can retrieve the DPLL working mode but cannot
configure it. This prevents changing the device operation, such as
switching from manual to automatic mode and vice versa.
Add a new callback .mode_set() to struct dpll_device_ops. Extend
the netlink policy and device-set command handling to process
the DPLL_A_MODE attribute. Update the netlink YAML specification
to include the mode attribute in the device-set operation.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260114122726.120303-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, the DPLL subsystem assumes that the only supported mode is
the one currently active on the device. When dpll_msg_add_mode_supported()
is called, it relies on ops->mode_get() and reports that single mode
to userspace. This prevents users from discovering other modes the device
might be capable of.
Add a new callback .supported_modes_get() to struct dpll_device_ops. This
allows drivers to populate a bitmap indicating all modes supported by
the hardware.
Update dpll_msg_add_mode_supported() to utilize this new callback:
* if ops->supported_modes_get is defined, use it to retrieve the full
bitmap of supported modes.
* if not defined, fall back to the existing behavior: retrieve
the current mode via ops->mode_get and set the corresponding bit
in the bitmap.
Finally, iterate over the bitmap and add a DPLL_A_MODE_SUPPORTED netlink
attribute for every set bit, accurately reporting the device's capabilities
to userspace.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260114122726.120303-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes and new device ids for 6.19-rc6
Included in here are:
- new usb-serial device ids
- dwc3-apple driver fixes to get things working properly on that
hardware platform
- ohci/uhci platfrom driver module soft-deps with ehci to remove a
runtime warning that sometimes shows up on some platforms.
- quirk for broken devices that can not handle reading the BOS
descriptor from them without going crazy.
- usb-serial driver fixes
- xhci driver fixes
- usb gadget driver fixes
All of these except for the last xhci fix has been in linux-next for a
while. The xhci fix has been reported by others to solve the issue for
them, so should be ok"
* tag 'usb-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: sideband: don't dereference freed ring when removing sideband endpoint
usb: gadget: uvc: retry vb2_reqbufs() with vb_vmalloc_memops if use_sg fail
usb: gadget: uvc: return error from uvcg_queue_init()
usb: gadget: uvc: fix interval_duration calculation
usb: gadget: uvc: fix req_payload_size calculation
usb: dwc3: apple: Ignore USB role switches to the active role
usb: host: xhci-tegra: Use platform_get_irq_optional() for wake IRQs
USB: OHCI/UHCI: Add soft dependencies on ehci_platform
usb: dwc3: apple: Set USB2 PHY mode before dwc3 init
USB: serial: f81232: fix incomplete serial port generation
USB: serial: ftdi_sio: add support for PICAXE AXE027 cable
USB: serial: option: add Telit LE910 MBIM composition
usb: core: add USB_QUIRK_NO_BOS for devices that hang on BOS descriptor
dt-bindings: usb: qcom,dwc3: Correct MSM8994 interrupts
dt-bindings: usb: qcom,dwc3: Correct IPQ5018 interrupts
tcpm: allow looking for role_sw device in the main node
usb: dwc3: Check for USB4 IP_NAME
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc deadline scheduler fixes, mainly for a new category of bugs that
were discovered and fixed recently:
- Fix a race condition in the DL server
- Fix a DL server bug which can result in incorrectly going idle when
there's work available
- Fix DL server bug which triggers a WARN() due to broken
get_prio_dl() logic and subsequent misbehavior
- Fix double update_rq_clock() calls
- Fix setscheduler() assumption about static priorities
- Make sure balancing callbacks are always called
- Plus a handful of preparatory commits for the fixes"
* tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Use ENQUEUE_MOVE to allow priority change
sched: Deadline has dynamic priority
sched: Audit MOVE vs balance_callbacks
sched: Fold rq-pin swizzle into __balance_callbacks()
sched/deadline: Avoid double update_rq_clock()
sched/deadline: Ensure get_prio_dl() is up-to-date
sched/deadline: Fix server stopping with runnable tasks
sched: Provide idle_rq() helper
sched/deadline: Fix potential race in dl_add_task_root_domain()
sched/deadline: Remove unnecessary comment in dl_add_task_root_domain()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fix from Bjorn Helgaas:
- Add a pci_free_irq_vectors() stub to fix a build issue when
CONFIG_PCI is not set (Boqun Feng)
* tag 'pci-v6.19-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Provide pci_free_irq_vectors() stub
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix an error path memory leak in the energy model management
code, fix a kerneldoc comment in it, and fix and revamp the energy
model YNL specification added recently along with the new energy model
management netlink interface (that received feedback after being
added):
- Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout)
- Fix stale description of the cost field in struct em_perf_state to
reflect the current code (Yaxiong Tian)
- Fix and revamp the energy model YNL specification added recently
along with the energy model netlink interface (Changwoo Min)"
* tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: EM: Add dump to get-perf-domains in the EM YNL spec
PM: EM: Change cpus' type from string to u64 array in the EM YNL spec
PM: EM: Rename em.yaml to dev-energymodel.yaml
PM: EM: Fix yamllint warnings in the EM YNL spec
PM: EM: Fix memory leak in em_create_pd() error path
PM: EM: Fix incorrect description of the cost field in struct em_perf_state
|
|
Merge fixes related to the energy model management for 6.19-rc6:
- Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout)
- Fix stale description of the cost field in struct em_perf_state to
reflect the current code (Yaxiong Tian)
- Fix and revamp the energy model YNL specification added recently
along with the energy model netlink interface (Changwoo Min)
* pm-em:
PM: EM: Add dump to get-perf-domains in the EM YNL spec
PM: EM: Change cpus' type from string to u64 array in the EM YNL spec
PM: EM: Rename em.yaml to dev-energymodel.yaml
PM: EM: Fix yamllint warnings in the EM YNL spec
PM: EM: Fix memory leak in em_create_pd() error path
PM: EM: Fix incorrect description of the cost field in struct em_perf_state
|
|
No user of PHY fixups unregisters these. IOW: The fixup unregistering
functions are unused and can be removed. Remove also documentation
for these functions. Whilst at it, remove also mentioning of
phy_register_fixup() from the Documentation, as this function has been
static since ea47e70e476f ("net: phy: remove fixup-related definitions
from phy.h which are not used outside phylib").
Fixup unregistering functions were added with f38e7a32ee4f
("phy: add phy fixup unregister functions") in 2016, and last user
was removed with 6782d06a47ad ("net: usb: lan78xx: Remove KSZ9031 PHY
fixup") in 2024.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ff8ac321-435c-48d0-b376-fbca80c0c22e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Vinod Koul says:
====================
phy common properties
Introduce "rx-polarity" and "tx-polarity" device tree properties
with Kunit tests (from Vladimir Oltean).
* tag 'phy_common_properties' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: add phy_get_rx_polarity() and phy_get_tx_polarity()
dt-bindings: phy-common-props: RX and TX lane polarity inversion
dt-bindings: phy-common-props: ensure protocol-names are unique
dt-bindings: phy-common-props: create a reusable "protocol-names" definition
dt-bindings: phy: rename transmit-amplitude.yaml to phy-common-props.yaml
====================
Link: https://patch.msgid.link/aWeXvFcGNK5T6As9@vaman
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|