summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2024-02-22udp: add local "peek offset enabled" flagPaolo Abeni1-0/+10
We want to re-organize the struct sock layout. The sk_peek_off field location is problematic, as most protocols want it in the RX read area, while UDP wants it on a cacheline different from sk_receive_queue. Create a local (inside udp_sock) copy of the 'peek offset is enabled' flag and place it inside the same cacheline of reader_queue. Check such flag before reading sk_peek_off. This will save potential false sharing and cache misses in the fast-path. Tested under UDP flood with small packets. The struct sock layout update causes a 4% performance drop, and this patch restores completely the original tput. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/67ab679c15fbf49fa05b3ffe05d91c47ab84f147.1708426665.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22net: phy: marvell-88q2xxx: add driver for the Marvell 88Q2220 PHYDimitri Fedrau1-0/+1
Add a driver for the Marvell 88Q2220. This driver allows to detect the link, switch between 100BASE-T1 and 1000BASE-T1 and switch between master and slave mode. Autonegotiation is supported. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Gregor Herburger <gregor.herburger@ew.tq-group.com> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-6-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22net: phy: Support 100/1000BT1 linkmode advertisementsDimitri Fedrau1-0/+8
Extend helper functions mii_t1_adv_m_mod_linkmode_t and linkmode_adv_to_mii_t1_adv_m_t to support 100BT1 and 1000BT1 linkmode advertisements. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-3-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21net: mdio: mdio-bcm-unimac: Manage clock around I/O accessesFlorian Fainelli1-0/+3
Up until now we have managed not to have the mdio-bcm-unimac manage its clock except during probe and suspend/resume. This works most of the time, except where it does not. With a fully modular build, we can get into a situation whereby the GENET driver is fully registered, and so is the mdio-bcm-unimac driver, however the Ethernet PHY driver is not yet, because it depends on a resource that is not yet available (e.g.: GPIO provider). In that state, the network device is not usable yet, and so to conserve power, the GENET driver will have turned off its "main" clock which feeds its MDIO controller. When the PHY driver finally probes however, we make an access to the PHY registers to e.g.: disable interrupts, and this causes a bus error within the MDIO controller space because the MDIO controller clock(s) are turned off. To remedy that, we manage the clock around all of the I/O accesses to the hardware which are done exclusively during read, write and clock divider configuration. This ensures that the register space is accessible, and this also ensures that there are not unnecessarily elevated reference counts keeping the clocks active when the network device is administratively turned off. It would be the case with the previous way of managing the clock. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21net: wan: framer: remove children from struct framer_ops kdocSimon Horman1-1/+0
Remove documentation of non-existent children field from the Kernel doc for struct framer_ops. Introduced by 82c944d05b1a ("net: wan: Add framer framework support") Signed-off-by: Simon Horman <horms@kernel.org> Acked-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21Merge tag 'wireless-next-2024-02-20' of ↵David S. Miller3-6/+120
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.9 The second "new features" pull request for v6.9. Lots of iwlwifi and stack changes this time. And naturally smaller changes to other drivers. We also twice merged wireless into wireless-next to avoid conflicts between the trees. Major changes: stack * mac80211: negotiated TTLM request support * SPP A-MSDU support * mac80211: wider bandwidth OFDMA config support iwlwifi * kunit tests * bump FW API to 89 for AX/BZ/SC devices * enable SPP A-MSDUs * support for new devices ath12k * refactoring in preparation for Multi-Link Operation (MLO) support * 1024 Block Ack window size support * provide firmware wmi logs via a trace event ath11k * 36 bit DMA mask support * support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) rtl8xxxu * TP-Link TL-WN823N V2 support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21net: wan: framer: constify of_phandle_args in xlateKrzysztof Kozlowski1-7/+7
The xlate callbacks are supposed to translate of_phandle_args to proper provider without modifying the of_phandle_args. Make the argument pointer to const for code safety and readability. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240217100306.86740-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-20net: skbuff: add overflow debug check to pull/push helpersFlorian Westphal1-0/+6
syzbot managed to trigger following splat: BUG: KASAN: use-after-free in __skb_flow_dissect+0x4a3b/0x5e50 Read of size 1 at addr ffff888208a4000e by task a.out/2313 [..] __skb_flow_dissect+0x4a3b/0x5e50 __skb_get_hash+0xb4/0x400 ip_tunnel_xmit+0x77e/0x26f0 ipip_tunnel_xmit+0x298/0x410 .. Analysis shows that the skb has a valid ->head, but bogus ->data pointer. skb->data gets its bogus value via the neigh layer, which does: 1556 __skb_pull(skb, skb_network_offset(skb)); ... and the skb was already dodgy at this point: skb_network_offset(skb) returns a negative value due to an earlier overflow of skb->network_header (u16). __skb_pull thus "adjusts" skb->data by a huge offset, pointing outside skb->head area. Allow debug builds to splat when we try to pull/push more than INT_MAX bytes. After this, the syzkaller reproducer yields a more precise splat before the flow dissector attempts to read off skb->data memory: WARNING: CPU: 5 PID: 2313 at include/linux/skbuff.h:2653 neigh_connected_output+0x28e/0x400 ip_finish_output2+0xb25/0xed0 iptunnel_xmit+0x4ff/0x870 ipgre_xmit+0x78e/0xbb0 Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240216113700.23013-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-20net: add netmem to skb_frag_tMina Almasry1-29/+71
Use struct netmem* instead of page in skb_frag_t. Currently struct netmem* is always a struct page underneath, but the abstraction allows efforts to add support for skb frags not backed by pages. There is unfortunately 1 instance where the skb_frag_t is assumed to be a exactly a bio_vec in kcm. For this case, WARN_ON_ONCE and return error before doing a cast. Add skb[_frag]_fill_netmem_*() and skb_add_rx_frag_netmem() helpers so that the API can be used to create netmem skbs. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-17net: phy: add PHY_EEE_CAP2_FEATURESHeiner Kallweit1-0/+2
As a prerequisite for adding EEE CAP2 register support, complement PHY_EEE_CAP1_FEATURES with PHY_EEE_CAP2_FEATURES. For now only 2500baseT and 5000baseT modes are supported. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-17net: mdio: add helpers for accessing the EEE CAP2 registersHeiner Kallweit1-0/+55
This adds helpers for accessing the EEE CAP2 registers. For now only 2500baseT and 5000baseT modes are supported. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski11-18/+75
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/dev.c 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()") 723de3ebef03 ("net: free altname using an RCU callback") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") drivers/net/ethernet/renesas/ravb_main.c ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path" ) c2da9408579d ("ravb: Add Rx checksum offload support for GbEth") net/mptcp/protocol.c bdd70eb68913 ("mptcp: drop the push_pending field") 28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15Merge tag 'net-6.8-rc5' of ↵Linus Torvalds2-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can, wireless and netfilter. Current release - regressions: - af_unix: fix task hung while purging oob_skb in GC - pds_core: do not try to run health-thread in VF path Current release - new code bugs: - sched: act_mirred: don't zero blockid when net device is being deleted Previous releases - regressions: - netfilter: - nat: restore default DNAT behavior - nf_tables: fix bidirectional offload, broken when unidirectional offload support was added - openvswitch: limit the number of recursions from action sets - eth: i40e: do not allow untrusted VF to remove administratively set MAC address Previous releases - always broken: - tls: fix races and bugs in use of async crypto - mptcp: prevent data races on some of the main socket fields, fix races in fastopen handling - dpll: fix possible deadlock during netlink dump operation - dsa: lan966x: fix crash when adding interface under a lag when some of the ports are disabled - can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Misc: - a handful of fixes and reliability improvements for selftests - fix sysfs documentation missing net/ in paths - finish the work of squashing the missing MODULE_DESCRIPTION() warnings in networking" * tag 'net-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits) net: fill in MODULE_DESCRIPTION()s for missing arcnet net: fill in MODULE_DESCRIPTION()s for mdio_devres net: fill in MODULE_DESCRIPTION()s for ppp net: fill in MODULE_DESCRIPTION()s for fddik/skfp net: fill in MODULE_DESCRIPTION()s for plip net: fill in MODULE_DESCRIPTION()s for ieee802154/fakelb net: fill in MODULE_DESCRIPTION()s for xen-netback net: ravb: Count packets instead of descriptors in GbEth RX path pppoe: Fix memory leak in pppoe_sendmsg() net: sctp: fix skb leak in sctp_inq_free() net: bcmasp: Handle RX buffer allocation failure net-timestamp: make sk_tskey more predictable in error path selftests: tls: increase the wait in poll_partial_rec_async ice: Add check for lport extraction to LAG init netfilter: nf_tables: fix bidirectional offload regression netfilter: nat: restore default DNAT behavior netfilter: nft_set_pipapo: fix missing : in kdoc igc: Remove temporary workaround igb: Fix string truncation warnings in igb_set_fw_version can: netlink: Fix TDCO calculation using the old data bittiming ...
2024-02-15update workarounds for gcc "asm goto" issueLinus Torvalds2-4/+12
In commit 4356e9f841f7 ("work around gcc bugs with 'asm goto' with outputs") I did the gcc workaround unconditionally, because the cause of the bad code generation wasn't entirely clear. In the meantime, Jakub Jelinek debugged the issue, and has come up with a fix in gcc [2], which also got backported to the still maintained branches of gcc-11, gcc-12 and gcc-13. Note that while the fix technically wasn't in the original gcc-14 branch, Jakub says: "while it is true that no GCC 14 snapshots until today (or whenever the fix will be committed) have the fix, for GCC trunk it is up to the distros to use the latest snapshot if they use it at all and would allow better testing of the kernel code without the workaround, so that if there are other issues they won't be discovered years later. Most userland code doesn't actually use asm goto with outputs..." so we will consider gcc-14 to be fixed - if somebody is using gcc snapshots of the gcc-14 before the fix, they should upgrade. Note that while the bug goes back to gcc-11, in practice other gcc changes seem to have effectively hidden it since gcc-12.1 as per a bisect by Jakub. So even a gcc-14 snapshot without the fix likely doesn't show actual problems. Also, make the default 'asm_goto_output()' macro mark the asm as volatile by hand, because of an unrelated gcc issue [1] where it doesn't match the documented behavior ("asm goto is always volatile"). Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 [2] Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Requested-by: Jakub Jelinek <jakub@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-15net: ipv6/addrconf: introduce a regen_min_advance sysctlAlex Henrie1-0/+1
In RFC 8981, REGEN_ADVANCE cannot be less than 2 seconds, and the RFC does not permit the creation of temporary addresses with lifetimes shorter than that: > When processing a Router Advertisement with a > Prefix Information option carrying a prefix for the purposes of > address autoconfiguration (i.e., the A bit is set), the host MUST > perform the following steps: > 5. A temporary address is created only if this calculated preferred > lifetime is greater than REGEN_ADVANCE time units. However, some users want to change their IPv6 address as frequently as possible regardless of the RFC's arbitrary minimum lifetime. For the benefit of those users, add a regen_min_advance sysctl parameter that can be set to below or above 2 seconds. Link: https://datatracker.ietf.org/doc/html/rfc8981 Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15net: mdio_bus: make mdio_bus_type constRicardo B. Marliere1-1/+1
Since commit d492cc2573a0 ("driver core: device.h: make struct bus_type a const *"), the driver core can properly handle constant struct bus_type, move the mdio_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20240213-bus_cleanup-mdio-v1-1-f9e799da7fda@marliere.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-15Merge tag 'mips-fixes_6.8_2' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - Fix for broken ipv6 checksums - Fix handling of exceptions in delay slots * tag 'mips-fixes_6.8_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: mm/memory: Use exception ip to search exception tables MIPS: Clear Cause.BD in instruction_pointer_set ptrace: Introduce exception_ip arch hook MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
2024-02-14net: remove dev_base_lockEric Dumazet1-2/+0
dev_base_lock is not needed anymore, all remaining users also hold RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net: add netdev_set_operstate() helperEric Dumazet2-1/+3
dev_base_lock is going away, add netdev_set_operstate() helper so that hsr does not have to know core internals. Remove dev_base_lock acquisition from rfc2863_policy() v3: use an "unsigned int" for dev->operstate, so that try_cmpxchg() can work on all arches. ( https://lore.kernel.org/oe-kbuild-all/202402081918.OLyGaea3-lkp@intel.com/ ) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14net: convert dev->reg_state to u8Eric Dumazet1-9/+14
Prepares things so that dev->reg_state reads can be lockless, by adding WRITE_ONCE() on write side. READ_ONCE()/WRITE_ONCE() do not support bitfields. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-14veth: rely on skb_pp_cow_data utility routineLorenzo Bianconi1-0/+2
Rely on skb_pp_cow_data utility routine and remove duplicated code. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/029cc14cce41cb242ee7efdcf32acc81f1ce4e9f.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-14xdp: add multi-buff support for xdp running in generic modeLorenzo Bianconi1-0/+2
Similar to native xdp, do not always linearize the skb in netif_receive_generic_xdp routine but create a non-linear xdp_buff to be processed by the eBPF program. This allow to add multi-buffer support for xdp running in generic mode. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/1044d6412b1c3e95b40d34993fd5f37cd2f319fd.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-14xdp: rely on skb pointer reference in do_xdp_generic and ↵Lorenzo Bianconi1-1/+1
netif_receive_generic_xdp Rely on skb pointer reference instead of the skb pointer in do_xdp_generic and netif_receive_generic_xdp routine signatures. This is a preliminary patch to add multi-buff support for xdp running in generic mode where we will need to reallocate the skb to avoid linearization and we will need to make it visible to do_xdp_generic() caller. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/c09415b1f48c8620ef4d76deed35050a7bddf7c2.1707729884.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13ptrace: Introduce exception_ip arch hookJiaxun Yang1-0/+4
On architectures with delay slot, architecture level instruction pointer (or program counter) in pt_regs may differ from where exception was triggered. Introduce exception_ip hook to invoke architecture code and determine actual instruction pointer to the exception. Link: https://lore.kernel.org/lkml/00d1b813-c55f-4365-8d81-d70258e10b16@app.fastmail.com/ Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-02-12Merge tag 'vfs-6.8-rc5.fixes' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix performance regression introduced by moving the security permission hook out of do_clone_file_range() and into its caller vfs_clone_file_range(). This causes the security hook to be called in situation were it wasn't called before as the fast permission checks were left in do_clone_file_range(). Fix this by merging the two implementations back together and restoring the old ordering: fast permission checks first, expensive ones later. - Tweak mount_setattr() permission checking so that mount properties on the real rootfs can be changed. When we added mount_setattr() we added additional checks compared to legacy mount(2). If the mount had a parent then verify that the caller and the mount namespace the mount is attached to match and if not make sure that it's an anonymous mount. But the real rootfs falls into neither category. It is neither an anoymous mount because it is obviously attached to the initial mount namespace but it also obviously doesn't have a parent mount. So that means legacy mount(2) allows changing mount properties on the real rootfs but mount_setattr(2) blocks this. This causes regressions (See the commit for details). Fix this by relaxing the check. If the mount has a parent or if it isn't a detached mount, verify that the mount namespaces of the caller and the mount are the same. Technically, we could probably write this even simpler and check that the mount namespaces match if it isn't a detached mount. But the slightly longer check makes it clearer what conditions one needs to think about. * tag 'vfs-6.8-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: relax mount_setattr() permission checks remap_range: merge do_clone_file_range() into vfs_clone_file_range()
2024-02-12net-device: move lstats in net_device_read_txrxEric Dumazet1-5/+5
dev->lstats is notably used from loopback ndo_start_xmit() and other virtual drivers. Per cpu stats updates are dirtying per-cpu data, but the pointer itself is read-only. Fixes: 43a71cd66b9c ("net-device: reorganize net_device fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12tcp: move tp->tcp_usec_ts to tcp_sock_read_txrx groupEric Dumazet1-2/+2
tp->tcp_usec_ts is a read mostly field, used in rx and tx fast paths. Fixes: d5fed5addb2b ("tcp: reorganize tcp_sock fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Wei Wang <weiwan@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12tcp: move tp->scaling_ratio to tcp_sock_read_txrx groupEric Dumazet1-1/+1
tp->scaling_ratio is a read mostly field, used in rx and tx fast paths. Fixes: d5fed5addb2b ("tcp: reorganize tcp_sock fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Wei Wang <weiwan@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-11Merge tag 'timers_urgent_for_v6.8_rc4' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Make sure a warning is issued when a hrtimer gets queued after the timers have been migrated on the CPU down path and thus said timer will get ignored * tag 'timers_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Report offline hrtimer enqueue
2024-02-10Merge tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linuxLinus Torvalds1-2/+5
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Update a potentially stale firmware attribute (Maurizio) - Fixes for the recent verbose error logging (Keith, Chaitanya) - Protection information payload size fix for passthrough (Francis) - Fix for a queue freezing issue in virtblk (Yi) - blk-iocost underflow fix (Tejun) - blk-wbt task detection fix (Jan) * tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux: virtio-blk: Ensure no requests in virtqueues before deleting vqs. blk-iocost: Fix an UBSAN shift-out-of-bounds warning nvme: use ns->head->pi_size instead of t10_pi_tuple structure size nvme-core: fix comment to reflect right functions nvme: move passthrough logging attribute to head blk-wbt: Fix detection of dirty-throttled tasks nvme-host: fix the updating of the firmware version
2024-02-10net: phy: provide whether link has changed in c37_read_statusChristian Marangi1-1/+1
Some PHY driver might require additional regs call after genphy_c37_read_status() is called. Expand genphy_c37_read_status to provide a bool wheather the link has changed or not to permit PHY driver to skip additional regs call if nothing has changed. Every user of genphy_c37_read_status() is updated with the new additional bool. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-10net: phy: add devm/of_phy_package_join helperChristian Marangi1-0/+6
Add devm/of_phy_package_join helper to join PHYs in a PHY package. These are variant of the manual phy_package_join with the difference that these will use DT nodes to derive the base_addr instead of manually passing an hardcoded value. An additional value is added in phy_package_shared, "np" to reference the PHY package node pointer in specific PHY driver probe_once and config_init_once functions to make use of additional specific properties defined in the PHY package node in DT. The np value is filled only with of_phy_package_join if a valid PHY package node is found. A valid PHY package node must have the node name set to "ethernet-phy-package". Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-10Merge tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds2-2/+3
Pull ceph fixes from Ilya Dryomov: "Some fscrypt-related fixups (sparse reads are used only for encrypted files) and two cap handling fixes from Xiubo and Rishabh" * tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client: ceph: always check dir caps asynchronously ceph: prevent use-after-free in encode_cap_msg() ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT libceph: just wait for more data to be available on the socket libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*() libceph: fail sparse-read if the data length doesn't match
2024-02-10work around gcc bugs with 'asm goto' with outputsLinus Torvalds2-2/+21
We've had issues with gcc and 'asm goto' before, and we created a 'asm_volatile_goto()' macro for that in the past: see commits 3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation bug") and a9f180345f53 ("compiler/gcc4: Make quirk for asm_volatile_goto() unconditional"). Then, much later, we ended up removing the workaround in commit 43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR 58670") because we no longer supported building the kernel with the affected gcc versions, but we left the macro uses around. Now, Sean Christopherson reports a new version of a very similar problem, which is fixed by re-applying that ancient workaround. But the problem in question is limited to only the 'asm goto with outputs' cases, so instead of re-introducing the old workaround as-is, let's rename and limit the workaround to just that much less common case. It looks like there are at least two separate issues that all hit in this area: (a) some versions of gcc don't mark the asm goto as 'volatile' when it has outputs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420 which is easy to work around by just adding the 'volatile' by hand. (b) Internal compiler errors: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422 which are worked around by adding the extra empty 'asm' as a barrier, as in the original workaround. but the problem Sean sees may be a third thing since it involves bad code generation (not an ICE) even with the manually added 'volatile'. but the same old workaround works for this case, even if this feels a bit like voodoo programming and may only be hiding the issue. Reported-and-tested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-09Merge tag 'efi-fixes-for-v6.8-1' of ↵Linus Torvalds1-0/+23
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "The only notable change here is the patch that changes the way we deal with spurious errors from the EFI memory attribute protocol. This will be backported to v6.6, and is intended to ensure that we will not paint ourselves into a corner when we tighten this further in order to comply with MS requirements on signed EFI code. Note that this protocol does not currently exist in x86 production systems in the field, only in Microsoft's fork of OVMF, but it will be mandatory for Windows logo certification for x86 PCs in the future. - Tighten ELF relocation checks on the RISC-V EFI stub - Give up if the new EFI memory attributes protocol fails spuriously on x86 - Take care not to place the kernel in the lowest 16 MB of DRAM on x86 - Omit special purpose EFI memory from memblock - Some fixes for the CXL CPER reporting code - Make the PE/COFF layout of mixed-mode capable images comply with a strict interpretation of the spec" * tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section cxl/trace: Remove unnecessary memcpy's cxl/cper: Fix errant CPER prints for CXL events efi: Don't add memblocks for soft-reserved memory efi: runtime: Fix potential overflow of soft-reserved region size efi/libstub: Add one kernel-doc comment x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR x86/efistub: Give up if memory attribute protocol returns an error riscv/efistub: Tighten ELF relocation check riscv/efistub: Ensure GP-relative addressing is not used
2024-02-09wwan: core: Add WWAN fastboot port typeJinjian Song1-0/+2
Add a new WWAN port that connects to the device fastboot protocol interface. Signed-off-by: Jinjian Song <jinjian.song@fibocom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-9/+9
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/stmicro/stmmac/common.h 38cc3c6dcc09 ("net: stmmac: protect updates of 64-bit statistics counters") fd5a6a71313e ("net: stmmac: est: Per Tx-queue error count for HLBF") c5c3e1bfc9e0 ("net: stmmac: Offload queueMaxSDU from tc-taprio") drivers/net/wireless/microchip/wilc1000/netdev.c c9013880284d ("wifi: fill in MODULE_DESCRIPTION()s for wilc1000") 328efda22af8 ("wifi: wilc1000: do not realloc workqueue everytime an interface is added") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 1279f9d9dec2 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08wifi: mac80211: adjust EHT capa when lowering bandwidthJohannes Berg1-0/+3
If intending to associate with a lower bandwidth, remove capabilities related to 320 MHz from the EHT capabilities element. Also change the EHT MCS-NSS set accordingly: if just reducing 320->160 or similar the format doesn't change, just cut off the last bytes. If changing from higher bandwidth to 20 MHz only EHT STA, adjust the format. Note that this also requires adjusting the caller in mlme.c since the data written can now be shorter than it determined. We need to clean all that up. Since the other callers pass NULL for the conn limit, we don't need to change things there. Link: https://msgid.link/20240129202041.b5f6df108c77.I0d8ea04079c61cb3744cc88625eeaf0d4776dc2b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08wifi: mac80211: implement MLO multicast deduplicationJohannes Berg1-0/+5
If the vif is an MLD then it may receive multicast from different links, and should drop those frames according to the SN. Implement that. Link: https://msgid.link/20240129200456.693b77d14b44.I491846f2bea0058c14eab6422962c10bfae9b675@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08wifi: mac80211: add/use ieee80211_get_sn()Johannes Berg1-1/+6
This will also be useful for MLO duplicate multicast detection, but add it already here and use it in one place that trivially converts. Link: https://msgid.link/20240129200456.f0ff49c80006.I850d2785ab1640e56e262d3ad7343b87f6962552@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08wifi: mac80211: refactor puncturing bitmap extractionJohannes Berg1-0/+16
Add a new inline helper function to ieee80211.h to extract the disabled subchannels bitmap from an EHT operation element, and use that in mac80211 where we do that. Link: https://msgid.link/20240129194108.d9f50dcec8d0.I8b08cbc2490a734fafcce0fa0fc328211ba6f10b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-02-08Merge tag 'mlx5-updates-2024-02-01' of ↵Jakub Kicinski2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2024-02-01 1) IPSec global stats for xfrm and mlx5 2) XSK memory improvements for non-linear SKBs 3) Software steering debug dump to use seq_file ops 4) Various code clean-ups * tag 'mlx5-updates-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: XDP, Exclude headroom and tailroom from memory calculations net/mlx5e: XSK, Exclude tailroom from non-linear SKBs memory calculations net/mlx5: DR, Change SWS usage to debug fs seq_file interface net/mlx5: Change missing SyncE capability print to debug net/mlx5: Remove initial segmentation duplicate definitions net/mlx5: Return specific error code for timeout on wait_fw_init net/mlx5: SF, Stop waiting for FW as teardown was called net/mlx5: remove fw reporter dump option for non PF net/mlx5: remove fw_fatal reporter dump option for non PF net/mlx5: Rename mlx5_sf_dev_remove Documentation: Fix counter name of mlx5 vnic reporter net/mlx5e: Delete obsolete IPsec code net/mlx5e: Connect mlx5 IPsec statistics with XFRM core xfrm: get global statistics from the offloaded device xfrm: generalize xdo_dev_state_update_curlft to allow statistics update ==================== Link: https://lore.kernel.org/r/20240206005527.1353368-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07net: Do not return value from init_dummy_netdev()Amit Cohen1-1/+1
init_dummy_netdev() always returns zero and all the callers do not check the returned value. Set the function to not return value, as it is not really used today. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240205103022.440946-1-amcohen@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07libceph: just wait for more data to be available on the socketXiubo Li1-1/+1
A short read may occur while reading the message footer from the socket. Later, when the socket is ready for another read, the messenger invokes all read_partial_*() handlers, including read_partial_sparse_msg_data(). The expectation is that read_partial_sparse_msg_data() would bail, allowing the messenger to invoke read_partial() for the footer and pick up where it left off. However read_partial_sparse_msg_data() violates that and ends up calling into the state machine in the OSD client. The sparse-read state machine assumes that it's a new op and interprets some piece of the footer as the sparse-read header and returns bogus extents/data length, etc. To determine whether read_partial_sparse_msg_data() should bail, let's reuse cursor->total_resid. Because once it reaches to zero that means all the extents and data have been successfully received in last read, else it could break out when partially reading any of the extents and data. And then osd_sparse_read() could continue where it left off. [ idryomov: changelog ] Link: https://tracker.ceph.com/issues/63586 Fixes: d396f89db39a ("libceph: add sparse read support to msgr1") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-02-07libceph: fail sparse-read if the data length doesn't matchXiubo Li1-1/+2
Once this happens that means there have bugs. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-02-07net: phy: add helper phy_advertise_eee_allHeiner Kallweit1-0/+1
Per default phylib preserves the EEE advertising at the time of phy probing. The EEE advertising can be changed from user space, in addition this helper allows to set the EEE advertising to all supported modes from drivers in kernel space. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20bfc471-aeeb-4ae4-ba09-7d6d4be6b86a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-06blk-wbt: Fix detection of dirty-throttled tasksJan Kara1-2/+5
The detection of dirty-throttled tasks in blk-wbt has been subtly broken since its beginning in 2016. Namely if we are doing cgroup writeback and the throttled task is not in the root cgroup, balance_dirty_pages() will set dirty_sleep for the non-root bdi_writeback structure. However blk-wbt checks dirty_sleep only in the root cgroup bdi_writeback structure. Thus detection of recently throttled tasks is not working in this case (we noticed this when we switched to cgroup v2 and suddently writeback was slow). Since blk-wbt has no easy way to get to proper bdi_writeback and furthermore its intention has always been to work on the whole device rather than on individual cgroups, just move the dirty_sleep timestamp from bdi_writeback to backing_dev_info. That fixes the checking for recently throttled task and saves memory for everybody as a bonus. CC: stable@vger.kernel.org Fixes: b57d74aff9ab ("writeback: track if we're sleeping on progress in balance_dirty_pages()") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240123175826.21452-1-jack@suse.cz [axboe: fixup indentation errors] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-06remap_range: merge do_clone_file_range() into vfs_clone_file_range()Amir Goldstein1-3/+0
commit dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") moved the permission hooks from do_clone_file_range() out to its caller vfs_clone_file_range(), but left all the fast sanity checks in do_clone_file_range(). This makes the expensive security hooks be called in situations that they would not have been called before (e.g. fs does not support clone). The only reason for the do_clone_file_range() helper was that overlayfs did not use to be able to call vfs_clone_file_range() from copy up context with sb_writers lock held. However, since commit c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held"), overlayfs just uses an open coded version of vfs_clone_file_range(). Merge_clone_file_range() into vfs_clone_file_range(), restoring the original order of checks as it was before the regressing commit and adapt the overlayfs code to call vfs_clone_file_range() before the permission hooks that were added by commit ca7ab482401c ("ovl: add permission hooks outside of do_splice_direct()"). Note that in the merge of do_clone_file_range(), the file_start_write() context was reduced to cover ->remap_file_range() without holding it over the permission hooks, which was the reason for doing the regressing commit in the first place. Reported-and-tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401312229.eddeb9a6-oliver.sang@intel.com Fixes: dfad37051ade ("remap_range: move permission hooks out of do_clone_file_range()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20240202102258.1582671-1-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-06net: phy: constify phydev->drvRussell King (Oracle)1-1/+1
Device driver structures are shared between all devices that they match, and thus nothing should never write to the device driver structure through the phydev->drv pointer. Let's make this pointer const to catch code that attempts to do so. Suggested-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1rVxXt-002YqY-9G@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-06hrtimer: Report offline hrtimer enqueueFrederic Weisbecker1-1/+3
The hrtimers migration on CPU-down hotplug process has been moved earlier, before the CPU actually goes to die. This leaves a small window of opportunity to queue an hrtimer in a blind spot, leaving it ignored. For example a practical case has been reported with RCU waking up a SCHED_FIFO task right before the CPUHP_AP_IDLE_DEAD stage, queuing that way a sched/rt timer to the local offline CPU. Make sure such situations never go unnoticed and warn when that happens. Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240129235646.3171983-4-boqun.feng@gmail.com