Age | Commit message (Collapse) | Author | Files | Lines |
|
These functions can return EPROBE_DEFER. Don't warn in such a case.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20240930181823.288892-3-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We can just use res directly.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20240930181823.288892-2-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch reverts two TOMOYO patches that were merged into Linus' tree
during the v6.12 merge window:
8b985bbfabbe ("tomoyo: allow building as a loadable LSM module")
268225a1de1a ("tomoyo: preparation step for building as a loadable LSM module")
Together these two patches introduced the CONFIG_SECURITY_TOMOYO_LKM
Kconfig build option which enabled a TOMOYO specific dynamic LSM loading
mechanism (see the original commits for more details). Unfortunately,
this approach was widely rejected by the LSM community as well as some
members of the general kernel community. Objections included concerns
over setting a bad precedent regarding individual LSMs managing their
LSM callback registrations as well as general kernel symbol exporting
practices. With little to no support for the CONFIG_SECURITY_TOMOYO_LKM
approach outside of Tetsuo, and multiple objections, we need to revert
these changes.
Link: https://lore.kernel.org/all/0c4b443a-9c72-4800-97e8-a3816b6a9ae2@I-love.SAKURA.ne.jp
Link: https://lore.kernel.org/all/CAHC9VhR=QjdoHG3wJgHFJkKYBg7vkQH2MpffgVzQ0tAByo_wRg@mail.gmail.com
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
alloc_ltalkdev in net/appletalk/dev.c is dead since
commit 00f3696f7555 ("net: appletalk: remove cops support")
Removing it (and it's helper) leaves dev.c and if_ltalk.h empty;
remove them and the Makefile entry.
tun.c was including that if_ltalk.h but actually wanted
the uapi version for LTALK_ALEN, fix up the path.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the Microsoft Azure Cobalt 100 CPU to the list of CPUs suffering
from erratum 3194386 added in commit 75b3c43eab59 ("arm64: errata:
Expand speculative SSBS workaround")
CC: Mark Rutland <mark.rutland@arm.com>
CC: James More <james.morse@arm.com>
CC: Will Deacon <will@kernel.org>
CC: stable@vger.kernel.org # 6.6+
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://lore.kernel.org/r/20241003225239.321774-1-eahariha@linux.microsoft.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
We received a regression report for System76 Pangolin (pang14) due to
the recent fix for Tuxedo Sirius devices to support the top speaker.
The reason was the conflicting PCI SSID, as often seen.
As a workaround, now the codec SSID is checked and the quirk is
applied conditionally only to Sirius devices.
Fixes: 4178d78cd7a8 ("ALSA: hda/conexant: Add pincfg quirk to enable top speakers on Sirius devices")
Reported-by: Christian Heusel <christian@heusel.eu>
Reported-by: Jerry <jerryluo225@gmail.com>
Closes: https://lore.kernel.org/c930b6a6-64e5-498f-b65a-1cd5e0a1d733@heusel.eu
Link: https://patch.msgid.link/20241004082602.29016-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Add hw monitor volume control for POD HD500X. This is done adding
LINE6_CAP_HWMON_CTL to the capabilities
Signed-off-by: Hans P. Moller <hmoller@uc.cl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20241003232828.5819-1-hmoller@uc.cl
|
|
If get_bpos() fails, it is likely that the corresponding error code should
be returned.
Fixes: a6970bb1dd99 ("ALSA: gus: Convert to the new PCM ops")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://patch.msgid.link/d9ca841edad697154afa97c73a5d7a14919330d9.1727984008.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Restore pci state on resume (Rodrigo Vivi)
- Fix locking on submission, queue and vm (Matthew Auld, Matthew Brost)
- Fix UAF on queue destruction (Matthew Auld)
- Fix resource release on freq init error path (He Lugang)
- Use rw_semaphore to reduce contention on ASID->VM lookup (Matthew Brost)
- Fix steering for media on Xe2_HPM (Gustavo Sousa)
- Tuning updates to Xe2 (Gustavo Sousa)
- Resume TDR after GT reset to prevent jobs running forever (Matthew Brost)
- Move id allocation to avoid userspace using a guessed number
to trigger UAF (Matthew Auld, Matthew Brost)
- Fix OA stream close preventing pbatch buffers to complete (José)
- Fix NPD when migrating memory on LNL (Zhanjun Dong)
- Fix memory leak when aborting binds (Matthew Brost)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2fiv63yanlal5mpw3mxtotte6yvkvtex74c7mkjxca4bazlyja@o4iejcfragxy
|
|
Allow tracking of packets sent with send_subcrq direct vs
indirect. `ethtool -S <dev>` will now provide a counter
of the number of uses of each xmit method. This metric will
be useful in performance debugging.
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241001163531.1803152-1-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-09-30 (ice, idpf)
This series contains updates to ice and idpf drivers:
For ice:
Michal corrects setting of dst VSI on LAN filters and adds clearing of
port VLAN configuration during reset.
Gui-Dong Han corrects failures to decrement refcount in some error
paths.
Przemek resolves a memory leak in ice_init_tx_topology().
Arkadiusz prevents setting of DPLL_PIN_STATE_SELECTABLE to an improper
value.
Dave stops clearing of VLAN tracking bit to allow for VLANs to be properly
restored after reset.
For idpf:
Ahmed sets uninitialized dyn_ctl_intrvl_s value.
Josh corrects use and reporting of mailbox size.
Larysa corrects order of function calls during de-initialization.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: deinit virtchnl transaction manager after vport and vectors
idpf: use actual mbx receive payload length
idpf: fix VF dynamic interrupt ctl register initialization
ice: fix VLAN replay after reset
ice: disallow DPLL_PIN_STATE_SELECTABLE for dpll output pins
ice: fix memleak in ice_init_tx_topology()
ice: clear port vlan config during reset
ice: Fix improper handling of refcount in ice_sriov_set_msix_vec_count()
ice: Fix improper handling of refcount in ice_dpll_init_rclk_pins()
ice: set correct dst VSI in only LAN filters
====================
Link: https://patch.msgid.link/20240930223601.3137464-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull Rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Fix/improve a couple 'depends on' on the newly added CFI/KASAN
suppport to avoid build errors/warnings
- Fix ARCH_SLAB_MINALIGN multiple definition error for RISC-V under
!CONFIG_MMU
- Clean upcoming (Rust 1.83.0) Clippy warnings
'kernel' crate:
- 'sync' module: fix soundness issue by requiring 'T: Sync' for
'LockedBy::access'; and fix helpers build error under PREEMPT_RT
- Fix trivial sorting issue ('rustfmtcheck') on the v6.12 Rust merge"
* tag 'rust-fixes-6.12' of https://github.com/Rust-for-Linux/linux:
rust: kunit: use C-string literals to clean warning
cfi: encode cfi normalized integers + kasan/gcov bug in Kconfig
rust: KASAN+RETHUNK requires rustc 1.83.0
rust: cfi: fix `patchable-function-entry` starting version
rust: mutex: fix __mutex_init() usage in case of PREEMPT_RT
rust: fix `ARCH_SLAB_MINALIGN` multiple definition error
rust: sync: require `T: Sync` for `LockedBy::access`
rust: kernel: sort Rust modules
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ufs fix from Al Viro:
"Fix ufs_rename() braino introduced this cycle.
The 'folio_release_kmap(dir_folio, new_dir)' in ufs_rename() part of
folio conversion should've been getting a pointer to ufs directory
entry within the page, rather than a pointer to directory struct
inode..."
* tag 'pull-fixes.ufs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs_rename(): fix bogus argument of folio_release_kmap()
|
|
The code seems to be handling EPROBE_DEFER explicitly and if there's no
error, enables the clock. clk_get_optional exists for that.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240930211628.330703-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Joe Damato says:
====================
gve: Link IRQs, queues, and NAPI instances
This series uses the netdev-genl API to link IRQs and queues to NAPI IDs
so that this information is queryable by user apps. This is particularly
useful for epoll-based busy polling apps which rely on having access to
the NAPI ID.
I've tested these commits on a GCP instance with a GVE NIC configured
and have included test output in the commit messages for each patch
showing how to query the information.
[1]: https://lore.kernel.org/netdev/20240926030025.226221-1-jdamato@fastly.com/
====================
Link: https://patch.msgid.link/20240930210731.1629-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the netdev-genl interface to map NAPI instances to queues so that
this information is accessible to user programs via netlink.
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8313, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8314, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8315, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8316, 'type': 'rx'},
{'id': 4, 'ifindex': 2, 'napi-id': 8317, 'type': 'rx'},
[...]
{'id': 0, 'ifindex': 2, 'napi-id': 8297, 'type': 'tx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8298, 'type': 'tx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8299, 'type': 'tx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8300, 'type': 'tx'},
{'id': 4, 'ifindex': 2, 'napi-id': 8301, 'type': 'tx'},
[...]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://patch.msgid.link/20240930210731.1629-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use netdev-genl interface to map IRQs to NAPI instances so that this
information is accessible by user apps via netlink.
$ cat /proc/interrupts | grep gve | grep -v mgmnt | cut -f1 --delimiter=':'
34
35
36
37
38
39
40
[...]
65
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 8288, 'ifindex': 2, 'irq': 65},
[...]
{'id': 8263, 'ifindex': 2, 'irq': 40},
{'id': 8262, 'ifindex': 2, 'irq': 39},
{'id': 8261, 'ifindex': 2, 'irq': 38},
{'id': 8260, 'ifindex': 2, 'irq': 37},
{'id': 8259, 'ifindex': 2, 'irq': 36},
{'id': 8258, 'ifindex': 2, 'irq': 35},
{'id': 8257, 'ifindex': 2, 'irq': 34}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://patch.msgid.link/20240930210731.1629-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename ip_len to payload_len since the length in this case refers only
to the payload, and not the entire IP packet like for IPv4. While we're
at it, just use the variable directly when calling
recv_verify_packet_udp/tcp.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240930162935.980712-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The test runs "devlink reload" explicitly. Instead, it is better to use
devlink_reload() which waits for udev events to be processed. Do not sleep
after reload, as devlink_reload() blocks until all the netdevs are renamed.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/844509e3057b65277a7181a23c95b71ec95e8a56.1727706741.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'rds_ib_dereg_odp_mr' has been unused since the original
commit 2eafa1746f17 ("net/rds: Handle ODP mr
registration/unregistration").
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20240930134358.48647-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
device_table in module_phy_driver macro is defined only when the
driver is built as a module. So a PHY driver imports phy::DeviceId
module in the following way then hits `unused import` warning when
it's compiled as built-in:
use kernel::net::phy::DeviceId;
kernel::module_phy_driver! {
drivers: [PhyQT2025],
device_table: [
DeviceId::new_with_driver::<PhyQT2025>(),
],
Put device_table in a const. It's not included in the kernel image if
unused (when the driver is compiled as built-in), and the compiler
doesn't complain.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20240930134038.1309-1-fujita.tomonori@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix multiple grammatical issues and add a missing period to improve
readability.
Signed-off-by: Leo Stone <leocstone@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240929005001.370991-1-leocstone@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
David Howells says:
====================
rxrpc: Miscellaneous fixes
Here some miscellaneous fixes for AF_RXRPC:
(1) Fix a race in the I/O thread vs UDP socket setup.
(2) Fix an uninitialised variable.
====================
Link: https://patch.msgid.link/20241001132702.3122709-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the uninitialised txb variable in rxrpc_send_data() by moving the code
that loads it above all the jumps to maybe_error, txb being stored back
into call->tx_pending right before the normal return.
Fixes: b0f571ecd794 ("rxrpc: Fix locking in rxrpc's sendmsg")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lists.infradead.org/pipermail/linux-afs/2024-October/008896.html
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://patch.msgid.link/20241001132702.3122709-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In rxrpc_open_socket(), it sets up the socket and then sets up the I/O
thread that will handle it. This is a problem, however, as there's a gap
between the two phases in which a packet may come into rxrpc_encap_rcv()
from the UDP packet but we oops when trying to wake the not-yet created I/O
thread.
As a quick fix, just make rxrpc_encap_rcv() discard the packet if there's
no I/O thread yet.
A better, but more intrusive fix would perhaps be to rearrange things such
that the socket creation is done by the I/O thread.
Fixes: a275da62e8c1 ("rxrpc: Create a per-local endpoint receive queue and I/O thread")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: yuxuanzhe@outlook.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241001132702.3122709-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for link up and link down interrupts in lan887x.
Signed-off-by: Divya Koppera <divya.koppera@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20241001144421.6661-1-divya.koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Guillaume Nault says:
====================
ipv4: Convert ip_route_input_slow() and its callers to dscp_t.
Prepare ip_route_input_slow() and its call chain to future conversion
of ->flowi4_tos.
The ->flowi4_tos field of "struct flowi4" is used in many different
places, which makes it hard to convert it from __u8 to dscp_t.
In order to avoid a big patch updating all its users at once, this
patch series gradually converts some users to dscp_t. Those users now
set ->flowi4_tos from a dscp_t variable that is converted to __u8 using
inet_dscp_to_dsfield().
When all users of ->flowi4_tos will use a dscp_t variable, converting
that field to dscp_t will just be a matter of removing all the
inet_dscp_to_dsfield() conversions.
This series concentrates on ip_route_input_slow() and its direct and
indirect callers.
====================
Link: https://patch.msgid.link/cover.1727807926.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass a dscp_t variable to ip_route_input_slow(), instead of a plain u8,
to prevent accidental setting of ECN bits in ->flowi4_tos.
Only ip_route_input_rcu() actually calls ip_route_input_slow(). Since
it already has a dscp_t variable to pass as parameter, we only need to
remove the inet_dscp_to_dsfield() conversion.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/d6bca5f87eea9e83a3861e6e05594cdd252583c9.1727807926.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass a dscp_t variable to ip_route_input_rcu(), instead of a plain u8,
to prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_route_input_rcu() to consider are:
* ip_route_input_noref(), which already has a dscp_t variable to pass
as parameter. We just need to remove the inet_dscp_to_dsfield()
conversion.
* inet_rtm_getroute(), which receives a u8 from user space and needs
to convert it with inet_dsfield_to_dscp().
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/c4dbb5aa9cbc79c4fcb317abbffa7c7156bc56a7.1727807926.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass a dscp_t variable to ip_route_input_noref(), instead of a plain
u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_route_input_noref() to consider are:
* arp_process() in net/ipv4/arp.c. This function sets the tos
parameter to 0, which is already a valid dscp_t value, so it
doesn't need to be adjusted for the new prototype.
* ip_route_input(), which already has a dscp_t variable to pass as
parameter. We just need to remove the inet_dscp_to_dsfield()
conversion.
* ipvlan_l3_rcv(), bpf_lwt_input_reroute(), ip_expire(),
ip_rcv_finish_core(), xfrm4_rcv_encap_finish() and
xfrm4_rcv_encap(), which get the DSCP directly from IPv4 headers
and can simply use the ip4h_dscp() helper.
While there, declare the IPv4 header pointers as const in
ipvlan_l3_rcv() and bpf_lwt_input_reroute().
Also, modify the declaration of ip_route_input_noref() in
include/net/route.h so that it matches the prototype of its
implementation in net/ipv4/route.c.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/a8a747bed452519c4d0cc06af32c7e7795d7b627.1727807926.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass a dscp_t variable to ip_route_input(), instead of a plain u8, to
prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_route_input() to consider are:
* input_action_end_dx4_finish() and input_action_end_dt4() in
net/ipv6/seg6_local.c. These functions set the tos parameter to 0,
which is already a valid dscp_t value, so they don't need to be
adjusted for the new prototype.
* icmp_route_lookup(), which already has a dscp_t variable to pass as
parameter. We just need to remove the inet_dscp_to_dsfield()
conversion.
* br_nf_pre_routing_finish(), ip_options_rcv_srr() and ip4ip6_err(),
which get the DSCP directly from IPv4 headers. Define a helper to
read the .tos field of struct iphdr as dscp_t, so that these
function don't have to do the conversion manually.
While there, declare *iph as const in br_nf_pre_routing_finish(),
declare its local variables in reverse-christmas-tree order and move
the "err = ip_route_input()" assignment out of the conditional to avoid
checkpatch warning.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/e9d40781d64d3d69f4c79ac8a008b8d67a033e8d.1727807926.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass a dscp_t variable to icmp_route_lookup(), instead of a plain u8,
to prevent accidental setting of ECN bits in ->flowi4_tos. Rename that
variable ("tos" -> "dscp") to make the intent clear.
While there, reorganise the function parameters to fill up horizontal
space.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/294fead85c6035bcdc5fcf9a6bb4ce8798c45ba1.1727807926.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Neal Cardwell says:
====================
tcp: 3 fixes for retrans_stamp and undo logic
Geumhwan Yu <geumhwan.yu@samsung.com> recently reported and diagnosed
a regression in TCP loss recovery undo logic in the case where a TCP
connection enters fast recovery, is unable to retransmit anything due to
TSQ, and then receives an ACK allowing forward progress. The sender should
be able to undo the spurious loss recovery in this case, but was not doing
so. The first patch fixes this regression.
Running our suite of packetdrill tests with the first fix, the tests
highlighted two other small bugs in the way retrans_stamp is updated in
some rare corner cases. The second two patches fix those other two small
bugs.
Thanks to Geumhwan Yu for the bug report!
====================
Link: https://patch.msgid.link/20241001200517.2756803-1-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix tcp_rcv_synrecv_state_fastopen() to not zero retrans_stamp
if retransmits are outstanding.
tcp_fastopen_synack_timer() sets retrans_stamp, so typically we'll
need to zero retrans_stamp here to prevent spurious
retransmits_timed_out(). The logic to zero retrans_stamp is from this
2019 commit:
commit cd736d8b67fb ("tcp: fix retrans timestamp on passive Fast Open")
However, in the corner case where the ACK of our TFO SYNACK carried
some SACK blocks that caused us to enter TCP_CA_Recovery then that
non-zero retrans_stamp corresponds to the active fast recovery, and we
need to leave retrans_stamp with its current non-zero value, for
correct ETIMEDOUT and undo behavior.
Fixes: cd736d8b67fb ("tcp: fix retrans timestamp on passive Fast Open")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241001200517.2756803-4-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix tcp_enter_recovery() so that if there are no retransmits out then
we zero retrans_stamp when entering fast recovery. This is necessary
to fix two buggy behaviors.
Currently a non-zero retrans_stamp value can persist across multiple
back-to-back loss recovery episodes. This is because we generally only
clears retrans_stamp if we are completely done with loss recoveries,
and get to tcp_try_to_open() and find !tcp_any_retrans_done(sk). This
behavior causes two bugs:
(1) When a loss recovery episode (CA_Loss or CA_Recovery) is followed
immediately by a new CA_Recovery, the retrans_stamp value can persist
and can be a time before this new CA_Recovery episode starts. That
means that timestamp-based undo will be using the wrong retrans_stamp
(a value that is too old) when comparing incoming TS ecr values to
retrans_stamp to see if the current fast recovery episode can be
undone.
(2) If there is a roughly minutes-long sequence of back-to-back fast
recovery episodes, one after another (e.g. in a shallow-buffered or
policed bottleneck), where each fast recovery successfully makes
forward progress and recovers one window of sequence space (but leaves
at least one retransmit in flight at the end of the recovery),
followed by several RTOs, then the ETIMEDOUT check may be using the
wrong retrans_stamp (a value set at the start of the first fast
recovery in the sequence). This can cause a very premature ETIMEDOUT,
killing the connection prematurely.
This commit changes the code to zero retrans_stamp when entering fast
recovery, when this is known to be safe (no retransmits are out in the
network). That ensures that when starting a fast recovery episode, and
it is safe to do so, retrans_stamp is set when we send the fast
retransmit packet. That addresses both bug (1) and bug (2) by ensuring
that (if no retransmits are out when we start a fast recovery) we use
the initial fast retransmit of this fast recovery as the time value
for undo and ETIMEDOUT calculations.
This makes intuitive sense, since the start of a new fast recovery
episode (in a scenario where no lost packets are out in the network)
means that the connection has made forward progress since the last RTO
or fast recovery, and we should thus "restart the clock" used for both
undo and ETIMEDOUT logic.
Note that if when we start fast recovery there *are* retransmits out
in the network, there can still be undesirable (1)/(2) issues. For
example, after this patch we can still have the (1) and (2) problems
in cases like this:
+ round 1: sender sends flight 1
+ round 2: sender receives SACKs and enters fast recovery 1,
retransmits some packets in flight 1 and then sends some new data as
flight 2
+ round 3: sender receives some SACKs for flight 2, notes losses, and
retransmits some packets to fill the holes in flight 2
+ fast recovery has some lost retransmits in flight 1 and continues
for one or more rounds sending retransmits for flight 1 and flight 2
+ fast recovery 1 completes when snd_una reaches high_seq at end of
flight 1
+ there are still holes in the SACK scoreboard in flight 2, so we
enter fast recovery 2, but some retransmits in the flight 2 sequence
range are still in flight (retrans_out > 0), so we can't execute the
new retrans_stamp=0 added here to clear retrans_stamp
It's not yet clear how to fix these remaining (1)/(2) issues in an
efficient way without breaking undo behavior, given that retrans_stamp
is currently used for undo and ETIMEDOUT. Perhaps the optimal (but
expensive) strategy would be to set retrans_stamp to the timestamp of
the earliest outstanding retransmit when entering fast recovery. But
at least this commit makes things better.
Note that this does not change the semantics of retrans_stamp; it
simply makes retrans_stamp accurate in some cases where it was not
before:
(1) Some loss recovery, followed by an immediate entry into a fast
recovery, where there are no retransmits out when entering the fast
recovery.
(2) When a TFO server has a SYNACK retransmit that sets retrans_stamp,
and then the ACK that completes the 3-way handshake has SACK blocks
that trigger a fast recovery. In this case when entering fast recovery
we want to zero out the retrans_stamp from the TFO SYNACK retransmit,
and set the retrans_stamp based on the timestamp of the fast recovery.
We introduce a tcp_retrans_stamp_cleanup() helper, because this
two-line sequence already appears in 3 places and is about to appear
in 2 more as a result of this bug fix patch series. Once this bug fix
patches series in the net branch makes it into the net-next branch
we'll update the 3 other call sites to use the new helper.
This is a long-standing issue. The Fixes tag below is chosen to be the
oldest commit at which the patch will apply cleanly, which is from
Linux v3.5 in 2012.
Fixes: 1fbc340514fc ("tcp: early retransmit: tcp_enter_recovery()")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241001200517.2756803-3-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the TCP loss recovery undo logic in tcp_packet_delayed() so that
it can trigger undo even if TSQ prevents a fast recovery episode from
reaching tcp_retransmit_skb().
Geumhwan Yu <geumhwan.yu@samsung.com> recently reported that after
this commit from 2019:
commit bc9f38c8328e ("tcp: avoid unconditional congestion window undo
on SYN retransmit")
...and before this fix we could have buggy scenarios like the
following:
+ Due to reordering, a TCP connection receives some SACKs and enters a
spurious fast recovery.
+ TSQ prevents all invocations of tcp_retransmit_skb(), because many
skbs are queued in lower layers of the sending machine's network
stack; thus tp->retrans_stamp remains 0.
+ The connection receives a TCP timestamp ECR value echoing a
timestamp before the fast recovery, indicating that the fast
recovery was spurious.
+ The connection fails to undo the spurious fast recovery because
tp->retrans_stamp is 0, and thus tcp_packet_delayed() returns false,
due to the new logic in the 2019 commit: commit bc9f38c8328e ("tcp:
avoid unconditional congestion window undo on SYN retransmit")
This fix tweaks the logic to be more similar to the
tcp_packet_delayed() logic before bc9f38c8328e, except that we take
care not to be fooled by the FLAG_SYN_ACKED code path zeroing out
tp->retrans_stamp (the bug noted and fixed by Yuchung in
bc9f38c8328e).
Note that this returns the high-level behavior of tcp_packet_delayed()
to again match the comment for the function, which says: "Nothing was
retransmitted or returned timestamp is less than timestamp of the
first retransmission." Note that this comment is in the original
2005-04-16 Linux git commit, so this is evidently long-standing
behavior.
Fixes: bc9f38c8328e ("tcp: avoid unconditional congestion window undo on SYN retransmit")
Reported-by: Geumhwan Yu <geumhwan.yu@samsung.com>
Diagnosed-by: Geumhwan Yu <geumhwan.yu@samsung.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241001200517.2756803-2-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Abhishek Chauhan says:
====================
Fix AQR PMA capabilities
Patch 1:-
AQR115c reports incorrect PMA capabilities which includes
10G/5G and also incorrectly disables capabilities like autoneg
and 10Mbps support.
AQR115c as per the Marvell databook supports speeds up to 2.5Gbps
with autonegotiation.
Patch 2:-
Remove the use of phy_set_max_speed in phy driver as the
function is mainly used in MAC driver to set the max
speed.
Instead use get_features to fix up Phy PMA capabilities for
AQR111, AQR111B0, AQR114C and AQCS109
====================
Link: https://patch.msgid.link/20241001224626.2400222-1-quic_abchauha@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the use of phy_set_max_speed in phy driver as the
function is mainly used in MAC driver to set the max
speed.
Instead use get_features to fix up Phy PMA capabilities for
AQR111, AQR111B0, AQR114C and AQCS109
Fixes: 038ba1dc4e54 ("net: phy: aquantia: add AQR111 and AQR111B0 PHY ID")
Fixes: 0974f1f03b07 ("net: phy: aquantia: remove false 5G and 10G speed ability for AQCS109")
Fixes: c278ec644377 ("net: phy: aquantia: add support for AQR114C PHY ID")
Link: https://lore.kernel.org/all/20240913011635.1286027-1-quic_abchauha@quicinc.com/T/
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20241001224626.2400222-3-quic_abchauha@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
AQR115c reports incorrect PMA capabilities which includes
10G/5G and also incorrectly disables capabilities like autoneg
and 10Mbps support.
AQR115c as per the Marvell databook supports speeds up to 2.5Gbps
with autonegotiation.
Fixes: 0ebc581f8a4b ("net: phy: aquantia: add support for aqr115c")
Link: https://lore.kernel.org/all/20240913011635.1286027-1-quic_abchauha@quicinc.com/T/
Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20241001224626.2400222-2-quic_abchauha@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mix netns into all IPv4 FIB hashes to avoid massive collision when
inserting the same address in many netns.
Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241001231438.3855035-1-alexandre.ferrieux@orange.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Joe Damato says:
====================
ena: Link IRQs, queues, and NAPI instances
This series uses the netdev-genl API to link IRQs and queues to NAPI IDs
so that this information is queryable by user apps. This is particularly
useful for epoll-based busy polling apps which rely on having access to
the NAPI ID.
I've tested these commits on an EC2 instance with an ENA NIC configured
and have included test output in the commit messages for each patch
showing how to query the information.
I noted in the implementation that the driver requests an IRQ for
management purposes which does not have an associated NAPI. I tried
to take this into account in patch 1, but would appreciate if ENA
maintainers can verify I did this correctly.
v1: https://lore.kernel.org/all/20240930195617.37369-1-jdamato@fastly.com/
====================
Link: https://patch.msgid.link/20241002001331.65444-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Link queues to NAPIs using the netdev-genl API so this information is
queryable.
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'rx'},
{'id': 4, 'ifindex': 2, 'napi-id': 8205, 'type': 'rx'},
{'id': 5, 'ifindex': 2, 'napi-id': 8206, 'type': 'rx'},
{'id': 6, 'ifindex': 2, 'napi-id': 8207, 'type': 'rx'},
{'id': 7, 'ifindex': 2, 'napi-id': 8208, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'tx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'tx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'tx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'tx'},
{'id': 4, 'ifindex': 2, 'napi-id': 8205, 'type': 'tx'},
{'id': 5, 'ifindex': 2, 'napi-id': 8206, 'type': 'tx'},
{'id': 6, 'ifindex': 2, 'napi-id': 8207, 'type': 'tx'},
{'id': 7, 'ifindex': 2, 'napi-id': 8208, 'type': 'tx'}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20241002001331.65444-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Link IRQs to NAPI instances with netif_napi_set_irq. This information
can be queried with the netdev-genl API. Note that the ENA device
appears to allocate an IRQ for management purposes which does not have a
NAPI associated with it; this commit takes this into consideration to
accurately construct a map between IRQs and NAPI instances.
Compare the output of /proc/interrupts for my ena device with the output of
netdev-genl after applying this patch:
$ cat /proc/interrupts | grep enp55s0 | cut -f1 --delimiter=':'
94
95
96
97
98
99
100
101
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 8208, 'ifindex': 2, 'irq': 101},
{'id': 8207, 'ifindex': 2, 'irq': 100},
{'id': 8206, 'ifindex': 2, 'irq': 99},
{'id': 8205, 'ifindex': 2, 'irq': 98},
{'id': 8204, 'ifindex': 2, 'irq': 97},
{'id': 8203, 'ifindex': 2, 'irq': 96},
{'id': 8202, 'ifindex': 2, 'irq': 95},
{'id': 8201, 'ifindex': 2, 'irq': 94}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: David Arinzon <darinzon@amazon.com>
Link: https://patch.msgid.link/20241002001331.65444-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Brandon reports sporadic, non-sensical spikes in cumulative pressure
time (total=) when reading cpu.pressure at a high rate. This is due to
a race condition between reader aggregation and tasks changing states.
While it affects all states and all resources captured by PSI, in
practice it most likely triggers with CPU pressure, since scheduling
events are so frequent compared to other resource events.
The race context is the live snooping of ongoing stalls during a
pressure read. The read aggregates per-cpu records for stalls that
have concluded, but will also incorporate ad-hoc the duration of any
active state that hasn't been recorded yet. This is important to get
timely measurements of ongoing stalls. Those ad-hoc samples are
calculated on-the-fly up to the current time on that CPU; since the
stall hasn't concluded, it's expected that this is the minimum amount
of stall time that will enter the per-cpu records once it does.
The problem is that the path that concludes the state uses a CPU clock
read that is not synchronized against aggregators; the clock is read
outside of the seqlock protection. This allows aggregators to race and
snoop a stall with a longer duration than will actually be recorded.
With the recorded stall time being less than the last snapshot
remembered by the aggregator, a subsequent sample will underflow and
observe a bogus delta value, resulting in an erratic jump in pressure.
Fix this by moving the clock read of the state change into the seqlock
protection. This ensures no aggregation can snoop live stalls past the
time that's recorded when the state concludes.
Reported-by: Brandon Duffany <brandon@buildbuddy.io>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219194
Link: https://lore.kernel.org/lkml/20240827121851.GB438928@cmpxchg.org/
Fixes: df77430639c9 ("psi: Reduce calls to sched_clock() in psi")
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As was tried in commit 4e103134b862 ("KVM: x86/mmu: Zap only the relevant
pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs,
need to be zapped. All of the accounting for a shadow page is tied to the
memslot, i.e. the shadow page holds a reference to the memslot, for all
intents and purposes. Deleting the memslot without removing all relevant
shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled,
results in NULL pointer derefs when tearing down the VM.
Reintroduce from that commit the code that walks the whole memslot when
there are active shadow MMU pages.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Yury reported a crash in the sfc driver originated from
netpoll_send_udp(). The netconsole sends a message and then netpoll
invokes the driver's NAPI function with a budget of zero. It is
dedicated to allow driver to free TX resources, that it may have used
while sending the packet.
In the netpoll case the driver invokes xdp_do_flush() unconditionally,
leading to crash because bpf_net_context was never assigned.
Invoke xdp_do_flush() only if budget is not zero.
Fixes: 401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
Reported-by: Yury Vostrikov <mon@unformed.ru>
Closes: https://lore.kernel.org/5627f6d1-5491-4462-9d75-bc0612c26a22@app.fastmail.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20241002125837.utOcRo6Y@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When configuring the fiber port, the DP83869 PHY driver incorrectly
calls linkmode_set_bit() with a bit mask (1 << 10) rather than a bit
number (10). This corrupts some other memory location -- in case of
arm64 the priv pointer in the same structure.
Since the advertising flags are updated from supported at the end of the
function the incorrect line isn't needed at all and can be removed.
Fixes: a29de52ba2a1 ("net: dp83869: Add ability to advertise Fiber connection")
Signed-off-by: Ingo van Lil <inguin@gmx.de>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241002161807.440378-1-inguin@gmx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jacob Keller says:
====================
packing: various improvements and KUnit tests
This series contains a handful of improvements and fixes for the packing
library, including the addition of KUnit tests.
There are two major changes which might be considered bug fixes:
1) The library is updated to handle arbitrary buffer lengths, fixing
undefined behavior when operating on buffers which are not a multiple
of 4 bytes.
2) The behavior of QUIRK_MSB_ON_THE_RIGHT is fixed to match the intended
behavior when operating on packings that are not byte aligned.
These are not sent to net because no driver currently depends on this
behavior. For (1), the existing users of the packing API all operate on
buffers which are multiples of 4-bytes. For (2), no driver currently uses
QUIRK_MSB_ON_THE_RIGHT. The incorrect behavior was found while writing
KUnit tests.
This series also includes a handful of minor cleanups from Vladimir, as
well as a change to introduce a separated pack() and unpack() API. This API
is not (yet) used by a driver, but is the first step in implementing
pack_fields() and unpack_fields() which will be used in future changes for
the ice driver and changes Vladimir has in progress for other drivers using
the packing API.
This series is part 1 of a 2-part series for implementing use of
lib/packing in the ice driver. The 2nd part includes a new pack_fields()
and unpack_fields() implementation inspired by the ice driver's existing
bit packing code. It is built on top of the split pack() and unpack()
code. Additionally, the KUnit tests are built on top of pack() and
unpack(), based on original selftests written by Vladimir.
Fitting the entire library changes and drivers changes into a single series
exceeded the usual series limits.
v1: https://lore.kernel.org/r/20240930-packing-kunit-tests-and-split-pack-unpack-v1-0-94b1f04aca85@intel.com
====================
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-0-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is an u8, so using GENMASK_ULL() for unsigned long long is
unnecessary.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-10-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This helps clarify what the 8 is for.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-9-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|