summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-12tools: ynl-gen: resolve enum vs struct name conflictsJakub Kicinski1-5/+20
Ethtool has an attribute set called stringset, from which we'll generate struct ethtool_stringset. Unfortunately, the old ethtool header declares enum ethtool_stringset (the same name), to which compilers object. This seems unavoidable. Check struct names against known constants and append an underscore if conflict is detected. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12tools: ynl-gen: don't generate enum types if unnamedJakub Kicinski1-2/+10
If attr set or enum has empty enum name we need to use u32 or int as function arguments and struct members. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12netlink: specs: ethtool: add C render hintsJakub Kicinski1-0/+5
Most of the C enum names are guessed correctly, but there is a handful of corner cases we need to name explicitly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12netlink: specs: support setting prefix-name per attributeJakub Kicinski3-2/+13
Ethtool's PSE PoDL has a attr nest with different prefixes: /* Power Sourcing Equipment */ enum { ETHTOOL_A_PSE_UNSPEC, ETHTOOL_A_PSE_HEADER, /* nest - _A_HEADER_* */ ETHTOOL_A_PODL_PSE_ADMIN_STATE, /* u32 */ ETHTOOL_A_PODL_PSE_ADMIN_CONTROL, /* u32 */ ETHTOOL_A_PODL_PSE_PW_D_STATUS, /* u32 */ Header has a prefix of ETHTOOL_A_PSE_ and other attrs prefix of ETHTOOL_A_PODL_PSE_ we can't cover them uniformly. If PODL was after PSE life would be easy. Now we either need to add prefixes to attr names which is yucky or support setting prefix name per attr. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12tools: ynl-gen: record extra args for regenJakub Kicinski2-1/+8
ynl-regen needs to know the arguments used to generate a file. Record excluded ops and, while at it, user headers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12tools: ynl-gen: support excluding tricky opsJakub Kicinski2-5/+17
The ethtool family has a small handful of quite tricky ops and a lot of simple very useful ops. Teach ynl-gen to skip ops so that we can bypass the tricky ones. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mdio: mdio-mux-mmioreg: Use of_property_read_reg() to parse "reg"Rob Herring1-3/+4
Use the recently added of_property_read_reg() helper to get the untranslated "reg" address value. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12dt-bindings: net: drop unneeded quotesKrzysztof Kozlowski10-11/+11
Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12Merge branch 'SCM_PIDFD-SCM_PEERPIDFD'David S. Miller16-13/+565
Alexander Mikhalitsyn says: ==================== Add SCM_PIDFD and SO_PEERPIDFD 1. Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, but it contains pidfd instead of plain pid, which allows programmers not to care about PID reuse problem. 2. Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd. This thing is direct analog of SO_PEERCRED which allows to get plain PID. 3. Add SCM_PIDFD / SO_PEERPIDFD kselftest Idea comes from UAPI kernel group: https://uapi-group.org/kernel-features/ Big thanks to Christian Brauner and Lennart Poettering for productive discussions about this and Luca Boccassi for testing and reviewing this. === Motivation behind this patchset Eric Dumazet raised a question: > It seems that we already can use pidfd_open() (since linux-5.3), and > pass the resulting fd in af_unix SCM_RIGHTS message ? Yes, it's possible, but it means that from the receiver side we need to trust the sent pidfd (in SCM_RIGHTS), or always use combination of SCM_RIGHTS+SCM_CREDENTIALS, then we can extract pidfd from SCM_RIGHTS, then acquire plain pid from pidfd and after compare it with the pid from SCM_CREDENTIALS. A few comments from other folks regarding this. Christian Brauner wrote: >Let me try and provide some of the missing background. >There are a range of use-cases where we would like to authenticate a >client through sockets without being susceptible to PID recycling >attacks. Currently, we can't do this as the race isn't fully fixable. >We can only apply mitigations. >What this patchset will allows us to do is to get a pidfd without the >client having to send us an fd explicitly via SCM_RIGHTS. As that's >already possibly as you correctly point out. >But for protocols like polkit this is quite important. Every message is >standalone and we would need to force a complete protocol change where >we would need to require that every client allocate and send a pidfd via >SCM_RIGHTS. That would also mean patching through all polkit users. >For something like systemd-journald where we provide logging facilities >and want to add metadata to the log we would also immensely benefit from >being able to get a receiver-side controlled pidfd. >With the message type we envisioned we don't need to change the sender >at all and can be safe against pid recycling. >Link: https://gitlab.freedesktop.org/polkit/polkit/-/merge_requests/154 >Link: https://uapi-group.org/kernel-features Lennart Poettering wrote: >So yes, this is of course possible, but it would mean the pidfd would >have to be transported as part of the user protocol, explicitly sent >by the sender. (Moreover, the receiver after receiving the pidfd would >then still have to somehow be able to prove that the pidfd it just >received actually refers to the peer's process and not some random >process. – this part is actually solvable in userspace, but ugly) >The big thing is simply that we want that the pidfd is associated >*implicity* with each AF_UNIX connection, not explicitly. A lot of >userspace already relies on this, both in the authentication area >(polkit) as well as in the logging area (systemd-journald). Right now >using the PID field from SO_PEERCREDS/SCM_CREDENTIALS is racy though >and very hard to get right. Making this available as pidfd too, would >solve this raciness, without otherwise changing semantics of it all: >receivers can still enable the creds stuff as they wish, and the data >is then implicitly appended to the connections/datagrams the sender >initiates. >Or to turn this around: things like polkit are typically used to >authenticate arbitrary dbus methods calls: some service implements a >dbus method call, and when an unprivileged client then issues that >call, it will take the client's info, go to polkit and ask it if this >is ok. If we wanted to send the pidfd as part of the protocol we >basically would have to extend every single method call to contain the >client's pidfd along with it as an additional argument, which would be >a massive undertaking: it would change the prototypes of basically >*all* methods a service defines… And that's just ugly. >Note that Alex' patch set doesn't expose anything that wasn't exposed >before, or attach, propagate what wasn't before. All it does, is make >the field already available anyway (the struct ucred .pid field) >available also in a better way (as a pidfd), to solve a variety of >races, with no effect on the protocol actually spoken within the >AF_UNIX transport. It's a seamless improvement of the status quo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12af_unix: Kconfig: make CONFIG_UNIX boolAlexander Mikhalitsyn1-5/+1
Let's make CONFIG_UNIX a bool instead of a tristate. We've decided to do that during discussion about SCM_PIDFD patchset [1]. [1] https://lore.kernel.org/lkml/20230524081933.44dc8bea@kernel.org/ Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12selftests: net: add SCM_PIDFD / SO_PEERPIDFD testAlexander Mikhalitsyn3-1/+433
Basic test to check consistency between: - SCM_CREDENTIALS and SCM_PIDFD - SO_PEERCRED and SO_PEERPIDFD Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: core: add getsockopt SO_PEERPIDFDAlexander Mikhalitsyn8-0/+55
Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd. This thing is direct analog of SO_PEERCRED which allows to get plain PID. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Stanislav Fomichev <sdf@google.com> Cc: bpf@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Tested-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12scm: add SO_PASSPIDFD and SCM_PIDFDAlexander Mikhalitsyn12-7/+76
Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS, but it contains pidfd instead of plain pid, which allows programmers not to care about PID reuse problem. We mask SO_PASSPIDFD feature if CONFIG_UNIX is not builtin because it depends on a pidfd_prepare() API which is not exported to the kernel modules. Idea comes from UAPI kernel group: https://uapi-group.org/kernel-features/ Big thanks to Christian Brauner and Lennart Poettering for productive discussions about this. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Luca Boccassi <bluca@debian.org> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Tested-by: Luca Boccassi <bluca@debian.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12Merge branch 'mlxsw-cleanups'David S. Miller6-81/+89
Petr Machata says: ==================== mlxsw: Cleanups in router code This patchset moves some router-related code from spectrum.c to spectrum_router.c where it should be. It also simplifies handlers of netevent notifications. - Patch #1 caches router pointer in a dedicated variable. This obviates the need to access the same as mlxsw_sp->router, making lines shorter, and permitting a future patch to add code that fits within 80 character limit. - Patch #2 moves IP / IPv6 validation notifier blocks from spectrum.c to spectrum_router, where the handlers are anyway. - In patch #3, pass router pointer to scheduler of deferred work directly, instead of having it deduce it on its own. - This makes the router pointer available in the handler function mlxsw_sp_router_netevent_event(), so in patch #4, use it directly, instead of finding it through mlxsw_sp_port. - In patch #5, extend mlxsw_sp_router_schedule_work() so that the NETEVENT_NEIGH_UPDATE handler can use it directly instead of inlining equivalent code. - In patches #6 and #7, add helpers for two common operations involving a backing netdev of a RIF. This makes it unnecessary for the function mlxsw_sp_rif_dev() to be visible outside of the router module, so in patch #8, hide it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: Privatize mlxsw_sp_rif_dev()Petr Machata2-2/+1
Now that the external users of mlxsw_sp_rif_dev() have been converted in the preceding patches, make the function static. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: Convert does-RIF-have-this-netdev queries to a dedicated helperPetr Machata3-9/+14
In a number of places, a netdevice underlying a RIF is obtained only to compare it to another pointer. In order to clean up the interface between the router and the other modules, add a new helper to specifically answer this question, and convert the relevant uses to this new interface. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: Convert RIF-has-netdevice queries to a dedicated helperPetr Machata4-4/+10
In a number of places, a netdevice underlying a RIF is obtained only to check if it a NULL pointer. In order to clean up the interface between the router and the other modules, add a new helper to specifically answer this question, and convert the relevant uses to this new interface. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: Reuse work neighbor initialization in work schedulerPetr Machata1-19/+10
After the struct mlxsw_sp_netevent_work.n field initialization is moved here, the body of code that handles NETEVENT_NEIGH_UPDATE is almost identical to the one in the helper function. Therefore defer to the helper instead of inlining the equivalent. Note that previously, the code took and put a reference of the netdevice. The new code defers to mlxsw_sp_dev_lower_is_port() to obviate the need for taking the reference. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: Use the available router pointer for netevent handlingPetr Machata1-7/+12
This code handles NETEVENT_DELAY_PROBE_TIME_UPDATE, which is invoked every time the delay_probe_time changes. mlxsw router currently only maintains one timer, so the last delay_probe_time set wins. Currently, mlxsw uses mlxsw_sp_port_lower_dev_hold() to find a reference to the router. This is no longer necessary. But as a side effect, this makes sure that only updates to "interesting netdevices" (ones that have a physical netdevice lower) are projected. Retain that side effect by calling mlxsw_sp_port_dev_lower_find_rcu() and punting if there is none. Then just proceed using the router pointer that's already at hand in the helper. Note that previously, the code took and put a reference of the netdevice. Because the mlxsw_sp pointer is now obtained from the notifier block, the port pointer (non-) NULL-ness is all that's relevant, and the reference does not need to be taken anymore. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: Pass router to mlxsw_sp_router_schedule_work() directlyPetr Machata1-5/+6
Instead of passing a notifier block and deducing the router pointer from that in the helper, do that in the caller, and pass the result. In the following patches, the pointer will also be made useful in the caller. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: Move here inetaddr validator notifiersPetr Machata4-25/+25
The validation logic is already in the router code. Move there the notifier blocks themselves as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12mlxsw: spectrum_router: mlxsw_sp_router_fini(): Extract a helper variablePetr Machata1-12/+13
Make mlxsw_sp_router_fini() more similar to the _init() function (and more concise) by extracting the `router' handle to a named variable and using that throughout. The availability of a dedicated `router' variable will come in handy in following patches. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: openvswitch: add support for l4 symmetric hashingAaron Conole3-2/+13
Since its introduction, the ovs module execute_hash action allowed hash algorithms other than the skb->l4_hash to be used. However, additional hash algorithms were not implemented. This means flows requiring different hash distributions weren't able to use the kernel datapath. Now, introduce support for symmetric hashing algorithm as an alternative hash supported by the ovs module using the flow dissector. Output of flow using l4_sym hash: recirc_id(0),in_port(3),eth(),eth_type(0x0800), ipv4(dst=64.0.0.0/192.0.0.0,proto=6,frag=no), packets:30473425, bytes:45902883702, used:0.000s, flags:SP., actions:hash(sym_l4(0)),recirc(0xd) Some performance testing with no GRO/GSO, two veths, single flow: hash(l4(0)): 4.35 GBits/s hash(l4_sym(0)): 4.24 GBits/s Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12Merge branch 'taprio-xstats'David S. Miller3-22/+25
Vladimir Oltean says: ==================== Fixes for taprio xstats 1. Taprio classes correspond to TXQs, and thus, class stats are TXQ stats not TC stats. 2. Drivers reporting taprio xstats should report xstats for *this* taprio, not for a previous one. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: enetc: reset taprio stats when taprio is deletedVladimir Oltean1-0/+9
Currently, the window_drop stats persist even if an incorrect Qdisc was removed from the interface and a new one is installed. This is because the enetc driver keeps the state, and that is persistent across multiple Qdiscs. To resolve the issue, clear all win_drop counters from all TX queues when the currently active Qdisc is removed. These counters are zero by default. The counters visible in ethtool -S are also affected, but I don't care very much about preserving those enough to keep them monotonically incrementing. Fixes: 4802fca8d1af ("net: enetc: report statistics counters for taprio") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net/sched: taprio: report class offload stats per TXQ, not per TCVladimir Oltean3-22/+16
The taprio Qdisc creates child classes per netdev TX queue, but taprio_dump_class_stats() currently reports offload statistics per traffic class. Traffic classes are groups of TXQs sharing the same dequeue priority, so this is incorrect and we shouldn't be bundling up the TXQ stats when reporting them, as we currently do in enetc. Modify the API from taprio to drivers such that they report TXQ offload stats and not TC offload stats. There is no change in the UAPI or in the global Qdisc stats. Fixes: 6c1adb650c8d ("net/sched: taprio: add netlink reporting for offload statistics counters") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12nfc: nxp-nci: store __be16 value in __be16 variableSimon Horman1-1/+1
Use a __be16 variable to store the big endian value of header in nxp_nci_i2c_fw_read(). Flagged by Sparse as: .../i2c.c:113:22: warning: cast to restricted __be16 No functional changes intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: mana: Add support for vlan taggingHaiyang Zhang1-2/+17
To support vlan, use MANA_LONG_PKT_FMT if vlan tag is present in TX skb. Then extract the vlan tag from the skb struct, and save it to tx_oob for the NIC to transmit. For vlan tags on the payload, they are accepted by the NIC too. For RX, extract the vlan tag from CQE and put it into skb. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12xfrm: Use xfrm_state selector for BEET inputHerbert Xu1-4/+3
For BEET the inner address and therefore family is stored in the xfrm_state selector. Use that when decapsulating an input packet instead of incorrectly relying on a non-existent tunnel protocol. Fixes: 5f24f41e8ea6 ("xfrm: Remove inner/outer modes from input path") Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-12sctp: fix an error code in sctp_sf_eat_auth()Dan Carpenter1-1/+1
The sctp_sf_eat_auth() function is supposed to enum sctp_disposition values and returning a kernel error code will cause issues in the caller. Change -ENOMEM to SCTP_DISPOSITION_NOMEM. Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12sctp: handle invalid error codes without calling BUG()Dan Carpenter1-1/+4
The sctp_sf_eat_auth() function is supposed to return enum sctp_disposition values but if the call to sctp_ulpevent_make_authkey() fails, it returns -ENOMEM. This results in calling BUG() inside the sctp_side_effects() function. Calling BUG() is an over reaction and not helpful. Call WARN_ON_ONCE() instead. This code predates git. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12ipvlan: fix bound dev checking for IPv6 l3s modeHangbin Liu1-0/+4
The commit 59a0b022aa24 ("ipvlan: Make skb->skb_iif track skb->dev for l3s mode") fixed ipvlan bonded dev checking by updating skb skb_iif. This fix works for IPv4, as in raw_v4_input() the dif is from inet_iif(skb), which is skb->skb_iif when there is no route. But for IPv6, the fix is not enough, because in ipv6_raw_deliver() -> raw_v6_match(), the dif is inet6_iif(skb), which is returns IP6CB(skb)->iif instead of skb->skb_iif if it's not a l3_slave. To fix the IPv6 part issue. Let's set IP6CB(skb)->iif to correct ifindex. BTW, ipvlan handles NS/NA specifically. Since it works fine, I will not reset IP6CB(skb)->iif when addr->atype is IPVL_ICMPV6. Fixes: c675e06a98a4 ("ipvlan: decouple l3s mode dependencies from other modes") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2196710 Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12sfc: Add devlink dev info support for EF10Martin Habets1-0/+9
Reuse the work done for EF100 to add devlink support for EF10. There is no devlink port support for EF10. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net/sched: act_pedit: Use kmemdup() to replace kmalloc + memcpyJiapeng Chong1-3/+1
./net/sched/act_pedit.c:245:21-28: WARNING opportunity for kmemdup. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5478 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12ionic: add support for ethtool extended stat link_down_countNitya Sunkad3-0/+12
Following the example of 'commit 9a0f830f8026 ("ethtool: linkstate: add a statistic for PHY down events")', added support for link down events. Add callback ionic_get_link_ext_stats to ionic_ethtool.c to support link_down_count, a property of netdev that gets reported exclusively on physical link down events. Run ethtool -I <devname> to display the device link down count. Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12wifi: mac80211: fragment per STA profile correctlyBenjamin Berg3-5/+6
When fragmenting the ML per STA profile, the element ID should be IEEE80211_MLE_SUBELEM_PER_STA_PROFILE rather than WLAN_EID_FRAGMENT. Change the helper function to take the to be used element ID and pass the appropriate value for each of the fragmentation levels. Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230611121219.9b5c793d904b.I7dad952bea8e555e2f3139fbd415d0cd2b3a08c3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-12Merge branch '100GbE' of ↵David S. Miller4-48/+84
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: Improve miscellaneous interrupt code Jacob Keller says: This series improves the driver's use of the threaded IRQ and the communication between ice_misc_intr() and the ice_misc_intr_thread_fn() which was previously introduced by commit 1229b33973c7 ("ice: Add low latency Tx timestamp read"). First, a new custom enumerated return value is used instead of a boolean for ice_ptp_process_ts(). This significantly reduces the cognitive burden when reviewing the logic for this function, as the expected action is clear from the return value name. Second, the unconditional loop in ice_misc_intr_thread_fn() is removed, replacing it with a write to the Other Interrupt Cause register. This causes the MAC to trigger the Tx timestamp interrupt again. This makes it possible to safely use the ice_misc_intr_thread_fn() to handle other tasks beyond just the Tx timestamps. It is also easier to reason about since the thread function will exit cleanly if we do something like disable the interrupt and call synchronize_irq(). Third, refactor the handling for external timestamp events to use the miscellaneous thread function. This resolves an issue with the external time stamps getting blocked while processing the periodic work function task. Fourth, a simplification of the ice_misc_intr() function to always return IRQ_WAKE_THREAD, and schedule the ice service task in the ice_misc_intr_thread_fn() instead. Finally, the Other Interrupt Cause is kept disabled over the thread function processing, rather than immediately re-enabled. Special thanks to Michal Schmidt for the careful review of the series and pointing out my misunderstandings of the kernel IRQ code. It has been determined that the race outlined as being fixed in previous series was actually introduced by this series itself, which I've since corrected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: ethtool: correct MAX attribute value for statsJakub Kicinski1-1/+1
When compiling YNL generated code compiler complains about array-initializer-out-of-bounds. Turns out the MAX value for STATS_GRP uses the value for STATS. This may lead to random corruptions in user space (kernel itself doesn't use this value as it never parses stats). Fixes: f09ea6fb1272 ("ethtool: add a new command for reading standard stats") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12net: wwan: iosm: enable runtime pm support for 7560M Chetan Kumar6-4/+65
Adds runtime pm support for 7560. As part of probe procedure auto suspend is enabled and auto suspend delay is set to 5000 ms for runtime pm use. Later auto flag is set to power manage the device at run time. On successful communication establishment between host and device the device usage counter is dropped and request to put the device into sleep state (suspend). In TX path, the device usage counter is raised and device is moved out of sleep(resume) for data transmission. In RX path, if the device has some data to be sent it request host platform to change the power state by giving PCI PME message. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12dt-bindings: net: xlnx,axi-ethernet: convert bindings document to yamlRadhey Shyam Pandey3-101/+184
Convert the bindings document for Xilinx AXI Ethernet Subsystem from txt to yaml. No changes to existing binding description. Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Sarath Babu Naidu Gaddam <sarath.babu.naidu.gaddam@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-12cifs: fix max_credits implementationShyam Prasad N2-4/+30
The current implementation of max_credits on the client does not work because the CreditRequest logic for several commands does not take max_credits into account. Still, we can end up asking the server for more credits, depending on the number of credits in flight. For this, we need to limit the credits while parsing the responses too. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-12cifs: fix sockaddr comparison in iface_cmpShyam Prasad N4-37/+88
iface_cmp used to simply do a memcmp of the two provided struct sockaddrs. The comparison needs to do more based on the address family. Similar logic was already present in cifs_match_ipaddr. Doing something similar now. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-12smb/client: print "Unknown" instead of bogus link speed valueEnzo Matsumiya1-1/+46
The virtio driver for Linux guests will not set a link speed to its paravirtualized NICs. This will be seen as -1 in the ethernet layer, and when some servers (e.g. samba) fetches it, it's converted to an unsigned value (and multiplied by 1000 * 1000), so in client side we end up with: 1) Speed: 4294967295000000 bps in DebugData. This patch introduces a helper that returns a speed string (in Mbps or Gbps) if interface speed is valid (>= SPEED_10 and <= SPEED_800000), or "Unknown" otherwise. The reason to not change the value in iface->speed is because we don't know the real speed of the HW backing the server NIC, so let's keep considering these as the fastest NICs available. Also print "Capabilities: None" when the interface doesn't support any. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-12cifs: print all credit counters in DebugDataShyam Prasad N1-3/+8
Output of /proc/fs/cifs/DebugData shows only the per-connection counter for the number of credits of regular type. i.e. the credits reserved for echo and oplocks are not displayed. There have been situations recently where having this info would have been useful. This change prints the credit counters of all three types: regular, echo, oplocks. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-12cifs: fix status checks in cifs_tree_connectShyam Prasad N2-8/+10
The ordering of status checks at the beginning of cifs_tree_connect is wrong. As a result, a tcon which is good may stay marked as needing reconnect infinitely. Fixes: 2f0e4f034220 ("cifs: check only tcon status on tcon related functions") Cc: stable@vger.kernel.org # 6.3 Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-12smb: remove obsolete comment鑫华1-1/+1
Because do_gettimeofday has been removed and replaced by ktime_get_real_ts64, So just remove the comment as it's not needed now. Signed-off-by: 鑫华 <jixianghua@xfusion.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-12blk-cgroup: Flush stats before releasing blkcg_gqMing Lei1-9/+31
As noted by Michal, the blkg_iostat_set's in the lockless list hold reference to blkg's to protect against their removal. Those blkg's hold reference to blkcg. When a cgroup is being destroyed, cgroup_rstat_flush() is only called at css_release_work_fn() which is called when the blkcg reference count reaches 0. This circular dependency will prevent blkcg and some blkgs from being freed after they are made offline. It is less a problem if the cgroup to be destroyed also has other controllers like memory that will call cgroup_rstat_flush() which will clean up the reference count. If block is the only controller that uses rstat, these offline blkcg and blkgs may never be freed leaking more and more memory over time. To prevent this potential memory leak: - flush blkcg per-cpu stats list in __blkg_release(), when no new stat can be added - add global blkg_stat_lock for covering concurrent parent blkg stat update - don't grab bio->bi_blkg reference when adding the stats into blkcg's per-cpu stat list since all stats are guaranteed to be consumed before releasing blkg instance, and grabbing blkg reference for stats was the most fragile part of original patch Based on Waiman's patch: https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.com/ Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") Cc: stable@vger.kernel.org Reported-by: Jay Shin <jaeshin@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: mkoutny@suse.com Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230609234249.1412858-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12Linux 6.4-rc6Linus Torvalds1-1/+1
2023-06-11selftests: net: vxlan: Fix selftest regression after changes in iproute2.Vladimir Nikishkin1-3/+3
The iproute2 output that eventually landed upstream is different than the one used in this test, resulting in failures. Fix by adjusting the test to use iproute2's JSON output, which is more stable than regular output. Fixes: 305c04189997 ("selftests: net: vxlan: Add tests for vxlan nolocalbypass option.") Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-11IB/isert: Fix incorrect release of isert connectionSaravanan Vajravel1-2/+0
The ib_isert module is releasing the isert connection both in isert_wait_conn() handler as well as isert_free_conn() handler. In isert_wait_conn() handler, it is expected to wait for iSCSI session logout operation to complete. It should free the isert connection only in isert_free_conn() handler. When a bunch of iSER target is cleared, this issue can lead to use-after-free memory issue as isert conn is twice released Fixes: b02efbfc9a05 ("iser-target: Fix implicit termination of connections") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/20230606102531.162967-4-saravanan.vajravel@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>