summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-05-08Merge tag 'batadv-next-pullrequest-20220508' of ↵David S. Miller3-9/+9
git://git.open-mesh.org/linux-merge This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - remove unnecessary type castings, by Yu Zhe Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08Merge branch 'mlxsw-dedicated-router-notification-block'David S. Miller8-122/+259
Ido Schimmel says: ==================== mlxsw: A dedicated notifier block for router code Petr says: Currently all netdevice events are handled in the centralized notifier handler maintained by spectrum.c. Since a number of events are involving router code, spectrum.c needs to dispatch them to spectrum_router.c. The spectrum module therefore needs to know more about the router code than it should have, and there is are several API points through which the two modules communicate. In this patchset, move bulk of the router-related event handling to the router code. Some of the knowledge has to stay: spectrum.c cannot veto events that the router supports, and vice versa. But beyond that, the two can ignore each other's details, which leads to more focused and simpler code. As a side effect, this fixes L3 HW stats support on tunnel netdevices. The patch set progresses as follows: - In patch #1, change spectrum code to not bounce L3 enslavement, which the router code supports. - In patch #2, add a new do-nothing notifier block to the router code. - In patches #3-#6, move router-specific event handling to the router module. In patch #7, clean up a comment. - In patch #8, use the advantage that all router event handling is in the router code and clean up taking router lock. - mlxsw supports L3 HW stats on tunnels as of this patchset. Patches #9 and #10 therefore add a selftest for L3 HW stats support on tunnels. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08selftests: forwarding: Add a tunnel-based test for L3 HW statsPetr Machata2-0/+110
Add a selftest that uses an IPIP topology and tests that L3 HW stats reflect the traffic in the tunnel. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08selftests: lib: Add a generic helper for obtaining HW statsPetr Machata2-12/+15
The function get_l3_stats() from the test hw_stats_l3.sh will be useful for any test that wishes to work with L3 stats. Furthermore, it is easy to generalize to other HW stats suites (for when such are added). Therefore, move the code to lib.sh, rewrite it to have the same interface as the other stats-collecting functions, and generalize to take the name of the HW stats suite to collect as an argument. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum_router: Take router lock in router notifier handlerPetr Machata1-32/+15
For notifications that the router needs to handle, router lock is taken. Further, at least to determine whether an event is related to a tunnel underlay, router lock also needs to be taken. Due to this, the router lock is always taken for each unhandled event, and also for some handled events, even if they are not related to underlay. Thus each event implies at least one router lock, sometimes two. Instead of deferring the locking to the leaf handlers, take the lock in the router notifier handler always. This simplifies thinking about the locking state, and in some cases saves one lock cycle. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Update a commentPetr Machata1-3/+2
The position of netdevice notifier registration no longer depends on the router initialization, because the event handler no longer dispatches to the router code. Update the comment at the registration to that effect. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of tunnel events to router codePetr Machata3-28/+15
The events related to IPIP tunnels are handled by the router code. Move the handling from the central dispatcher in spectrum.c to the new notifier handler in the router module. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of router events to router codePetr Machata3-18/+16
The events NETDEV_PRE_CHANGEADDR, NETDEV_CHANGEADDR and NETDEV_CHANGEMTU have implications for in-ASIC router interface objects, and as such are handled in the router module. Move the handling from the central dispatcher in spectrum.c to the new notifier handler in the router module. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of HW stats events to router codePetr Machata2-11/+40
L3 HW stats are implemented in mlxsw as RIF counters, and therefore the code resides in spectrum_router. Exclude the offload xstats events from the mlxsw_sp_netdevice_event_is_router() predicate, and instead recreate the glue code in the router module. Previously, the order of dispatch was that for events on tunnels, a dedicated handler was called, which however did not handle HW stats events. But there is nothing special about tunnel devices as far as HW stats: there is a RIF associated with the tunnel netdevice, and that RIF is where the counter should be installed. Therefore now, HW stats events are tested first, independent of netdevice type. The upshot is that as of this commit, mlxsw supports L3 HW stats work on GRE tunnels. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of VRF events to router codePetr Machata3-15/+16
Events involving VRF, as L3 concern, are handled in the router code, by the helper mlxsw_sp_netdevice_vrf_event(). The handler is currently invoked from the centralized dispatcher in spectrum.c. Instead, move the call to the newly-introduced router-specific notifier handler. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum_router: Add a dedicated notifier blockPetr Machata2-0/+21
Currently all netdevice events are handled in the centralized notifier handler maintained by spectrum.c. Since a number of events are involving router code, spectrum.c needs to dispatch them to spectrum_router.c. The spectrum module therefore needs to know more about the router code than it should have, and there is are several API points through which the two modules communicate. To simplify the notifier handlers, introduce a new notifier into the router module. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Tolerate enslaving of various devices to VRFPetr Machata1-10/+16
Enslaving netdevices to VRF is currently handled through an mlxsw_sp_is_vrf_event() conditional in mlxsw_sp_netdevice_event(). In the following patch sets, VRF enslavement will be handled purely in the router code. Therefore make handlers of NETDEV_PRECHANGEUPPER tolerant of enslaving to VRF, so that they do not bounce the change. For NETDEV_CHANGEUPPER, drop the WARN_ON(1) and bounce from mlxsw_sp_netdevice_port_vlan_event(). This is the only handler that warned and bounces even in the CHANGEUPPER code, other handler quietly do nothing when they encounter an unfamiliar upper. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08Merge branch 'switch-drivers-to-netif_napi_add_weight'David S. Miller42-59/+71
Jakub Kicinski says: ==================== net: switch drivers to netif_napi_add_weight() The minority of drivers pass a custom weight to netif_napi_add(). Switch those away to the new netif_napi_add_weight(). All drivers (which can go thru net-next) calling netif_napi_add() will now be calling it with NAPI_POLL_WEIGHT or 64. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08net: wan: switch to netif_napi_add_weight()Jakub Kicinski4-4/+5
A handful of WAN drivers use custom napi weights, switch them to the new API. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08net: virtio: switch to netif_napi_add_weight()Jakub Kicinski1-2/+2
virtio netdev driver uses a custom napi weight, switch to the new API for setting custom weight. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08r8152: switch to netif_napi_add_weight()Jakub Kicinski1-4/+2
r8152 uses a custom napi weight, switch to the new API for setting custom weight. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08eth: switch to netif_napi_add_weight()Jakub Kicinski34-47/+58
Switch all Ethernet drivers which use custom napi weights to the new API. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08caif_virtio: switch to netif_napi_add_weight()Jakub Kicinski1-1/+2
caif_virtio uses a custom napi weight, switch to the new API for setting custom weights. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08um: vector: switch to netif_napi_add_weight()Jakub Kicinski1-1/+2
UM's netdev driver uses a custom napi weight, switch to the new API for setting custom weight. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08Merge tag 'asoc-fix-v5.18-rc4' of ↵Takashi Iwai14-41/+54
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.18 A larger collection of fixes than I'd like, mainly because mixer-test is making it's way into the CI systems and turning up issues on a wider range of systems. The most substantial thing though is a revert and an alternative fix for a dmaengine issue where the fix caused disruption for some other configurations, the core fix is backed out an a driver specific thing done instead.
2022-05-07Merge tag 'gpio-fixes-for-v5.18-rc6' of ↵Linus Torvalds5-16/+6
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix the bounds check for the 'gpio-reserved-ranges' device property in gpiolib-of - drop the assignment of the pwm base number in gpio-mvebu (this was missed by the patch doing it globally for all pwm drivers) - fix the fwnode assignment (use own fwnode, not the parent's one) for the GPIO irqchip in gpio-visconti - update the irq_stat field before checking the trigger field in gpio-pca953x - update GPIO entry in MAINTAINERS * tag 'gpio-fixes-for-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set) gpio: visconti: Fix fwnode of GPIO IRQ MAINTAINERS: update the GPIO git tree entry gpio: mvebu: drop pwm base assignment gpiolib: of: fix bounds check for 'gpio-reserved-ranges'
2022-05-07Merge tag 'block-5.18-2022-05-06' of git://git.kernel.dk/linux-blockLinus Torvalds4-18/+51
Pull block fixes from Jens Axboe: "A single revert for a change that isn't needed in 5.18, and a small series for s390/dasd" * tag 'block-5.18-2022-05-06' of git://git.kernel.dk/linux-block: s390/dasd: Use kzalloc instead of kmalloc/memset s390/dasd: Fix read inconsistency for ESE DASD devices s390/dasd: Fix read for ESE with blksize < 4k s390/dasd: prevent double format of tracks for ESE devices s390/dasd: fix data corruption for ESE devices Revert "block: release rq qos structures for queue without disk"
2022-05-07Merge tag 'io_uring-5.18-2022-05-06' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+6
Pull io_uring fix from Jens Axboe: "Just a single file assignment fix this week" * tag 'io_uring-5.18-2022-05-06' of git://git.kernel.dk/linux-block: io_uring: assign non-fixed early for async work
2022-05-07Merge branch 'simplify-migration-of-host-filtered-addresses-in-felix-driver'Jakub Kicinski5-145/+69
Vladimir Oltean says: ==================== Simplify migration of host filtered addresses in Felix driver The purpose of this patch set is to remove the functions dsa_port_walk_fdbs() and dsa_port_walk_mdbs() from the DSA core, which were introduced when the Felix driver gained support for unicast filtering on standalone ports. They get called when changing the tagging protocol back and forth between "ocelot" and "ocelot-8021q". I did not realize we could get away without having them. The patch set was regression-tested using the local_termination.sh selftest using both tagging protocols. ==================== Link: https://lore.kernel.org/r/20220505162213.307684-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-07net: dsa: delete dsa_port_walk_{fdbs,mdbs}Vladimir Oltean2-46/+0
All the users of these functions are gone, delete them before they gain new ones. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-07net: dsa: felix: perform MDB migration based on ocelot->multicast listVladimir Oltean3-46/+69
The felix driver is the only user of dsa_port_walk_mdbs(), and there isn't even a good reason for it, considering that the host MDB entries are already saved by the ocelot switch lib in the ocelot->multicast list. Rewrite the multicast entry migration procedure around the ocelot->multicast list so we can delete dsa_port_walk_mdbs(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-07net: dsa: felix: stop migrating FDBs back and forth on tag proto changeVladimir Oltean1-53/+2
I just realized we don't need to migrate the host-filtered FDB entries when the tagging protocol changes from "ocelot" to "ocelot-8021q". Host-filtered addresses are learned towards the PGID_CPU "multicast" port group, reserved by software, which contains BIT(ocelot->num_phys_ports). That is the "special" port entry in the analyzer block for the CPU port module. In "ocelot" mode, the CPU port module's packets are redirected to the NPI port. In "ocelot-8021q" mode, felix_8021q_cpu_port_init() does something funny anyway, and changes PGID_CPU to stop pointing at the CPU port module and start pointing at the physical port where the DSA master is attached. The fact that we can alter the destination of packets learned towards PGID_CPU without altering the MAC table entries themselves means that it is pointless to walk through the FDB entries, forget that they were learned towards PGID_CPU, and re-learn them towards the "unicast" PGID associated with the physical port connected to the DSA master. We can let the PGID_CPU value change simply alter the destination of the host-filtered unicast packets in one fell swoop. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-07net: dsa: felix: use PGID_CPU for FDB entry migration on NPI portVladimir Oltean1-4/+2
ocelot_fdb_add() redirects FDB entries installed on the NPI port towards the special reserved PGID_CPU used for host-filtered addresses. PGID_CPU contains BIT(ocelot->num_phys_ports) in the destination port mask, which is code name for the CPU port module. Whereas felix_migrate_fdbs_to_*_port() uses the ocelot->num_phys_ports PGID directly, and it appears that this works too. Even if this PGID is set to zero, apparently its number is special and packets still reach the CPU port module. Nonetheless, in the end, these addresses end up in the same place regardless of whether they go through an extra indirection layer or not. Use PGID_CPU across to have more uniformity. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-07mlxbf_gige: increase MDIO polling rate to 5usDavid Thompson1-2/+4
This patch increases the polling rate used by the mlxbf_gige driver on the MDIO bus. The previous polling rate was every 100us, and the new rate is every 5us. With this change the amount of time spent waiting for the MDIO BUSY signal to de-assert drops from ~100us to ~27us for each operation. Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Link: https://lore.kernel.org/r/20220505162309.20050-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-07net: chelsio: cxgb4: Avoid potential negative array offsetKees Cook1-5/+5
Using min_t(int, ...) as a potential array index implies to the compiler that negative offsets should be allowed. This is not the case, though. Replace "int" with "unsigned int". Fixes the following warning exposed under future CONFIG_FORTIFY_SOURCE improvements: In file included from include/linux/string.h:253, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/rcupdate.h:29, from include/linux/rculist.h:11, from include/linux/pid.h:5, from include/linux/sched.h:14, from include/linux/delay.h:23, from drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:35: drivers/net/ethernet/chelsio/cxgb4/t4_hw.c: In function 't4_get_raw_vpd_params': include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 29 and size [2147483648, 4294967295] [-Warray-bounds] 46 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy' 388 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk' 433 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2796:9: note: in expansion of macro 'memcpy' 2796 | memcpy(p->id, vpd + id, min_t(int, id_len, ID_LEN)); | ^~~~~~ include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 0 and size [2147483648, 4294967295] [-Warray-bounds] 46 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy' 388 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk' 433 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2798:9: note: in expansion of macro 'memcpy' 2798 | memcpy(p->sn, vpd + sn, min_t(int, sn_len, SERNUM_LEN)); | ^~~~~~ Additionally remove needless cast from u8[] to char * in last strim() call. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202205031926.FVP7epJM-lkp@intel.com Fixes: fc9279298e3a ("cxgb4: Search VPD with pci_vpd_find_ro_info_keyword()") Fixes: 24c521f81c30 ("cxgb4: Use pci_vpd_find_id_string() to find VPD ID string") Cc: Raju Rangoju <rajur@chelsio.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220505233101.1224230-1-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-07Merge branch '10GbE' of ↵Jakub Kicinski2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 10GbE Intel Wired LAN Driver Updates 2022-05-05 This series contains updates to ixgbe and igb drivers. Jeff Daly adjusts type for 'allow_unsupported_sfp' to match the associated struct value for ixgbe. Alaa Mohamed converts, deprecated, kmap() call to kmap_local_page() for igb. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igb: Convert kmap() to kmap_local_page() ixgbe: Fix module_param allow_unsupported_sfp type ==================== Link: https://lore.kernel.org/r/20220505155651.2606195-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-07netlink: do not reset transport header in netlink_recvmsg()Eric Dumazet1-1/+0
netlink_recvmsg() does not need to change transport header. If transport header was needed, it should have been reset by the producer (netlink_dump()), not the consumer(s). The following trace probably happened when multiple threads were using MSG_PEEK. BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg write to 0xffff88811e9f15b2 of 2 bytes by task 32012 on cpu 1: skb_reset_transport_header include/linux/skbuff.h:2760 [inline] netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] __sys_recvfrom+0x204/0x2c0 net/socket.c:2097 __do_sys_recvfrom net/socket.c:2115 [inline] __se_sys_recvfrom net/socket.c:2111 [inline] __x64_sys_recvfrom+0x74/0x90 net/socket.c:2111 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae write to 0xffff88811e9f15b2 of 2 bytes by task 32005 on cpu 0: skb_reset_transport_header include/linux/skbuff.h:2760 [inline] netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978 ____sys_recvmsg+0x162/0x2f0 ___sys_recvmsg net/socket.c:2674 [inline] __sys_recvmsg+0x209/0x3f0 net/socket.c:2704 __do_sys_recvmsg net/socket.c:2714 [inline] __se_sys_recvmsg net/socket.c:2711 [inline] __x64_sys_recvmsg+0x42/0x50 net/socket.c:2711 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffff -> 0x0000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 32005 Comm: syz-executor.4 Not tainted 5.18.0-rc1-syzkaller-00328-ge1f700ebd6be-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220505161946.2867638-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-07Merge tag 'for-5.18-rc5-tag' of ↵Linus Torvalds4-34/+53
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Regression fixes in zone activation: - move a loop invariant out of the loop to avoid checking space status - properly handle unlimited activation Other fixes: - for subpage, force the free space v2 mount to avoid a warning and make it easy to switch a filesystem on different page size systems - export sysfs status of exclusive operation 'balance paused', so the user space tools can recognize it and allow adding a device with paused balance - fix assertion failure when logging directory key range item" * tag 'for-5.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: sysfs: export the balance paused state of exclusive operation btrfs: fix assertion failure when logging directory key range item btrfs: zoned: activate block group properly on unlimited active zone device btrfs: zoned: move non-changing condition check out of the loop btrfs: force v2 space cache usage for subpage mount
2022-05-06Merge tag 'nfs-for-5.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds5-10/+54
Pull NFS client fixes from Trond Myklebust: "Highlights include: Stable fixes: - Fix a socket leak when setting up an AF_LOCAL RPC client - Ensure that knfsd connects to the gss-proxy daemon on setup Bugfixes: - Fix a refcount leak when migrating a task off an offlined transport - Don't gratuitously invalidate inode attributes on delegation return - Don't leak sockets in xs_local_connect() - Ensure timely close of disconnected AF_LOCAL sockets" * tag 'nfs-for-5.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: Revert "SUNRPC: attempt AF_LOCAL connect on setup" SUNRPC: Ensure gss-proxy connects on setup SUNRPC: Ensure timely close of disconnected AF_LOCAL sockets SUNRPC: Don't leak sockets in xs_local_connect() NFSv4: Don't invalidate inode attributes on delegation return SUNRPC release the transport of a relocated task with an assigned transport
2022-05-06ipv4: drop dst in multicast routing pathLokesh Dhoundiyal1-0/+1
kmemleak reports the following when routing multicast traffic over an ipsec tunnel. Kmemleak output: unreferenced object 0x8000000044bebb00 (size 256): comm "softirq", pid 0, jiffies 4294985356 (age 126.810s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 80 00 00 00 05 13 74 80 ..............t. 80 00 00 00 04 9b bf f9 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f83947e0>] __kmalloc+0x1e8/0x300 [<00000000b7ed8dca>] metadata_dst_alloc+0x24/0x58 [<0000000081d32c20>] __ipgre_rcv+0x100/0x2b8 [<00000000824f6cf1>] gre_rcv+0x178/0x540 [<00000000ccd4e162>] gre_rcv+0x7c/0xd8 [<00000000c024b148>] ip_protocol_deliver_rcu+0x124/0x350 [<000000006a483377>] ip_local_deliver_finish+0x54/0x68 [<00000000d9271b3a>] ip_local_deliver+0x128/0x168 [<00000000bd4968ae>] xfrm_trans_reinject+0xb8/0xf8 [<0000000071672a19>] tasklet_action_common.isra.16+0xc4/0x1b0 [<0000000062e9c336>] __do_softirq+0x1fc/0x3e0 [<00000000013d7914>] irq_exit+0xc4/0xe0 [<00000000a4d73e90>] plat_irq_dispatch+0x7c/0x108 [<000000000751eb8e>] handle_int+0x16c/0x178 [<000000001668023b>] _raw_spin_unlock_irqrestore+0x1c/0x28 The metadata dst is leaked when ip_route_input_mc() updates the dst for the skb. Commit f38a9eb1f77b ("dst: Metadata destinations") correctly handled dropping the dst in ip_route_input_slow() but missed the multicast case which is handled by ip_route_input_mc(). Drop the dst in ip_route_input_mc() avoiding the leak. Fixes: f38a9eb1f77b ("dst: Metadata destinations") Signed-off-by: Lokesh Dhoundiyal <lokesh.dhoundiyal@alliedtelesis.co.nz> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220505020017.3111846-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-69/+190
Pull kvm fixes from Paolo Bonzini: "x86: - Account for family 17h event renumberings in AMD PMU emulation - Remove CPUID leaf 0xA on AMD processors - Fix lockdep issue with locking all vCPUs - Fix loss of A/D bits in SPTEs - Fix syzkaller issue with invalid guest state" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Exit to userspace if vCPU has injected exception and invalid state KVM: SEV: Mark nested locking of vcpu->lock kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id KVM: x86/mmu: Use atomic XCHG to write TDP MMU SPTEs with volatile bits KVM: x86/mmu: Move shadow-present check out of spte_has_volatile_bits() KVM: x86/mmu: Don't treat fully writable SPTEs as volatile (modulo A/D)
2022-05-06Merge tag 'riscv-for-linus-5.18-rc6' of ↵Linus Torvalds1-2/+19
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fix from Palmer Dabbelt: - A fix to relocate the DTB early in boot, in cases where the bootloader doesn't put the DTB in a region that will end up mapped by the kernel. This manifests as a crash early in boot on a handful of configurations. * tag 'riscv-for-linus-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: relocate DTB if it's outside memory region
2022-05-06ice: fix PTP stale Tx timestamps cleanupMichal Michalik1-2/+8
Read stale PTP Tx timestamps from PHY on cleanup. After running out of Tx timestamps request handlers, hardware (HW) stops reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used to only clean up stale handlers in driver and was leaving the hardware registers not read. Not reading stale PTP Tx timestamps prevents next interrupts from arriving and makes timestamping unusable. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Michal Michalik <michal.michalik@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: clear stale Tx queue settings before configuringAnatolii Gerasymenko1-18/+50
The iAVF driver uses 3 virtchnl op codes to communicate with the PF regarding the VF Tx queues: * VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware logic for the Tx queues * VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts * VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings. There is a bug in the iAVF driver due to the race condition between VF reset request and shutdown being executed in parallel. This leads to a break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent. If this occurs, the PF driver never cleans up the Tx queues. This results in leaving behind stale Tx queue settings in the hardware and firmware. The most obvious outcome is that upon the next VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx scheduler node due to a lack of space. We need to protect ICE driver against such situation. To fix this, make sure we clear existing stale settings out when handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the previous settings. Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if the queue is not configured. The function already handles the case when the Tx queue is not currently configured and exits with a 0 return in that case. Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: Fix race during aux device (un)pluggingIvan Vecera3-8/+20
Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] <TASK> [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] </TASK> Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06KVM: VMX: Exit to userspace if vCPU has injected exception and invalid stateSean Christopherson1-1/+1
Exit to userspace with an emulation error if KVM encounters an injected exception with invalid guest state, in addition to the existing check of bailing if there's a pending exception (KVM doesn't support emulating exceptions except when emulating real mode via vm86). In theory, KVM should never get to such a situation as KVM is supposed to exit to userspace before injecting an exception with invalid guest state. But in practice, userspace can intervene and manually inject an exception and/or stuff registers to force invalid guest state while a previously injected exception is awaiting reinjection. Fixes: fc4fad79fc3d ("KVM: VMX: Reject KVM_RUN if emulation is required with pending exception") Reported-by: syzbot+cfafed3bb76d3e37581b@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220502221850.131873-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-06KVM: SEV: Mark nested locking of vcpu->lockPeter Gonda1-4/+38
svm_vm_migrate_from() uses sev_lock_vcpus_for_migration() to lock all source and target vcpu->locks. Unfortunately there is an 8 subclass limit, so a new subclass cannot be used for each vCPU. Instead maintain ownership of the first vcpu's mutex.dep_map using a role specific subclass: source vs target. Release the other vcpu's mutex.dep_maps. Fixes: b56639318bb2b ("KVM: SEV: Add support for SEV intra host migration") Reported-by: John Sperbeck<jsperbeck@google.com> Suggested-by: David Rientjes <rientjes@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Gonda <pgonda@google.com> Message-Id: <20220502165807.529624-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-06Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode"Marcin Szycik1-5/+3
This reverts commit bfaaba99e680bf82bf2cbf69866c3f37434ff766. Commit bfaaba99e680 ("ice: Hide bus-info in ethtool for PRs in switchdev mode") was a workaround for lshw tool displaying incorrect descriptions for port representors and PF in switchdev mode. Now the issue has been fixed in the lshw tool itself [1]. Removing the workaround can be considered a regression, as the user might be running older, unpatched lshw version. However, another important change (ice: link representors to PCI device, which improves port representor netdev naming with SET_NETDEV_DEV) also causes the same "regression" as removing the workaround, i.e. unpatched lshw is able to access bus-info information (this time not via ethtool) and the bug can occur. Therefore, the workaround no longer prevents the bug and can be removed. [1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1 Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds6-96/+85
Pull rdma fixes from Jason Gunthorpe: "A few recent regressions in rxe's multicast code, and some old driver bugs: - Error case unwind bug in rxe for rkeys - Dot not call netdev functions under a spinlock in rxe multicast code - Use the proper BH lock type in rxe multicast code - Fix idrma deadlock and crash - Add a missing flush to drain irdma QPs when in error - Fix high userspace latency in irdma during destroy due to synchronize_rcu() - Rare race in siw MPA processing" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/rxe: Change mcg_lock to a _bh lock RDMA/rxe: Do not call dev_mc_add/del() under a spinlock RDMA/siw: Fix a condition race issue in MPA request processing RDMA/irdma: Fix possible crash due to NULL netdev in notifier RDMA/irdma: Reduce iWARP QP destroy time RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state RDMA/rxe: Recheck the MR in when generating a READ reply RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core() RDMA/rxe: Fix "Replace mr by rkey in responder resources"
2022-05-06Merge tag 'mmc-v5.18-rc4' of ↵Linus Torvalds3-6/+64
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull mmc fixes from Ulf Hansson: "MMC core: - Fix initialization for eMMC's HS200/HS400 mode MMC host: - sdhci-msm: Reset GCC_SDCC_BCR register to prevent timeout issues - sunxi-mmc: Fix DMA descriptors allocated above 32 bits" * tag 'mmc-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits mmc: core: Set HS clock speed before sending HS CMD13
2022-05-06ice: link representors to PCI deviceMichal Swiatkowski1-0/+1
Link port representors to parent PCI device to benefit from systemd defined naming scheme. Example from ip tool: - without linking: eth0 ... - with linking: eth0 ... altname enp24s0f0npf0vf0 The port representor name is being shown in altname, because the name is longer than IFNAMSIZ (16) limit. Altname can be used in ip tool. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06Merge tag 'drm-fixes-2022-05-06' of git://anongit.freedesktop.org/drm/drmLinus Torvalds7-21/+9
Pull drm fixes from Dave Airlie: "A pretty quiet week, one fbdev, msm, kconfig, and two amdgpu fixes, about what I'd expect for rc6. fbdev: - hotunplugging fix amdgpu: - Fix a xen dom0 regression on APUs - Fix a potential array overflow if a receiver were to send an erroneous audio channel count msm: - lockdep fix. it6505: - kconfig fix" * tag 'drm-fixes-2022-05-06' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT drm/amdgpu: do not use passthrough mode in Xen dom0 drm/bridge: ite-it6505: add missing Kconfig option select fbdev: Make fb_release() return -ENODEV if fbdev was unregistered drm/msm/dp: remove fail safe mode related code
2022-05-06gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set)Puyou Lu1-2/+2
When one port's input state get inverted (eg. from low to hight) after pca953x_irq_setup but before setting irq_mask (by some other driver such as "gpio-keys"), the next inversion of this port (eg. from hight to low) will not be triggered any more (because irq_stat is not updated at the first time). Issue should be fixed after this commit. Fixes: 89ea8bbe9c3e ("gpio: pca953x.c: add interrupt handling capability") Signed-off-by: Puyou Lu <puyou.lu@gmail.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-05-06Merge branch 'tso-gso-limit-split'David S. Miller37-84/+137
Jakub Kicinski says: ==================== net: disambiguate the TSO and GSO limits This series separates the device-reported TSO limitations from the user space-controlled GSO limits. It used to be that we only had the former (HW limits) but they were named GSO. This probably lead to confusion and letting user override them. The problem came up in the BIG TCP discussion between Eric and Alex, and seems like something we should address. Targeting net-next because (a) nobody is reporting problems; and (b) there is a tiny but non-zero chance that some actually wants to lift the HW limitations. ==================== Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06net: move netif_set_gso_max helpersJakub Kicinski2-21/+21
These are now internal to the core, no need to expose them. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>