summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-07-02Merge branch 'page_pool-bnxt_en-unlink-old-page-pool-in-queue-api-using-helper'Paolo Abeni3-6/+4
David Wei says: ==================== page_pool: bnxt_en: unlink old page pool in queue api using helper 56ef27e3 unexported page_pool_unlink_napi() and renamed it to page_pool_disable_direct_recycling(). This is because there was no in-tree user of page_pool_unlink_napi(). Since then Rx queue API and an implementation in bnxt got merged. In the bnxt implementation, it broadly follows the following steps: allocate new queue memory + page pool, stop old rx queue, swap, then destroy old queue memory + page pool. The existing NAPI instance is re-used so when the old page pool that is no longer used but still linked to this shared NAPI instance is destroyed, it will trigger warnings. In my initial patches I unlinked a page pool from a NAPI instance directly. Instead, export page_pool_disable_direct_recycling() and call that instead to avoid having a driver touch a core struct. ==================== Link: https://patch.msgid.link/20240627030200.3647145-1-dw@davidwei.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02bnxt_en: unlink page pool when stopping Rx queueDavid Wei1-5/+1
Have bnxt call page_pool_disable_direct_recycling() to unlink the old page pool when resetting a queue prior to destroying it, instead of touching a netdev core struct directly. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02page_pool: export page_pool_disable_direct_recycling()David Wei2-1/+3
56ef27e3 unexported page_pool_unlink_napi() and renamed it to page_pool_disable_direct_recycling(). This is because there was no in-tree user of page_pool_unlink_napi(). Since then Rx queue API and an implementation in bnxt got merged. In the bnxt implementation, it broadly follows the following steps: allocate new queue memory + page pool, stop old rx queue, swap, then destroy old queue memory + page pool. The existing NAPI instance is re-used so when the old page pool that is no longer used but still linked to this shared NAPI instance is destroyed, it will trigger warnings. In my initial patches I unlinked a page pool from a NAPI instance directly. Instead, export page_pool_disable_direct_recycling() and call that instead to avoid having a driver touch a core struct. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02net: allow skb_datagram_iter to be called from any contextSagi Grimberg1-6/+13
We only use the mapping in a single context, so kmap_local is sufficient and cheaper. Make sure to use skb_frag_foreach_page as skb frags may contain compound pages and we need to map page by page. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20-oliver.sang@intel.com Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Link: https://patch.msgid.link/20240626100008.831849-1-sagi@grimberg.me Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02Merge branch 'zerocopy-tx-cleanups'Paolo Abeni5-35/+39
Pavel Begunkov says: ==================== zerocopy tx cleanups Assorted zerocopy send path cleanups, the main part of which is moving some net stack specific accounting out of io_uring back to net/ in Patch 4. ==================== Link: https://patch.msgid.link/cover.1719190216.git.asml.silence@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02net: limit scope of a skb_zerocopy_iter_stream varPavel Begunkov1-1/+2
skb_zerocopy_iter_stream() only uses @orig_uarg in the !link_skb path, and we can move the local variable in the appropriate block. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02io_uring/net: move charging socket out of zc io_uringPavel Begunkov4-18/+13
Currently, io_uring's io_sg_from_iter() duplicates the part of __zerocopy_sg_from_iter() charging pages to the socket. It'd be too easy to miss while changing it in net/, the chunk is not the most straightforward for outside users and full of internal implementation details. io_uring is not a good place to keep it, deduplicate it by moving out of the callback into __zerocopy_sg_from_iter(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02net: batch zerocopy_fill_skb_from_iter accountingPavel Begunkov1-13/+18
Instead of accounting every page range against the socket separately, do it in batch based on the change in skb->truesize. It's also moved into __zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is simpler and responsible for setting frags but not the accounting. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02net: split __zerocopy_sg_from_iter()Pavel Begunkov1-9/+13
Split a function out of __zerocopy_sg_from_iter() that only cares about the traditional path with refcounted pages and doesn't need to know about ->sg_from_iter. A preparation patch, we'll improve on the function later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02net: always try to set ubuf in skb_zerocopy_iter_streamPavel Begunkov1-2/+1
skb_zcopy_set() does nothing if there is already a ubuf_info associated with an skb, and since ->link_skb should have set it several lines above the check here essentially does nothing and can be removed. It's also safer this way, because even if the callback is faulty we'll have it set. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-02s390: Update defconfigsHeiko Carstens2-6/+4
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-07-02Merge tag 'scsi-fixes' of ↵Linus Torvalds2-1/+19
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "A couple of error leg problems, one affecting scsi_debug and the other affecting pure SAS (i.e. not SATA) SCSI expanders" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed scsi: scsi_debug: Fix create target debugfs failure
2024-07-02net: phy: fix potential use of NULL pointer in phy_suspend()Russell King (Oracle)1-2/+2
phy_suspend() checks the WoL status, and then dereferences phydrv->flags if (and only if) we decided that WoL has been enabled on either the PHY or the netdev. We then check whether phydrv was NULL, but we've potentially already dereferenced the pointer. If phydrv is NULL, then phy_ethtool_get_wol() will return an error and leave wol.wolopts set to zero. However, if netdev->wol_enabled is true, then we would dereference a NULL pointer. Checking the PHY drivers, the only place that phydev->wol_enabled is checked by them is in their suspend/resume callbacks and nowhere else (which is correct, because phylib only updates this in phy_suspend()). So, move the NULL pointer check earlier to avoid a NULL pointer dereference. Leave the check for phydrv->suspend in place as a driver may populate the .resume method but not the .suspend method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1sN8tn-00GDCZ-Jj@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02e1000e: Fix S0ix residency on corporate systemsDima Ruinskiy1-66/+66
On vPro systems, the configuration of the I219-LM to achieve power gating and S0ix residency is split between the driver and the CSME FW. It was discovered that in some scenarios, where the network cable is connected and then disconnected, S0ix residency is not always reached. This was root-caused to a subset of I219-LM register writes that are not performed by the CSME FW. Therefore, the driver should perform these register writes on corporate setups, regardless of the CSME FW state. This was discovered on Meteor Lake systems; however it is likely to appear on other platforms as well. Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218589 Signed-off-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240628201754.2744221-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02Merge tag 'linux-can-next-for-6.11-20240629' of ↵Jakub Kicinski14-230/+294
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2024-06-29 Geert Uytterhoeven contributes 3 patches with small improvements and cleanups for the rcar_canfd driver. A patch by Christophe JAILLET constifies the struct m_can_ops in the m_can driver to reduce the code size. The last 9 patches are by me an work around erratum DS80000789E 6 of mcp2518fd. * tag 'linux-can-next-for-6.11-20240629' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum can: mcp251xfd: rx: add workaround for erratum DS80000789E 6 of mcp2518fd can: mcp251xfd: rx: prepare to workaround broken RX FIFO head index erratum can: mcp251xfd: mcp251xfd_handle_rxif_ring_uinc(): factor out in separate function can: mcp251xfd: clarify the meaning of timestamp can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop() can: mcp251xfd: update errata references can: mcp251xfd: properly indent labels can: gs_usb: add VID/PID for Xylanta SAINT3 product family can: m_can: Constify struct m_can_ops can: rcar_canfd: Remove superfluous parentheses in address calculations can: rcar_canfd: Improve printing of global operational state can: rcar_canfd: Simplify clock handling ==================== Link: https://patch.msgid.link/20240629114017.1080160-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02Merge tag 'linux-can-fixes-for-6.10-20240701' of ↵Jakub Kicinski1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2024-07-01 Jimmy Assarsson's patch for the kvaser_usb adds a missing explicit initialization of the struct kvaser_usb_driver_info::family for the kvaser_usb_driver_info_leafimx. * tag 'linux-can-fixes-for-6.10-20240701' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: kvaser_usb: Explicitly initialize family in leafimx driver_info struct ==================== Link: https://patch.msgid.link/20240701080643.1354022-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-01Merge tag 'cxl-fixes-6.10-rc7' of ↵Linus Torvalds8-25/+170
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dave Jiang: - Fix no cxl_nvd during pmem region auto-assemble - Avoid NULLL pointer dereference in region lookup - Add missing checks to interleave capability - Add cxl kdoc fix to address document compilation error * tag 'cxl-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl: documentation: add missing files to cxl driver-api cxl/region: check interleave capability cxl/region: Avoid null pointer dereference in region lookup cxl/mem: Fix no cxl_nvd during pmem region auto-assembling
2024-07-01Merge tag 'for-6.10-rc6-tag' of ↵Linus Torvalds1-2/+11
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fixup for a recent fix that prevents an infinite loop during block group reclaim. Unfortunately it introduced an unsafe way of updating block group list and could race with relocation. This could be hit on fast devices when relocation/balance does not have enough space" * tag 'for-6.10-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix adding block group to a reclaim list and the unused list during reclaim
2024-07-01Merge tag 'asm-generic-fixes-6.10-2' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fix from Arnd Bergmann: "This fixes up a last minute build regression from the previous set of bug fixes" * tag 'asm-generic-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: syscalls: fix sys_fanotify_mark prototype
2024-07-01Merge tag 'arm-fixes-6.10-2' of ↵Linus Torvalds27-35/+105
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "A number of devicetree fixes came in for the rockchip platforms, correcting some of the address information, and reverting a change to the MMC controller configuration that caused regressions. Four drivers have one code change each, addressing minor build issues for the optee firmware driver, the litex SoC platform driver and two reset drivers. The riscv fixes as also simple, mainly turning off device nodes in the canaan dts files unless they are actually usable on a particular board. Finally, Drew takes over maintaining the THEAD RISC-V SoC platform" * tag 'arm-fixes-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: drivers/soc/litex: drop obsolete dependency on COMPILE_TEST tee: optee: ffa: Fix missing-field-initializers warning arm64: dts: rockchip: Add sound-dai-cells for RK3368 arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi 4B reset: hisilicon: hi6220: add missing MODULE_DESCRIPTION() macro reset: gpio: Fix missing gpiolib dependency for GPIO reset controller MAINTAINERS: thead: update Maintainer arm64: dts: rockchip: fix PMIC interrupt pin on ROCK Pi E riscv: dts: starfive: Set EMMC vqmmc maximum voltage to 3.3V on JH7110 boards arm64: dts: rockchip: make poweroff(8) work on Radxa ROCK 5A Revert "arm64: dts: rockchip: remove redundant cd-gpios from rk3588 sdmmc nodes" ARM: dts: rockchip: rk3066a: add #sound-dai-cells to hdmi node arm64: dts: rockchip: Fix the value of `dlg,jack-det-rate` mismatch on rk3399-gru arm64: dts: rockchip: set correct pwm0 pinctrl on rk3588-tiger riscv: dts: canaan: Disable I/O devices unless used riscv: dts: canaan: Clean up serial aliases arm64: dts: rockchip: Rename LED related pinctrl nodes on rk3308-rock-pi-s arm64: dts: rockchip: Fix SD NAND and eMMC init on rk3308-rock-pi-s arm64: dts: rockchip: Fix rk3308 codec@ff560000 reset-names arm64: dts: rockchip: Fix the DCDC_REG2 minimum voltage on Quartz64 Model B
2024-07-01Merge tag 'mtd/fixes-for-6.10-rc7' of ↵Linus Torvalds2-29/+43
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fixes from Miquel Raynal: - Rockchip NAND controller driver was not checking the timings properly and the introduction of NV-DDR support broke it. - The core was also misbehaving in some very specific cases: in case of (unlikely) bitflips in the parameter page, the fallback might have failed as well but for software reasons. - Finally, the chosen ECC configuration was no longer properly propagated to upper layers, mostly failing an info message at probe time. * tag 'mtd/fixes-for-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: rockchip: ensure NVDDR timings are rejected mtd: rawnand: Bypass a couple of sanity checks during NAND identification mtd: rawnand: Fix the nand_read_data_op() early check mtd: rawnand: Ensure ECC configuration is propagated to upper layers
2024-07-01Merge tag 'vfs-6.10-rc7.fixes' of ↵Linus Torvalds9-106/+52
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "Misc: - Don't misleadingly warn during filesystem thaw operations. It's possible that a block device which was frozen before it was mounted can cause a failing thaw operation if someone concurrently tried to mount it while that thaw operation was issued and the device had already been temporarily claimed for the mount (The mount will of course be aborted because the device is frozen). netfs: - Fix io_uring based write-through. Make sure that the total request length is correctly set. - Fix partial writes to folio tail. - Remove some xarray helpers that were intended for bounce buffers which got defered to a later patch series. - Make netfs_page_mkwrite() whether folio->mapping is vallid after acquiring the folio lock. - Make netfs_page_mkrite() flush conflicting data instead of waiting. fsnotify: - Ensure that fsnotify creation events are generated before fsnotify open events when a file is created via ->atomic_open(). The ordering was broken before. - Ensure that no fsnotify events are generated for O_PATH file descriptors. While no fsnotify open events were generated, fsnotify close events were. Make it consistent and don't produce any" * tag 'vfs-6.10-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs: Fix netfs_page_mkwrite() to flush conflicting data, not wait netfs: Fix netfs_page_mkwrite() to check folio->mapping is valid netfs: Delete some xarray-wangling functions that aren't used netfs: Fix early issue of write op on partial write to folio tail netfs: Fix io_uring based write-through vfs: generate FS_CREATE before FS_OPEN when ->atomic_open used. fsnotify: Do not generate events for O_PATH file descriptors fs: don't misleadingly warn during thaw operations
2024-07-01btrfs: fix adding block group to a reclaim list and the unused list during ↵Naohiro Aota1-2/+11
reclaim There is a potential parallel list adding for retrying in btrfs_reclaim_bgs_work and adding to the unused list. Since the block group is removed from the reclaim list and it is on a relocation work, it can be added into the unused list in parallel. When that happens, adding it to the reclaim list will corrupt the list head and trigger list corruption like below. Fix it by taking fs_info->unused_bgs_lock. [177.504][T2585409] BTRFS error (device nullb1): error relocating ch= unk 2415919104 [177.514][T2585409] list_del corruption. next->prev should be ff1100= 0344b119c0, but was ff11000377e87c70. (next=3Dff110002390cd9c0) [177.529][T2585409] ------------[ cut here ]------------ [177.537][T2585409] kernel BUG at lib/list_debug.c:65! [177.545][T2585409] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [177.555][T2585409] CPU: 9 PID: 2585409 Comm: kworker/u128:2 Tainted: G W 6.10.0-rc5-kts #1 [177.568][T2585409] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2 02/14/2022 [177.579][T2585409] Workqueue: events_unbound btrfs_reclaim_bgs_work[btrfs] [177.589][T2585409] RIP: 0010:__list_del_entry_valid_or_report.cold+0x70/0x72 [177.624][T2585409] RSP: 0018:ff11000377e87a70 EFLAGS: 00010286 [177.633][T2585409] RAX: 000000000000006d RBX: ff11000344b119c0 RCX:0000000000000000 [177.644][T2585409] RDX: 000000000000006d RSI: 0000000000000008 RDI:ffe21c006efd0f40 [177.655][T2585409] RBP: ff110002e0509f78 R08: 0000000000000001 R09:ffe21c006efd0f08 [177.665][T2585409] R10: ff11000377e87847 R11: 0000000000000000 R12:ff110002390cd9c0 [177.676][T2585409] R13: ff11000344b119c0 R14: ff110002e0508000 R15:dffffc0000000000 [177.687][T2585409] FS: 0000000000000000(0000) GS:ff11000fec880000(0000) knlGS:0000000000000000 [177.700][T2585409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [177.709][T2585409] CR2: 00007f06bc7b1978 CR3: 0000001021e86005 CR4:0000000000771ef0 [177.720][T2585409] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000 [177.731][T2585409] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400 [177.742][T2585409] PKRU: 55555554 [177.748][T2585409] Call Trace: [177.753][T2585409] <TASK> [177.759][T2585409] ? __die_body.cold+0x19/0x27 [177.766][T2585409] ? die+0x2e/0x50 [177.772][T2585409] ? do_trap+0x1ea/0x2d0 [177.779][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72 [177.788][T2585409] ? do_error_trap+0xa3/0x160 [177.795][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72 [177.805][T2585409] ? handle_invalid_op+0x2c/0x40 [177.812][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72 [177.820][T2585409] ? exc_invalid_op+0x2d/0x40 [177.827][T2585409] ? asm_exc_invalid_op+0x1a/0x20 [177.834][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72 [177.843][T2585409] btrfs_delete_unused_bgs+0x3d9/0x14c0 [btrfs] There is a similar retry_list code in btrfs_delete_unused_bgs(), but it is safe, AFAICS. Since the block group was in the unused list, the used bytes should be 0 when it was added to the unused list. Then, it checks block_group->{used,reserved,pinned} are still 0 under the block_group->lock. So, they should be still eligible for the unused list, not the reclaim list. The reason it is safe there it's because because we're holding space_info->groups_sem in write mode. That means no other task can allocate from the block group, so while we are at deleted_unused_bgs() it's not possible for other tasks to allocate and deallocate extents from the block group, so it can't be added to the unused list or the reclaim list by anyone else. The bug can be reproduced by btrfs/166 after a few rounds. In practice this can be hit when relocation cannot find more chunk space and ends with ENOSPC. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Suggested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Fixes: 4eb4e85c4f81 ("btrfs: retry block group reclaim without infinite loop") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-01selftests/bpf: Delete extra blank lines in test_sockmapZhu Jun1-1/+0
Delete extra blank lines inside of test_selftest(). Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240627031905.7133-1-zhujun2@cmss.chinamobile.com
2024-07-01riscv, bpf: Use bpf_prog_pack for RV64 bpf trampolinePu Lehui1-14/+29
We used bpf_prog_pack to aggregate bpf programs into huge page to relieve the iTLB pressure on the system. We can apply it to bpf trampoline, as Song had been implemented it in core and x86 [0]. This patch is going to use bpf_prog_pack to RV64 bpf trampoline. Since Song and Puranjay have done a lot of work for bpf_prog_pack on RV64, implementing this function will be easy. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> #riscv Link: https://lore.kernel.org/all/20231206224054.492250-1-song@kernel.org [0] Link: https://lore.kernel.org/bpf/20240622030437.3973492-4-pulehui@huaweicloud.com
2024-07-01riscv, bpf: Fix out-of-bounds issue when preparing trampoline imagePu Lehui1-5/+13
We get the size of the trampoline image during the dry run phase and allocate memory based on that size. The allocated image will then be populated with instructions during the real patch phase. But after commit 26ef208c209a ("bpf: Use arch_bpf_trampoline_size"), the `im` argument is inconsistent in the dry run and real patch phase. This may cause emit_imm in RV64 to generate a different number of instructions when generating the 'im' address, potentially causing out-of-bounds issues. Let's emit the maximum number of instructions for the "im" address during dry run to fix this problem. Fixes: 26ef208c209a ("bpf: Use arch_bpf_trampoline_size") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240622030437.3973492-3-pulehui@huaweicloud.com
2024-07-01bpf: Use precise image size for struct_ops trampolinePu Lehui1-1/+1
For trampoline using bpf_prog_pack, we need to generate a rw_image buffer with size of (image_end - image). For regular trampoline, we use the precise image size generated by arch_bpf_trampoline_size to allocate rw_image. But for struct_ops trampoline, we allocate rw_image directly using close to PAGE_SIZE size. We do not need to allocate for that much, as the patch size is usually much smaller than PAGE_SIZE. Let's use precise image size for it too. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> #riscv Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20240622030437.3973492-2-pulehui@huaweicloud.com
2024-07-01libbpf: Fix error handling in btf__distill_base()Alan Maguire1-1/+1
Coverity points out that after calling btf__new_empty_split() the wrong value is checked for error. Fixes: 58e185a0dc35 ("libbpf: Add btf__distill_base() creating split BTF with distilled base BTF") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240629100058.2866763-1-alan.maguire@oracle.com
2024-07-01selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfuncLorenzo Bianconi3-0/+325
Introduce e2e selftest for bpf_xdp_flow_lookup kfunc through xdp_flowtable utility. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/b74393fb4539aecbbd5ac7883605f86a95fb0b6b.1719698275.git.lorenzo@kernel.org
2024-07-01netfilter: Add bpf_xdp_flow_lookup kfuncLorenzo Bianconi4-1/+137
Introduce bpf_xdp_flow_lookup kfunc in order to perform the lookup of a given flowtable entry based on a fib tuple of incoming traffic. bpf_xdp_flow_lookup can be used as building block to offload in xdp the processing of sw flowtable when hw flowtable is not available. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/bpf/55d38a4e5856f6d1509d823ff4e98aaa6d356097.1719698275.git.lorenzo@kernel.org
2024-07-01netfilter: nf_tables: Add flowtable map for xdp offloadFlorian Westphal4-2/+154
This adds a small internal mapping table so that a new bpf (xdp) kfunc can perform lookups in a flowtable. As-is, xdp program has access to the device pointer, but no way to do a lookup in a flowtable -- there is no way to obtain the needed struct without questionable stunts. This allows to obtain an nf_flowtable pointer given a net_device structure. In order to keep backward compatibility, the infrastructure allows the user to add a given device to multiple flowtables, but it will always return the first added mapping performing the lookup since it assumes the right configuration is 1:1 mapping between flowtables and net_devices. Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/bpf/9f20e2c36f494b3bf177328718367f636bb0b2ab.1719698275.git.lorenzo@kernel.org
2024-07-01syscalls: fix sys_fanotify_mark prototypeArnd Bergmann1-0/+6
My earlier fix missed an incorrect function prototype that shows up on native 32-bit builds: In file included from fs/notify/fanotify/fanotify_user.c:14: include/linux/syscalls.h:248:25: error: conflicting types for 'sys_fanotify_mark'; have 'long int(int, unsigned int, u32, u32, int, const char *)' {aka 'long int(int, unsigned int, unsigned int, unsigned int, int, const char *)'} 1924 | SYSCALL32_DEFINE6(fanotify_mark, | ^~~~~~~~~~~~~~~~~ include/linux/syscalls.h:862:17: note: previous declaration of 'sys_fanotify_mark' with type 'long int(int, unsigned int, u64, int, const char *)' {aka 'long int(int, unsigned int, long long unsigned int, int, const char *)'} On x86 and powerpc, the prototype is also wrong but hidden in an #ifdef, so it never caused problems. Add another alternative declaration that matches the conditional function definition. Fixes: 403f17a33073 ("parisc: use generic sys_fanotify_mark implementation") Cc: stable@vger.kernel.org Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-01net: ethtool: Fix the panic caused by dev being null when dumping coalesceHeng Qi1-2/+3
syzbot reported a general protection fault caused by a null pointer dereference in coalesce_fill_reply(). The issue occurs when req_base->dev is null, leading to an invalid memory access. This panic occurs if dumping coalesce when no device name is specified. Fixes: f750dfe825b9 ("ethtool: provide customized dim profile management") Reported-by: syzbot+e77327e34cdc8c36b7d3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e77327e34cdc8c36b7d3 Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01Merge tag 'v6.10-rockchip-dtsfixes1' of ↵Arnd Bergmann13-12/+34
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes Apart from the regular dts fixes for wrong addresses, missing or wrong properties, this reverts the previous move away from cd-gpios to the mmc-controller's internal card-detect. With this change applied, it was reported that boards could not detect card anymore, so this go reverted of course. * tag 'v6.10-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Add sound-dai-cells for RK3368 arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi 4B arm64: dts: rockchip: fix PMIC interrupt pin on ROCK Pi E arm64: dts: rockchip: make poweroff(8) work on Radxa ROCK 5A Revert "arm64: dts: rockchip: remove redundant cd-gpios from rk3588 sdmmc nodes" ARM: dts: rockchip: rk3066a: add #sound-dai-cells to hdmi node arm64: dts: rockchip: Fix the value of `dlg,jack-det-rate` mismatch on rk3399-gru arm64: dts: rockchip: set correct pwm0 pinctrl on rk3588-tiger arm64: dts: rockchip: Rename LED related pinctrl nodes on rk3308-rock-pi-s arm64: dts: rockchip: Fix SD NAND and eMMC init on rk3308-rock-pi-s arm64: dts: rockchip: Fix rk3308 codec@ff560000 reset-names arm64: dts: rockchip: Fix the DCDC_REG2 minimum voltage on Quartz64 Model B Link: https://lore.kernel.org/r/10237789.nnTZe4vzsl@diego Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-01Merge branch '100GbE' of ↵David S. Miller9-64/+165
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue into main Tony nguyen says: ==================== Intel Wired LAN Driver Updates 2024-06-28 (MAINTAINERS, ice) This series contains updates to MAINTAINERS file and ice driver. Jesse replaces himself with Przemek in the maintainers file. Karthik Sundaravel adds support for VF get/set MAC address via devlink. Eric checks for errors from ice_vsi_rebuild() during queue reconfiguration. Paul adjusts FW API version check for E830 devices. Piotr adds differentiation of unload type when shutting down AdminQ. Przemek changes ice_adapter initialization to occur once per physical card. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01Merge tag 'for-net-2024-06-28' of ↵David S. Miller13-71/+131
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth into main bluetooth pull request for net: - Ignore too large handle values in BIG - L2CAP: sync sock recv cb and release - hci_bcm4377: Fix msgid release - ISO: Check socket flag instead of hcon - hci_event: Fix setting of unicast qos interval - hci: disallow setting handle bigger than HCI_CONN_HANDLE_MAX - Add quirk to ignore reserved PHY bits in LE Extended Adv Report - hci_core: cancel all works upon hci_unregister_dev - btintel_pcie: Fix REVERSE_INULL issue reported by coverity - qca: Fix BT enable failure again for QCA6390 after warm reboot Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01Merge branch 'bnxt_en-ptp' into mainDavid S. Miller4-83/+252
Michael Chan says: ==================== bnxt_en: PTP updates for net-next The first 5 patches implement the PTP feature on the new BCM5760X chips. The main new hardware feature is the new TX timestamp completion which enables the driver to retrieve the TX timestamp in NAPI without deferring to the PTP worker. The last 5 patches increase the number of TX PTP packets in-flight from 1 to 4 on the older BCM5750X chips. On these older chips, we need to call firmware in the PTP worker to retrieve the timestamp. We use an arry to keep track of the in-flight TX PTP packets. v2: Patch #2: Fix the unwind of txr->is_ts_pkt when bnxt_start_xmit() aborts. Patch #4: Set the SKBTX_IN_PROGRESS flag for timestamp packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Remove atomic operations on ptp->tx_availPavan Chebbi3-22/+31
Now that we require the spinlock to protect ptp->txts_prod, change ptp->tx_avail to non-atomic and protect it under the same spinlock. Add a new helper function bnxt_ptp_get_txts_prod() to decrement ptp->tx_avail under spinlock and return the producer. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Increase the max total outstanding PTP TX packets to 4Pavan Chebbi4-30/+68
Start accepting up to 4 TX TS requests on BCM5750X (P5) chips. These PTP TX packets will be queued in the ptp->txts_req[] array waiting for the TX timestamp to complete. The entries in the array will be managed by a producer and consumer index. The producer index is updated under spinlock since multiple TX rings can try to send PTP packets at the same time. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Let bnxt_stamp_tx_skb() return error codePavan Chebbi1-5/+11
Change the function bnxt_stamp_tx_skb() to return 0 for suceess or -EAGAIN if the timestamp is still pending in firmware. The calling PTP aux worker will reschedule based on the return code. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Remove an impossible condition check for PTP TX pending SKBPavan Chebbi3-13/+4
In the current 5750X PTP code paths, there is always at most one TX SKB requested for timestamp and we won't accept another one until we have retrieved the timestamp or it has timed out. Remove the unnecessary check in bnxt_get_tx_ts_p5() for a pending SKB and change the function to void. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Refactor all PTP TX timestamp fields into a structPavan Chebbi3-28/+40
On the older 5750X (P5) chips, we currently support only 1 TX PTP packet in-flight waiting for the timestamp. Refactor the datastructures to prepare to support up to 4 TX PTP packets. Combine all fields required for PTP TX timestamp query into one structure. An array of this structure will be added in follow-on patches to support multiple outstanding TX timestamps. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Add BCM5760X specific PHC registers mappingPavan Chebbi3-5/+25
BCM5760X firmware will advertise direct 64-bit PHC registers access for the driver from BAR0. Make the necessary changes in handling HWRM_PORT_MAC_PTP_QCFG's response and PHC register mapping for 5760X chips. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Add TX timestamp completion logicMichael Chan4-11/+64
The new BCM5760X chips will return the timestamp of TX packets in a new completion. Add logic in __bnxt_poll_work() to handle this completion type to retrieve the timestamp. This feature eliminates the limit on the number of in-flight PTP TX packets. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Allow some TX packets to be unprocessed in NAPIMichael Chan1-6/+15
The driver's current logic will always free all the TX SKBs up to txr->tx_hw_cons within NAPI. In the next patches, we'll be adding logic to handle TX timestamp completion and we may need to hold some remaining TX SKBs if we don't have the timestamp completions yet. Modify __bnxt_poll_work_done() to clear each event bit separately to allow bnapi->tx_int() to decide whether to clear BNXT_TX_CMP_EVENT or not. bnapi->tx_int() will not clear BNXT_TX_CMP_EVENT if some TX SKBs are held waiting for TX timestamps. Note that legacy chips will never hold any SKBs this way. The SKB is always deferred to the PTP worker slow path to retrieve the timestamp from firmware. On the new P7 chips, the timestamp is returned by the hardware directly and we can retrieve it directly from NAPI. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bdMichael Chan2-2/+7
Remove the unused is_gso field and add the is_ts_pkt field to struct bnxt_sw_tx_bd. This field will mark the TX BD that has requested HW TX timestamp. The field needs to be cleared if the timestamp packet is later aborted. This field will be useful when processing the new TX timestamp completion from the hardware in the next patches. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01bnxt_en: Add new TX timestamp completion definitionsMichael Chan1-0/+26
The new BCM5760X chips will generate this new TX timestamp completion when a TX packet's timestamp has been taken right before transmission. The driver logic to retrieve the timestamp will be added in the next few patches. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01octeontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAMNithin Dabilpuram6-3/+102
Octeontx2 hardware uses Near Data Cache(NDC) block to cache contexts in it so that access to LLC/DRAM can be avoided. It is recommended in HRM to sync the NDC contents before releasing/resetting LF resources. Hence implement NDC_SYNC mailbox and sync contexts during driver teardown. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01net: tn40xx: add initial ethtool_ops supportFUJITA Tomonori1-0/+14
Call phylink_ethtool_ksettings_get() for get_link_ksettings method and ethtool_op_get_link() for get_link method. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-01Merge tag 'nf-next-24-06-28' of ↵David S. Miller10-278/+459
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: Patch #1 to #11 to shrink memory consumption for transaction objects: struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 */ struct nft_trans_elem { /* size: 72 (-40), cachelines: 2, members: 4 */ struct nft_trans_flowtable { /* size: 80 (-48), cachelines: 2, members: 5 */ struct nft_trans_obj { /* size: 72 (-40), cachelines: 2, members: 4 */ struct nft_trans_rule { /* size: 80 (-32), cachelines: 2, members: 6 */ struct nft_trans_set { /* size: 96 (-24), cachelines: 2, members: 8 */ struct nft_trans_table { /* size: 56 (-40), cachelines: 1, members: 2 */ struct nft_trans_elem can now be allocated from kmalloc-96 instead of kmalloc-128 slab. Series from Florian Westphal. For the record, I have mangled patch #1 to add nft_trans_container_*() and use if for every transaction object. I have also added BUILD_BUG_ON to ensure struct nft_trans always comes at the beginning of the container transaction object. And few minor cleanups, any new bugs are of my own. Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno. Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang. Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting default conntrack timeouts via nfnetlink_cttimeout API, from Lin Ma. Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use larger secctx names than the existing 256 bytes length. Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving nfnetlink_queue, from Florian Westphal. Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>