summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-04-20Merge tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds3-11/+3
Pull xtensa fixes from Max Filippov: - fix patching CPU selection in patch_text - fix potential deadlock in ISS platform serial driver - fix potential register clobbering in coprocessor exception handler * tag 'xtensa-20220416' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: fix a7 clobbering in coprocessor context load/store arch: xtensa: platforms: Fix deadlock in rs_close() xtensa: patch_text: Fixup last cpu should be master
2022-04-20Merge tag 'erofs-for-5.18-rc4-fixes' of ↵Linus Torvalds3-11/+8
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "One patch to fix a use-after-free race related to the on-stack z_erofs_decompressqueue, which happens very rarely but needs to be fixed properly soon. The other patch fixes some sysfs Sphinx warnings" * tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: Documentation/ABI: sysfs-fs-erofs: Fix Sphinx errors erofs: fix use-after-free of on-stack io[]
2022-04-20Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array"Linus Torvalds1-4/+5
This reverts commit 5a519c8fe4d620912385f94372fc8472fa98c662. It turns out that making the pipe almost arbitrarily large has some rather unexpected downsides. The kernel test robot reports a kernel warning that is due to pipe->max_usage now growing to the point where the iter_file_splice_write() buffer allocation can no longer be satisfied as a slab allocation, and the int nbufs = pipe->max_usage; struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL); code sequence there will now always fail as a result. That code could be modified to use kvcalloc() too, but I feel very uncomfortable making those kinds of changes for a very niche use case that really should have other options than make these kinds of fundamental changes to pipe behavior. Maybe the CRIU process dumping should be multi-threaded, and use multiple pipes and multiple cores, rather than try to use one larger pipe to minimize splice() calls. Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/ Cc: Andrei Vagin <avagin@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-20x86: __memcpy_flushcache: fix wrong alignment if size > 2^32Mikulas Patocka1-1/+1
The first "if" condition in __memcpy_flushcache is supposed to align the "dest" variable to 8 bytes and copy data up to this alignment. However, this condition may misbehave if "size" is greater than 4GiB. The statement min_t(unsigned, size, ALIGN(dest, 8) - dest); casts both arguments to unsigned int and selects the smaller one. However, the cast truncates high bits in "size" and it results in misbehavior. For example: suppose that size == 0x100000001, dest == 0x200000002 min_t(unsigned, size, ALIGN(dest, 8) - dest) == min_t(0x1, 0xe) == 0x1; ... dest += 0x1; so we copy just one byte "and" dest remains unaligned. This patch fixes the bug by replacing unsigned with size_t. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-20f2fs: fix wrong condition check when failing metapage readJaegeuk Kim1-3/+3
This patch fixes wrong initialization. Fixes: 50c63009f6ab ("f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-20f2fs: keep io_flags to avoid IO split due to different op_flags in two fio ↵Jaegeuk Kim1-12/+21
holders Let's attach io_flags to bio only, so that we can merge IOs given original io_flags only. Fixes: 64bf0eef0171 ("f2fs: pass the bio operation to bio_alloc_bioset") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-20f2fs: remove obsolete whint_modeJaegeuk Kim4-205/+1
This patch removes obsolete whint_mode. Fixes: 41d36a9f3e53 ("fs: remove kiocb.ki_hint") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-20selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packetsIdo Schimmel1-0/+17
The test verifies that packets are correctly flooded by the bridge and the VXLAN device by matching on the encapsulated packets at the other end. However, if packets other than those generated by the test also ingress the bridge (e.g., MLD packets), they will be flooded as well and interfere with the expected count. Make the test more robust by making sure that only the packets generated by the test can ingress the bridge. Drop all the rest using tc filters on the egress of 'br0' and 'h1'. In the software data path, the problem can be solved by matching on the inner destination MAC or dropping unwanted packets at the egress of the VXLAN device, but this is not currently supported by mlxsw. Fixes: d01724dd2a66 ("selftests: mlxsw: spectrum-2: Add a test for VxLAN flooding with IPv6") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packetsIdo Schimmel1-0/+17
The test verifies that packets are correctly flooded by the bridge and the VXLAN device by matching on the encapsulated packets at the other end. However, if packets other than those generated by the test also ingress the bridge (e.g., MLD packets), they will be flooded as well and interfere with the expected count. Make the test more robust by making sure that only the packets generated by the test can ingress the bridge. Drop all the rest using tc filters on the egress of 'br0' and 'h1'. In the software data path, the problem can be solved by matching on the inner destination MAC or dropping unwanted packets at the egress of the VXLAN device, but this is not currently supported by mlxsw. Fixes: 94d302deae25 ("selftests: mlxsw: Add a test for VxLAN flooding") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20Merge branch 'mlxsw-line-card-status-tracking'David S. Miller7-18/+513
Ido Schimmel says: ==================== mlxsw: Line cards status tracking When a line card is provisioned, netdevs corresponding to the ports found on the line card are registered. User space can then perform various logical configurations (e.g., splitting, setting MTU) on these netdevs. However, since the line card is not present / powered on (i.e., it is not in 'active' state), user space cannot access the various components found on the line card. For example, user space cannot read the temperature of gearboxes or transceiver modules found on the line card via hwmon / thermal. Similarly, it cannot dump the EEPROM contents of these transceiver modules. The above is only possible when the line card becomes active. This patchset solves the problem by tracking the status of each line card and invoking callbacks from interested parties when a line card becomes active / inactive. Patchset overview: Patch #1 adds the infrastructure in the line cards core that allows users to registers a set of callbacks that are invoked when a line card becomes active / inactive. To avoid races, if a line card is already active during registration, the got_active() callback is invoked. Patches #2-#3 are preparations. Patch #4 changes the port module core to register a set of callbacks with the line cards core. See detailed description with examples in the commit message. Patches #5-#6 do the same with regards to thermal / hwmon support, so that user space will be able to monitor the temperature of various components on the line card when it becomes active. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core_hwmon: Add interfaces for line card initialization and ↵Vadim Pasternak1-0/+84
de-initialization Add callback functions for line card 'hwmon' initialization and de-initialization. Each line card is associated with the relevant 'hwmon' device, which may contain thermal attributes for the cages and gearboxes found on this line card. The line card 'hwmon' initialization / de-initialization APIs are to be called when line card is set to active / inactive state by got_active() / got_inactive() callbacks from line card state machine. For example cage temperature for module #9 located at line card #7 will be exposed by utility 'sensors' like: linecard#07 front panel 009: +32.0C (crit = +70.0C, emerg = +80.0C) And temperature for gearbox #3 located at line card #5 will be exposed like: linecard#05 gearbox 003: +41.0C (highest = +41.0C) Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core_thermal: Add interfaces for line card initialization and ↵Vadim Pasternak1-0/+74
de-initialization Add callback functions for line card thermal area initialization and de-initialization. Each line card is associated with the relevant thermal area, which may contain thermal zones for cages and gearboxes found on this line card. The line card thermal initialization / de-initialization APIs are to be called when line card is set to active / inactive state by got_active() / got_inactive() callbacks from line card state machine. For example thermal zone for module #9 located at line card #7 will have type: mlxsw-lc7-module9. And thermal zone for gearbox #2 located at line card #5 will have type: mlxsw-lc5-gearbox2. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core_env: Add interfaces for line card initialization and ↵Vadim Pasternak1-1/+165
de-initialization Netdevs for ports found on line cards are registered upon provisioning. However, user space is not allowed to access the transceiver modules found on a line card until the line card becomes active. Therefore, register event operations with the line card core to get notifications whenever a line card becomes active or inactive. When user space tries to dump the EEPROM of a transceiver module or reset it and the corresponding line card is inactive, emit an error message: ethtool -m enp1s0nl7p9 netlink error: mlxsw_core: Cannot read EEPROM of module on an inactive line card netlink error: Input/output error When user space tries to set the power mode policy of such a transceiver, cache the configuration and apply it when the line card becomes active. This is consistent with other port configuration (e.g., MTU setting) that user space is able to perform while the line card is provisioned, but inactive. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core_env: Split module power mode setting to a separate functionVadim Pasternak1-14/+27
Move the code that applies the module power mode to the device to a separate function. This function will be invoked by the next patch to set the power mode on transceiver modules found on a line card when the line card becomes active. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core: Add bus argument to environment init APIVadim Pasternak3-3/+9
Pass bus argument to mlxsw_env_init(). The purpose is to get access to device handle, which is to be provided to error message in case of line card activation failure. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20mlxsw: core_linecards: Introduce ops for linecards status change trackingJiri Pirko2-0/+154
Introduce an infrastructure allowing users to register a set of operations which are to be called whenever a line card gets active/inactive. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20ALSA: usb-audio: Clear MIDI port active flag after drainingTakashi Iwai1-0/+1
When a rawmidi output stream is closed, it calls the drain at first, then does trigger-off only when the drain returns -ERESTARTSYS as a fallback. It implies that each driver should turn off the stream properly after the drain. Meanwhile, USB-audio MIDI interface didn't change the port->active flag after the drain. This may leave the output work picking up the port that is closed right now, which eventually leads to a use-after-free for the already released rawmidi object. This patch fixes the bug by properly clearing the port->active flag after the output drain. Reported-by: syzbot+70e777a39907d6d5fd0a@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/00000000000011555605dceaff03@google.com Link: https://lore.kernel.org/r/20220420130247.22062-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-04-20dt-bindings: dmaengine: qcom: gpi: Add minItems for interruptsVinod Koul1-0/+1
Add the minItems for interrupts property as well. In the absence of this, we get warning if interrupts are less than 13 arch/arm64/boot/dts/qcom/qrb5165-rb5.dtb: dma-controller@800000: interrupts: [[0, 588, 4], [0, 589, 4], [0, 590, 4], [0, 591, 4], [0, 592, 4], [0, 593, 4], [0, 594, 4], [0, 595, 4], [0, 596, 4], [0, 597, 4]] is too short Signed-off-by: Vinod Koul <vkoul@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220414064235.1182195-1-vkoul@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-04-20nfc: MAINTAINERS: add Bug entryKrzysztof Kozlowski1-0/+1
Add a Bug section, indicating preferred mailing method for bug reports, to NFC Subsystem entry. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20dmaengine: idxd: skip clearing device context when device is read-onlyDave Jiang1-0/+3
If the device shows up as read-only configuration, skip the clearing of the state as the context must be preserved for device re-enable after being disabled. Fixes: 0dcfe41e9a4c ("dmanegine: idxd: cleanup all device related bits after disabling device") Reported-by: Tony Zhu <tony.zhu@intel.com> Tested-by: Tony Zhu <tony.zhu@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/164971479479.2200566.13980022473526292759.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-04-20dmaengine: idxd: add RO check for wq max_transfer_size writeDave Jiang1-0/+3
Block wq_max_transfer_size_store() when the device is configured as read-only and not configurable. Fixes: d7aad5550eca ("dmaengine: idxd: add support for configurable max wq xfer size") Reported-by: Bernice Zhang <bernice.zhang@intel.com> Tested-by: Bernice Zhang <bernice.zhang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/164971488154.2200913.10706665404118545941.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-04-20dmaengine: idxd: add RO check for wq max_batch_size writeDave Jiang1-0/+3
Block wq_max_batch_size_store() when the device is configured as read-only and not configurable. Fixes: e7184b159dd3 ("dmaengine: idxd: add support for configurable max wq batch size") Reported-by: Bernice Zhang <bernice.zhang@intel.com> Tested-by: Bernice Zhang <bernice.zhang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/164971493551.2201159.1942042593642155209.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-04-20dmaengine: idxd: fix retry value to be constant for duration of function callDave Jiang1-2/+2
When retries is compared to wq->enqcmds_retries each loop of idxd_enqcmds(), wq->enqcmds_retries can potentially changed by user. Assign the value of retries to wq->enqcmds_retries during initialization so it is the original value set when entering the function. Fixes: 7930d8553575 ("dmaengine: idxd: add knob for enqcmds retries") Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/165031760154.3658664.1983547716619266558.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-04-20dmaengine: idxd: match type for retries var in idxd_enqcmds()Dave Jiang1-1/+2
wq->enqcmds_retries is defined as unsigned int. However, retries on the stack is defined as int. Change retries to unsigned int to compare the same type. Fixes: 7930d8553575 ("dmaengine: idxd: add knob for enqcmds retries") Suggested-by: Thiago Macieira <thiago.macieira@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/165031747059.3658198.6035308204505664375.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-04-20dmaengine: dw-edma: Fix inconsistent indentingJiapeng Chong1-9/+10
Eliminate the follow smatch warning: drivers/dma/dw-edma/dw-edma-v0-core.c:419 dw_edma_v0_core_start() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20220413023442.18856-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-04-20Merge tag 'linux-can-next-for-5.19-20220419' of ↵David S. Miller30-47/+3382
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-04-19 this is a pull request of 17 patches for net-next/master. The first 2 patches are by me and target the CAN driver infrastructure. One patch renames a function in the rx_offload helper the other one updates the CAN bitrate calculation to prefer small bit rate pre-scalers over larger ones, which is encouraged by the CAN in Automation. Kris Bahnsen contributes a patch to fix the links to Technologic Systems web resources in the sja1000 driver. Christophe Leroy's patch prepares the mpc5xxx_can driver for upcoming powerpc header cleanup. Minghao Chi's patch converts the flexcan driver to use pm_runtime_resume_and_get(). The next 2 patches target the Xilinx CAN driver. Lukas Bulwahn's patch fixes an entry in the MAINTAINERS file. A patch by me marks the bit timing constants as const. Wolfram Sang's patch documents r8a77961 support on the renesas,rcar-canfd bindings document. The next 2 patches are by me and add support for the mcp251863 chip to the mcp251xfd driver. The last 7 patches are by Pavel Pisa, Martin Jerabek et al. and add the ctucanfd driver for the CTU CAN FD IP Core. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: stmmac: Use readl_poll_timeout_atomic() in atomic stateKevin Hao1-2/+2
The init_systime() may be invoked in atomic state. We have observed the following call trace when running "phc_ctl /dev/ptp0 set" on a Intel Agilex board. BUG: sleeping function called from invalid context at drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c:74 in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 381, name: phc_ctl preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 Preemption disabled at: [<ffff80000892ef78>] stmmac_set_time+0x34/0x8c CPU: 2 PID: 381 Comm: phc_ctl Not tainted 5.18.0-rc2-next-20220414-yocto-standard+ #567 Hardware name: SoCFPGA Agilex SoCDK (DT) Call trace: dump_backtrace.part.0+0xc4/0xd0 show_stack+0x24/0x40 dump_stack_lvl+0x7c/0xa0 dump_stack+0x18/0x34 __might_resched+0x154/0x1c0 __might_sleep+0x58/0x90 init_systime+0x78/0x120 stmmac_set_time+0x64/0x8c ptp_clock_settime+0x60/0x9c pc_clock_settime+0x6c/0xc0 __arm64_sys_clock_settime+0x88/0xf0 invoke_syscall+0x5c/0x130 el0_svc_common.constprop.0+0x4c/0x100 do_el0_svc+0x7c/0xa0 el0_svc+0x58/0xcc el0t_64_sync_handler+0xa4/0x130 el0t_64_sync+0x18c/0x190 So we should use readl_poll_timeout_atomic() here instead of readl_poll_timeout(). Also adjust the delay time to 10us to fix a "__bad_udelay" build error reported by "kernel test robot <lkp@intel.com>". I have tested this on Intel Agilex and NXP S32G boards, there is no delay needed at all. So the 10us delay should be long enough for most cases. Fixes: ff8ed737860e ("net: stmmac: use readl_poll_timeout() function in init_systime()") Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20Merge branch 'net-sched-flower-num-vlan-tags'David S. Miller4-33/+88
Boris Sukholitko says: ==================== net/sched: flower: match on the number of vlan tags Our customers in the fiber telecom world have network configurations where they would like to control their traffic according to the number of tags appearing in the packet. For example, TR247 GPON conformance test suite specification mostly talks about untagged, single, double tagged packets and gives lax guidelines on the vlan protocol vs. number of vlan tags. This is different from the common IT networks where 802.1Q and 802.1ad protocols are usually describe single and double tagged packet. GPON configurations that we work with have arbitrary mix the above protocols and number of vlan tags in the packet. The following patch series implement number of vlans flower filter. They add num_of_vlans flower filter as an alternative to vlan ethtype protocol matching. The end result is that the following command becomes possible: tc filter add dev eth1 ingress flower \ num_of_vlans 1 vlan_prio 5 action drop Also, from our logs, we have redirect rules such that: tc filter add dev $GPON ingress flower num_of_vlans $N \ action mirred egress redirect dev $DEV where N can range from 0 to 3 and $DEV is the function of $N. Also there are rules setting skb mark based on the number of vlans: tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \ $P action skbedit mark $M More about the patch series: - patches 1-2 remove duplicate code by introducing is_key_vlan helper. - patch 3, 4 implement num_of_vlans in the dissector and in the flower. - patch 5 uses the num_of_vlans filter to allow further matching on vlan attributes. Complementary iproute2 patches are being sent separately. Thanks, Boris. - v4: rebased to the latest net-next - v3: - more example commands in patch 3 description (request by Jamal) - patch 5 description made clearer (thanks to Jiri) - v2: - add suitable subject prefixes - more evolved patch 5 description ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net/sched: flower: Consider the number of tags for vlan filtersBoris Sukholitko1-8/+16
Before this patch the existence of vlan filters was conditional on the vlan protocol being matched in the tc rule. For example, the following rule: tc filter add dev eth1 ingress flower vlan_prio 5 was illegal because vlan protocol (e.g. 802.1q) does not appear in the rule. Remove the above restriction by looking at the num_of_vlans filter to allow further matching on vlan attributes. The following rule becomes legal as a result of this commit: tc filter add dev eth1 ingress flower num_of_vlans 1 vlan_prio 5 because having num_of_vlans==1 implies that the packet is single tagged. Change is_vlan_key helper to look at the number of vlans in addition to the vlan ethertype. The outcome of this change is that outer (e.g. vlan_prio) and inner (e.g. cvlan_prio) tag vlan filters require the number of vlan tags to be greater then 0 and 1 accordingly. As a result of is_vlan_key change, the ethertype may be set to 0 when matching on the number of vlans. Update fl_set_key_vlan to avoid setting key, mask vlan_tpid for the 0 ethertype. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net/sched: flower: Add number of vlan tags filterBoris Sukholitko2-0/+16
These are bookkeeping parts of the new num_of_vlans filter. Defines, dump, load and set are being done here. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20flow_dissector: Add number of vlan tags dissectorBoris Sukholitko2-0/+29
Our customers in the fiber telecom world have network configurations where they would like to control their traffic according to the number of tags appearing in the packet. For example, TR247 GPON conformance test suite specification mostly talks about untagged, single, double tagged packets and gives lax guidelines on the vlan protocol vs. number of vlan tags. This is different from the common IT networks where 802.1Q and 802.1ad protocols are usually describe single and double tagged packet. GPON configurations that we work with have arbitrary mix the above protocols and number of vlan tags in the packet. The goal is to make the following TC commands possible: tc filter add dev eth1 ingress flower \ num_of_vlans 1 vlan_prio 5 action drop From our logs, we have redirect rules such that: tc filter add dev $GPON ingress flower num_of_vlans $N \ action mirred egress redirect dev $DEV where N can range from 0 to 3 and $DEV is the function of $N. Also there are rules setting skb mark based on the number of vlans: tc filter add dev $GPON ingress flower num_of_vlans $N vlan_prio \ $P action skbedit mark $M This new dissector allows extracting the number of vlan tags existing in the packet. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net/sched: flower: Reduce identation after is_key_vlan refactoringBoris Sukholitko1-17/+17
Whitespace only. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net/sched: flower: Helper function for vlan ethtype checksBoris Sukholitko1-15/+17
There are somewhat repetitive ethertype checks in fl_set_key. Refactor them into is_vlan_key helper function. To make the changes clearer, avoid touching identation levels. This is the job for the next patch in the series. Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20ar5523: Use kzalloc instead of kmalloc/memsetHaowen Bai1-2/+1
Use kzalloc rather than duplicating its implementation, which makes code simple and easy to understand. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: dsa: realtek: remove realtek,rtl8367s stringLuiz Angelo Daros de Luca2-5/+0
There is no need to add new compatible strings for each new supported chip version. The compatible string is used only to select the subdriver (rtl8365mb.c or rtl8366rb.c). Once in the subdriver, it will detect the chip model by itself, ignoring which compatible string was used. Link: https://lore.kernel.org/netdev/20220414014055.m4wbmr7tdz6hsa3m@bang-olufsen.dk/ Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20dt-bindings: net: dsa: realtek: cleanup compatible stringsLuiz Angelo Daros de Luca1-21/+14
Compatible strings are used to help the driver find the chip ID/version register for each chip family. After that, the driver can setup the switch accordingly. Keep only the first supported model for each family as a compatible string and reference other chip models in the description. The removed compatible strings have never been used in a released kernel. CC: devicetree@vger.kernel.org Link: https://lore.kernel.org/netdev/20220414014055.m4wbmr7tdz6hsa3m@bang-olufsen.dk/ Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20Merge branch 'hns3-next'David S. Miller12-46/+116
Guangbin Huang says: ==================== net: hns3: updates for -next This series includes some updates for the HNS3 ethernet driver. Change logs: V1 -> V2: - Fix failed to apply to net-next problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: hns3: remove unnecessary line wrap for hns3_set_tunableHao Chen1-6/+3
Remove unnecessary line wrap for hns3_set_tunable to improve function readability. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: hns3: replace magic value by HCLGE_RING_REG_OFFSETPeng Li1-1/+1
Magic values are not recommended. Signed-off-by: Peng Li<lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: hns3: fix the wrong words in commentsPeng Li3-3/+3
This patch fixes wrong words in comments. Signed-off-by: Peng Li<lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: hns3: update the comment of function hclgevf_get_mbx_respPeng Li1-2/+4
The param of function hclgevf_get_mbx_resp has been changed but the comments not upodated. This patch updates it. Signed-off-by: Peng Li<lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: hns3: add log for setting tx spare buf sizeHao Chen1-0/+6
For the active tx spare buffer size maybe changed according to the page size, so add log to notice it. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: hns3: add failure logs in hclge_set_vport_mtuJie Wang1-0/+3
Currently, There is a low probability that pf mtu configuration fails, but the information in logs is insufficient for problem locating when the VF mtu value is illegally modified. So record the vf index and vf mtu value at the failure scenario. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: hns3: refine the definition for struct hclge_pf_to_vf_msgJian Shen2-5/+14
The struct hclge_pf_to_vf_msg is used for mailbox message from PF to VF, including both response and request. But its definition can only indicate respone, which makes the message data copy in function hclge_send_mbx_msg() unreadable. So refine it by edding a general message definition into it. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: hns3: refactor hns3_set_ringparam()Hao Chen2-22/+49
Use struct hns3_ring_param to replace variable new/old_xxx and add hns3_is_ringparam_changed() to judge them if is changed to improve code readability. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: hns3: add ethtool parameter check for CQE/EQE modeYufeng Mo5-7/+33
For DEVICE_VERSION_V2, the hardware does not support the CQE mode. So add capability bit for coalesce CQE mode and add parameter check for it in ethtool. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20Merge branch 'atlantic-xdp-multi-buffer'David S. Miller12-45/+670
[PATCH net-next v5 0/3] net: atlantic: Add XDP support @ 2022-04-17 10:12 Taehee Yoo 2022-04-17 10:12 ` [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane Taehee Yoo ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw) To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk, john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf Cc: ap420073 This patchset is to make atlantic to support multi-buffer XDP. The first patch implement control plane of xdp. The aq_xdp(), callback of .xdp_bpf is added. The second patch implements data plane of xdp. XDP_TX, XDP_DROP, and XDP_PASS is supported. __aq_ring_xdp_clean() is added to receive and execute xdp program. aq_nic_xmit_xdpf() is added to send packet by XDP. The third patch implements callback of .ndo_xdp_xmit. aq_xdp_xmit() is added to send redirected packets and it internally calls aq_nic_xmit_xdpf(). Memory model is MEM_TYPE_PAGE_SHARED. Order-2 page allocation is used when XDP is enabled. LRO will be disabled if XDP program doesn't supports multi buffer. AQC chip supports 32 multi-queues and 8 vectors(irq). There are two options. 1. under 8 cores and maximum 4 tx queues per core. 2. under 4 cores and maximum 8 tx queues per core. Like other drivers, these tx queues can be used only for XDP_TX, XDP_REDIRECT queue. If so, no tx_lock is needed. But this patchset doesn't use this strategy because getting hardware tx queue index cost is too high. So, tx_lock is used in the aq_nic_xmit_xdpf(). single-core, single queue, 80% cpu utilization. 32.30% [kernel] [k] aq_get_rxpages_xdp 10.44% [kernel] [k] aq_hw_read_reg <---------- here 9.86% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx 5.51% [kernel] [k] aq_ring_rx_clean single-core, 8 queues, 100% cpu utilization, half PPS. 52.03% [kernel] [k] aq_hw_read_reg <---------- here 18.24% [kernel] [k] aq_get_rxpages_xdp 4.30% [kernel] [k] hw_atl_b0_hw_ring_rx_receive 4.24% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx 2.79% [kernel] [k] aq_ring_rx_clean Performance result(64 Byte) 1. XDP_TX a. xdp_geieric, single core - 2.5Mpps, 100% cpu b. xdp_driver, single core - 4.5Mpps, 80% cpu c. xdp_generic, 8 core(hyper thread) - 6.3Mpps, 40% cpu d. xdp_driver, 8 core(hyper thread) - 6.3Mpps, 30% cpu 2. XDP_REDIRECT a. xdp_generic, single core - 2.3Mpps b. xdp_driver, single core - 4.5Mpps v5: - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0 - Use 2K frame size instead of 3K - Use order-2 page allocation instead of order-0 - Rename aq_get_rxpage() to aq_alloc_rxpages() - Add missing PageFree stats for ethtool - Remove aq_unset_rxpage_xdp(), introduced by v2 patch due to change of memory model - Fix wrong last parameter value of xdp_prepare_buff() - Add aq_get_rxpages_xdp() to increase page reference count v4: - Fix compile warning v3: - Change wrong PPS performance result 40% -> 80% in single core(Intel i3-12100) - Separate aq_nic_map_xdp() from aq_nic_map_skb() - Drop multi buffer packets if single buffer XDP is attached - Disable LRO when single buffer XDP is attached - Use xdp_get_{frame/buff}_len() v2: - Do not use inline in C file Taehee Yoo (3): net: atlantic: Implement xdp control plane net: atlantic: Implement xdp data plane net: atlantic: Implement .ndo_xdp_xmit handler .../net/ethernet/aquantia/atlantic/aq_cfg.h | 1 + .../ethernet/aquantia/atlantic/aq_ethtool.c | 9 + .../net/ethernet/aquantia/atlantic/aq_main.c | 87 ++++ .../net/ethernet/aquantia/atlantic/aq_main.h | 2 + .../net/ethernet/aquantia/atlantic/aq_nic.c | 136 ++++++ .../net/ethernet/aquantia/atlantic/aq_nic.h | 5 + .../net/ethernet/aquantia/atlantic/aq_ring.c | 409 ++++++++++++++++-- .../net/ethernet/aquantia/atlantic/aq_ring.h | 21 +- .../net/ethernet/aquantia/atlantic/aq_vec.c | 23 +- .../net/ethernet/aquantia/atlantic/aq_vec.h | 6 + .../aquantia/atlantic/hw_atl/hw_atl_a0.c | 6 +- .../aquantia/atlantic/hw_atl/hw_atl_b0.c | 10 +- 12 files changed, 670 insertions(+), 45 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 4+ messages in thread * [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane 2022-04-17 10:12 [PATCH net-next v5 0/3] net: atlantic: Add XDP support Taehee Yoo @ 2022-04-17 10:12 ` Taehee Yoo 2022-04-17 10:12 ` [PATCH net-next v5 2/3] net: atlantic: Implement xdp data plane Taehee Yoo 2022-04-17 10:12 ` [PATCH net-next v5 3/3] net: atlantic: Implement .ndo_xdp_xmit handler Taehee Yoo 2 siblings, 0 replies; 4+ messages in thread From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw) To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk, john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf Cc: ap420073 aq_xdp() is a xdp setup callback function for Atlantic driver. When XDP is attached or detached, the device will be restarted because it uses different headroom, tailroom, and page order value. If XDP enabled, it switches default page order value from 0 to 2. Because the default maximum frame size is still 2K and it needs additional area for headroom and tailroom. The total size(headroom + frame size + tailroom) is 2624. So, 1472Bytes will be always wasted for every frame. But when order-2 is used, these pages can be used 6 times with flip strategy. It means only about 106Bytes per frame will be wasted. Also, It supports xdp fragment feature. MTU can be 16K if xdp prog supports xdp fragment. If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS. And a static key is added and It will be used to call the xdp_clean handler in ->poll(). data plane implementation will be contained the followed patch. Signed-off-by: Taehee Yoo <ap420073@gmail.com> --- v5: - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0 - Use 2K frame size instead of 3K - Use order-2 page allocation instead of order-0 - Rename aq_get_rxpage() to aq_alloc_rxpages() v4: - No changed v3: - Disable LRO when single buffer XDP is attached v2: - No changed
2022-04-20net: atlantic: Implement .ndo_xdp_xmit handlerTaehee Yoo3-0/+26
aq_xdp_xmit() is the callback function of .ndo_xdp_xmit. It internally calls aq_nic_xmit_xdpf() to send packet. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: atlantic: Implement xdp data planeTaehee Yoo6-9/+468
It supports XDP_PASS, XDP_DROP and multi buffer. The new function aq_nic_xmit_xdpf() is used to send packet with xdp_frame and internally it calls aq_nic_map_xdp(). AQC chip supports 32 multi-queues and 8 vectors(irq). there are two option 1. under 8 cores and 4 tx queues per core. 2. under 4 cores and 8 tx queues per core. Like ixgbe, these tx queues can be used only for XDP_TX, XDP_REDIRECT queue. If so, no tx_lock is needed. But this patchset doesn't use this strategy because getting hardware tx queue index cost is too high. So, tx_lock is used in the aq_nic_xmit_xdpf(). single-core, single queue, 80% cpu utilization. 30.75% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx 10.35% [kernel] [k] aq_hw_read_reg <---------- here 4.38% [kernel] [k] get_page_from_freelist single-core, 8 queues, 100% cpu utilization, half PPS. 45.56% [kernel] [k] aq_hw_read_reg <---------- here 17.58% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx 4.72% [kernel] [k] hw_atl_b0_hw_ring_rx_receive The new function __aq_ring_xdp_clean() is a xdp rx handler and this is called only when XDP is attached. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: atlantic: Implement xdp control planeTaehee Yoo10-37/+177
aq_xdp() is a xdp setup callback function for Atlantic driver. When XDP is attached or detached, the device will be restarted because it uses different headroom, tailroom, and page order value. If XDP enabled, it switches default page order value from 0 to 2. Because the default maximum frame size is still 2K and it needs additional area for headroom and tailroom. The total size(headroom + frame size + tailroom) is 2624. So, 1472Bytes will be always wasted for every frame. But when order-2 is used, these pages can be used 6 times with flip strategy. It means only about 106Bytes per frame will be wasted. Also, It supports xdp fragment feature. MTU can be 16K if xdp prog supports xdp fragment. If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS. And a static key is added and It will be used to call the xdp_clean handler in ->poll(). data plane implementation will be contained the followed patch. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>