summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-03-13mptcp: add rm_list in mptcp_options_receivedGeliang Tang3-10/+18
This patch changed the member rm_id in struct mptcp_options_received as a list of the removing address ids, and renamed it to rm_list. In mptcp_parse_option, parsed the RM_ADDR suboption and filled them into the rm_list in struct mptcp_options_received. In mptcp_incoming_options, passed this rm_list to the function mptcp_pm_rm_addr_received. It also changed the parameter type of mptcp_pm_rm_addr_received. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13mptcp: add rm_list_tx in mptcp_pm_dataGeliang Tang3-10/+18
This patch added a new member rm_list_tx for struct mptcp_pm_data as the removing address list on the outgoing direction. Initialize its nr field to zero in mptcp_pm_data_init. In mptcp_pm_remove_anno_addr, put the single address id into an removing list, and passed it to mptcp_pm_remove_addr. In mptcp_pm_remove_addr, save the input rm_list to rm_list_tx in struct mptcp_pm_data. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13mptcp: add rm_list in mptcp_out_optionsGeliang Tang4-13/+48
This patch defined a new struct mptcp_rm_list, the ids field was an array of the removing address ids, the nr field was the valid number of removing address ids in the array. The array size was definced as a new macro MPTCP_RM_IDS_MAX. Changed the member rm_id of struct mptcp_out_options to rm_list. In mptcp_established_options_rm_addr, invoked mptcp_pm_rm_addr_signal to get the rm_list. According the number of addresses in it, calculated the padded RM_ADDR suboption length. And saved the ids array in struct mptcp_out_options's rm_list member. In mptcp_write_options, iterated each address id from struct mptcp_out_options's rm_list member, set the invalid ones as TCPOPT_NOP, then filled them into the RM_ADDR suboption. Changed TCPOLEN_MPTCP_RM_ADDR_BASE from 4 to 3. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13Merge branch 'resil-nhgroups-netdevsim-selftests'David S. Miller5-10/+2059
Petr Machata says: ==================== net: Resilient NH groups: netdevsim, selftests Support for resilient next-hop groups was added in a previous patch set. Resilient next hop groups add a layer of indirection between the SKB hash and the next hop. Thus the hash is used to reference a hash table bucket, which is then used to reference a particular next hop. This allows the system more flexibility when assigning SKB hash space to next hops. Previously, each next hop had to be assigned a continuous range of SKB hash space. With a hash table as an intermediate layer, it is possible to reassign next hops with a hash table bucket granularity. In turn, this mends issues with traffic flow redirection resulting from next hop removal or adjustments in next-hop weights. This patch set introduces mock offloading of resilient next hop groups by the netdevsim driver, and a suite of selftests. - Patch #1 adds a netdevsim-specific lock to protect next-hop hashtable. Previously, netdevsim relied on RTNL to maintain mutual exclusion. Patch #2 extracts a helper to make the following patches clearer. - Patch #3 implements the support for offloading of resilient next-hop groups. - Patch #4 introduces a new debugfs interface to set activity on a selected next-hop bucket. This simulates how HW can periodically report bucket activity, and buckets thus marked are expected to be exempt from migration to new next hops when the group changes. - Patches #5 and #6 clean up the fib_nexthop selftests. - Patches #7, #8 and #9 add tests for resilient next hop groups. Patch #7 adds resilient-hashing counterparts to fib_nexthops.sh. Patch #8 adds a new traffic test for resilient next-hop groups. Patch #9 adds a new traffic test for tunneling. - Patch #10 actually leverages the netdevsim offload to implement a suite of algorithmic tests that verify how and when buckets are migrated under various simulated workload scenarios. The overall plan is to contribute approximately the following patchsets: 1) Nexthop policy refactoring (already pushed) 2) Preparations for resilient next hop groups (already pushed) 3) Implementation of resilient next hop group (already pushed) 4) Netdevsim offload plus a suite of selftests (this patchset) 5) Preparations for mlxsw offload of resilient next-hop groups 6) mlxsw offload including selftests Interested parties can look at the complete code at [2]. [1] https://tools.ietf.org/html/rfc2992 [2] https://github.com/idosch/linux/commits/submit/res_integ_v1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13selftests: netdevsim: Add test for resilient nexthop groups offload APIIdo Schimmel1-0/+620
Test various aspects of the resilient nexthop group offload API on top of the netdevsim implementation. Both good and bad flows are tested. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13selftests: forwarding: Add resilient multipath tunneling nexthop testIdo Schimmel1-0/+361
Add a resilient nexthop objects version of gre_multipath_nh.sh. Test that both IPv4 and IPv6 overlays work with resilient nexthop groups where the nexthops are two GRE tunnels. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13selftests: forwarding: Add resilient hashing testIdo Schimmel1-0/+400
Verify that IPv4 and IPv6 multipath forwarding works correctly with resilient nexthop groups and with different weights. Test that when the idle timer is not zero, the resilient groups are not rebalanced - because the nexthop buckets are considered active - and the initial weights (1:1) are used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13selftests: fib_nexthops: Test resilient nexthop groupsIdo Schimmel1-0/+517
Add test cases for resilient nexthop groups. Exhaustive forwarding tests are added separately under net/forwarding/. Examples: # ./fib_nexthops.sh -t basic_res Basic resilient nexthop group functional tests ---------------------------------------------- TEST: Add a nexthop group with default parameters [ OK ] TEST: Get a nexthop group with default parameters [ OK ] TEST: Get a nexthop group with non-default parameters [ OK ] TEST: Add a nexthop group with 0 buckets [ OK ] TEST: Replace nexthop group parameters [ OK ] TEST: Get a nexthop group after replacing parameters [ OK ] TEST: Replace idle timer [ OK ] TEST: Get a nexthop group after replacing idle timer [ OK ] TEST: Replace unbalanced timer [ OK ] TEST: Get a nexthop group after replacing unbalanced timer [ OK ] TEST: Replace with no parameters [ OK ] TEST: Get a nexthop group after replacing no parameters [ OK ] TEST: Replace nexthop group type - implicit [ OK ] TEST: Replace nexthop group type - explicit [ OK ] TEST: Replace number of nexthop buckets [ OK ] TEST: Get a nexthop group after replacing with invalid parameters [ OK ] TEST: Dump all nexthop buckets [ OK ] TEST: Dump all nexthop buckets in a group [ OK ] TEST: Dump all nexthop buckets with a specific nexthop device [ OK ] TEST: Dump all nexthop buckets with a specific nexthop identifier [ OK ] TEST: Dump all nexthop buckets in a non-existent group [ OK ] TEST: Dump all nexthop buckets in a non-resilient group [ OK ] TEST: Dump all nexthop buckets using a non-existent device [ OK ] TEST: Dump all nexthop buckets with invalid 'groups' keyword [ OK ] TEST: Dump all nexthop buckets with invalid 'fdb' keyword [ OK ] TEST: Get a valid nexthop bucket [ OK ] TEST: Get a nexthop bucket with valid group, but invalid index [ OK ] TEST: Get a nexthop bucket from a non-resilient group [ OK ] TEST: Get a nexthop bucket from a non-existent group [ OK ] Tests passed: 29 Tests failed: 0 # ./fib_nexthops.sh -t ipv4_large_res_grp IPv4 large resilient group (128k buckets) ----------------------------------------- TEST: Dump large (x131072) nexthop buckets [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_large_res_grp IPv6 large resilient group (128k buckets) ----------------------------------------- TEST: Dump large (x131072) nexthop buckets [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv4_res_torture IPv4 runtime resilient nexthop group torture -------------------------------------------- TEST: IPv4 resilient nexthop group torture test [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_res_torture IPv6 runtime resilient nexthop group torture -------------------------------------------- TEST: IPv6 resilient nexthop group torture test [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv4_res_grp_fcnal IPv4 resilient groups functional -------------------------------- TEST: Nexthop group updated when entry is deleted [ OK ] TEST: Nexthop buckets updated when entry is deleted [ OK ] TEST: Nexthop group updated after replace [ OK ] TEST: Nexthop buckets updated after replace [ OK ] TEST: Nexthop group updated when entry is deleted - nECMP [ OK ] TEST: Nexthop buckets updated when entry is deleted - nECMP [ OK ] TEST: Nexthop group updated after replace - nECMP [ OK ] TEST: Nexthop buckets updated after replace - nECMP [ OK ] Tests passed: 8 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_res_grp_fcnal IPv6 resilient groups functional -------------------------------- TEST: Nexthop group updated when entry is deleted [ OK ] TEST: Nexthop buckets updated when entry is deleted [ OK ] TEST: Nexthop group updated after replace [ OK ] TEST: Nexthop buckets updated after replace [ OK ] TEST: Nexthop group updated when entry is deleted - nECMP [ OK ] TEST: Nexthop buckets updated when entry is deleted - nECMP [ OK ] TEST: Nexthop group updated after replace - nECMP [ OK ] TEST: Nexthop buckets updated after replace - nECMP [ OK ] Tests passed: 8 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13selftests: fib_nexthops: List each test case in a different lineIdo Schimmel1-4/+26
The lines with the IPv4 and IPv6 test cases are already very long and more test cases will be added in subsequent patches. List each test case in a different line to make it easier to extend the test with more test cases. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13selftests: fib_nexthops: Declutter test outputIdo Schimmel1-0/+2
Before: # ./fib_nexthops.sh -t ipv4_torture IPv4 runtime torture -------------------- TEST: IPv4 torture test [ OK ] ./fib_nexthops.sh: line 213: 19376 Killed ipv4_del_add_loop1 ./fib_nexthops.sh: line 213: 19377 Killed ipv4_grp_replace_loop ./fib_nexthops.sh: line 213: 19378 Killed ip netns exec me ping -f 172.16.101.1 > /dev/null 2>&1 ./fib_nexthops.sh: line 213: 19380 Killed ip netns exec me ping -f 172.16.101.2 > /dev/null 2>&1 ./fib_nexthops.sh: line 213: 19381 Killed ip netns exec me mausezahn veth1 -B 172.16.101.2 -A 172.16.1.1 -c 0 -t tcp "dp=1-1023, flags=syn" > /dev/null 2>&1 Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_torture IPv6 runtime torture -------------------- TEST: IPv6 torture test [ OK ] ./fib_nexthops.sh: line 213: 24453 Killed ipv6_del_add_loop1 ./fib_nexthops.sh: line 213: 24454 Killed ipv6_grp_replace_loop ./fib_nexthops.sh: line 213: 24456 Killed ip netns exec me ping -f 2001:db8:101::1 > /dev/null 2>&1 ./fib_nexthops.sh: line 213: 24457 Killed ip netns exec me ping -f 2001:db8:101::2 > /dev/null 2>&1 ./fib_nexthops.sh: line 213: 24458 Killed ip netns exec me mausezahn -6 veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn" > /dev/null 2>&1 Tests passed: 1 Tests failed: 0 After: # ./fib_nexthops.sh -t ipv4_torture IPv4 runtime torture -------------------- TEST: IPv4 torture test [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_torture IPv6 runtime torture -------------------- TEST: IPv6 torture test [ OK ] Tests passed: 1 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13netdevsim: Allow reporting activity on nexthop bucketsIdo Schimmel1-0/+61
A key component of the resilient hashing algorithm is the hash buckets' activity. If a bucket is active, it will not be populated with a new nexthop in order not to break existing flows. Therefore, in order to easily and thoroughly test the algorithm, we need to be in full control over the reported activity. Add a debugfs interface that allows user space to have netdevsim report a nexthop bucket within a resilient nexthop group as active. For example: # echo 10 23 > /sys/kernel/debug/netdevsim/netdevsim10/fib/nexthop_bucket_activity Will mark bucket 23 in nexthop group 10 as active. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13netdevsim: Add support for resilient nexthop groupsIdo Schimmel1-0/+55
Allow resilient nexthop groups to be programmed and account their occupancy according to their number of buckets. The nexthop group itself as well as its buckets are marked with hardware flags (i.e., 'RTNH_F_TRAP'). Replacement of a single nexthop bucket can fail using the following debugfs knob: # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace N # echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace Y Replacement of a resilient nexthop group can fail using the following debugfs knob: # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace N # echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace Y This enables testing of various error paths. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13netdevsim: Create a helper for setting nexthop hardware flagsIdo Schimmel1-3/+10
Instead of calling nexthop_set_hw_flags(), call a helper. It will be used to also set nexthop bucket flags in a subsequent patch. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13netdevsim: fib: Introduce a lock to guard nexthop hashtablePetr Machata1-3/+7
Currently netdevsim relies on RTNL to maintain exclusivity in accessing the nexthop hash table. However, bucket notification may be called without RTNL having been held. Instead, introduce a custom lock to guard the table. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13Merge branch 'ptp-warnings'David S. Miller5-25/+32
Lee Jones says: ==================== Rid W=1 warnings from PTP This set is part of a larger effort attempting to clean-up W=1 kernel builds, which are currently overwhelmingly riddled with niggly little warnings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13ptp: ptp_p: Demote non-conformant kernel-doc headers and supply a param ↵Lee Jones1-3/+6
description Fixes the following W=1 kernel build warning(s): drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'control' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'event' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'addend' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'accum' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'test' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ts_compare' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rsystime_lo' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rsystime_hi' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'systime_lo' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'systime_hi' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'trgt_lo' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'trgt_hi' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'asms_lo' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'asms_hi' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'amms_lo' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'amms_hi' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ch_control' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ch_event' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'tx_snap_lo' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'tx_snap_hi' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rx_snap_lo' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'rx_snap_hi' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'src_uuid_lo' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'src_uuid_hi' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'can_status' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'can_snap_lo' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'can_snap_hi' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ts_sel' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'ts_st' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'reserve1' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'stl_max_set_en' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'stl_max_set' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'reserve2' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:78: warning: Function parameter or member 'srst' not described in 'pch_ts_regs' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'regs' not described in 'pch_dev' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'ptp_clock' not described in 'pch_dev' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'caps' not described in 'pch_dev' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'exts0_enabled' not described in 'pch_dev' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'exts1_enabled' not described in 'pch_dev' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'mem_base' not described in 'pch_dev' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'mem_size' not described in 'pch_dev' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'irq' not described in 'pch_dev' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'pdev' not described in 'pch_dev' drivers/ptp/ptp_pch.c:121: warning: Function parameter or member 'register_lock' not described in 'pch_dev' drivers/ptp/ptp_pch.c:128: warning: Function parameter or member 'station' not described in 'pch_params' drivers/ptp/ptp_pch.c:291: warning: Function parameter or member 'pdev' not described in 'pch_set_station_address' Cc: Richard Cochran <richardcochran@gmail.com> Cc: LAPIS SEMICONDUCTOR <tshimizu818@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13ptp: ptp_clockmatrix: Demote non-kernel-doc header to standard commentLee Jones1-2/+2
Fixes the following W=1 kernel build warning(s): drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds drivers/ptp/ptp_clockmatrix.c:1408: warning: Cannot understand * @brief Maximum absolute value for write phase offset in picoseconds Cc: Richard Cochran <richardcochran@gmail.com> Cc: IDT-support-1588@lm.renesas.com Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13ptp_pch: Move 'pch_*()' prototypes to shared headerLee Jones4-8/+24
Fixes the following W=1 kernel build warning(s): drivers/ptp/ptp_pch.c:193:6: warning: no previous prototype for ‘pch_ch_control_write’ [-Wmissing-prototypes] drivers/ptp/ptp_pch.c:201:5: warning: no previous prototype for ‘pch_ch_event_read’ [-Wmissing-prototypes] drivers/ptp/ptp_pch.c:212:6: warning: no previous prototype for ‘pch_ch_event_write’ [-Wmissing-prototypes] drivers/ptp/ptp_pch.c:220:5: warning: no previous prototype for ‘pch_src_uuid_lo_read’ [-Wmissing-prototypes] drivers/ptp/ptp_pch.c:231:5: warning: no previous prototype for ‘pch_src_uuid_hi_read’ [-Wmissing-prototypes] drivers/ptp/ptp_pch.c:242:5: warning: no previous prototype for ‘pch_rx_snap_read’ [-Wmissing-prototypes] drivers/ptp/ptp_pch.c:259:5: warning: no previous prototype for ‘pch_tx_snap_read’ [-Wmissing-prototypes] drivers/ptp/ptp_pch.c:300:5: warning: no previous prototype for ‘pch_set_station_address’ [-Wmissing-prototypes] Cc: Richard Cochran <richardcochran@gmail.com> (maintainer:PTP HARDWARE CLOCK SUPPORT) Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Flavio Suligoi <f.suligoi@asem.it> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13ptp_pch: Remove unused function 'pch_ch_control_read()'Lee Jones2-12/+0
Fixes the following W=1 kernel build warning(s): drivers/ptp/ptp_pch.c:182:5: warning: no previous prototype for ‘pch_ch_control_read’ [-Wmissing-prototypes] Cc: Richard Cochran <richardcochran@gmail.com> (maintainer:PTP HARDWARE CLOCK SUPPORT) Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Flavio Suligoi <f.suligoi@asem.it> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13net: dsa: bcm_sf2: setup BCM4908 internal crossbarRafał Miłecki3-0/+53
On some SoCs (e.g. BCM4908, BCM631[345]8) SF2 has an integrated crossbar. It allows connecting its selected external ports to internal ports. It's used by vendors to handle custom Ethernet setups. BCM4908 has following 3x2 crossbar. On Asus GT-AC5300 rgmii is used for connecting external BCM53134S switch. GPHY4 is usually used for WAN port. More fancy devices use SerDes for 2.5 Gbps Ethernet. ┌──────────┐ SerDes ─── 0 ─┤ │ │ 3x2 ├─ 0 ─── switch port 7 GPHY4 ─── 1 ─┤ │ │ crossbar ├─ 1 ─── runner (accelerator) rgmii ─── 2 ─┤ │ └──────────┘ Use setup data based on DT info to configure BCM4908's switch port 7. Right now only GPHY and rgmii variants are supported. Handling SerDes can be implemented later. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13net: dsa: bcm_sf2: store PHY interface/mode in port structureRafał Miłecki2-4/+13
It's needed later for proper switch / crossbar setup. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13net: ipv4: route.c: Fix indentation of multi line comment.Shubhankar Kuranagatti1-48/+49
All comment lines inside the comment block have been aligned. Every line of comment starts with a * (uniformity in code). Signed-off-by: Shubhankar Kuranagatti <shubhankarvk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13net: broadcom: bcm4908_enet: support TX interruptRafał Miłecki1-35/+103
It appears that each DMA channel has its own interrupt and both rings can be configured (the same way) to handle interrupts. 1. Make ring interrupts code generic (make it operate on given ring) 2. Move napi to ring (so each has its own) 3. Make IRQ handler generic (match ring against received IRQ number) 4. Add (optional) support for TX interrupt Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13dt-bindings: net: bcm4908-enet: add optional TX interruptRafał Miłecki1-4/+13
I discovered that hardware actually supports two interrupts, one per DMA channel (RX and TX). Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13Merge branch 'macb-fixed-link-fixes'David S. Miller2-0/+44
Robert Hancock says: ==================== macb SGMII fixed-link fixes Some fixes to the macb driver for use in SGMII mode with a fixed-link (such as for chip-to-chip connectivity). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13net: macb: Disable PCS auto-negotiation for SGMII fixed-link modeRobert Hancock2-0/+30
When using a fixed-link configuration in SGMII mode, it's not really sensible to have auto-negotiation enabled since the link settings are fixed by definition. In other configurations, such as an SGMII connection to a PHY, it should generally be enabled. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13net: macb: poll for fixed link state in SGMII modeRobert Hancock1-0/+14
When using a fixed-link configuration with GEM in SGMII mode, such as for a chip-to-chip interconnect, the link state was always showing as established regardless of the actual connectivity state. We can monitor the pcs_link_state bit in the Network Status register to determine whether the PCS link state is actually up. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13Merge tag 'mlx5-updates-2021-03-12' of ↵David S. Miller12-55/+107
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-03-12 1) TC support for ICMP parameters 2) TC connection tracking with mirroring 3) A round of trivial fixups and cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13net/mlx5e: Allow to match on ICMP parametersMaor Dickman2-0/+49
Support matching on ICMPv4/6 type and code parameters using misc3 section of match parameters. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: CT: Add support for mirroringPaul Blakey2-9/+14
Add support for mirroring before the CT action by spliting the pre ct rule. Mirror outputs are done first on the tc chain,prio table rule (the fwd rule), which will then forward to a per port fwd table. On this fwd table, we insert the original pre ct rule that forwards to ct/ct nat table. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: Display the command index in command mailbox dumpAlaa Hleihel1-14/+18
Multiple commands can be printed at the same time which can lead to wrong order of their lines in dmesg output. As a result, it's hard to match data dumps to the correct command or which command was fully dumped at some point. Fix this by displaying the corresponding command index, and also indicate when a command was fully dumped. Signed-off-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5e: allocate 'indirection_rqt' buffer dynamicallyArnd Bergmann1-3/+13
Increasing the size of the indirection_rqt array from 128 to 256 bytes pushed the stack usage of the mlx5e_hairpin_fill_rqt_rqns() function over the warning limit when building with clang and CONFIG_KASAN: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:970:1: error: stack frame size of 1180 bytes in function 'mlx5e_tc_add_nic_flow' [-Werror,-Wframe-larger-than=] Using dynamic allocation here is safe because the caller does the same, and it reduces the stack usage of the function to just a few bytes. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5e: Dump ICOSQ WQE descriptor on CQE with error eventsTariq Toukan1-0/+1
Dump the ICOSQ's WQE descriptor when a completion with error is received. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapathMaxim Mikityanskiy1-1/+1
Commit e20f0dbf204f ("net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES") switched to using net_prefetchw at all places in mlx5e. In the same time frame, commit 5af75c747e2a ("net/mlx5e: Enhanced TX MPWQE for SKBs") added one more usage of prefetchw. When these two changes were merged, this new occurrence of prefetchw wasn't replaced with net_prefetchw. This commit fixes this last occurrence of prefetchw in mlx5e_tx_mpwqe_session_start, making the same change that was done in mlx5e_xdp_mpwqe_session_start. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5e: Remove redundant newline in NL_SET_ERR_MSG_MODRoi Dayan1-2/+2
Fix the following coccicheck warnings: drivers/net/ethernet/mellanox/mlx5/core/devlink.c:145:29-66: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD drivers/net/ethernet/mellanox/mlx5/core/devlink.c:140:29-77: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: Read congestion counters from all ports when lag is activeMark Zhang1-1/+1
Read congestion counters from all ports in any lag mode rather than only in RoCE lag mode (e.g., VF lag). Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: remove unneeded semicolonJiapeng Chong1-1/+1
Fix the following coccicheck warnings: ./drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c:495:2-3: Unneeded semicolon. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: use kvfree() for memory allocated with kvzalloc()Junlin Yang1-5/+5
It is allocated with kvzalloc(), the corresponding release function should not be kfree(), use kvfree() instead. Generated by: scripts/coccinelle/api/kfree_mismatch.cocci Signed-off-by: Junlin Yang <yangjunlin@yulong.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: DR, Add missing vhca_id consume from STEv1Yevgeny Kliteynik1-0/+1
The field source_eswitch_owner_vhca_id was not consumed in the same way as in STEv0. Added the missing set. Fixes: 10b694186410 ("net/mlx5: DR, Add HW STEv1 match logic") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: DR, Remove unneeded rx_decap_l3 function for STEv1Yevgeny Kliteynik1-18/+0
Remove the dr_ste_v1_set_rx_decap_l3 function that was replaced by another function - fixing a rebase error. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-13net/mlx5: DR, Fixed typo in STE v0Yevgeny Kliteynik1-1/+1
"reforamt" -> "reformat" Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12docs: networking: phy: Improve placement of parenthesisJonathan Neuschäfer1-2/+2
"either" is outside the parentheses, so the matching "or" should be too. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12Merge branch 'tcp-delayed-completions'David S. Miller3-19/+13
Eric Dumazet says: ==================== tcp: better deal with delayed TX completions Jakub and Neil reported an increase of RTO timers whenever TX completions are delayed a bit more (by increasing NIC TX coalescing parameters) While problems have been there forever, second patch might introduce some regressions so I prefer not backport them to stable releases before things settle. Many thanks to FB team for their help and tests. Few packetdrill tests need to be changed to reflect the improvements brought by this series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12tcp: remove obsolete check in __tcp_retransmit_skb()Eric Dumazet1-8/+0
TSQ provides a nice way to avoid bufferbloat on individual socket, including retransmit packets. We can get rid of the old heuristic: /* Do not sent more than we queued. 1/4 is reserved for possible * copying overhead: fragmentation, tunneling, mangling etc. */ if (refcount_read(&sk->sk_wmem_alloc) > min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf)) return -EAGAIN; This heuristic was giving false positives according to Jakub, whenever TX completions are delayed above RTT. (Ack packets are processed by TCP stack before clones are orphaned/freed) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()Eric Dumazet1-6/+4
Jakub reported Data included in a Fastopen SYN that had to be retransmit would have to wait for an RTO if TX completions are slow, even with prior fix. This is because tcp_rcv_fastopen_synack() does not use standard rtx logic, meaning TSQ handler exits early in tcp_tsq_write() because tp->lost_out == tp->retrans_out Lets make tcp_rcv_fastopen_synack() use standard rtx logic, by using tcp_mark_skb_lost() on the skb thats needs to be sent again. Not this raised a warning in tcp_fastretrans_alert() during my tests since we consider the data not being aknowledged by the receiver does not mean packet was lost on the network. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12tcp: plug skb_still_in_host_queue() to TSQEric Dumazet2-5/+9
Jakub and Neil reported an increase of RTO timers whenever TX completions are delayed a bit more (by increasing NIC TX coalescing parameters) Main issue is that TCP stack has a logic preventing a packet being retransmit if the prior clone has not yet been orphaned or freed. This logic came with commit 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues") Thankfully, in the case skb_still_in_host_queue() detects the initial clone is still in flight, it can use TSQ logic that will eventually retry later, at the moment the clone is freed or orphaned. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Neil Spring <ntspring@fb.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12isdn: remove extra spaces in the header fileTong Zhang1-7/+7
fix some coding style issues in the isdn header Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12tipc: clean up warnings detected by sparseHoang Huu Le3-22/+58
This patch fixes the following warning from sparse: net/tipc/monitor.c:263:35: warning: incorrect type in assignment (different base types) net/tipc/monitor.c:263:35: expected unsigned int net/tipc/monitor.c:263:35: got restricted __be32 [usertype] [...] net/tipc/node.c:374:13: warning: context imbalance in 'tipc_node_read_lock' - wrong count at exit net/tipc/node.c:379:13: warning: context imbalance in 'tipc_node_read_unlock' - unexpected unlock net/tipc/node.c:384:13: warning: context imbalance in 'tipc_node_write_lock' - wrong count at exit net/tipc/node.c:389:13: warning: context imbalance in 'tipc_node_write_unlock_fast' - unexpected unlock net/tipc/node.c:404:17: warning: context imbalance in 'tipc_node_write_unlock' - unexpected unlock [...] net/tipc/crypto.c:1201:9: warning: incorrect type in initializer (different address spaces) net/tipc/crypto.c:1201:9: expected struct tipc_aead [noderef] __rcu *__tmp net/tipc/crypto.c:1201:9: got struct tipc_aead * [...] Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12tipc: convert dest node's address to network orderHoang Le1-1/+1
(struct tipc_link_info)->dest is in network order (__be32), so we must convert the value to network order before assigning. The problem detected by sparse: net/tipc/netlink_compat.c:699:24: warning: incorrect type in assignment (different base types) net/tipc/netlink_compat.c:699:24: expected restricted __be32 [usertype] dest net/tipc/netlink_compat.c:699:24: got int Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12Merge branch 'mlxsw-Implement-sampling-using-mirroring'David S. Miller8-22/+181
Ido Schimmel says: ==================== mlxsw: Implement sampling using mirroring So far, sampling was implemented using a dedicated sampling mechanism that is available on all Spectrum ASICs. Spectrum-2 and later ASICs support sampling by mirroring packets to the CPU port with probability. This method has a couple of advantages compared to the legacy method: * Extra metadata per-packet: Egress port, egress traffic class, traffic class occupancy and end-to-end latency * Ability to sample packets on egress / per-flow as opposed to only ingress This series should not result in any user-visible changes and its aim is to convert Spectrum-2 and later ASICs to perform sampling by mirroring to the CPU port with probability. Future submissions will expose the additional metadata and enable sampling using more triggers (e.g., egress). Series overview: Patches #1-#3 extend the SPAN (mirroring) module to accept new parameters required for sampling. See individual commit messages for detailed explanation. Patch #4-#5 split sampling support between Spectrum-1 and later ASIC while still using the legacy method for all ASIC generations. Patch #6 converts Spectrum-2 and later ASICs to perform sampling by mirroring to the CPU port with probability. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>