summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-08-21ice: improve print for VF's when adding/deleting MAC filtersBrett Creeley1-2/+2
When we fail to add/delete MAC filters in the VF, the print doesn't distinguish between the two. Fix that by printing whether or not we failed to add/delete the MAC filter respectively. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-21ice: Change type for queue countsPawel Kaminski1-12/+13
These queue variables are being assigned values that are type u16. Change the local variables to match these types. Since these represent queue counts, they should never be negative. Signed-off-by: Pawel Kaminski <pawel.kaminski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-21ice: Move VF resources definition to SR-IOV specific fileAkeem G Abodunrin2-10/+13
In order to use some of the VF resources definition in the SR-IOV specific virtchnl header file, this patch moves applicable code to ice_virtchnl_pf.h file accordingly... and they should have been defined in the destination file originally. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-21ice: Increase size of Mailbox receive queue for many VFsBrett Creeley2-3/+4
Currently we use the ICE_MBXQ_LEN for both the Mailbox send and receive queues that are used to communicate with VFs. This is fine for the send queue because the PF driver will lock the queue for every single send, but for the Mailbox receive queue every VF is posting to its Mailbox send queue and the hardware is then handing the message to the PF on its Mailbox receive queue. This becomes a problem with many VFs because it seems to overburden the Mailbox receive queue on the PF. Fix this by increasing the Mailbox receive queue for the PF to 512 entries. The number 512 was determined based on the number of VFs supported by the device. We can have a total of 256 VFs so in the worst case this allows the VFs to put 2 messages in the PFs Mailbox receive queue at the same time. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-21ice: Reduce wait times during VF bringup/resetBrett Creeley2-11/+19
Currently there are a couple places where the VF is waiting too long when checking the status of registers. This is causing the AVF driver to spin for longer than necessary in the __IAVF_STARTUP state. Sometimes it causes the AVF to go into the __IAVF_COMM_FAILED, which may retrigger the __IAVF_STARTUP state. Try to reduce the chance of this happening by removing unnecessary wait times in VF bringup/resets. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-21ice: update GLINT_DYN_CTL and GLINT_VECT2FUNC register accessPaul Greenwalt2-11/+16
Register access for GLINT_DYN_CTL and GLINT_VECT2FUNC should be within the PF space and not the absolute device space. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-21ice: Do not always bring up PF VSI in ice_ena_vsi()Tony Nguyen1-2/+0
During rebuild ice_ena_vsi() is called to recover the VSI state. This function assumes the PF VSI is always to be enabled, however, it's possible that during reset/rebuild the interface can be brought down. If this occurs, we can attempt to bring up the PF VSI on a downed interface which can lead to various crashes. If the interface is not running, do not bring up the associated VSI. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-21ice: allow empty Rx descriptorsMitch Williams1-3/+12
In some circumstances, the hardware will hand us a receive descriptor which has no data attached, but is otherwise valid. The receive code was improperly ignoring these descriptors, which result in an infinite loop. To fix this, change the receive code to process all descriptors, regardless of the size of the associated data. Add checks to the memory-handling functions to allow for zero size. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-21ice: Fix kernel hang with DCB reset in CEE modeUsha Ketineni1-61/+88
This patch fixes the set local MIB AQ call failures in the DCB rebuild path by setting the defaults for the ETS recommended DCB configuration. Also, willing bits for the DCB configuration needs to be set correctly. Resets works fine in IEEE mode as the ETS recommended DCB configuration is populated but not in CEE mode. Without this patch, PFR causes the kernel hang in CEE mode. Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-21ice: Set WB_ON_ITR when we don't re-enable interruptsBrett Creeley3-0/+70
Currently when busy polling is enabled we aren't setting/enabling WB_ON_ITR in the driver. This doesn't break the driver, but it does cause issues. If we don't enable WB_ON_ITR mode we will still get write-backs from hardware during polling when a cache line has been filled, but if a cache line is not filled we will not get the write-back because WB_ON_ITR is not set. Fix this by enabling WB_ON_ITR in the driver when interrupts are disabled. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20ice: fix set pause param autoneg checkPaul Greenwalt1-1/+26
When ETHTOOL_GLINKSETTINGS is defined get pause param pause->autoneg reports SW configured setting, however when not defined get pause param pause->autoneg reports the link status. Set pause param needs to compare pause->autoneg with the same source as get pause param to block the user from changing autoneg with the set pause param option, or the user may be incorrectly blocked from changing Rx|Tx pause settings. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20ice: Restructure VFs initialization flowsAkeem G Abodunrin2-22/+48
This patch restructures how VFs are configured, and resources allocated. Instead of freeing resources that were never allocated, and resetting empty VFs that have never been created - the new flow will just allocate resources for number of requested VFs based on the availability. During VFs initialization process, global interrupt is disabled, and rearmed after getting MSIX vectors for VFs. This allows immediate mailbox communications, instead of delaying it till later and VFs. PF communications resulted to using polling instead of actual interrupt. The issue manifested when creating higher number of VFs (128 VFs) per PF. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20ice: Assume that more than one Rx queue is rare in ice_napi_pollBrett Creeley1-5/+10
Currently we divide budget by the number of Rx queues per Rx ring container in ice_napi_poll even if there is only 1. This is an unnecessary divide for the normal case of 1 Rx ring per Rx ring container. Fix this by using an unlikely() call in the case where we actually need to divide. Also, we will always set budget_per_ring even if there are no Rx rings in the Rx ring container so we don't need to initialize it to 0. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20ice: Use the software based tail when checking for hung Tx ringBrett Creeley1-3/+3
Currently in ice_get_tx_pending we try to read a Tx ring's tail. This is then compared with the software based head (next_to_clean) to determine if we have pending work. This will never work because reading of the Tx ring's tail is no longer supported. Fix this by using the software based tail (next_to_use) to determine if there is pending work. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20Merge tag 'wireless-drivers-next-for-davem-2019-08-19' of ↵David S. Miller70-2907/+8606
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 5.4 First set of patches for 5.4. Major changes: brcmfmac * enable 160 MHz channel support rt2x00 * add support for PLANEX GW-USMicroN USB device rtw88 * add Bluetooth coexistance support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20Merge branch 'sctp-support-per-endpoint-auth-and-asconf-flags'David S. Miller10-136/+325
Xin Long says: ==================== sctp: support per endpoint auth and asconf flags This patchset mostly does 3 things: 1. add per endpint asconf flag and use asconf flag properly and add SCTP_ASCONF_SUPPORTED sockopt. 2. use auth flag properly and add SCTP_AUTH_SUPPORTED sockopt. 3. remove the 'global feature switch' to discard chunks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20sctp: remove net sctp.x_enable working as a global switchXin Long1-16/+12
The netns sctp feature flags shouldn't work as a global switch, which is mostly like a firewall/netfilter's job. Also, it will break asoc as it discard or accept chunks incorrectly when net sctp.x_enable is changed after the asoc is created. Since each type of chunk's processing function will check the corresp asoc's feature flag, this 'global switch' should be removed, and net sctp.x_enable will only work as the default feature flags for the future sctp sockets/endpoints. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20sctp: add SCTP_AUTH_SUPPORTED sockoptXin Long2-0/+87
SCTP_AUTH_SUPPORTED sockopt is used to set enpoint's auth flag. With this feature, each endpoint will have its own flag for its future asoc's auth_capable, instead of netns auth flag. Note that when both ep's auth_enable is enabled, endpoint auth related data should be initialized. If asconf_enable is also set, SCTP_CID_ASCONF/SCTP_CID_ASCONF_ACK should be added into auth_chunk_list. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20sctp: add sctp_auth_init and sctp_auth_freeXin Long3-56/+76
This patch is to factor out sctp_auth_init and sctp_auth_free functions, and sctp_auth_init will also be used in the next patch for SCTP_AUTH_SUPPORTED sockopt. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20sctp: use ep and asoc auth_enable properlyXin Long2-33/+44
sctp has per endpoint auth flag and per asoc auth flag, and the asoc one should be checked when coming to asoc and the endpoint one should be checked when coming to endpoint. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20sctp: add SCTP_ASCONF_SUPPORTED sockoptXin Long2-0/+83
SCTP_ASCONF_SUPPORTED sockopt is used to set enpoint's asconf flag. With this feature, each endpoint will have its own flag for its future asoc's asconf_capable, instead of netns asconf flag. Note that when both ep's asconf_enable and auth_enable are enabled, SCTP_CID_ASCONF and SCTP_CID_ASCONF_ACK should be added into auth_chunk_list. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20sctp: check asoc peer.asconf_capable before processing asconfXin Long1-2/+4
asconf chunks should be dropped when the asoc doesn't support asconf feature. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20sctp: not set peer.asconf_capable in sctp_association_initXin Long1-9/+0
asoc->peer.asconf_capable is to be set during handshake, and its value should be initialized to 0. net->sctp.addip_noauth will be checked in sctp_process_init when processing INIT_ACK on client and COOKIE_ECHO on server. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20sctp: add asconf_enable in struct sctp_endpointXin Long4-20/+19
This patch is to make addip/asconf flag per endpoint, and its value is initialized by the per netns flag, net->sctp.addip_enable. It also replaces the checks of net->sctp.addip_enable with ep->asconf_enable in some places. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20net: remove empty inet_exit_netLi RongQing1-5/+0
Pointer members of an object with static storage duration, if not explicitly initialized, will be initialized to a NULL pointer. The net namespace API checks if this pointer is not NULL before using it, it are safe to remove the function. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20Merge branch 'ns-plugin-fixes'David S. Miller6-294/+295
Vlad Buslov says: ==================== Fix problems with using ns plugin Recent changes to plugin architecture broke some of the tests when running tdc without specifying a test group. Fix tests incompatible with ns plugin and modify tests to not reuse interface name of ns veth interface for dummy interface. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20tc-testing: concurrency: wrap piped rule update commandsVlad Buslov1-9/+9
Concurrent tests use several commands to update rules in parallel: 'find' prints names of batch files in tmp directory and pipes result to 'xargs' which runs instance of tc per batch file in parallel. This breaks when used with ns plugin that adds 'ip netns exec $NS' prefix to the command, which causes only first command in pipe to be executed in namespace: =====> Test e41d: Add 1M flower filters with 10 parallel tc instances -----> prepare stage ns/SubPlugin.adjust_command adjust_command: stage is setup; inserting netns stuff in command [/bin/mkdir tmp] list [['/bin/mkdir', 'tmp']] adjust_command: return command [ip netns exec tcut /bin/mkdir tmp] command "ip netns exec tcut /bin/mkdir tmp" ns/SubPlugin.adjust_command adjust_command: stage is setup; inserting netns stuff in command [/sbin/tc qdisc add dev ens1f0 ingress] list [['/sbin/tc', 'qdisc', 'add', 'dev', 'ens1f0', 'ingress']] adjust_command: return command [ip netns exec tcut /sbin/tc qdisc add dev ens1f0 ingress] command "ip netns exec tcut /sbin/tc qdisc add dev ens1f0 ingress" ns/SubPlugin.adjust_command adjust_command: stage is setup; inserting netns stuff in command [./tdc_multibatch.py ens1f0 tmp 100000 10 add] list [['./tdc_multibatch.py', 'ens1f0', 'tmp', '100000', '10', 'add']] adjust_command: return command [ip netns exec tcut ./tdc_multibatch.py ens1f0 tmp 100000 10 add] command "ip netns exec tcut ./tdc_multibatch.py ens1f0 tmp 100000 10 add" -----> execute stage ns/SubPlugin.adjust_command adjust_command: stage is execute; inserting netns stuff in command [find tmp/add* -print | xargs -n 1 -P 10 /sbin/tc -b] list [['find', 'tmp/add*', '-print', '|', 'xargs', '-n', '1', '-P', '10', '/sbin/tc', '-b'] ] adjust_command: return command [ip netns exec tcut find tmp/add* -print | xargs -n 1 -P 10 /sbin/tc -b] command "ip netns exec tcut find tmp/add* -print | xargs -n 1 -P 10 /sbin/tc -b" exit: 123 exit: 0 Cannot find device "ens1f0" Cannot find device "ens1f0" Command failed tmp/add_0:1 Command failed tmp/add_1:1 Cannot find device "ens1f0" Command failed tmp/add_2:1 Cannot find device "ens1f0" Command failed tmp/add_4:1 Cannot find device "ens1f0" Command failed tmp/add_3:1 Cannot find device "ens1f0" Command failed tmp/add_5:1 Cannot find device "ens1f0" Command failed tmp/add_6:1 Cannot find device "ens1f0" Command failed tmp/add_8:1 Cannot find device "ens1f0" Command failed tmp/add_7:1 Cannot find device "ens1f0" Command failed tmp/add_9:1 Fix the issue by executing whole compound command in namespace by wrapping it in 'bash -c' invocation. Fixes: 489ce2f42514 ("tc-testing: Restore original behaviour for namespaces in tdc") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20tc-testing: use dedicated DUMMY interface name for dummy devVlad Buslov5-285/+286
A lot of tests reuse $DEV1 veth name for naming dummy device. This causes problem when tdc is invoked without specifying a test group and tries to execute all tests. In this case tdc instantiates ns plugin, which creates veth pair once before running tests. However, if any of the tests that reuse $DEV1 run before test that depend on ns plugin, it will delete $DEV1 as a part of teardown section: =====> Test 3b88: Delete ingress qdisc twice [3770/41080] -----> prepare stage ns/SubPlugin.adjust_command adjust_command: stage is setup; inserting netns stuff in command [/sbin/ip link add dev v0p1 type dummy || /bin/true] list [['/sbin/ip', 'link', 'add', 'dev', 'v0p1', 'type', 'dummy', '||', '/bin/true']] adjust_command: return command [ip netns exec tcut /sbin/ip link add dev v0p1 type dummy || /bin/true] command "ip netns exec tcut /sbin/ip link add dev v0p1 type dummy || /bin/true" ns/SubPlugin.adjust_command adjust_command: stage is setup; inserting netns stuff in command [/sbin/tc qdisc add dev v0p1 ingress] list [['/sbin/tc', 'qdisc', 'add', 'dev', 'v0p1', 'ingress']] adjust_command: return command [ip netns exec tcut /sbin/tc qdisc add dev v0p1 ingress] command "ip netns exec tcut /sbin/tc qdisc add dev v0p1 ingress" ns/SubPlugin.adjust_command adjust_command: stage is setup; inserting netns stuff in command [/sbin/tc qdisc del dev v0p1 ingress] list [['/sbin/tc', 'qdisc', 'del', 'dev', 'v0p1', 'ingress']] adjust_command: return command [ip netns exec tcut /sbin/tc qdisc del dev v0p1 ingress] command "ip netns exec tcut /sbin/tc qdisc del dev v0p1 ingress" -----> execute stage ns/SubPlugin.adjust_command adjust_command: stage is execute; inserting netns stuff in command [/sbin/tc qdisc del dev v0p1 ingress] list [['/sbin/tc', 'qdisc', 'del', 'dev', 'v0p1', 'ingress']] adjust_command: return command [ip netns exec tcut /sbin/tc qdisc del dev v0p1 ingress] command "ip netns exec tcut /sbin/tc qdisc del dev v0p1 ingress" -----> verify stage ns/SubPlugin.adjust_command adjust_command: stage is verify; inserting netns stuff in command [/sbin/tc qdisc show dev v0p1] list [['/sbin/tc', 'qdisc', 'show', 'dev', 'v0p1']] adjust_command: return command [ip netns exec tcut /sbin/tc qdisc show dev v0p1] command "ip netns exec tcut /sbin/tc qdisc show dev v0p1" -----> teardown stage ns/SubPlugin.adjust_command adjust_command: stage is teardown; inserting netns stuff in command [/sbin/ip link del dev v0p1 type dummy] list [['/sbin/ip', 'link', 'del', 'dev', 'v0p1', 'type', 'dummy']] adjust_command: return command [ip netns exec tcut /sbin/ip link del dev v0p1 type dummy] command "ip netns exec tcut /sbin/ip link del dev v0p1 type dummy" After this ns-dependent tests will fail because dev doesn't exist: =====> Test 901f: Add fw filter with prio at 32-bit maxixum -----> prepare stage ns/SubPlugin.adjust_command adjust_command: stage is setup; inserting netns stuff in command [/sbin/tc qdisc add dev v0p1 ingress] list [['/sbin/tc', 'qdisc', 'add', 'dev', 'v0p1', 'ingress']] adjust_command: return command [ip netns exec tcut /sbin/tc qdisc add dev v0p1 ingress] command "ip netns exec tcut /sbin/tc qdisc add dev v0p1 ingress" -----> prepare stage *** Could not execute: "$TC qdisc add dev $DEV1 ingress" -----> prepare stage *** Error message: "Cannot find device "v0p1" " returncode 1; expected [0] -----> prepare stage *** Aborting test run. <_io.BufferedReader name=3> *** stdout *** <_io.BufferedReader name=5> *** stderr *** "-----> prepare stage" did not complete successfully Exception <class '__main__.PluginMgrTestFail'> ('setup', None, '"-----> prepare stage" did not complete successfully') (caught in test_runner, running test 477 901f Add fw filter with prio at 32-bit maxixum stage setup) --------------- traceback File "./tdc.py", line 371, in test_runner res = run_one_test(pm, args, index, tidx) File "./tdc.py", line 272, in run_one_test prepare_env(args, pm, 'setup', "-----> prepare stage", tidx["setup"]) File "./tdc.py", line 247, in prepare_env '"{}" did not complete successfully'.format(prefix)) --------------- Fix the issue by introducing standalone $DUMMY config variable and substitute all usage of $DEV1 in tests that don't depend on ns plugin. Fixes: 489ce2f42514 ("tc-testing: Restore original behaviour for namespaces in tdc") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20r8152: fix accessing skb after napi_gro_receiveHayes Wang1-1/+1
Fix accessing skb after napi_gro_receive which is caused by commit 47922fcde536 ("r8152: support skb_add_rx_frag"). Fixes: 47922fcde536 ("r8152: support skb_add_rx_frag") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19Merge branch 'RTL8125-EEE'David S. Miller2-2/+53
Heiner Kallweit says: ==================== net: phy: realtek: support NBase-T MMD EEE registers on RTL8125 Add missing EEE-related constants, including the new MMD EEE registers for NBase-T / 802.3bz. Based on that emulate the new 802.3bz MMD EEE registers for 2.5Gbps EEE on RTL8125. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: phy: realtek: support NBase-T MMD EEE registers on RTL8125Heiner Kallweit1-2/+43
Emulate the 802.3bz MMD EEE registers for 2.5Gbps EEE on RTL8125. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: phy: add EEE-related constantsHeiner Kallweit1-0/+10
Add EEE-related constants. This includes the new MMD EEE registers for NBase-T / 802.3bz. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: flow_offload: convert block_ing_cb_list to regular list typeVlad Buslov1-6/+7
RCU list block_ing_cb_list is protected by rcu read lock in flow_block_ing_cmd() and with flow_indr_block_ing_cb_lock mutex in all functions that use it. However, flow_block_ing_cmd() needs to call blocking functions while iterating block_ing_cb_list which leads to following suspicious RCU usage warning: [ 401.510948] ============================= [ 401.510952] WARNING: suspicious RCU usage [ 401.510993] 5.3.0-rc3+ #589 Not tainted [ 401.510996] ----------------------------- [ 401.511001] include/linux/rcupdate.h:265 Illegal context switch in RCU read-side critical section! [ 401.511004] other info that might help us debug this: [ 401.511008] rcu_scheduler_active = 2, debug_locks = 1 [ 401.511012] 7 locks held by test-ecmp-add-v/7576: [ 401.511015] #0: 00000000081d71a5 (sb_writers#4){.+.+}, at: vfs_write+0x166/0x1d0 [ 401.511037] #1: 000000002bd338c3 (&of->mutex){+.+.}, at: kernfs_fop_write+0xef/0x1b0 [ 401.511051] #2: 00000000c921c634 (kn->count#317){.+.+}, at: kernfs_fop_write+0xf7/0x1b0 [ 401.511062] #3: 00000000a19cdd56 (&dev->mutex){....}, at: sriov_numvfs_store+0x6b/0x130 [ 401.511079] #4: 000000005425fa52 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x30/0x140 [ 401.511092] #5: 00000000c5822793 (rtnl_mutex){+.+.}, at: unregister_netdevice_notifier+0x35/0x140 [ 401.511101] #6: 00000000c2f3507e (rcu_read_lock){....}, at: flow_block_ing_cmd+0x5/0x130 [ 401.511115] stack backtrace: [ 401.511121] CPU: 21 PID: 7576 Comm: test-ecmp-add-v Not tainted 5.3.0-rc3+ #589 [ 401.511124] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 401.511127] Call Trace: [ 401.511138] dump_stack+0x85/0xc0 [ 401.511146] ___might_sleep+0x100/0x180 [ 401.511154] __mutex_lock+0x5b/0x960 [ 401.511162] ? find_held_lock+0x2b/0x80 [ 401.511173] ? __tcf_get_next_chain+0x1d/0xb0 [ 401.511179] ? mark_held_locks+0x49/0x70 [ 401.511194] ? __tcf_get_next_chain+0x1d/0xb0 [ 401.511198] __tcf_get_next_chain+0x1d/0xb0 [ 401.511251] ? uplink_rep_async_event+0x70/0x70 [mlx5_core] [ 401.511261] tcf_block_playback_offloads+0x39/0x160 [ 401.511276] tcf_block_setup+0x1b0/0x240 [ 401.511312] ? mlx5e_rep_indr_setup_tc_cb+0xca/0x290 [mlx5_core] [ 401.511347] ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core] [ 401.511359] tc_indr_block_get_and_ing_cmd+0x11b/0x1e0 [ 401.511404] ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core] [ 401.511414] flow_block_ing_cmd+0x7e/0x130 [ 401.511453] ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core] [ 401.511462] __flow_indr_block_cb_unregister+0x7f/0xf0 [ 401.511502] mlx5e_nic_rep_netdevice_event+0x75/0xb0 [mlx5_core] [ 401.511513] unregister_netdevice_notifier+0xe9/0x140 [ 401.511554] mlx5e_cleanup_rep_tx+0x6f/0xe0 [mlx5_core] [ 401.511597] mlx5e_detach_netdev+0x4b/0x60 [mlx5_core] [ 401.511637] mlx5e_vport_rep_unload+0x71/0xc0 [mlx5_core] [ 401.511679] esw_offloads_disable+0x5b/0x90 [mlx5_core] [ 401.511724] mlx5_eswitch_disable.cold+0xdf/0x176 [mlx5_core] [ 401.511759] mlx5_device_disable_sriov+0xab/0xb0 [mlx5_core] [ 401.511794] mlx5_core_sriov_configure+0xaf/0xd0 [mlx5_core] [ 401.511805] sriov_numvfs_store+0xf8/0x130 [ 401.511817] kernfs_fop_write+0x122/0x1b0 [ 401.511826] vfs_write+0xdb/0x1d0 [ 401.511835] ksys_write+0x65/0xe0 [ 401.511847] do_syscall_64+0x5c/0xb0 [ 401.511857] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 401.511862] RIP: 0033:0x7fad892d30f8 [ 401.511868] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 96 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 60 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 [ 401.511871] RSP: 002b:00007ffca2a9fad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 401.511875] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fad892d30f8 [ 401.511878] RDX: 0000000000000002 RSI: 000055afeb072a90 RDI: 0000000000000001 [ 401.511881] RBP: 000055afeb072a90 R08: 00000000ffffffff R09: 000000000000000a [ 401.511884] R10: 000055afeb058710 R11: 0000000000000246 R12: 0000000000000002 [ 401.511887] R13: 00007fad893a8780 R14: 0000000000000002 R15: 00007fad893a3740 To fix the described incorrect RCU usage, convert block_ing_cb_list from RCU list to regular list and protect it with flow_indr_block_ing_cb_lock mutex in flow_block_ing_cmd(). Fixes: 1150ab0f1b33 ("flow_offload: support get multi-subsystem block") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller549-2903/+4711
Merge conflict of mlx5 resolved using instructions in merge commit 9566e650bf7fdf58384bb06df634f7531ca3a97e. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds125-688/+1156
Pull networking fixes from David Miller: 1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov. 2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan. 3) Fix severe performance regression due to lack of SKB coalescing of fragments during local delivery, from Guillaume Nault. 4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk. 5) Fix batched events in skbedit packet action, from Roman Mashak. 6) Propagate VLAN TX offload to hw_enc_features in bond and team drivers, from Yue Haibing. 7) RXRPC local endpoint refcounting fix and read after free in rxrpc_queue_local(), from David Howells. 8) Fix endian bug in ibmveth multicast list handling, from Thomas Falcon. 9) Oops, make nlmsg_parse() wrap around the correct function, __nlmsg_parse not __nla_parse(). Fix from David Ahern. 10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin. 11) Fix memory leak in cxgb4, from Wenwen Wang. 12) Yet another race in AF_PACKET, from Eric Dumazet. 13) Fix false detection of retransmit failures in tipc, from Tuong Lien. 14) Use after free in ravb_tstamp_skb, from Tho Vu. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits) ravb: Fix use-after-free ravb_tstamp_skb netfilter: nf_tables: map basechain priority to hardware priority net: sched: use major priority number as hardware priority wimax/i2400m: fix a memory leak bug net: cavium: fix driver name ibmvnic: Unmap DMA address of TX descriptor buffers after use bnxt_en: Fix to include flow direction in L2 key bnxt_en: Use correct src_fid to determine direction of the flow bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails bnxt_en: Improve RX doorbell sequence. bnxt_en: Fix VNIC clearing logic for 57500 chips. net: kalmia: fix memory leaks cx82310_eth: fix a memory leak bug bnx2x: Fix VF's VLAN reconfiguration in reload. Bluetooth: Add debug setting for changing minimum encryption key size tipc: fix false detection of retransmit failures lan78xx: Fix memory leaks MAINTAINERS: r8169: Update path to the driver MAINTAINERS: PHY LIBRARY: Update files in the record ...
2019-08-19keys: Fix description sizeDavid Howells1-4/+4
The maximum key description size is 4095. Commit f771fde82051 ("keys: Simplify key description management") inadvertantly reduced that to 255 and made sizes between 256 and 4095 work weirdly, and any size whereby size & 255 == 0 would cause an assertion in __key_link_begin() at the following line: BUG_ON(index_key->desc_len == 0); This can be fixed by simply increasing the size of desc_len in struct keyring_index_key to a u16. Note the argument length test in keyutils only checked empty descriptions and descriptions with a size around the limit (ie. 4095) and not for all the values in between, so it missed this. This has been addressed and https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/commit/?id=066bf56807c26cd3045a25f355b34c1d8a20a5aa now exhaustively tests all possible lengths of type, description and payload and then some. The assertion failure looks something like: kernel BUG at security/keys/keyring.c:1245! ... RIP: 0010:__key_link_begin+0x88/0xa0 ... Call Trace: key_create_or_update+0x211/0x4b0 __x64_sys_add_key+0x101/0x200 do_syscall_64+0x5b/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 It can be triggered by: keyctl add user "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" a @s Fixes: f771fde82051 ("keys: Simplify key description management") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-19Linux 5.3-rc5v5.3-rc5Linus Torvalds1-1/+1
2019-08-19net: hns: add phy_attached_info() to the hns driverYonglong Liu1-0/+2
This patch adds the call to phy_attached_info() to the hns driver to identify which exact PHY drivers is in use. Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19ravb: Fix use-after-free ravb_tstamp_skbTho Vu1-2/+6
When a Tx timestamp is requested, a pointer to the skb is stored in the ravb_tstamp_skb struct. This was done without an skb_get. There exists the possibility that the skb could be freed by ravb_tx_free (when ravb_tx_free is called from ravb_start_xmit) before the timestamp was processed, leading to a use-after-free bug. Use skb_get when filling a ravb_tstamp_skb struct, and add appropriate frees/consumes when a ravb_tstamp_skb struct is freed. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Tho Vu <tho.vu.wh@rvc.renesas.com> Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: ethernet: mediatek: Add MT7628/88 SoC supportStefan Roese4-112/+425
This patch adds support for the MediaTek MT7628/88 SoCs to the common MediaTek ethernet driver. Some minor changes are needed for this and a bigger change, as the MT7628 does not support QDMA (only PDMA). Signed-off-by: Stefan Roese <sr@denx.de> Cc: René van Dorst <opensource@vdorst.com> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Sean Wang <sean.wang@mediatek.com> Cc: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: ethernet: mediatek: Rename NEXT_RX_DESP_IDX to NEXT_DESP_IDXStefan Roese2-3/+3
Rename the NEXT_RX_DESP_IDX macro to NEXT_DESP_IDX, so that it better can be used for TX ops as well. This will be used in the upcoming MT7628/88 support (same functionality for RX and TX in this macro). Signed-off-by: Stefan Roese <sr@denx.de> Cc: René van Dorst <opensource@vdorst.com> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Sean Wang <sean.wang@mediatek.com> Cc: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: ethernet: mediatek: Rename MTK_QMTK_INT_STATUS to MTK_QDMA_INT_STATUSStefan Roese2-5/+5
Currently all QDMA registers are named "MTK_QDMA_foo" in this driver with one exception: MTK_QMTK_INT_STATUS. This patch renames MTK_QMTK_INT_STATUS to MTK_QDMA_INT_STATUS so that all macros follow this rule. Signed-off-by: Stefan Roese <sr@denx.de> Cc: René van Dorst <opensource@vdorst.com> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Sean Wang <sean.wang@mediatek.com> Cc: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19dt-bindings: net: mediatek: Add support for MediaTek MT7628/88 SoCStefan Roese1-0/+1
Add compatible for the ethernet IP core on MT7628/88 SoCs. Its compatible with the older Ralink Rt5350F SoC. And OpenWrt already uses this compatible string for the MT76x8. Signed-off-by: Stefan Roese <sr@denx.de> Cc: René van Dorst <opensource@vdorst.com> Cc: Daniel Golle <daniel@makrotopia.org> Cc: Sean Wang <sean.wang@mediatek.com> Cc: John Crispin <john@phrozen.org> Cc: devicetree@vger.kernel.org Cc: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19be2net: eliminate enable field from be_aic_objIvan Vecera3-7/+9
Adaptive coalescing is managed per adapter not per event queue so it does not needed to store 'enable' flag for each event queue. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19Merge branch 'flow_offload-hardware-priority-fixes'David S. Miller9-17/+28
Pablo Neira Ayuso says: ==================== flow_offload hardware priority fixes This patchset contains two updates for the flow_offload users: 1) Pass the major tc priority to drivers so they do not have to lshift it. This is a preparation patch for the fix coming in patch #2. 2) Set the hardware priority from the netfilter basechain priority, some drivers break when using the existing hardware priority number that is set to zero. v5: fix patch 2/2 to address a clang warning and to simplify the priority mapping. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19netfilter: nf_tables: map basechain priority to hardware priorityPablo Neira Ayuso3-3/+20
This patch adds initial support for offloading basechains using the priority range from 1 to 65535. This is restricting the netfilter priority range to 16-bit integer since this is what most drivers assume so far from tc. It should be possible to extend this range of supported priorities later on once drivers are updated to support for 32-bit integer priorities. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: sched: use major priority number as hardware priorityPablo Neira Ayuso6-14/+8
tc transparently maps the software priority number to hardware. Update it to pass the major priority which is what most drivers expect. Update drivers too so they do not need to lshift the priority field of the flow_cls_common_offload object. The stmmac driver is an exception, since this code assumes the tc software priority is fine, therefore, lshift it just to be conservative. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19wimax/i2400m: fix a memory leak bugWenwen Wang1-1/+3
In i2400m_barker_db_init(), 'options_orig' is allocated through kstrdup() to hold the original command line options. Then, the options are parsed. However, if an error occurs during the parsing process, 'options_orig' is not deallocated, leading to a memory leak bug. To fix this issue, free 'options_orig' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19net: cavium: fix driver nameStephen Hemminger1-1/+1
The driver name gets exposed in sysfs under /sys/bus/pci/drivers so it should look like other devices. Change it to be common format (instead of "Cavium PTP"). This is a trivial fix that was observed by accident because Debian kernels were building this driver into kernel (bug). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19tipc: clean up skb list lock handling on send pathJon Maloy6-25/+26
The policy for handling the skb list locks on the send and receive paths is simple. - On the send path we never need to grab the lock on the 'xmitq' list when the destination is an exernal node. - On the receive path we always need to grab the lock on the 'inputq' list, irrespective of source node. However, when transmitting node local messages those will eventually end up on the receive path of a local socket, meaning that the argument 'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in the function tipc_sk_rcv(). This has been handled by always initializing the spinlock of the 'xmitq' list at message creation, just in case it may end up on the receive path later, and despite knowing that the lock in most cases never will be used. This approach is inaccurate and confusing, and has also concealed the fact that the stated 'no lock grabbing' policy for the send path is violated in some cases. We now clean up this by never initializing the lock at message creation, instead doing this at the moment we find that the message actually will enter the receive path. At the same time we fix the four locations where we incorrectly access the spinlock on the send/error path. This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock is initialised") which has now become redundant. CC: Eric Dumazet <edumazet@google.com> Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>