summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-02-14i40evf: Add support to configure bw via tc toolHarshitha Ramamurthy3-3/+67
This patch adds support to configure bandwidth for the traffic classes via tc tool. The required information is passed to the PF which is used in the process of setting up the traffic classes. Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14i40e: Delete queue channel for ADq on VFAvinash Dayanand1-0/+63
This patch takes care of freeing up all the VSIs, queues and other ADq related software and hardware resources, when a user requests for deletion of ADq on VF. Example command: tc qdisc del dev eth0 root Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14i40evf: Alloc queues for ADq on VFAvinash Dayanand2-0/+20
This patch allocates number of queues requested by the user as a part of TC command when ADq is enabled on a VF. In order to be consistent in design with PF implementation of ADq, don't allow to set channels via ethtool from VF when ADq is already enabled. This means the users will not be able to change the number of queues/channels via ethtool for a VF when ADq is ON. In order to be able to use set channels, users will be required to disable ADq first and then try setting the channels again. When ADq is enabled on VF, it goes through a reset during which VSIs and queues are re-configured. Meanwhile if we receive link status message from PF even before the queues are re-configured, just ignore this link up message. Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14i40e: Enable ADq and create queue channel/s on VFAvinash Dayanand3-69/+379
This patch enables ADq and creates queue channels on a VF. An ADq enabled VF can have up to 4 VSIs and each one of them represents a traffic class and this is termed as a queue channel. Each of these VSIs can have up to 4 queues. This patch services the request for enabling ADq and adds queue channel based on the TC mqprio info provided by the user in the VF. Initially a check is made to see if spoof check is OFF, if not ADq will not be enabled. PF notifies VF for a reset in order to complete the creation of ADq resources i.e. creation of additional VSIs and allocation of queues as per TC information, all in the reset path. Steps: ====== 1. Turn off the spoof check 2. Enable ADq using tc mqprio command with or without rate limit. 3. Pass traffic. Example: ======== % ip link set dev eth0 vf 0 spoofchk off % tc qdisc add dev $iface root mqprio num_tc 4 map\ 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 queues\ 4@0 4@4 4@8 4@8 hw 1 mode channel Expected results: ================= 1. Total number of queues for the VF should be sum of queues of all TCs. 2. Traffic flow should be normal without errors. Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14i40evf: add ndo_setup_tc callback to i40evfHarshitha Ramamurthy3-1/+257
This patch introduces the callback to the ndo_setup_tc function in the VF driver. We add a wrapper function to make room for the upcoming cloud filter patches which add calls to different functions from setup_tc. First, we add support for capability exchange for ADQ between the PF and VF. Next, we add support to take in the mqprio configuration and configure queues as per the traffic classes, rate limit and the priorities specified by the user. This is done by passing the channel config to the PF driver through a virtchannel message. The flags and bits added, track if ADq is enabled, set max number of traffic classes to 4 and provide ability to negotiate capability with the PF. Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14virtchnl: Add virtchl structures to support queue channelsHarshitha Ramamurthy1-0/+40
This patch defines new structs in support of the virtchannel message that the VF sends to the PF to create a queue channel specified by the user via tc tool. Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14i40evf: Fix link up issue when queues are disabledAvinash Dayanand3-8/+22
One of the previous patch fixes the link up issue by ignoring it if i40evf is not in __I40EVF_RUNNING state. However this doesn't fix the race condition when queues are disabled esp for ADq on VF. Hence check if all queues are enabled before starting all queues. Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-14Merge branch 'net-dev-Make-protocol-ptr-dependent-on-CONFIG'David S. Miller5-2/+12
David Ahern says: ==================== net: dev: Make protocol ptr dependent on CONFIG Found these in a branch from 3-years ago. Still relevant today. Make decnet, ax25, and atalk ptrs in net_device based on their respective CONFIG. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14net: Remove atalk header from socket.cDavid Ahern1-1/+0
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14net: Make atalk_ptr depend on ATALK or IRDADavid Ahern2-0/+4
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14net: Make ax25_ptr depend on CONFIG_AX25David Ahern2-0/+4
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14net: Make dn_ptr depend on CONFIG_DECNETDavid Ahern2-1/+4
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14Merge branch '40GbE' of ↵David S. Miller6-138/+119
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-02-13 This series contains updates to i40e and i40evf. Wei Yongjun fixes a function that needed to be "static". Also fixes the use of GFP_KERNEL to GFP_ATOMIC when we have taken a spinlock. Mitch cleans up several info messages to not include the memory addresses being used on the off chance this information could be used maliciously. Alan provides several fixes to the broadcast filters starting with the triggering of overflow promiscuous in circumstances where we run out of space for broadcast filters to prevent traffic from being unexpectedly dropped. Refactored the code to improve the readability and maintainability when we are concerned about when and how overflow promiscuous is changed. Harshitha cleans up a message to make it more clear on what is being reset, so users are not confused and think the PF is resetting. Dave fixes an issue where the MAC, firmware version and NPAR checks used to determine if shutting off the firmware LLDP engine is supported or not, instead set a hardware flag which ethtool can use. Jake updates the VF driver to use __dev_uc_sync and __dev_mc_sync, like the PF driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13selftests: Add FIB onlink testsDavid Ahern1-0/+375
Add test cases verifying FIB onlink commands work as expected in various conditions - IPv4, IPv6, main table, and VRF. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13i40evf: Make VF reset warning message more clearHarshitha Ramamurthy1-1/+1
When the PF resets the VF, the VF puts out a warning message indicating that the VF received a reset message from the PF. Make this message more clear so that we do not mistakenly think that the PF is undergoing a reset. Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13i40evf: use __dev_[um]c_sync routines in .set_rx_modeJacob Keller1-39/+56
Similar to changes done to the PF driver in commit 6622f5cdbaf3 ("i40e: make use of __dev_uc_sync and __dev_mc_sync"), replace our home-rolled method for updating the internal status of MAC filters with __dev_uc_sync and __dev_mc_sync. These new functions use internal state within the netdev struct in order to efficiently break the question of "which filters in this list need to be added or removed" into singular "add this filter" and "delete this filter" requests. This vastly improves our handling of .set_rx_mode especially with large number of MAC filters being added to the device, and even results in a simpler .set_rx_mode handler. Under some circumstances, such as when attached to a bridge, we may receive a request to delete our own permanent address. Prevent deletion of this address during i40evf_addr_unsync so that we don't accidentally stop receiving traffic. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13i40e: i40e: Change ethtool check from MAC to HW flagDave Ertman3-10/+13
The MAC, FW Version and NPAR check used to determine if shutting off the FW LLDP engine is supported is not using the usual feature check mechanism. This patch fixes the problem by moving the feature check to i40e_sw_init in order to set a flag in pf->hw_features that ethtool will use for priv_flags disable operation. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13i40e: do not force filter failure in overflow promiscuousAlan Brady1-14/+1
Broadcast filters can now cause overflow promiscuous to trigger when adding "too many" VLANs to all the ports of a device and the driver needs a way to exit overflow promiscuous once triggered. Currently the driver looks to see if there are "too many" filters and/or we have any failed filters to determine when it is safe to exit overflow promiscuous. If we trigger overflow promiscuous with broadcast filters, any new filters added will be "auto-failed" until we exit overflow promiscuous. Since the user can't manually remove the failed broadcast filters for VLANs (nor should we expect the user to do such), there is no way to exit overflow promiscuous without reloading the driver. The easiest way to do this is to remove the shortcut to "auto-fail" filters in overflow promiscuous. If the user removes the VLANs, the failed filters will be removed and since we're no longer "auto-failing" new filters, we'll eventually get a good set of filters and exit overflow promiscuous. This has the side benefit of making filter state more explicit in that if a filter says it's failed we know for a fact it failed and not just assuming it will if we're in overflow promiscuous. This is nice because if the user removes some filters and then adds some, even if we're in overflow promiscuous, the filter might succeed; we were just assuming it won't because the user hasn't rectified other existing failed filters. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13i40e: refactor promisc_changed in i40e_sync_vsi_filtersAlan Brady1-22/+20
This code here is quite complex and easy to screw up. Let's see if we can't improve the readability and maintainability a bit. This refactors out promisc_changed into two variables 'old_overflow' and 'new_overflow' which makes it a bit clearer when we're concerned about when and how overflow promiscuous is changed. This also makes so that we no longer need to pass a boolean pointer to i40e_aqc_add_filters. Instead we can simply check if we changed the overflow promiscuous flag since the function start. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13i40evf: Use an iterator of the same type as the listHarshitha Ramamurthy1-4/+7
When iterating through the linked list of VLAN filters, make the iterator the same type as that of the linked list. Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13i40e: broadcast filters can trigger overflow promiscuousAlan Brady1-2/+4
When adding a bunch of VLANs to all the ports on a device, it's possible to run out of space for broadcast filters. The driver should trigger overflow promiscuous in this circumstance to prevent traffic from being unexpectedly dropped. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13i40e: don't leak memory addressesMitch Williams2-41/+12
Could a Bad Person do Bad Things to a server if they found these addresses printed in the log? Who knows? But let's not take that risk. Remove pointers from a bunch of printks. In some cases, I was able to adjust the message to indicate whether or not the value was null. In others, I just removed the entire message as there was really no hope of saving it. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13i40evf: use GFP_ATOMIC under spin lockWei Yongjun1-4/+4
A spin lock is taken here so we should use GFP_ATOMIC. Fixes: 504398f0a78e ("i40evf: use spinlock to protect (mac|vlan)_filter_list") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13i40e: Make local function i40e_get_link_speed staticWei Yongjun1-1/+1
Fixes the following sparse warning: drivers/net/ethernet/intel/i40e/i40e_main.c:5440:5: warning: symbol 'i40e_get_link_speed' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-13Merge branch 'selftests-fib_tests-simplifications-verbosity-and-a-race'David S. Miller1-246/+237
David Ahern says: ==================== selftests: fib_tests: simplifications, verbosity and a race Improve efficiency of fib_tests.sh and make the test result more verbose, from this summary: $ fib_tests.sh is failing in a VM: $ fib_tests.sh Running netdev unregister tests PASS: unicast route test PASS: multipath route test Running netdev down tests PASS: unicast route test PASS: multipath route test Running netdev carrier change tests PASS: local route carrier test FAIL: unicast route carrier test where a single entry actually corresponds to many checks to a much more verbse output that clarifies test cases: $fib_tests.sh Single path route carrier test .... Carrier down IPv4 fibmatch [ OK ] IPv6 fibmatch [ OK ] IPv4 linkdown flag set [FAIL] IPv6 linkdown flag set [FAIL] Second address added with carrier down IPv4 fibmatch [ OK ] IPv6 fibmatch [ OK ] IPv4 linkdown flag set [FAIL] IPv6 linkdown flag set [ OK ] And then fix the race in changing carrier down on dummy device to checking the corresponding routes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13selftests: fib_tests: sleep after changing carrierDavid Ahern1-0/+1
sleep for a second after setting carrier down to allow linkwatch to propagate the change to the routing stack via netdev_state_change. As it stands there is a race setting carrier down on the dummy device and then checking the linkdown flag in the routes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13selftests: fib_tests: Move admin of dummy0 to helpersDavid Ahern1-66/+34
Move setup and teardown of testns and dummy0 to helpers. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13selftests: fib_tests: Make test results more verboseDavid Ahern1-113/+135
fib_tests.sh is failing in a VM: $ fib_tests.sh Running netdev unregister tests PASS: unicast route test PASS: multipath route test Running netdev down tests PASS: unicast route test PASS: multipath route test Running netdev carrier change tests PASS: local route carrier test FAIL: unicast route carrier test The last test corresponds to fib_carrier_unicast_test which 12 places that could be failing. Be more verbose in the output so a failure is easier to track down and separate test setup failures with set -e and set +e pairs. With the verbose logging it is easier to see which checks are failing: $fib_tests.sh Single path route carrier test .... Carrier down IPv4 fibmatch [ OK ] IPv6 fibmatch [ OK ] IPv4 linkdown flag set [FAIL] IPv6 linkdown flag set [FAIL] Second address added with carrier down IPv4 fibmatch [ OK ] IPv6 fibmatch [ OK ] IPv4 linkdown flag set [FAIL] IPv6 linkdown flag set [ OK ] Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13selftests: fib_tests: simplify ip commands in a namespaceDavid Ahern1-105/+105
'ip netns exec testns ip' is more efficiently handled using 'ip -netns'; runs the ip command after switching the namespace and avoids an exec. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net/ipv4: Unexport fib_multipath_hash and fib_select_pathDavid Ahern2-2/+0
Do not export fib_multipath_hash or fib_select_path; both are only used by core ipv4 code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net/ipv4: Simplify fib_select_pathDavid Ahern1-6/+5
If flow oif is set and it is not an l3mdev, then fib_select_path can jump to the source address check. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13Merge branch 'sctp-rename-sctp-diag-file-and-add-file-comments-for-it'David S. Miller2-0/+33
Xin Long says: ==================== sctp: rename sctp diag file and add file comments for it This patchset is to remove the sctp_ prefix for sctp diag file, and also to add the missing file comments for it. v1->v2: split them into two patches as Marcelo suggested. ==================== Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13sctp: add file comments in diag.cXin Long1-0/+31
This patch is to add the missing file comments for sctp diag file. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13sctp: rename sctp_diag.c as diag.cXin Long2-0/+2
Remove 'sctp_' prefix for diag file, to keep consistent with other files' names. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13mlxsw: spectrum: Use NL_SET_ERR_MSG_MODArkadi Sharshevsky3-24/+16
Use NL_SET_ERR_MSG_MOD helper which adds the module name instead of specifying the prefix each time. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13Merge branch 'mlxsw-SPAN-cleanups'David S. Miller6-351/+433
Jiri Pirko says: ==================== mlxsw: SPAN cleanups In patch one of this short series, a misplaced pointer star is moved to the correct place. In the second patch, we observe that if SPAN entries carry their reference count anyway, it's redundant to also carry a "used" flag. In the third patch, SPAN support code is moved to a separate module. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13mlxsw: spectrum: Move SPAN code to separate modulePetr Machata6-348/+433
For the upcoming work on SPAN, it makes sense to move the current code to a module of its own. It already has a well-defined API boundary to the mirror management (which is used from matchall and ACL code). A couple more functions need to be exported for the functions that spectrum.c needs to use for MTU handling and subsystem init/fini. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13mlxsw: spectrum: Drop struct span_entry.usedPetr Machata2-5/+2
The member ref_count already determines whether a given SPAN entry is used, and is as easy to use as a dedicated boolean. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13mlxsw: spectrum: Fix a coding style nitPetr Machata1-2/+2
Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13Merge branch 'mlxsw-IPIP-cleanups'David S. Miller3-91/+73
Jiri Pirko says: ==================== mlxsw: IPIP cleanups In the first patch, a forgotten #include is added. Even though the code compiles as-is, the include is necessary for modules that should include spectrum_ipip.h. The second patch corrects an assumption that IPv6 tunnels use struct ip_tunnel_parm to store tunnel parameters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13mlxsw: spectrum: Distinguish between IPv4/6 tunnelsPetr Machata3-91/+72
struct ip_tunnel_parm, where GRE and several other tunnel types hold information, is IPv4-specific. The current router / ipip code in mlxsw however uses it as if it were generic. Make it clear that it's not. Rename many functions from _params_ to _params4_. mlxsw_sp_ipip_parms_saddr() and _daddr() take a proto argument to dispatch on it. Move the dispatch logic to mlxsw_sp_ipip_netdev_saddr() and _daddr(), and replace with single-protocol functions. In struct mlxsw_sp_ipip_entry, move the "parms" field to a (for the time being, singleton) union. Update users throughout. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13mlxsw: spectrum_ipip: Add a forgotten includePetr Machata1-0/+1
struct ip_tunnel_parm, which is used in spectrum_ipip.h, is defined in if_tunnel.h. However, the former neglects to include the latter. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13dpaa_eth: fix incorrect commentJake Moroni1-1/+1
The comment stated that a thread was started, but that is not the case. Signed-off-by: Jake Moroni <mail@jakemoroni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13Merge branch 'Replacing-net_mutex-with-rw_semaphore'David S. Miller45-41/+116
Kirill Tkhai says: ==================== Replacing net_mutex with rw_semaphore this is the third version of the patchset introducing net_sem instead of net_mutex. The patchset adds net_sem in addition to net_mutex and allows pernet_operations to be "async". This flag means, the pernet_operations methods are safe to be executed with any other pernet_operations (un)initializing another net. If there are only async pernet_operations in the system, net_mutex is not used either for setup_net() or for cleanup_net(). The pernet_operations converted in this patchset allow to create minimal .config to have network working, and the changes improve the performance like you may see below: %for i in {1..10000}; do unshare -n bash -c exit; done *before* real 1m40,377s user 0m9,672s sys 0m19,928s *after* real 0m17,007s user 0m5,311s sys 0m11,779 (5.8 times faster) In the future, when all pernet_operations become async, we'll just remove this "async" field tree-wide. All the new logic is concentrated in patches [1-5/32]. The rest of patches converts specific operations: review, rationale of they can be converted, and setting of async flag. Kirill v3: Improved patches descriptions. Added comment into [5/32]. Added [32/32] converting netlink_tap_net_ops (new pernet operations introduced in 2018). v2: Single patch -> patchset with rationale of every conversion ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Convert netlink_tap_net_opsKirill Tkhai1-0/+1
These pernet_operations init just allocated net memory, and they obviously can be executed in parallel in any others. v3: New Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Convert diag_net_opsKirill Tkhai1-0/+1
These pernet operations just create and destroy netlink socket. The socket is pernet and else operations don't touch it. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Convert default_device_opsKirill Tkhai1-0/+1
These pernet operations consist of exit() and exit_batch() methods. default_device_exit() moves not-local and virtual devices to init_net. There is nothing exciting, because this may happen in any time on a working system, and rtnl_lock() and synchronize_net() protect us from all cases of external dereference. The same for default_device_exit_batch(). Similar unregisteration may happen in any time on a system. Here several lists (like todo_list), which are accessed under rtnl_lock(). After rtnl_unlock() and netdev_run_todo() all the devices are flushed. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Convert loopback_net_opsKirill Tkhai1-0/+1
These pernet_operations have only init() method. It allocates memory for net_device, calls register_netdev() and assigns net::loopback_dev. register_netdev() is allowed be used without additional locks, as it's synchronized on rtnl_lock(). There are many examples of using this functon directly from ioctl(). The only difference, compared to ioctl(), is that net is not completely alive at this moment. But it looks like, there is no way for parallel pernet_operations to dereference the net_device, as the most of struct net_device lists, where it's linked, are related to net, and the net is not liked. The exceptions are net_device::unreg_list, close_list, todo_list, used for unregistration, and ::link_watch_list, where net_device may be linked to global lists. Unregistration of loopback_dev obviously can't happen, when loopback_net_init() is executing, as the net as alive. It occurs in default_device_ops, which currently requires net_mutex, and it behaves as a barrier at the moment. It will be considered in next patch. Speaking about link_watch_list, it seems, there is no way for loopback_dev at time of registration to be linked in lweventlist and be available for another pernet_operations. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Convert addrconf_opsKirill Tkhai1-0/+1
These pernet_operations (un)register sysctl, which are not touched by anybody else. So, it's safe to make them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13net: Convert ipv4_sysctl_opsKirill Tkhai1-0/+1
These pernet_operations create and destroy sysctl, which are not touched by anybody else. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>