summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-07-27Merge branch 'tls-rx-decrypt-from-the-tcp-queue'Jakub Kicinski8-124/+723
Jakub Kicinski says: ==================== tls: rx: decrypt from the TCP queue This is the final part of my TLS Rx rework. It switches from strparser to decrypting data from skbs queued in TCP. We don't need the full strparser for TLS, its needs are very basic. This set gives us a small but measurable (6%) performance improvement (continuous stream). ==================== Link: https://lore.kernel.org/r/20220722235033.2594446-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: do not use the standard strparserJakub Kicinski5-69/+558
TLS is a relatively poor fit for strparser. We pause the input every time a message is received, wait for a read which will decrypt the message, start the parser, repeat. strparser is built to delineate the messages, wrap them in individual skbs and let them float off into the stack or a different socket. TLS wants the data pages and nothing else. There's no need for TLS to keep cloning (and occasionally skb_unclone()'ing) the TCP rx queue. This patch uses a pre-allocated skb and attaches the skbs from the TCP rx queue to it as frags. TLS is careful never to modify the input skb without CoW'ing / detaching it first. Since we call TCP rx queue cleanup directly we also get back the benefit of skb deferred free. Overall this results in a 6% gain in my benchmarks. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: device: add input CoW helperJakub Kicinski3-10/+21
Wrap the remaining skb_cow_data() into a helper, so it's easier to replace down the lane. The new version will change the skb so make sure relevant pointers get reloaded after the call. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tcp: allow tls to decrypt directly from the tcp rcv queueJakub Kicinski2-1/+43
Expose TCP rx queue accessor and cleanup, so that TLS can decrypt directly from the TCP queue. The expectation is that the caller can access the skb returned from tcp_recv_skb() and up to inq bytes worth of data (some of which may be in ->next skbs) and then call tcp_read_done() when data has been consumed. The socket lock must be held continuously across those two operations. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: device: keep the zero copy status with offloadJakub Kicinski3-5/+35
The non-zero-copy path assumes a full skb with decrypted contents. This means the device offload would have to CoW the data. Try to keep the zero-copy status instead, copy the data to user space. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: don't free the output in case of zero-copyJakub Kicinski1-13/+13
In the future we'll want to reuse the input skb in case of zero-copy so we shouldn't always free darg.skb. Move the freeing of darg.skb into the non-zc cases. All cases will now free ctx->recv_pkt (inside let tls_rx_rec_done()). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: factor SW handling out of tls_rx_one_record()Jakub Kicinski1-36/+57
After recent changes the SW side of tls_rx_one_record() can be nicely encapsulated in its own function. Move the pad handling as well. This will be useful for ->zc handling in tls_decrypt_device(). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: wrap recv_pkt accesses in helpersJakub Kicinski2-5/+11
To allow for the logic to change later wrap accesses which interrogate the input skb in helper functions. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Merge branch 'implement-dev-info-and-dev-flash-for-line-cards'Jakub Kicinski12-171/+1043
Jiri Pirko says: ==================== Implement dev info and dev flash for line cards This patchset implements two features: 1) "devlink dev info" is exposed for line card (patches 6-9) 2) "devlink dev flash" is implemented for line card gearbox flashing (patch 10) For every line card, "a nested" auxiliary device is created which allows to bind the features mentioned above (patch 4). The relationship between line card and its auxiliary dev devlink is carried over extra line card netlink attribute (patches 3 and 5). The first patch removes devlink_mutex from devlink_register/unregister() which eliminates possible deadlock during devlink reload command. The second patchset follows up with putting net pointer check into new helper. Examples: $ devlink lc show pci/0000:01:00.0 lc 1 pci/0000:01:00.0: lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0 supported_types: 16x100G $ devlink dev show auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0 $ devlink dev info auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0: versions: fixed: hw.revision 0 fw.psid MT_0000000749 running: ini.version 4 fw 19.2010.1312 $ devlink dev flash auxiliary/mlxsw_core.lc.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2 ==================== Link: https://lore.kernel.org/r/20220725082925.366455-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26selftests: mlxsw: Check line card info on activated line cardJiri Pirko1-0/+24
Once line card is activated, check the FW version and PSID are exposed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26selftests: mlxsw: Check line card info on provisioned line cardJiri Pirko1-0/+30
Once line card is provisioned, check if HW revision and INI version are exposed on associated nested auxiliary device. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Implement line card device flashingJiri Pirko4-10/+323
Implement flash_update() devlink op for the line card devlink instance to allow user to update line card gearbox FW using MDDT register and mlxfw. Example: $ devlink dev flash auxiliary/mlxsw_core.lc.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Expose device PSID over device infoJiri Pirko3-0/+35
Use tunneled MGIR to obtain PSID of line card device and extend device_info_get() op to fill up the info with that. Example: $ devlink dev info auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0: versions: fixed: hw.revision 0 fw.psid MT_0000000749 running: ini.version 4 fw 19.2010.1312 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: reg: Add Management DownStream Device Tunneling RegisterJiri Pirko1-0/+90
The MDDT register allows to deliver query and request messages (PRM registers, commands) to a DownStream device. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Probe active line cards for devices and expose FW versionJiri Pirko3-0/+69
In case the line card is active, go over all possible existing devices (gearboxes) on it and expose FW version of the flashable one. Example: $ devlink dev info auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0: versions: fixed: hw.revision 0 running: ini.version 4 fw 19.2010.1312 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: reg: Extend MDDQ by device_infoJiri Pirko1-1/+82
Extend existing MDDQ register by possibility to query information about devices residing on a line card. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Expose HW revision and INI versionJiri Pirko4-0/+61
Implement info_get() to expose HW revision of a linecard and loaded INI version. Example: $ devlink dev info auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0: versions: fixed: hw.revision 0 running: ini.version 4 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Introduce per line card auxiliary deviceJiri Pirko6-3/+194
In order to be eventually able to expose line card gearbox version and possibility to flash FW, model the line card as a separate device on auxiliary bus. Add the auxiliary device for provisioned line card in order to be able to expose provisioned line card info over devlink dev info. When the line card becomes active, there may be other additional info added to the output. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26net: devlink: introduce nested devlink entity for line cardJiri Pirko3-0/+46
For the purpose of exposing device info and allow flash update which is going to be implemented in follow-up patches, introduce a possibility for a line card to expose relation to nested devlink entity. The nested devlink entity represents the line card. Example: $ devlink lc show pci/0000:01:00.0 lc 1 pci/0000:01:00.0: lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0 supported_types: 16x100G $ devlink dev show auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26net: devlink: move net check into devlinks_xa_for_each_registered_get()Jiri Pirko1-96/+39
Benefit from having devlinks iterator helper devlinks_xa_for_each_registered_get() and move the net pointer check inside. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26net: devlink: make sure that devlink_try_get() works with valid pointer ↵Jiri Pirko1-91/+80
during xarray iteration Remove dependency on devlink_mutex during devlinks xarray iteration. The reason is that devlink_register/unregister() functions taking devlink_mutex would deadlock during devlink reload operation of devlink instance which registers/unregisters nested devlink instances. The devlinks xarray consistency is ensured internally by xarray. There is a reference taken when working with devlink using devlink_try_get(). But there is no guarantee that devlink pointer picked during xarray iteration is not freed before devlink_try_get() is called. Make sure that devlink_try_get() works with valid pointer. Achieve it by: 1) Splitting devlink_put() so the completion is sent only after grace period. Completion unblocks the devlink_unregister() routine, which is followed-up by devlink_free() 2) During devlinks xa_array iteration, get devlink pointer from xa_array holding RCU read lock and taking reference using devlink_try_get() before unlock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_putLuiz Augusto von Dentz2-13/+49
This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: stable@kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26Bluetooth: Always set event mask on suspendAbhishek Pandit-Subedi1-3/+3
When suspending, always set the event mask once disconnects are successful. Otherwise, if wakeup is disallowed, the event mask is not set before suspend continues and can result in an early wakeup. Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Cc: stable@vger.kernel.org Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26Bluetooth: mgmt: Fix double free on error pathDan Carpenter1-1/+0
Don't call mgmt_pending_remove() twice (double free). Fixes: 6b88eff43704 ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()Tetsuo Handa1-2/+1
lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1], for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") guards clear_bit() using fq.lock even before fq_init() from ieee80211_txq_setup_flows() initializes this spinlock. According to discussion [2], Toke was not happy with expanding usage of fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we can instead use synchronize_rcu() for flushing ieee80211_wake_txqs(). Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1] Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2] Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp [ pick up commit 3598cb6e1862 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next] Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26ice: do not setup vlan for loopback VSIMaciej Fijalkowski1-3/+5
Currently loopback test is failiing due to the error returned from ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)Maciej Fijalkowski1-1/+2
Tx side sets EOP and RS bits on descriptors to indicate that a particular descriptor is the last one and needs to generate an irq when it was sent. These bits should not be checked on completion path regardless whether it's the Tx or the Rx. DD bit serves this purpose and it indicates that a particular descriptor is either for Rx or was successfully Txed. EOF is also set as loopback test does not xmit fragmented frames. Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead of EOP and RS pair. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Fix VSIs unable to share unicast MACAnirudh Venkataramanan2-40/+2
The driver currently does not allow two VSIs in the same PF domain to have the same unicast MAC address. This is incorrect in the sense that a policy decision is being made in the driver when it must be left to the user. This approach was causing issues when rebooting the system with VFs spawned not being able to change their MAC addresses. Such errors were present in dmesg: [ 7921.068237] ice 0000:b6:00.2 ens2f2: Unicast MAC 6a:0d:e4:70:ca:d1 already exists on this PF. Preventing setting VF 7 unicast MAC address to 6a:0d:e4:70:ca:d1 Fix that by removing this restriction. Doing this also allows us to remove some additional code that's checking if a unicast MAC filter already exists. Fixes: 47ebc7b02485 ("ice: Check if unicast MAC exists before setting VF MAC") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Fix tunnel checksum offload with fragmented trafficPrzemyslaw Patynowski1-3/+5
Fix checksum offload on VXLAN tunnels. In case, when mpls protocol is not used, set l4 header to transport header of skb. This fixes case, when user tries to offload checksums of VXLAN tunneled traffic. Steps for reproduction (requires link partner with tunnels): ip l s enp130s0f0 up ip a f enp130s0f0 ip a a 10.10.110.2/24 dev enp130s0f0 ip l s enp130s0f0 mtu 1600 ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev enp130s0f0 dstport 4789 ip l s vxlan12_sut up ip a a 20.10.110.2/24 dev vxlan12_sut iperf3 -c 20.10.110.1 #should connect Offload params: td_offset, cd_tunnel_params were corrupted, due to l4 header pointing wrong address. NIC would then drop those packets internally, due to incorrect TX descriptor data, which increased GLV_TEPC register. Fixes: 69e66c04c672 ("ice: Add mpls+tso support") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Fix max VLANs available for VFPrzemyslaw Patynowski1-1/+2
Legacy VLAN implementation allows for untrusted VF to have 8 VLAN filters, not counting VLAN 0 filters. Current VLAN_V2 implementation lowers available filters for VF, by counting in VLAN 0 filter for both TPIDs. Fix this by counting only non zero VLAN filters. Without this patch, untrusted VF would not be able to access 8 VLAN filters. Fixes: cc71de8fa133 ("ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26netfilter: nft_queue: only allow supported familes and hooksFlorian Westphal1-0/+27
Trying to use 'queue' statement in ingress (for example) triggers a splat on reinject: WARNING: CPU: 3 PID: 1345 at net/netfilter/nf_queue.c:291 ... because nf_reinject cannot find the ruleset head. The netdev family doesn't support async resume at the moment anyway, so disallow loading such rulesets with a more appropriate error message. v2: add 'validate' callback and also check hook points, v1 did allow ingress use in 'table inet', but that doesn't work either. (Pablo) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26netfilter: nf_tables: add rescheduling points during loop detection walksFlorian Westphal1-0/+6
Add explicit rescheduling points during ruleset walk. Switching to a faster algorithm is possible but this is a much smaller change, suitable for nf tree. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26netfilter: nf_queue: do not allow packet truncation below transport header ↵Florian Westphal1-1/+6
offset Domingo Dirutigliano and Nicola Guerrera report kernel panic when sending nf_queue verdict with 1-byte nfta_payload attribute. The IP/IPv6 stack pulls the IP(v6) header from the packet after the input hook. If user truncates the packet below the header size, this skb_pull() will result in a malformed skb (skb->len < 0). Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink") Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26ice: Add support for PPPoE hardware offloadMarcin Szycik6-2/+259
Add support for creating PPPoE filters in switchdev mode. Add support for parsing PPPoE and PPP-specific tc options: pppoe_sid and ppp_proto. Example filter: tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \ 1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR Changes in iproute2 are required to use the new fields. ICE COMMS DDP package is required to create a filter as it contains PPPoE profiles. Added a warning message when loaded DDP package does not contain required profiles. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26flow_offload: Introduce flow_match_pppoeWojciech Drewek2-0/+13
Allow to offload PPPoE filters by adding flow_rule_match_pppoe. Drivers can extract PPPoE specific fields from now on. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26net/sched: flower: Add PPPoE filterWojciech Drewek2-0/+67
Add support for PPPoE specific fields for tc-flower. Those fields can be provided only when protocol was set to ETH_P_PPP_SES. Defines, dump, load and set are being done here. Overwrite basic.n_proto only in case of PPP_IP and PPP_IPV6, otherwise leave it as ETH_P_PPP_SES. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26Merge tag 's390-5.19-7' of ↵Linus Torvalds1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fix from Alexander GordeevL - Prevent relatively slow PRNO TRNG random number operation from being called from interrupt context. That could for example cause some network loads to timeout. * tag 's390-5.19-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/archrandom: prevent CPACF trng invocations in interrupt context
2022-07-26flow_dissector: Add PPPoE dissectorsWojciech Drewek3-7/+73
Allow to dissect PPPoE specific fields which are: - session ID (16 bits) - ppp protocol (16 bits) - type (16 bits) - this is PPPoE ethertype, for now only ETH_P_PPP_SES is supported, possible ETH_P_PPP_DISC in the future The goal is to make the following TC command possible: # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \ flower \ pppoe_sid 12 \ ppp_proto ip \ action drop Note that only PPPoE Session is supported. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26mm: fix NULL pointer dereference in wp_page_reuse()Qi Zheng1-1/+1
The vmf->page can be NULL when the wp_page_reuse() is invoked by wp_pfn_shared(), it will cause the following panic: BUG: kernel NULL pointer dereference, address: 000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 18 PID: 923 Comm: Xorg Not tainted 5.19.0-rc8.bm.1-amd64 #263 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g14 RIP: 0010:_compound_head+0x0/0x40 [...] Call Trace: wp_page_reuse+0x1c/0xa0 do_wp_page+0x1a5/0x3f0 __handle_mm_fault+0x8cf/0xd20 handle_mm_fault+0xd5/0x2a0 do_user_addr_fault+0x1d0/0x680 exc_page_fault+0x78/0x170 asm_exc_page_fault+0x22/0x30 To fix it, this patch performs a NULL pointer check before dereferencing the vmf->page. Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-26bridge: Do not send empty IFLA_AF_SPEC attributeBenjamin Poirier1-2/+6
After commit b6c02ef54913 ("bridge: Netlink interface fix."), br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a bridge vlan dump is requested but an interface does not have any vlans configured. iproute2 ignores such an empty attribute since commit b262a9becbcb ("bridge: Fix output with empty vlan lists") but older iproute2 versions as well as other utilities have their output changed by the cited kernel commit, resulting in failed test cases. Regardless, emitting an empty attribute is pointless and inefficient. Avoid this change by canceling the attribute if no AF_SPEC data was added. Fixes: b6c02ef54913 ("bridge: Netlink interface fix.") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220725001236.95062-1-bpoirier@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26Merge branch 'octeontx2-minor-tc-fixes'Paolo Abeni1-33/+73
Subbaraya Sundeep says: ==================== Octeontx2 minor tc fixes This patch set fixes two problems found in tc code wrt to ratelimiting and when installing UDP/TCP filters. Patch 1: CN10K has different register format compared to CN9xx hence fixes that. Patch 2: Check flow mask also before installing a src/dst port filter, otherwise installing for one port installs for other one too. ==================== Link: https://lore.kernel.org/r/1658650874-16459-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26octeontx2-pf: Fix UDP/TCP src and dst port tc filtersSubbaraya Sundeep1-12/+18
Check the mask for non-zero value before installing tc filters for L4 source and destination ports. Otherwise installing a filter for source port installs destination port too and vice-versa. Fixes: 1d4d9e42c240 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26octeontx2-pf: cn10k: Fix egress ratelimit configurationSunil Goutham1-21/+55
NIX_AF_TLXX_PIR/CIR register format has changed from OcteonTx2 to CN10K. CN10K supports larger burst size. Fix burst exponent and burst mantissa configuration for CN10K. Also fixed 'maxrate' from u32 to u64 since 'police.rate_bytes_ps' passed by stack is also u64. Fixes: e638a83f167e ("octeontx2-pf: TC_MATCHALL egress ratelimiting offload") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26Merge branch 'octeontx2-minor-tc-fixes'Paolo Abeni1-33/+73
Subbaraya Sundeep says: ==================== Octeontx2 minor tc fixes This patch set fixes two problems found in tc code wrt to ratelimiting and when installing UDP/TCP filters. Patch 1: CN10K has different register format compared to CN9xx hence fixes that. Patch 2: Check flow mask also before installing a src/dst port filter, otherwise installing for one port installs for other one too. ==================== Link: https://lore.kernel.org/r/1658650874-16459-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26octeontx2-pf: Fix UDP/TCP src and dst port tc filtersSubbaraya Sundeep1-12/+18
Check the mask for non-zero value before installing tc filters for L4 source and destination ports. Otherwise installing a filter for source port installs destination port too and vice-versa. Fixes: 1d4d9e42c240 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26octeontx2-pf: cn10k: Fix egress ratelimit configurationSunil Goutham1-21/+55
NIX_AF_TLXX_PIR/CIR register format has changed from OcteonTx2 to CN10K. CN10K supports larger burst size. Fix burst exponent and burst mantissa configuration for CN10K. Also fixed 'maxrate' from u32 to u64 since 'police.rate_bytes_ps' passed by stack is also u64. Fixes: e638a83f167e ("octeontx2-pf: TC_MATCHALL egress ratelimiting offload") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26sfc/siena: fix repeated words in commentswangjianli1-1/+1
Delete the redundant word 'in'. Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Link: https://lore.kernel.org/r/20220724075207.21080-1-wangjianli@cdjrlc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26sfc/falcon: fix repeated words in commentswangjianli1-1/+1
Delete the redundant word 'in'. Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Link: https://lore.kernel.org/r/20220724074746.19550-1-wangjianli@cdjrlc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26Merge branch 'add-mtu-change-with-stmmac-interface-running'Jakub Kicinski7-313/+459
Christian Marangi says: ==================== Add MTU change with stmmac interface running This series is to permit MTU change while the interface is running. Major rework are needed to permit to allocate a new dma conf based on the new MTU before applying it. This is to make sure there is enough space to allocate all the DMA queue before releasing the stmmac driver. This was tested with a simple way to stress the network while the interface is running. 2 ssh connection to the device: - One generating simple traffic with while true; do free; done - The other making the mtu change with a delay of 1 second The connection is correctly stopped and recovered after the MTU is changed. The first 2 patch of this series are minor fixup that fix problems presented while testing this. One fix a problem when we renable a queue while we are generating a new dma conf. The other is a corner case that was notice while stressing the driver and turning down the interface while there was some traffic. (this is a follow-up of a simpler patch that wanted to add the same feature. It was suggested to first try to check if it was possible to apply the new configuration. Posting as RFC as it does major rework for the new concept of DMA conf) ==================== Link: https://lore.kernel.org/r/20220723142933.16030-1-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26sctp: fix sleep in atomic context bug in timer handlersDuoming Zhou1-1/+1
There are sleep in atomic context bugs in timer handlers of sctp such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(), sctp_generate_t1_init_event(), sctp_generate_timeout_event(), sctp_generate_t3_rtx_event() and so on. The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter that may sleep could be called by different timer handlers which is in interrupt context. One of the call paths that could trigger bug is shown below: (interrupt context) sctp_generate_probe_event sctp_do_sm sctp_side_effects sctp_cmd_interpreter sctp_outq_teardown sctp_outq_init sctp_sched_set_sched n->init_sid(..,GFP_KERNEL) sctp_sched_prio_init_sid //may sleep This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched() from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic context bugs. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>