summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-07-27tcp: allow tls to decrypt directly from the tcp rcv queueJakub Kicinski2-1/+43
Expose TCP rx queue accessor and cleanup, so that TLS can decrypt directly from the TCP queue. The expectation is that the caller can access the skb returned from tcp_recv_skb() and up to inq bytes worth of data (some of which may be in ->next skbs) and then call tcp_read_done() when data has been consumed. The socket lock must be held continuously across those two operations. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: device: keep the zero copy status with offloadJakub Kicinski3-5/+35
The non-zero-copy path assumes a full skb with decrypted contents. This means the device offload would have to CoW the data. Try to keep the zero-copy status instead, copy the data to user space. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: don't free the output in case of zero-copyJakub Kicinski1-13/+13
In the future we'll want to reuse the input skb in case of zero-copy so we shouldn't always free darg.skb. Move the freeing of darg.skb into the non-zc cases. All cases will now free ctx->recv_pkt (inside let tls_rx_rec_done()). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: factor SW handling out of tls_rx_one_record()Jakub Kicinski1-36/+57
After recent changes the SW side of tls_rx_one_record() can be nicely encapsulated in its own function. Move the pad handling as well. This will be useful for ->zc handling in tls_decrypt_device(). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27tls: rx: wrap recv_pkt accesses in helpersJakub Kicinski2-5/+11
To allow for the logic to change later wrap accesses which interrogate the input skb in helper functions. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Merge branch 'implement-dev-info-and-dev-flash-for-line-cards'Jakub Kicinski12-171/+1043
Jiri Pirko says: ==================== Implement dev info and dev flash for line cards This patchset implements two features: 1) "devlink dev info" is exposed for line card (patches 6-9) 2) "devlink dev flash" is implemented for line card gearbox flashing (patch 10) For every line card, "a nested" auxiliary device is created which allows to bind the features mentioned above (patch 4). The relationship between line card and its auxiliary dev devlink is carried over extra line card netlink attribute (patches 3 and 5). The first patch removes devlink_mutex from devlink_register/unregister() which eliminates possible deadlock during devlink reload command. The second patchset follows up with putting net pointer check into new helper. Examples: $ devlink lc show pci/0000:01:00.0 lc 1 pci/0000:01:00.0: lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0 supported_types: 16x100G $ devlink dev show auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0 $ devlink dev info auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0: versions: fixed: hw.revision 0 fw.psid MT_0000000749 running: ini.version 4 fw 19.2010.1312 $ devlink dev flash auxiliary/mlxsw_core.lc.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2 ==================== Link: https://lore.kernel.org/r/20220725082925.366455-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26selftests: mlxsw: Check line card info on activated line cardJiri Pirko1-0/+24
Once line card is activated, check the FW version and PSID are exposed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26selftests: mlxsw: Check line card info on provisioned line cardJiri Pirko1-0/+30
Once line card is provisioned, check if HW revision and INI version are exposed on associated nested auxiliary device. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Implement line card device flashingJiri Pirko4-10/+323
Implement flash_update() devlink op for the line card devlink instance to allow user to update line card gearbox FW using MDDT register and mlxfw. Example: $ devlink dev flash auxiliary/mlxsw_core.lc.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Expose device PSID over device infoJiri Pirko3-0/+35
Use tunneled MGIR to obtain PSID of line card device and extend device_info_get() op to fill up the info with that. Example: $ devlink dev info auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0: versions: fixed: hw.revision 0 fw.psid MT_0000000749 running: ini.version 4 fw 19.2010.1312 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: reg: Add Management DownStream Device Tunneling RegisterJiri Pirko1-0/+90
The MDDT register allows to deliver query and request messages (PRM registers, commands) to a DownStream device. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Probe active line cards for devices and expose FW versionJiri Pirko3-0/+69
In case the line card is active, go over all possible existing devices (gearboxes) on it and expose FW version of the flashable one. Example: $ devlink dev info auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0: versions: fixed: hw.revision 0 running: ini.version 4 fw 19.2010.1312 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: reg: Extend MDDQ by device_infoJiri Pirko1-1/+82
Extend existing MDDQ register by possibility to query information about devices residing on a line card. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Expose HW revision and INI versionJiri Pirko4-0/+61
Implement info_get() to expose HW revision of a linecard and loaded INI version. Example: $ devlink dev info auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0: versions: fixed: hw.revision 0 running: ini.version 4 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26mlxsw: core_linecards: Introduce per line card auxiliary deviceJiri Pirko6-3/+194
In order to be eventually able to expose line card gearbox version and possibility to flash FW, model the line card as a separate device on auxiliary bus. Add the auxiliary device for provisioned line card in order to be able to expose provisioned line card info over devlink dev info. When the line card becomes active, there may be other additional info added to the output. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26net: devlink: introduce nested devlink entity for line cardJiri Pirko3-0/+46
For the purpose of exposing device info and allow flash update which is going to be implemented in follow-up patches, introduce a possibility for a line card to expose relation to nested devlink entity. The nested devlink entity represents the line card. Example: $ devlink lc show pci/0000:01:00.0 lc 1 pci/0000:01:00.0: lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0 supported_types: 16x100G $ devlink dev show auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26net: devlink: move net check into devlinks_xa_for_each_registered_get()Jiri Pirko1-96/+39
Benefit from having devlinks iterator helper devlinks_xa_for_each_registered_get() and move the net pointer check inside. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26net: devlink: make sure that devlink_try_get() works with valid pointer ↵Jiri Pirko1-91/+80
during xarray iteration Remove dependency on devlink_mutex during devlinks xarray iteration. The reason is that devlink_register/unregister() functions taking devlink_mutex would deadlock during devlink reload operation of devlink instance which registers/unregisters nested devlink instances. The devlinks xarray consistency is ensured internally by xarray. There is a reference taken when working with devlink using devlink_try_get(). But there is no guarantee that devlink pointer picked during xarray iteration is not freed before devlink_try_get() is called. Make sure that devlink_try_get() works with valid pointer. Achieve it by: 1) Splitting devlink_put() so the completion is sent only after grace period. Completion unblocks the devlink_unregister() routine, which is followed-up by devlink_free() 2) During devlinks xa_array iteration, get devlink pointer from xa_array holding RCU read lock and taking reference using devlink_try_get() before unlock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26Documentation: kunit: Add CLI args for kunit_toolSadiya Kazi1-1/+62
Some kunit_tool command line arguments are missing in run_wrapper.rst. Document them. Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Maíra Canal <mairacanal@riseup.net> Signed-off-by: Sadiya Kazi <sadiyakazi@google.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-07-26Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_putLuiz Augusto von Dentz2-13/+49
This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: stable@kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26Bluetooth: Always set event mask on suspendAbhishek Pandit-Subedi1-3/+3
When suspending, always set the event mask once disconnects are successful. Otherwise, if wakeup is disallowed, the event mask is not set before suspend continues and can result in an early wakeup. Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Cc: stable@vger.kernel.org Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26Bluetooth: mgmt: Fix double free on error pathDan Carpenter1-1/+0
Don't call mgmt_pending_remove() twice (double free). Fixes: 6b88eff43704 ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()Tetsuo Handa1-2/+1
lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1], for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") guards clear_bit() using fq.lock even before fq_init() from ieee80211_txq_setup_flows() initializes this spinlock. According to discussion [2], Toke was not happy with expanding usage of fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we can instead use synchronize_rcu() for flushing ieee80211_wake_txqs(). Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1] Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2] Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp [ pick up commit 3598cb6e1862 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next] Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26ice: do not setup vlan for loopback VSIMaciej Fijalkowski1-3/+5
Currently loopback test is failiing due to the error returned from ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)Maciej Fijalkowski1-1/+2
Tx side sets EOP and RS bits on descriptors to indicate that a particular descriptor is the last one and needs to generate an irq when it was sent. These bits should not be checked on completion path regardless whether it's the Tx or the Rx. DD bit serves this purpose and it indicates that a particular descriptor is either for Rx or was successfully Txed. EOF is also set as loopback test does not xmit fragmented frames. Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead of EOP and RS pair. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Fix VSIs unable to share unicast MACAnirudh Venkataramanan2-40/+2
The driver currently does not allow two VSIs in the same PF domain to have the same unicast MAC address. This is incorrect in the sense that a policy decision is being made in the driver when it must be left to the user. This approach was causing issues when rebooting the system with VFs spawned not being able to change their MAC addresses. Such errors were present in dmesg: [ 7921.068237] ice 0000:b6:00.2 ens2f2: Unicast MAC 6a:0d:e4:70:ca:d1 already exists on this PF. Preventing setting VF 7 unicast MAC address to 6a:0d:e4:70:ca:d1 Fix that by removing this restriction. Doing this also allows us to remove some additional code that's checking if a unicast MAC filter already exists. Fixes: 47ebc7b02485 ("ice: Check if unicast MAC exists before setting VF MAC") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Fix tunnel checksum offload with fragmented trafficPrzemyslaw Patynowski1-3/+5
Fix checksum offload on VXLAN tunnels. In case, when mpls protocol is not used, set l4 header to transport header of skb. This fixes case, when user tries to offload checksums of VXLAN tunneled traffic. Steps for reproduction (requires link partner with tunnels): ip l s enp130s0f0 up ip a f enp130s0f0 ip a a 10.10.110.2/24 dev enp130s0f0 ip l s enp130s0f0 mtu 1600 ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev enp130s0f0 dstport 4789 ip l s vxlan12_sut up ip a a 20.10.110.2/24 dev vxlan12_sut iperf3 -c 20.10.110.1 #should connect Offload params: td_offset, cd_tunnel_params were corrupted, due to l4 header pointing wrong address. NIC would then drop those packets internally, due to incorrect TX descriptor data, which increased GLV_TEPC register. Fixes: 69e66c04c672 ("ice: Add mpls+tso support") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26ice: Fix max VLANs available for VFPrzemyslaw Patynowski1-1/+2
Legacy VLAN implementation allows for untrusted VF to have 8 VLAN filters, not counting VLAN 0 filters. Current VLAN_V2 implementation lowers available filters for VF, by counting in VLAN 0 filter for both TPIDs. Fix this by counting only non zero VLAN filters. Without this patch, untrusted VF would not be able to access 8 VLAN filters. Fixes: cc71de8fa133 ("ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26Merge remote-tracking branch 'linux-integrity/kexec-keyrings' into ↵Mimi Zohar10-127/+172
next-integrity From the cover letter: Currently when loading a kernel image via the kexec_file_load() system call, x86 can make use of three keyrings i.e. the .builtin_trusted_keys, .secondary_trusted_keys and .platform keyrings to verify a signature. However, arm64 and s390 can only use the .builtin_trusted_keys and .platform keyring respectively. For example, one resulting problem is kexec'ing a kernel image would be rejected with the error "Lockdown: kexec: kexec of unsigned images is restricted; see man kernel_lockdown.7". This patch set enables arm64 and s390 to make use of the same keyrings as x86 to verify the signature kexec'ed kernel image. The recently introduced .machine keyring impacts the roots of trust by linking the .machine keyring to the .secondary keyring. The roots of trust for different keyrings are described as follows, .builtin_trusted_keys: Keys may be built into the kernel during build or inserted into memory reserved for keys post build. The root of trust is based on verification of the kernel image signature. For example, on a physical system in a secure boot environment, this trust is rooted in hardware. .machine: If the end-users choose to trust the keys provided by first-stage UEFI bootloader shim i.e. Machine Owner Keys (MOK keys), the keys will be added to this keyring which is linked to the .secondary_trusted_keys keyring as the same as the .builtin_trusted_keys keyring. Shim has built-in keys from a Linux distribution or the end-users-enrolled keys. So the root of trust of this keyring is either a Linux distribution vendor or the end-users. .secondary_trusted_keys: Certificates signed by keys on the .builtin_trusted_keys, .machine, or existing keys on the .secondary_trusted_keys keryings may be loaded onto the .secondary_trusted_keys keyring. This establishes a signature chain of trust based on keys loaded on either the .builtin_trusted_keys or .machine keyrings, if configured and enabled. .platform: The .platform keyring consist of UEFI db and MOK keys which are used by shim to verify the first boot kernel's image signature. If end-users choose to trust MOK keys and the kernel has the .machine keyring enabled, the .platform keyring only consists of UEFI db keys since the MOK keys are added to the .machine keyring instead. Because the end-users could also enroll their own MOK keys, the root of trust could be hardware and the end-users.
2022-07-26netfilter: nft_queue: only allow supported familes and hooksFlorian Westphal1-0/+27
Trying to use 'queue' statement in ingress (for example) triggers a splat on reinject: WARNING: CPU: 3 PID: 1345 at net/netfilter/nf_queue.c:291 ... because nf_reinject cannot find the ruleset head. The netdev family doesn't support async resume at the moment anyway, so disallow loading such rulesets with a more appropriate error message. v2: add 'validate' callback and also check hook points, v1 did allow ingress use in 'table inet', but that doesn't work either. (Pablo) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26netfilter: nf_tables: add rescheduling points during loop detection walksFlorian Westphal1-0/+6
Add explicit rescheduling points during ruleset walk. Switching to a faster algorithm is possible but this is a much smaller change, suitable for nf tree. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26netfilter: nf_queue: do not allow packet truncation below transport header ↵Florian Westphal1-1/+6
offset Domingo Dirutigliano and Nicola Guerrera report kernel panic when sending nf_queue verdict with 1-byte nfta_payload attribute. The IP/IPv6 stack pulls the IP(v6) header from the packet after the input hook. If user truncates the packet below the header size, this skb_pull() will result in a malformed skb (skb->len < 0). Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink") Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-PSumeet Pawnikar1-0/+2
Add Alder Lake-N and Raptor Lake-P to the list of processor models for which Power Limit4 is supported by the Intel RAPL driver. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-26ACPI: CPPC: Do not prevent CPPC from working in the futureRafael J. Wysocki2-31/+25
There is a problem with the current revision checks in is_cppc_supported() that they essentially prevent the CPPC support from working if a new _CPC package format revision being a proper superset of the v3 and only causing _CPC to return a package with more entries (while retaining the types and meaning of the entries defined by the v3) is introduced in the future and used by the platform firmware. In that case, as long as the number of entries in the _CPC return package is at least CPPC_V3_NUM_ENT, it should be perfectly fine to use the v3 support code and disregard the additional package entries added by the new package format revision. For this reason, drop is_cppc_supported() altogether, put the revision checks directly into acpi_cppc_processor_probe() so they are easier to follow and rework them to take the case mentioned above into account. Fixes: 4773e77cdc9b ("ACPI / CPPC: Add support for CPPC v3") Cc: 4.18+ <stable@vger.kernel.org> # 4.18+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-26ACPI: PM: x86: Print messages regarding LPS0 idle supportRafael J. Wysocki2-1/+6
Because suspend-to-idle is always supported and on x86 it is the only way to suspend the system if S3 is not supported by the platform, the kernel attempts to enter low-power S0 idle in the suspend-to-idle flow regardless of whether or not the ACPI_FADT_LOW_POWER_S0 flag is set in the FADT. However, if that flag is not set, residency counters associated with low-power S0 idle may not count and the platform may refuse to put the EC into a low-power mode, for example. For this reason, print diagnostic messages when the platform should achieve significant energy savings in low-power S0 idle (because the ACPI_FADT_LOW_POWER_S0 flag is set in the FADT) and when suspend-to-idle becomes the default suspend method (because low-power S0 idle should be equally or more efficient than S3, if available). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
2022-07-26PM: QoS: Add check to make sure CPU freq is non-negativeShivnandan Kumar1-2/+2
CPU frequency should never be negative. If some client driver calls freq_qos_update_request with a negative value which will be very high in absolute terms, then frequency QoS sets max CPU freq at fmax as it considers it's absolute value but it will add plist node with negative priority. plist node has priority from INT_MIN (highest) to INT_MAX(lowest). Once priority is set as negative, another client will not be able to reduce CPU frequency. Adding check to make sure CPU freq is non-negative will fix this problem. Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-26PM: hibernate: defer device probing when resuming from hibernationTetsuo Handa1-1/+12
syzbot is reporting hung task at misc_open() [1], for there is a race window of AB-BA deadlock which involves probe_count variable. Currently wait_for_device_probe() from snapshot_open() from misc_open() can sleep forever with misc_mtx held if probe_count cannot become 0. When a device is probed by hub_event() work function, probe_count is incremented before the probe function starts, and probe_count is decremented after the probe function completed. There are three cases that can prevent probe_count from dropping to 0. (a) A device being probed stopped responding (i.e. broken/malicious hardware). (b) A process emulating a USB device using /dev/raw-gadget interface stopped responding for some reason. (c) New device probe requests keeps coming in before existing device probe requests complete. The phenomenon syzbot is reporting is (b). A process which is holding system_transition_mutex and misc_mtx is waiting for probe_count to become 0 inside wait_for_device_probe(), but the probe function which is called from hub_event() work function is waiting for the processes which are blocked at mutex_lock(&misc_mtx) to respond via /dev/raw-gadget interface. This patch mitigates (b) by deferring wait_for_device_probe() from snapshot_open() to snapshot_write() and snapshot_ioctl(). Please note that the possibility of (b) remains as long as any thread which is emulating a USB device via /dev/raw-gadget interface can be blocked by uninterruptible blocking operations (e.g. mutex_lock()). Please also note that (a) and (c) are not addressed. Regarding (c), we should change the code to wait for only one device which contains the image for resuming from hibernation. I don't know how to address (a), for use of timeout for wait_for_device_probe() might result in loss of user data in the image. Maybe we should require the userland to wait for the image device before opening /dev/snapshot interface. Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1] Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-26ublk_drv: fix double shift bugDan Carpenter1-2/+2
The test/clear_bit() functions take a bit number, but this code is passing as shifted value. It's the equivalent of saying BIT(BIT(0)) instead of just BIT(0). This doesn't affect runtime because numbers are small and it's done consistently. Fixes: fa362045564e ("ublk: simplify ublk_ch_open and ublk_ch_release") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/Yt/2R/+MJf/MSoyl@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-26ice: Add support for PPPoE hardware offloadMarcin Szycik6-2/+259
Add support for creating PPPoE filters in switchdev mode. Add support for parsing PPPoE and PPP-specific tc options: pppoe_sid and ppp_proto. Example filter: tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \ 1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR Changes in iproute2 are required to use the new fields. ICE COMMS DDP package is required to create a filter as it contains PPPoE profiles. Added a warning message when loaded DDP package does not contain required profiles. Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26flow_offload: Introduce flow_match_pppoeWojciech Drewek2-0/+13
Allow to offload PPPoE filters by adding flow_rule_match_pppoe. Drivers can extract PPPoE specific fields from now on. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26net/sched: flower: Add PPPoE filterWojciech Drewek2-0/+67
Add support for PPPoE specific fields for tc-flower. Those fields can be provided only when protocol was set to ETH_P_PPP_SES. Defines, dump, load and set are being done here. Overwrite basic.n_proto only in case of PPP_IP and PPP_IPV6, otherwise leave it as ETH_P_PPP_SES. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26Merge tag 's390-5.19-7' of ↵Linus Torvalds1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fix from Alexander GordeevL - Prevent relatively slow PRNO TRNG random number operation from being called from interrupt context. That could for example cause some network loads to timeout. * tag 's390-5.19-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/archrandom: prevent CPACF trng invocations in interrupt context
2022-07-26flow_dissector: Add PPPoE dissectorsWojciech Drewek3-7/+73
Allow to dissect PPPoE specific fields which are: - session ID (16 bits) - ppp protocol (16 bits) - type (16 bits) - this is PPPoE ethertype, for now only ETH_P_PPP_SES is supported, possible ETH_P_PPP_DISC in the future The goal is to make the following TC command possible: # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \ flower \ pppoe_sid 12 \ ppp_proto ip \ action drop Note that only PPPoE Session is supported. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26mm: fix NULL pointer dereference in wp_page_reuse()Qi Zheng1-1/+1
The vmf->page can be NULL when the wp_page_reuse() is invoked by wp_pfn_shared(), it will cause the following panic: BUG: kernel NULL pointer dereference, address: 000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 18 PID: 923 Comm: Xorg Not tainted 5.19.0-rc8.bm.1-amd64 #263 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g14 RIP: 0010:_compound_head+0x0/0x40 [...] Call Trace: wp_page_reuse+0x1c/0xa0 do_wp_page+0x1a5/0x3f0 __handle_mm_fault+0x8cf/0xd20 handle_mm_fault+0xd5/0x2a0 do_user_addr_fault+0x1d0/0x680 exc_page_fault+0x78/0x170 asm_exc_page_fault+0x22/0x30 To fix it, this patch performs a NULL pointer check before dereferencing the vmf->page. Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-26spi: a3700: support BE for AC5 SPI driverNoam1-2/+2
Signed-off-by: Noam <lnoam@marvell.com> Tested-by: Raz Adashi <raza@marvell.com> Reviewed-by: Raz Adashi <raza@marvell.com> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Link: https://lore.kernel.org/r/20220726130038.20995-1-vadym.kochan@plvision.eu Signed-off-by: Mark Brown <broonie@kernel.org>
2022-07-26drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()Nathan Chancellor1-1/+1
When booting a kernel compiled with clang's CFI protection (CONFIG_CFI_CLANG), there is a CFI failure in drm_simple_kms_crtc_mode_valid() when trying to call simpledrm_simple_display_pipe_mode_valid() through ->mode_valid(): [ 0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8): ... [ 0.324928] Call trace: [ 0.324969] __ubsan_handle_cfi_check_fail+0x58/0x60 [ 0.325053] __cfi_check_fail+0x3c/0x44 [ 0.325120] __cfi_slowpath_diag+0x178/0x200 [ 0.325192] drm_simple_kms_crtc_mode_valid+0x58/0x80 [ 0.325279] __drm_helper_update_and_validate+0x31c/0x464 ... The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs' expects a return type of 'enum drm_mode_status', not 'int'. Correct it to fix the CFI failure. Cc: stable@vger.kernel.org Fixes: 11e8f5fd223b ("drm: Add simpledrm driver") Link: https://github.com/ClangBuiltLinux/linux/issues/1647 Reported-by: Tomasz Paweł Gajc <tpgxyz@gmail.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220725233629.223223-1-nathan@kernel.org
2022-07-26hwmon: (occ) Replace open-coded variant of %*phN specifierAndy Shevchenko1-6/+2
printf()-like functions in the kernel have extensions, such as %*phN to dump small pieces of memory as hex bytes. Replace custom approach with the direct use of %*phN. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220726143110.4809-1-andriy.shevchenko@linux.intel.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-07-26selftests/bpf: Attach to socketcall() in test_probe_userIlya Leoshkevich2-13/+51
test_probe_user fails on architectures where libc uses socketcall(SYS_CONNECT) instead of connect(). Fix by attaching to socketcall as well. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20220726134008.256968-3-iii@linux.ibm.com
2022-07-26libbpf: Extend BPF_KSYSCALL documentationIlya Leoshkevich1-4/+11
Explicitly list known quirks. Mention that socket-related syscalls can be invoked via socketcall(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20220726134008.256968-2-iii@linux.ibm.com
2022-07-26bpf, devmap: Compute proper xdp_frame len redirecting framesLorenzo Bianconi1-2/+2
Even if it is currently forbidden to XDP_REDIRECT a multi-frag xdp_frame into a devmap, compute proper xdp_frame length in __xdp_enqueue and is_valid_dst routines running xdp_get_frame_len(). Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/894d99c01139e921bdb6868158ff8e67f661c072.1658596075.git.lorenzo@kernel.org