summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-05-20bpf: tcp: Avoid taking fast sock lock in iteratorAditi Ghag1-3/+2
This is a preparatory commit to replace `lock_sock_fast` with `lock_sock`,and facilitate BPF programs executed from the TCP sockets iterator to be able to destroy TCP sockets using the bpf_sock_destroy kfunc (implemented in follow-up commits). Previously, BPF TCP iterator was acquiring the sock lock with BH disabled. This led to scenarios where the sockets hash table bucket lock can be acquired with BH enabled in some path versus disabled in other. In such situation, kernel issued a warning since it thinks that in the BH enabled path the same bucket lock *might* be acquired again in the softirq context (BH disabled), which will lead to a potential dead lock. Since bpf_sock_destroy also happens in a process context, the potential deadlock warning is likely a false alarm. Here is a snippet of annotated stack trace that motivated this change: ``` Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&h->lhash2[i].lock); local_bh_disable(); lock(&h->lhash2[i].lock); kernel imagined possible scenario: local_bh_disable(); /* Possible softirq */ lock(&h->lhash2[i].lock); *** Potential Deadlock *** process context: lock_acquire+0xcd/0x330 _raw_spin_lock+0x33/0x40 ------> Acquire (bucket) lhash2.lock with BH enabled __inet_hash+0x4b/0x210 inet_csk_listen_start+0xe6/0x100 inet_listen+0x95/0x1d0 __sys_listen+0x69/0xb0 __x64_sys_listen+0x14/0x20 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc bpf_sock_destroy run from iterator: lock_acquire+0xcd/0x330 _raw_spin_lock+0x33/0x40 ------> Acquire (bucket) lhash2.lock with BH disabled inet_unhash+0x9a/0x110 tcp_set_state+0x6a/0x210 tcp_abort+0x10d/0x200 bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9 bpf_iter_run_prog+0x1ff/0x340 ------> lock_sock_fast that acquires sock lock with BH disabled bpf_iter_tcp_seq_show+0xca/0x190 bpf_seq_read+0x177/0x450 ``` Also, Yonghong reported a deadlock for non-listening TCP sockets that this change resolves. Previously, `lock_sock_fast` held the sock spin lock with BH which was again being acquired in `tcp_abort`: ``` watchdog: BUG: soft lockup - CPU#0 stuck for 86s! [test_progs:2331] RIP: 0010:queued_spin_lock_slowpath+0xd8/0x500 Call Trace: <TASK> _raw_spin_lock+0x84/0x90 tcp_abort+0x13c/0x1f0 bpf_prog_88539c5453a9dd47_iter_tcp6_client+0x82/0x89 bpf_iter_run_prog+0x1aa/0x2c0 ? preempt_count_sub+0x1c/0xd0 ? from_kuid_munged+0x1c8/0x210 bpf_iter_tcp_seq_show+0x14e/0x1b0 bpf_seq_read+0x36c/0x6a0 bpf_iter_tcp_seq_show lock_sock_fast __lock_sock_fast spin_lock_bh(&sk->sk_lock.slock); /* * Fast path return with bottom halves disabled and * sock::sk_lock.slock held.* */ ... tcp_abort local_bh_disable(); spin_lock(&((sk)->sk_lock.slock)); // from bh_lock_sock(sk) ``` With the switch to `lock_sock`, it calls `spin_unlock_bh` before returning: ``` lock_sock lock_sock_nested spin_lock_bh(&sk->sk_lock.slock); : spin_unlock_bh(&sk->sk_lock.slock); ``` Acked-by: Yonghong Song <yhs@meta.com> Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-2-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-05-20Merge tag 'scsi-fixes' of ↵Linus Torvalds5-13/+17
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Six small fixes. Four in drivers and the two core changes should be read together as a correction to a prior iorequest_cnt fix that exposed us to a potential use after free" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: core: Decrease scsi_device's iorequest_cnt if dispatch failed scsi: Revert "scsi: core: Do not increase scsi_device's iorequest_cnt if dispatch failed" scsi: storvsc: Don't pass unused PFNs to Hyper-V host scsi: ufs: core: Fix MCQ nr_hw_queues scsi: ufs: core: Rename symbol sizeof_utp_transfer_cmd_desc() scsi: ufs: core: Fix MCQ tag calculation
2023-05-20Bluetooth: btnxpuart: Fix compiler warningsNeeraj Sanjay Kale1-3/+3
This fixes the follwing compiler warning reported by kernel test robot: drivers/bluetooth/btnxpuart.c:1332:34: warning: unused variable 'nxpuart_of_match_table' [-Wunused-const-variable] Signed-off-by: Neeraj Sanjay Kale <neeraj.sanjaykale@nxp.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202305161345.eClvTYQ9-lkp@intel.com/ Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-05-20Bluetooth: Unlink CISes when LE disconnects in hci_conn_delRuihan Li1-15/+6
Currently, hci_conn_del calls hci_conn_unlink for BR/EDR, (e)SCO, and CIS connections, i.e., everything except LE connections. However, if (e)SCO connections are unlinked when BR/EDR disconnects, CIS connections should also be unlinked when LE disconnects. In terms of disconnection behavior, CIS and (e)SCO connections are not too different. One peculiarity of CIS is that when CIS connections are disconnected, the CIS handle isn't deleted, as per [BLUETOOTH CORE SPECIFICATION Version 5.4 | Vol 4, Part E] 7.1.6 Disconnect command: All SCO, eSCO, and CIS connections on a physical link should be disconnected before the ACL connection on the same physical connection is disconnected. If it does not, they will be implicitly disconnected as part of the ACL disconnection. ... Note: As specified in Section 7.7.5, on the Central, the handle for a CIS remains valid even after disconnection and, therefore, the Host can recreate a disconnected CIS at a later point in time using the same connection handle. Since hci_conn_link invokes both hci_conn_get and hci_conn_hold, hci_conn_unlink should perform both hci_conn_put and hci_conn_drop as well. However, currently it performs only hci_conn_put. This patch makes hci_conn_unlink call hci_conn_drop as well, which simplifies the logic in hci_conn_del a bit and may benefit future users of hci_conn_unlink. But it is noted that this change additionally implies that hci_conn_unlink can queue disc_work on conn itself, with the following call stack: hci_conn_unlink(conn) [conn->parent == NULL] -> hci_conn_unlink(child) [child->parent == conn] -> hci_conn_drop(child->parent) -> queue_delayed_work(&conn->disc_work) Queued disc_work after hci_conn_del can be spurious, so during the process of hci_conn_del, it is necessary to make the call to cancel_delayed_work(&conn->disc_work) after invoking hci_conn_unlink. Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-05-20Bluetooth: Fix UAF in hci_conn_hash_flush againRuihan Li2-12/+23
Commit 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon") reintroduced a previously fixed bug [1] ("KASAN: slab-use-after-free Read in hci_conn_hash_flush"). This bug was originally fixed by commit 5dc7d23e167e ("Bluetooth: hci_conn: Fix possible UAF"). The hci_conn_unlink function was added to avoid invalidating the link traversal caused by successive hci_conn_del operations releasing extra connections. However, currently hci_conn_unlink itself also releases extra connections, resulted in the reintroduced bug. This patch follows a more robust solution for cleaning up all connections, by repeatedly removing the first connection until there are none left. This approach does not rely on the inner workings of hci_conn_del and ensures proper cleanup of all connections. Meanwhile, we need to make sure that hci_conn_del never fails. Indeed it doesn't, as it now always returns zero. To make this a bit clearer, this patch also changes its return type to void. Reported-by: syzbot+8bb72f86fc823817bc5d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-bluetooth/000000000000aa920505f60d25ad@google.com/ Fixes: 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon") Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Co-developed-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-05-20Bluetooth: Refcnt drop must be placed last in hci_conn_unlinkRuihan Li1-3/+3
If hci_conn_put(conn->parent) reduces conn->parent's reference count to zero, it can immediately deallocate conn->parent. At the same time, conn->link->list has its head in conn->parent, causing use-after-free problems in the latter list_del_rcu(&conn->link->list). This problem can be easily solved by reordering the two operations, i.e., first performing the list removal with list_del_rcu and then decreasing the refcnt with hci_conn_put. Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Closes: https://lore.kernel.org/linux-bluetooth/CABBYNZ+1kce8_RJrLNOXd_8=Mdpb=2bx4Nto-hFORk=qiOkoCg@mail.gmail.com/ Fixes: 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon") Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-05-20Bluetooth: Fix potential double free caused by hci_conn_unlinkRuihan Li1-9/+12
The hci_conn_unlink function is being called by hci_conn_del, which means it should not call hci_conn_del with the input parameter conn again. If it does, conn may have already been released when hci_conn_unlink returns, leading to potential UAF and double-free issues. This patch resolves the problem by modifying hci_conn_unlink to release only conn's child links when necessary, but never release conn itself. Reported-by: syzbot+690b90b14f14f43f4688@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-bluetooth/000000000000484a8205faafe216@google.com/ Fixes: 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon") Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reported-by: syzbot+690b90b14f14f43f4688@syzkaller.appspotmail.com Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Reported-by: syzbot+8bb72f86fc823817bc5d@syzkaller.appspotmail.com
2023-05-20NFSv4.2: Fix a potential double free with READ_PLUSAnna Schumaker1-2/+10
kfree()-ing the scratch page isn't enough, we also need to set the pointer back to NULL to avoid a double-free in the case of a resend. Fixes: fbd2a05f29a9 (NFSv4.2: Rework scratch handling for READ_PLUS) Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-19SUNRPC: Don't change task->tk_status after the call to rpc_exit_taskTrond Myklebust1-3/+2
Some calls to rpc_exit_task() may deliberately change the value of task->tk_status, for instance because it gets checked by the RPC call's rpc_release() callback. That makes it wrong to reset the value to task->tk_rpc_status. In particular this causes a bug where the rpc_call_done() callback tries to fail over a set of pNFS/flexfiles writes to a different IP address, but the reset of task->tk_status causes nfs_commit_release_pages() to immediately mark the file as having a fatal error. Fixes: 39494194f93b ("SUNRPC: Fix races with rpc_killall_tasks()") Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-19NFS: Convert kmap_atomic() to kmap_local_folio()Fabio M. De Francesco1-2/+2
kmap_atomic() is deprecated in favor of kmap_local_{folio,page}(). Therefore, replace kmap_atomic() with kmap_local_folio() in nfs_readdir_folio_array_append(). kmap_atomic() disables page-faults and preemption (the latter only for !PREEMPT_RT kernels), However, the code within the mapping/un-mapping in nfs_readdir_folio_array_append() does not depend on the above-mentioned side effects. Therefore, a mere replacement of the old API with the new one is all that is required (i.e., there is no need to explicitly add any calls to pagefault_disable() and/or preempt_disable()). Tested with (x)fstests in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Fixes: ec108d3cc766 ("NFS: Convert readdir page array functions to use a folio") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-19Merge tag 'ceph-for-6.4-rc3' of https://github.com/ceph/ceph-clientLinus Torvalds2-1/+15
Pull ceph fixes from Ilya Dryomov: "A workaround for a just discovered bug in MClientSnap encoding which goes back to 2017 (marked for stable) and a fixup to quieten a static checker" * tag 'ceph-for-6.4-rc3' of https://github.com/ceph/ceph-client: ceph: force updating the msg pointer in non-split case ceph: silence smatch warning in reconnect_caps_cb()
2023-05-19Merge tag 'pm-6.4-rc3' of ↵Linus Torvalds4-26/+32
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two issues in the cpupower utility and get rid of a spurious warning message printed to the kernel log by the ACPI cpufreq driver after recent changes. Specifics: - Get rid of a warning message printed by the ACPI cpufreq driver after recent changes in it when anohter CPU performance scaling driver is registered already when it starts (Petr Pavlu) - Make cpupower read TSC on each CPU right before reading MPERF so as to reduce the potential time difference between the TSC and MPERF accesses and improve the C0 percentage calculation (Wyes Karny) - Fix a possible file handle leak and clean up the code in the sysfs_get_enabled() function in cpupower (Hao Zeng)" * tag 'pm-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: ACPI: Prevent a warning when another frequency driver is loaded cpupower: Make TSC read per CPU for Mperf monitor cpupower:Fix resource leaks in sysfs_get_enabled()
2023-05-19Merge tag 'acpi-6.4-rc3' of ↵Linus Torvalds1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Add an ACPI IRQ override quirk for LG UltraPC 17U70P so as to make the internal keyboard work on that machine (Rubén Gómez)" * tag 'acpi-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: resource: Add IRQ override quirk for LG UltraPC 17U70P
2023-05-19Merge tag 'docs-6.4-fixes' of git://git.lwn.net/linuxLinus Torvalds9-32/+40
Pull documentation fixes from Jonathan Corbet: "Four straightforward documentation fixes" * tag 'docs-6.4-fixes' of git://git.lwn.net/linux: Documentation/filesystems: ramfs-rootfs-initramfs: use :Author: Documentation/filesystems: sharedsubtree: add section headings docs: quickly-build-trimmed-linux: various small fixes and improvements Documentation: use capitalization for chapters and acronyms
2023-05-19Merge tag 's390-6.4-2' of ↵Linus Torvalds15-55/+34
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - Add check whether the required facilities are installed before using the s390-specific ChaCha20 implementation - Key blobs for s390 protected key interface IOCTLs commands PKEY_VERIFYKEY2 and PKEY_VERIFYKEY3 may contain clear key material. Zeroize copies of these keys in kernel memory after creating protected keys - Set CONFIG_INIT_STACK_NONE=y in defconfigs to avoid extra overhead of initializing all stack variables by default - Make sure that when a new channel-path is enabled all subchannels are evaluated: with and without any devices connected on it - When SMT thread CPUs are added to CPU topology masks the nr_cpu_ids limit is not checked and could be exceeded. Respect the nr_cpu_ids limit and avoid a warning when CONFIG_DEBUG_PER_CPU_MAPS is set - The pointer to IPL Parameter Information Block is stored in the absolute lowcore as a virtual address. Save it as the physical address for later use by dump tools - Fix a Queued Direct I/O (QDIO) problem on z/VM guests using QIOASSIST with dedicated (pass through) QDIO-based devices such as FCP, real OSA or HiperSockets - s390's struct statfs and struct statfs64 contain padding, which field-by-field copying does not set. Initialize the respective structures with zeros before filling them and copying to userspace - Grow s390 compat_statfs64, statfs and statfs64 structures f_spare array member to cover padding and simplify things - Remove obsolete SCHED_BOOK and SCHED_DRAWER configs - Remove unneeded S390_CCW_IOMMU and S390_AP_IOM configs * tag 's390-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/iommu: get rid of S390_CCW_IOMMU and S390_AP_IOMMU s390/Kconfig: remove obsolete configs SCHED_{BOOK,DRAWER} s390/uapi: cover statfs padding by growing f_spare statfs: enforce statfs[64] structure initialization s390/qdio: fix do_sqbs() inline assembly constraint s390/ipl: fix IPIB virtual vs physical address confusion s390/topology: honour nr_cpu_ids when adding CPUs s390/cio: include subchannels without devices also for evaluation s390/defconfigs: set CONFIG_INIT_STACK_NONE=y s390/pkey: zeroize key blobs s390/crypto: use vector instructions only if available for ChaCha20
2023-05-19Merge tag 'arm64-fixes' of ↵Linus Torvalds6-16/+14
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "A mixture of compiler/static checker resolutions and a couple of MTE fixes: - Avoid erroneously marking untagged pages with PG_mte_tagged - Always reset KASAN tags for destination page in copy_page() - Mark PMU header functions 'static inline' - Fix some sparse warnings due to missing casts" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Do not set PG_mte_tagged if tags were not initialized arm64: Also reset KASAN tag if page is not PG_mte_tagged arm64: perf: Mark all accessor functions inline ARM: perf: Mark all accessor functions inline arm64: vdso: Pass (void *) to virt_to_page() arm64/mm: mark private VM_FAULT_X defines as vm_fault_t
2023-05-19KVM: Fix vcpu_array[0] racesMichal Luczaj1-6/+10
In kvm_vm_ioctl_create_vcpu(), add vcpu to vcpu_array iff it's safe to access vcpu via kvm_get_vcpu() and kvm_for_each_vcpu(), i.e. when there's no failure path requiring vcpu removal and destruction. Such order is important because vcpu_array accessors may end up referencing vcpu at vcpu_array[0] even before online_vcpus is set to 1. When online_vcpus=0, any call to kvm_get_vcpu() goes through array_index_nospec() and ends with an attempt to xa_load(vcpu_array, 0): int num_vcpus = atomic_read(&kvm->online_vcpus); i = array_index_nospec(i, num_vcpus); return xa_load(&kvm->vcpu_array, i); Similarly, when online_vcpus=0, a kvm_for_each_vcpu() does not iterate over an "empty" range, but actually [0, ULONG_MAX]: xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, \ (atomic_read(&kvm->online_vcpus) - 1)) In both cases, such online_vcpus=0 edge case, even if leading to unnecessary calls to XArray API, should not be an issue; requesting unpopulated indexes/ranges is handled by xa_load() and xa_for_each_range(). However, this means that when the first vCPU is created and inserted in vcpu_array *and* before online_vcpus is incremented, code calling kvm_get_vcpu()/kvm_for_each_vcpu() already has access to that first vCPU. This should not pose a problem assuming that once a vcpu is stored in vcpu_array, it will remain there, but that's not the case: kvm_vm_ioctl_create_vcpu() first inserts to vcpu_array, then requests a file descriptor. If create_vcpu_fd() fails, newly inserted vcpu is removed from the vcpu_array, then destroyed: vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus); r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT); kvm_get_kvm(kvm); r = create_vcpu_fd(vcpu); if (r < 0) { xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx); kvm_put_kvm_no_destroy(kvm); goto unlock_vcpu_destroy; } atomic_inc(&kvm->online_vcpus); This results in a possible race condition when a reference to a vcpu is acquired (via kvm_get_vcpu() or kvm_for_each_vcpu()) moments before said vcpu is destroyed. Signed-off-by: Michal Luczaj <mhal@rbox.co> Message-Id: <20230510140410.1093987-2-mhal@rbox.co> Cc: stable@vger.kernel.org Fixes: c5b077549136 ("KVM: Convert the kvm->vcpus array to a xarray", 2021-12-08) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-19KVM: VMX: Fix header file dependency of asm/vmx.hJacob Xu1-0/+2
Include a definition of WARN_ON_ONCE() before using it. Fixes: bb1fcc70d98f ("KVM: nVMX: Allow L1 to use 5-level page walks for nested EPT") Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jacob Xu <jacobhxu@google.com> [reworded commit message; changed <asm/bug.h> to <linux/bug.h>] Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220225012959.1554168-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-19KVM: Don't enable hardware after a restart/shutdown is initiatedSean Christopherson1-1/+16
Reject hardware enabling, i.e. VM creation, if a restart/shutdown has been initiated to avoid re-enabling hardware between kvm_reboot() and machine_{halt,power_off,restart}(). The restart case is especially problematic (for x86) as enabling VMX (or clearing GIF in KVM_RUN on SVM) blocks INIT, which results in the restart/reboot hanging as BIOS is unable to wake and rendezvous with APs. Note, this bug, and the original issue that motivated the addition of kvm_reboot(), is effectively limited to a forced reboot, e.g. `reboot -f`. In a "normal" reboot, userspace will gracefully teardown userspace before triggering the kernel reboot (modulo bugs, errors, etc), i.e. any process that might do ioctl(KVM_CREATE_VM) is long gone. Fixes: 8e1c18157d87 ("KVM: VMX: Disable VMX when system shutdown") Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20230512233127.804012-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-19KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdownSean Christopherson1-15/+11
Use syscore_ops.shutdown to disable hardware virtualization during a reboot instead of using the dedicated reboot_notifier so that KVM disables virtualization _after_ system_state has been updated. This will allow fixing a race in KVM's handling of a forced reboot where KVM can end up enabling hardware virtualization between kernel_restart_prepare() and machine_restart(). Rename KVM's hook to match the syscore op to avoid any possible confusion from wiring up a "reboot" helper to a "shutdown" hook (neither "shutdown nor "reboot" is completely accurate as the hook handles both). Opportunistically rewrite kvm_shutdown()'s comment to make it less VMX specific, and to explain why kvm_rebooting exists. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: kvmarm@lists.linux.dev Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: Anup Patel <anup@brainfault.org> Cc: Atish Patra <atishp@atishpatra.org> Cc: kvm-riscv@lists.infradead.org Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20230512233127.804012-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-19Merge tag 'sound-6.4-rc3' of ↵Linus Torvalds33-96/+330
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes that have been gathered since rc1: - Lots of small ASoC SOF Intel fixes - A couple of UAF and NULL-dereference fixes - Quirks and updates for HD-audio, USB-audio and ASoC AMD - A few minor build / sparse warning fixes - MAINTAINERS and DT updates" * tag 'sound-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (38 commits) ALSA: hda: Add NVIDIA codec IDs a3 through a7 to patch table ALSA: oss: avoid missing-prototype warnings ALSA: cs46xx: mark snd_cs46xx_download_image as static ALSA: hda: Fix Oops by 9.1 surround channel names ASoC: SOF: topology: Fix tuples array allocation ASoC: SOF: Separate the tokens for input and output pin index MAINTAINERS: Remove self from Cirrus Codec drivers ASoC: cs35l56: Prevent unbalanced pm_runtime in dsp_work() on SoundWire ASoC: SOF: topology: Fix logic for copying tuples ASoC: SOF: pm: save io region state in case of errors in resume ASoC: MAINTAINERS: drop Krzysztof Kozlowski from Samsung audio ASoC: mediatek: mt8186: Fix use-after-free in driver remove path ASoC: SOF: ipc3-topology: Make sure that only one cmd is sent in dai_config ASoC: SOF: sof-client-probes: fix pm_runtime imbalance in error handling ASoC: SOF: pcm: fix pm_runtime imbalance in error handling ASoC: SOF: debug: conditionally bump runtime_pm counter on exceptions ASoC: SOF: Intel: hda-mlink: add helper to program SoundWire PCMSyCM registers ASoC: SOF: Intel: hda-mlink: initialize instance_offset member ASoC: SOF: Intel: hda-mlink: use 'ml_addr' parameter consistently ASoC: SOF: Intel: hda-mlink: fix base_ptr computation ...
2023-05-19net/mlx5e: E-Switch, Initialize E-Switch for eswitch managerRoi Dayan1-2/+2
Initialize eswitch instance for a function which is eswitch manager but not a vport group manager. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: devlink, Only show PF related devlink warning when neededRoi Dayan1-2/+1
Limit the PF related warning to show if device is actually a PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: E-Switch, Use metadata matching for RoCE loopback ruleRoi Dayan3-34/+40
Use metadata matching for RoCE loopback rule if device is configured to use metadata for source port matching. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: E-Switch, Use RoCE version 2 for loopback trafficRoi Dayan1-2/+2
Could be port initializing eswitch doesn't support RoCE version 1 but all ports should support RoCE version 2. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Add a check that log_max_l2_table is validRoi Dayan2-2/+6
If log_max_l2_table is 0 there is no really room for one L2 address. and should be treated as not supported. Do the check in MPFS init and for vport context events which both used to update L2 address. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch: move debug print of adding mac to correct placeRoi Dayan1-3/+4
Move the debug print inside the if clause that actually does the change. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Check device is PF when stopping esw offloadsRoi Dayan1-1/+1
Checking sriov is done on the pci device so it can return true on other devices like SF but nothing should be done in this case. Add a check that the device is PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: Remove redundant vport_group_manager cap checkRoi Dayan1-4/+1
It's enough to check for esw_manager cap for get the esw flow table caps. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rulesRoi Dayan1-19/+43
Like other rules use metadata matching if supported instead of source_port. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Allow get vport api if esw existsRoi Dayan1-1/+1
We could have an esw manager device which is not a vport group manager. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Update when to set other vport contextRoi Dayan3-3/+6
Other vport context should be set if vport number is not 0. In case of ECPF, vport 0 represents the host PF representor so also need to set other vport context. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: Remove redundant __func__ arg from fs_err() callsRoi Dayan1-7/+5
fs_err() already logs the function name. remote the arg so the function name will not be logged twice. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5e: E-Switch, Remove flow_source check for metadata matchingRoi Dayan1-3/+0
There is no reason to check for flow_source cap to allow metadata matching. When flow_source match is being used the flow_source cap is being checked. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: E-Switch, Remove redundant checkRoi Dayan1-4/+0
The call to mlx5_eswitch_enable() also does the same check and if E-Switch not supported it returns 0 without any change. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: Remove redundant esw multiport validate functionRoi Dayan1-22/+1
The function didn't validate the value and doesn't require value validation as it will always be valid true or false values. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19Merge branch 'bpf: Show target_{obj,btf}_id for tracing link'Alexei Starovoitov2-2/+15
Yafang Shao says: ==================== The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. For some other link types like perf_event and kprobe_multi, it is not easy to find which functions are attached either. We may support ->fill_link_info for them in the future. v1->v2: - Skip showing them in the plain output for the old kernels. (Quentin) - Coding improvement. (Andrii) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-19bpftool: Show target_{obj,btf}_id in tracing link infoYafang Shao1-0/+6
The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. The result as follows, $ tools/bpf/bpftool/bpftool link show 2: tracing prog 13 prog_type tracing attach_type trace_fentry target_obj_id 1 target_btf_id 13964 pids trace(10673) $ tools/bpf/bpftool/bpftool link show -j [{"id":2,"type":"tracing","prog_id":13,"prog_type":"tracing","attach_type":"trace_fentry","target_obj_id":1,"target_btf_id":13964,"pids":[{"pid":10673,"comm":"trace"}]}] Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230517103126.68372-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-19bpf: Show target_{obj,btf}_id in tracing link fdinfoYafang Shao1-2/+9
The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. The result as follows, $ cat /proc/10673/fdinfo/10 pos: 0 flags: 02000000 mnt_id: 15 ino: 2094 link_type: tracing link_id: 2 prog_tag: a04f5eef06a7f555 prog_id: 13 attach_type: 24 target_obj_id: 1 target_btf_id: 13964 Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230517103126.68372-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-19bpf: Fix mask generation for 32-bit narrow loads of 64-bit fieldsWill Deacon1-1/+1
A narrow load from a 64-bit context field results in a 64-bit load followed potentially by a 64-bit right-shift and then a bitwise AND operation to extract the relevant data. In the case of a 32-bit access, an immediate mask of 0xffffffff is used to construct a 64-bit BPP_AND operation which then sign-extends the mask value and effectively acts as a glorified no-op. For example: 0: 61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0) results in the following code generation for a 64-bit field: ldr x7, [x7] // 64-bit load mov x10, #0xffffffffffffffff and x7, x7, x10 Fix the mask generation so that narrow loads always perform a 32-bit AND operation: ldr x7, [x7] // 64-bit load mov w10, #0xffffffff and w7, w7, w10 Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Krzesimir Nowak <krzesimir@kinvolk.io> Cc: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230518102528.1341-1-will@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-19ice: use src VSI instead of src MAC in slow-pathMichal Swiatkowski9-102/+40
The use of a source MAC to direct packets from the VF to the corresponding port representor is only ok if there is only one MAC on a VF. To support this functionality when the number of MACs on a VF is greater, it is necessary to match a source VSI instead of a source MAC. Let's use the new switch API that allows matching on metadata. If MAC isn't used in match criteria there is no need to handle adding rule after virtchnl command. Instead add new rule while port representor is being configured. Remove rule_added field, checking for sp_rule can be used instead. Remove also checking for switchdev running in deleting rule as it can be called from unroll context when running flag isn't set. Checking for sp_rule covers both context (with and without running flag). Rules are added in eswitch configuration flow, so there is no need to have replay function. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19ice: allow matching on meta dataMichal Swiatkowski5-105/+95
Add meta data matching criteria in the same place as protocol matching criteria. There is no need to add meta data as special words after parsing all lookups. Trade meta data in the same why as other lookups. The one difference between meta data lookups and protocol lookups is that meta data doesn't impact how the packets looks like. Because of that ignore it when filling testing packet. Match on tunnel type meta data always if tunnel type is different than TNL_LAST. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19ice: specify field names in ice_prot_ext initMichal Swiatkowski1-23/+28
Anonymous initializers are now discouraged. Define ICE_PROTCOL_ENTRY macro to rewrite anonymous initializers to named one. No functional changes here. Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19ice: remove redundant Rx field from rule infoMichal Swiatkowski4-22/+14
Information about the direction is currently stored in sw_act.flag. There is no need to duplicate it in another field. Setting direction flag doesn't mean that there is a match criteria for direction in rule. It is only a information for HW from where switch id should be collected (VSI or port). In current implementation of advance rule handling, without matching for direction meta data, we can always set one the same flag and everything will work the same. Ability to match on direction meta data will be added in follow up patches. Recipe 0, 3 and 9 loaded from package has direction match criteria, but they are handled in other function. Move ice_adv_rule_info fields to avoid holes. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19ice: define meta data to match in switchMichal Swiatkowski3-16/+183
Add description for each meta data. Redefine tunnel mask to match only tunneled MAC and tunneled VLAN. It shouldn't try to match other flags (previously it was 0xff, it is redundant). VLAN mask was 0xd000, change it to 0xf000. 4 last bits are flags depending on the same field in packets (VLAN tag). Because of that, It isn't harmful to match also on ITAG. Group all MDID and MDID offsets into enums to keep things organized. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-05-19perf bench syscall: Fix __NR_execve undeclared build errorTiezhu Yang1-0/+3
The __NR_execve definition for i386 was deleted by mistake in the commit ece7f7c0507c ("perf bench syscall: Add fork syscall benchmark"), add it to fix the build error on i386. Fixes: ece7f7c0507cc147 ("perf bench syscall: Add fork syscall benchmark") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: loongson-kernel@lists.loongnix.cn Closes: https://lore.kernel.org/all/CA+G9fYvgBR1iB0CorM8OC4AM_w_tFzyQKHc+rF6qPzJL=TbfDQ@mail.gmail.com/ Link: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-19Merge branch 'pm-tools'Rafael J. Wysocki2-24/+30
Merge cpupower utility fixes for 6.4-rc3: - Read TSC on each CPU right before reading MPERF so as to reduce the potential time difference between the TSC and MPERF accesses and improve the C0 percentage calculation (Wyes Karny). - Fix a possible file handle leak and clean up the code in sysfs_get_enabled() (Hao Zeng). * pm-tools: cpupower: Make TSC read per CPU for Mperf monitor cpupower:Fix resource leaks in sysfs_get_enabled()
2023-05-19Merge tag 'linux-cpupower-6.4-rc3' of ↵Rafael J. Wysocki2-24/+30
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull cpupower utility fixes for 6.4-rc3 from Shuah Khan: "This cpupower fixes update for Linux 67.4-rc3 consists of: - a resource leak fix - fix drift in C0 percentage calculation due to System-wide TSC read. To lower this drift read TSC per CPU and also just after mperf read. This technique improves C0 percentage calculation in Mperf monitor" * tag 'linux-cpupower-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: cpupower: Make TSC read per CPU for Mperf monitor cpupower:Fix resource leaks in sysfs_get_enabled()
2023-05-19fbdev: omapfb: panel-tpo-td043mtea1: fix error code in probe()Dan Carpenter1-1/+2
This was using the wrong variable, "r", instead of "ddata->vcc_reg", so it returned success instead of a negative error code. Fixes: 0d3dbeb8142a ("video: fbdev: omapfb: panel-tpo-td043mtea1: Make use of the helper function dev_err_probe()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-19perf test attr: Fix python SafeConfigParser() deprecation warningIan Rogers1-3/+3
Address the warning: ``` tests/attr.py:155: DeprecationWarning: The SafeConfigParser class has been renamed to ConfigParser in Python 3.2. This alias will be removed in Python 3.12. Use ConfigParser directly instead. parser = configparser.SafeConfigParser() ``` by removing the word 'Safe'. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230517225707.2682235-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>