summaryrefslogtreecommitdiff
path: root/net/unix/af_unix.c
diff options
context:
space:
mode:
authorYan Zhao <yan.y.zhao@intel.com>2025-04-01 18:47:27 +0300
committerPaolo Bonzini <pbonzini@redhat.com>2025-04-04 14:11:59 +0300
commitc0b8dcabb2cddc98c265548632c39e97422f61b6 (patch)
treee9e9c870dd28a9d7a95001868e8fab6ffa6f5d07 /net/unix/af_unix.c
parent6bad6ecc63b75af294ff3f56f54d6b857c8964a5 (diff)
downloadlinux-c0b8dcabb2cddc98c265548632c39e97422f61b6.tar.xz
KVM: VMX: Use separate subclasses for PI wakeup lock to squash false positive
Use a separate subclass when acquiring KVM's per-CPU posted interrupts wakeup lock in the scheduled out path, i.e. when adding a vCPU on the list of vCPUs to wake, to workaround a false positive deadlock. Chain exists of: &p->pi_lock --> &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); lock(&rq->__lock); lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu)); lock(&p->pi_lock); *** DEADLOCK *** In the wakeup handler, the callchain is *always*: sysvec_kvm_posted_intr_wakeup_ipi() | --> pi_wakeup_handler() | --> kvm_vcpu_wake_up() | --> try_to_wake_up(), and the lock order is: &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) --> &p->pi_lock. For the schedule out path, the callchain is always (for all intents and purposes; if the kernel is preemptible, kvm_sched_out() can be called from something other than schedule(), but the beginning of the callchain will be the same point in vcpu_block()): vcpu_block() | --> schedule() | --> kvm_sched_out() | --> vmx_vcpu_put() | --> vmx_vcpu_pi_put() | --> pi_enable_wakeup_handler() and the lock order is: &rq->__lock --> &per_cpu(wakeup_vcpus_on_cpu_lock, cpu) I.e. lockdep sees AB+BC ordering for schedule out, and CA ordering for wakeup, and complains about the A=>C versus C=>A inversion. In practice, deadlock can't occur between schedule out and the wakeup handler as they are mutually exclusive. The entirely of the schedule out code that runs with the problematic scheduler locks held, does so with IRQs disabled, i.e. can't run concurrently with the wakeup handler. Use a subclass instead disabling lockdep entirely, and tell lockdep that both subclasses are being acquired when loading a vCPU, as the sched_out and sched_in paths are NOT mutually exclusive, e.g. CPU 0 CPU 1 --------------- --------------- vCPU0 sched_out vCPU1 sched_in vCPU1 sched_out vCPU 0 sched_in where vCPU0's sched_in may race with vCPU1's sched_out, on CPU 0's wakeup list+lock. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-ID: <20250401154727.835231-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Diffstat (limited to 'net/unix/af_unix.c')
0 files changed, 0 insertions, 0 deletions