powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC

Running perf fuzzer showed below in dmesg logs: "Can't find PMC that caused IRQ" This means a PMU exception happened, but none of the PMC's (Performance Monitor Counter) were found to be overflown. There are some corner cases that clears the PMCs after PMI gets masked. In such cases, the perf interrupt handler will not find the active PMC values that had caused the overflow and thus leads to this message while replaying. Case 1: PMU Interrupt happens during replay of other interrupts and counter values gets cleared by PMU callbacks before replay: During replay of interrupts like timer, __do_irq() and doorbell exception, we conditionally enable interrupts via may_hard_irq_enable(). This could potentially create a window to generate a PMI. Since irq soft mask is set to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before perf interrupt is replayed and the PMU events could be deleted or stopped. This will change the PMU SPR values and resets the counters. Snippet of ftrace log showing PMU callbacks invoked in __do_irq(): <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq <<>> <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable <<>> <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts Case 2: PMI's masked during local_* operations, example local_add(). If the local_add() operation happens within a local_irq_save(), replay of PMI will be during local_irq_restore(). Similar to case 1, this could also create a window before replay where PMU events gets deleted or stopped. Fix it by updating the PMU callback function power_pmu_disable() to check for pending perf interrupt. If there is an overflown PMC and pending perf interrupt indicated in paca, clear the PMI bit in paca to drop that sample. Clearing of PMI bit is done in power_pmu_disable() since disable is invoked before any event gets deleted/stopped. With this fix, if there are more than one event running in the PMU, there is a chance that we clear the PMI bit for the event which is not getting deleted/stopped. The other events may still remain active. Hence to make sure we don't drop valid sample in such cases, another check is added in power_pmu_enable. This checks if there is an overflown PMC found among the active events and if so enable back the PMI bit. Two new helper functions are introduced to clear/set the PMI, ie clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function pmi_irq_pending() is introduced to give a warning if there is pending PMI bit in paca, but no PMC is overflown. Also there are corner cases which result in performance monitor interrupts being triggered during power_pmu_disable(). This happens since PMXE bit is not cleared along with disabling of other MMCR0 bits in the pmu_disable. Such PMI's could leave the PMU running and could trigger PMI again which will set MMCR0 PMAO bit. This could lead to spurious interrupts in some corner cases. Example, a timer after power_pmu_del() which will re-enable interrupts and triggers a PMI again since PMAO bit is still set. But fails to find valid overflow since PMC was cleared in power_pmu_del(). Fix that by disabling PMXE along with disabling of other MMCR0 bits in power_pmu_disable(). We can't just replay PMI any time. Hence this approach is preferred rather than replaying PMI before resetting overflown PMC. Patch also documents core-book3s on a race condition which can trigger these PMC messages during idle path in PowerNV. Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them") Reported-by: Nageswara R Sastry <nasastry@in.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make pmi_irq_pending() return bool, reflow/reword some comments] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
author: Athira Rajeev <atrajeev@linux.vnet.ibm.com> 2021-07-21 08:48:29 +0300
committer: Michael Ellerman <mpe@ellerman.id.au> 2021-11-30 09:15:49 +0300
commit: 2c9ac51b850d84ee496b0a5d832ce66d411ae552 (patch)
tree: 881b389df2141155d70732c46b57f9a770f087ff /arch/powerpc/perf
parent: f05cab0034babaa9b3dfaf6003ee6493496a8180 (diff)
download: linux-2c9ac51b850d84ee496b0a5d832ce66d411ae552.tar.xz
1 files changed, 57 insertions, 1 deletions
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 8d4ff93462fb..1f1ded29a06e 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -857,6 +857,19 @@ static void write_pmc(int idx, unsigned long val)
 	}
 }
 
+static int any_pmc_overflown(struct cpu_hw_events *cpuhw)
+{
+	int i, idx;
+
+	for (i = 0; i < cpuhw->n_events; i++) {
+		idx = cpuhw->event[i]->hw.idx;
+		if ((idx) && ((int)read_pmc(idx) < 0))
+			return idx;
+	}
+
+	return 0;
+}
+
 /* Called from sysrq_handle_showregs() */
 void perf_event_print_debug(void)
 {
@@ -1281,11 +1294,13 @@ static void power_pmu_disable(struct pmu *pmu)
 
 		/*
 		 * Set the 'freeze counters' bit, clear EBE/BHRBA/PMCC/PMAO/FC56
+		 * Also clear PMXE to disable PMI's getting triggered in some
+		 * corner cases during PMU disable.
 		 */
 		val  = mmcr0 = mfspr(SPRN_MMCR0);
 		val |= MMCR0_FC;
 		val &= ~(MMCR0_EBE | MMCR0_BHRBA | MMCR0_PMCC | MMCR0_PMAO |
-			 MMCR0_FC56);
+			 MMCR0_PMXE | MMCR0_FC56);
 		/* Set mmcr0 PMCCEXT for p10 */
 		if (ppmu->flags & PPMU_ARCH_31)
 			val |= MMCR0_PMCCEXT;
@@ -1299,6 +1314,23 @@ static void power_pmu_disable(struct pmu *pmu)
 		mb();
 		isync();
 
+		/*
+		 * Some corner cases could clear the PMU counter overflow
+		 * while a masked PMI is pending. One such case is when
+		 * a PMI happens during interrupt replay and perf counter
+		 * values are cleared by PMU callbacks before replay.
+		 *
+		 * If any PMC corresponding to the active PMU events are
+		 * overflown, disable the interrupt by clearing the paca
+		 * bit for PMI since we are disabling the PMU now.
+		 * Otherwise provide a warning if there is PMI pending, but
+		 * no counter is found overflown.
+		 */
+		if (any_pmc_overflown(cpuhw))
+			clear_pmi_irq_pending();
+		else
+			WARN_ON(pmi_irq_pending());
+
 		val = mmcra = cpuhw->mmcr.mmcra;
 
 		/*
@@ -1390,6 +1422,15 @@ static void power_pmu_enable(struct pmu *pmu)
 	 * (possibly updated for removal of events).
 	 */
 	if (!cpuhw->n_added) {
+		/*
+		 * If there is any active event with an overflown PMC
+		 * value, set back PACA_IRQ_PMI which would have been
+		 * cleared in power_pmu_disable().
+		 */
+		hard_irq_disable();
+		if (any_pmc_overflown(cpuhw))
+			set_pmi_irq_pending();
+
 		mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
 		mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
 		if (ppmu->flags & PPMU_ARCH_31)
@@ -2337,6 +2378,14 @@ static void __perf_event_interrupt(struct pt_regs *regs)
 				break;
 			}
 		}
+
+		/*
+		 * Clear PACA_IRQ_PMI in case it was set by
+		 * set_pmi_irq_pending() when PMU was enabled
+		 * after accounting for interrupts.
+		 */
+		clear_pmi_irq_pending();
+
 		if (!active)
 			/* reset non active counters that have overflowed */
 			write_pmc(i + 1, 0);
@@ -2356,6 +2405,13 @@ static void __perf_event_interrupt(struct pt_regs *regs)
 			}
 		}
 	}
+
+	/*
+	 * During system wide profling or while specific CPU is monitored for an
+	 * event, some corner cases could cause PMC to overflow in idle path. This
+	 * will trigger a PMI after waking up from idle. Since counter values are _not_
+	 * saved/restored in idle path, can lead to below "Can't find PMC" message.
+	 */
 	if (unlikely(!found) && !arch_irq_disabled_regs(regs))
 		printk_ratelimited(KERN_WARNING "Can't find PMC that caused IRQ\n");
author	Athira Rajeev <atrajeev@linux.vnet.ibm.com>	2021-07-21 08:48:29 +0300
committer	Michael Ellerman <mpe@ellerman.id.au>	2021-11-30 09:15:49 +0300
commit	2c9ac51b850d84ee496b0a5d832ce66d411ae552 (patch)
tree	881b389df2141155d70732c46b57f9a770f087ff /arch/powerpc/perf
parent	f05cab0034babaa9b3dfaf6003ee6493496a8180 (diff)
download	linux-2c9ac51b850d84ee496b0a5d832ce66d411ae552.tar.xz