summaryrefslogtreecommitdiff
path: root/arch/powerpc/perf
AgeCommit message (Collapse)AuthorFilesLines
2023-02-22powerpc/imc-pmu: Revert nest_init_lock to being a mutexMichael Ellerman1-7/+7
commit ad53db4acb415976761d7302f5b02e97f2bd097e upstream. The recent commit 76d588dddc45 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") fixed warnings (and possible deadlocks) in the IMC PMU driver by converting the locking to use spinlocks. It also converted the init-time nest_init_lock to a spinlock, even though it's not used at runtime in IRQ disabled sections or while holding other spinlocks. This leads to warnings such as: BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 1, expected: 0 CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc2-14719-gf12cd06109f4-dirty #1 Hardware name: Mambo,Simulated-System POWER9 0x4e1203 opal:v6.6.6 PowerNV Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) __might_resched+0x178/0x1a0 __cpuhp_setup_state+0x64/0x1e0 init_imc_pmu+0xe48/0x1250 opal_imc_counters_probe+0x30c/0x6a0 platform_probe+0x78/0x110 really_probe+0x104/0x420 __driver_probe_device+0xb0/0x170 driver_probe_device+0x58/0x180 __driver_attach+0xd8/0x250 bus_for_each_dev+0xb4/0x140 driver_attach+0x34/0x50 bus_add_driver+0x1e8/0x2d0 driver_register+0xb4/0x1c0 __platform_driver_register+0x38/0x50 opal_imc_driver_init+0x2c/0x40 do_one_initcall+0x80/0x360 kernel_init_freeable+0x310/0x3b8 kernel_init+0x30/0x1a0 ret_from_kernel_thread+0x5c/0x64 Fix it by converting nest_init_lock back to a mutex, so that we can call sleeping functions while holding it. There is no interaction between nest_init_lock and the runtime spinlocks used by the actual PMU routines. Fixes: 76d588dddc45 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") Tested-by: Kajol Jain<kjain@linux.ibm.com> Reviewed-by: Kajol Jain<kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230130014401.540543-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18powerpc/imc-pmu: Fix use of mutex in IRQs disabled sectionKajol Jain1-70/+66
commit 76d588dddc459fefa1da96e0a081a397c5c8e216 upstream. Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event. Command to trigger the warning: # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5 Performance counter stats for 'sleep 5': 0 thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ 5.002117947 seconds time elapsed 0.000131000 seconds user 0.001063000 seconds sys Below is snippet of the warning in dmesg: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec preempt_count: 2, expected: 0 4 locks held by perf-exec/2869: #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90 #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0 #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510 #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510 irq event stamp: 4806 hardirqs last enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0 hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510 softirqs last enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61 Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV Call Trace: dump_stack_lvl+0x98/0xe0 (unreliable) __might_resched+0x2f8/0x310 __mutex_lock+0x6c/0x13f0 thread_imc_event_add+0xf4/0x1b0 event_sched_in+0xe0/0x210 merge_sched_in+0x1f0/0x600 visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0 ctx_flexible_sched_in+0xcc/0x140 ctx_sched_in+0x20c/0x2a0 ctx_resched+0x104/0x1c0 perf_event_exec+0x340/0x510 begin_new_exec+0x730/0xef0 load_elf_binary+0x3f8/0x1e10 ... do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0 WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0 CPU: 36 PID: 2869 Comm: sleep Tainted: G W 6.2.0-rc2-00011-g1247637727f2 #61 Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV NIP: c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670 REGS: c00000004d2134e0 TRAP: 0700 Tainted: G W (6.2.0-rc2-00011-g1247637727f2) MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 48002824 XER: 00000000 CFAR: c00000000013fb64 IRQMASK: 1 The above warning triggered because the current imc-pmu code uses mutex lock in interrupt disabled sections. The function mutex_lock() internally calls __might_resched(), which will check if IRQs are disabled and in case IRQs are disabled, it will trigger the warning. Fix the issue by changing the mutex lock to spinlock. Fixes: 8f95faaac56c ("powerpc/powernv: Detect and create IMC device") Reported-by: Michael Petlan <mpetlan@redhat.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Fix comments, trim oops in change log, add reported-by tags] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18powerpc/hv-gpci: Fix hv_gpci event listKajol Jain4-1/+57
[ Upstream commit 03f7c1d2a49acd30e38789cd809d3300721e9b0e ] Based on getPerfCountInfo v1.018 documentation, some of the hv_gpci events were deprecated for platform firmware that supports counter_info_version 0x8 or above. Fix the hv_gpci event list by adding a new attribute group called "hv_gpci_event_attrs_v6" and a "ENABLE_EVENTS_COUNTERINFO_V6" macro to enable these events for platform firmware that supports counter_info_version 0x6 or below. And assigning the hv_gpci event list based on output counter info version of underlying plaform. Fixes: 97bf2640184f ("powerpc/perf/hv-gpci: add the remaining gpci requests") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221130174513.87501-1-kjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18powerpc/perf: callchain validate kernel stack pointer boundsNicholas Piggin1-0/+1
[ Upstream commit 32c5209214bd8d4f8c4e9d9b630ef4c671f58e79 ] The interrupt frame detection and loads from the hypothetical pt_regs are not bounds-checked. The next-frame validation only bounds-checks STACK_FRAME_OVERHEAD, which does not include the pt_regs. Add another test for this. The user could set r1 to be equal to the address matching the first interrupt frame - STACK_INT_FRAME_SIZE, which is in the previous page due to the kernel redzone, and induce the kernel to load the marker from there. Possibly this could cause a crash at least. If the user could induce the previous page to contain a valid marker, then it might be able to direct perf to read specific memory addresses in a way that could be transmitted back to the user in the perf data. Fixes: 20002ded4d93 ("perf_counter: powerpc: Add callchain support") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-4-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14powerpc/perf: Fix the threshold compare group constraint for power9Kajol Jain1-1/+2
[ Upstream commit ab0cc6bbf0c812731c703ec757fcc3fc3a457a34 ] Thresh compare bits for a event is used to program thresh compare field in Monitor Mode Control Register A (MMCRA: 9-18 bits for power9). When scheduling events as a group, all events in that group should match value in threshold bits (like thresh compare, thresh control, thresh select). Otherwise event open for the sibling events should fail. But in the current code, incase thresh compare bits are not valid, we are not failing in group_constraint function which can result in invalid group schduling. Fix the issue by returning -1 incase event is threshold and threshold compare value is not valid. Thresh control bits in the event code is used to program thresh_ctl field in Monitor Mode Control Register A (MMCRA: 48-55). In below example, the scheduling of group events PM_MRK_INST_CMPL (873534401e0) and PM_THRESH_MET (8734340101ec) is expected to fail as both event request different thresh control bits and invalid thresh compare value. Result before the patch changes: [command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1 Performance counter stats for 'sleep 1': 11,048 r8735340401e0 1,967 r8734340101ec 1.001354036 seconds time elapsed 0.001421000 seconds user 0.000000000 seconds sys Result after the patch changes: [command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1 Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (r8735340401e0). /bin/dmesg | grep -i perf may provide additional information. Fixes: 78a16d9fc1206 ("powerpc/perf: Avoid FAB_*_MATCH checks for power9") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506061015.43916-2-kjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27powerpc/perf: Fix power9 event alternativesAthira Rajeev1-4/+4
[ Upstream commit 0dcad700bb2776e3886fe0a645a4bf13b1e747cd ] When scheduling a group of events, there are constraint checks done to make sure all events can go in a group. Example, one of the criteria is that events in a group cannot use the same PMC. But platform specific PMU supports alternative event for some of the event codes. During perf_event_open(), if any event group doesn't match constraint check criteria, further lookup is done to find alternative event. By current design, the array of alternatives events in PMU code is expected to be sorted by column 0. This is because in find_alternative() the return criteria is based on event code comparison. ie. "event < ev_alt[i][0])". This optimisation is there since find_alternative() can be called multiple times. In power9 PMU code, the alternative event array is not sorted properly and hence there is breakage in finding alternative events. To work with existing logic, fix the alternative event array to be sorted by column 0 for power9-pmu.c Results: With alternative events, multiplexing can be avoided. That is, for example, in power9 PM_LD_MISS_L1 (0x3e054) has alternative event, PM_LD_MISS_L1_ALT (0x400f0). This is an identical event which can be programmed in a different PMC. Before: # perf stat -e r3e054,r300fc Performance counter stats for 'system wide': 1057860 r3e054 (50.21%) 379 r300fc (49.79%) 0.944329741 seconds time elapsed Since both the events are using PMC3 in this case, they are multiplexed here. After: # perf stat -e r3e054,r300fc Performance counter stats for 'system wide': 1006948 r3e054 182 r300fc Fixes: 91e0bd1e6251 ("powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220419114828.89843-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-15powerpc/perf: Don't use perf_hw_context for trace IMC PMUAthira Rajeev1-1/+5
[ Upstream commit 0198322379c25215b2778482bf1221743a76e2b5 ] Trace IMC (In-Memory collection counters) in powerpc is useful for application level profiling. For trace_imc, presently task context (task_ctx_nr) is set to perf_hw_context. But perf_hw_context should only be used for CPU PMU. See commit 26657848502b ("perf/core: Verify we have a single perf_hw_context PMU"). So for trace_imc, even though it is per thread PMU, it is preferred to use sw_context in order to be able to do application level monitoring. Hence change the task_ctx_nr to use perf_sw_context. Fixes: 012ae244845f ("powerpc/perf: Trace imc PMU functions") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Update subject & incorporate notes into change log, reflow comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202041837.65968-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-22powerpc/perf/hv-gpci: Fix counter value parsingKajol Jain1-1/+1
commit f9addd85fbfacf0d155e83dbee8696d6df5ed0c7 upstream. H_GetPerformanceCounterInfo (0xF080) hcall returns the counter data in the result buffer. Result buffer has specific format defined in the PAPR specification. One of the fields is counter offset and width of the counter data returned. Counter data are returned in a unsigned char array in big endian byte order. To get the final counter data, the values must be left shifted byte at a time. But commit 220a0c609ad17 ("powerpc/perf: Add support for the hv gpci (get performance counter info) interface") made the shifting bitwise and also assumed little endian order. Because of that, hcall counters values are reported incorrectly. In particular this can lead to counters go backwards which messes up the counter prev vs now calculation and leads to huge counter value reporting: #: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ -C 0 -I 1000 time counts unit events 1.000078854 18,446,744,073,709,535,232 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 2.000213293 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 3.000320107 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 4.000428392 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 5.000537864 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 6.000649087 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 7.000760312 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 8.000865218 16,448 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 9.000978985 18,446,744,073,709,535,232 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 10.001088891 16,384 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 11.001201435 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 12.001307937 18,446,744,073,709,535,232 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ Fix the shifting logic to correct match the format, ie. read bytes in big endian order. Fixes: e4f226b1580b ("powerpc/perf/hv-gpci: Increase request buffer size") Cc: stable@vger.kernel.org # v4.6+ Reported-by: Nageswara R Sastry<rnsastry@linux.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Nageswara R Sastry<rnsastry@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210813082158.429023-1-kjain@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14powerpc/perf: Fix PMU constraint check for EBB eventsAthira Rajeev1-2/+2
[ Upstream commit 10f8f96179ecc7f69c927f6d231f6d02736cea83 ] The power PMU group constraints includes check for EBB events to make sure all events in a group must agree on EBB. This will prevent scheduling EBB and non-EBB events together. But in the existing check, settings for constraint mask and value is interchanged. Patch fixes the same. Before the patch, PMU selftest "cpu_event_pinned_vs_ebb_test" fails with below in dmesg logs. This happens because EBB event gets enabled along with a non-EBB cpu event. [35600.453346] cpu_event_pinne[41326]: illegal instruction (4) at 10004a18 nip 10004a18 lr 100049f8 code 1 in cpu_event_pinned_vs_ebb_test[10000000+10000] Test results after the patch: $ ./pmu/ebb/cpu_event_pinned_vs_ebb_test test: cpu_event_pinned_vs_ebb tags: git_version:v5.12-rc5-93-gf28c3125acd3-dirty Binding to cpu 8 EBB Handler is at 0x100050c8 read error on event 0x7fffe6bd4040! PM_RUN_INST_CMPL: result 9872 running/enabled 37930432 success: cpu_event_pinned_vs_ebb This bug was hidden by other logic until commit 1908dc911792 (perf: Tweak perf_event_attr::exclusive semantics). Fixes: 4df489991182 ("powerpc/perf: Add power8 EBB support") Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Mention commit 1908dc911792] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1617725761-1464-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17powerpc/perf: Record counter overflow always if SAMPLE_IP is unsetAthira Rajeev1-4/+15
[ Upstream commit d137845c973147a22622cc76c7b0bc16f6206323 ] While sampling for marked events, currently we record the sample only if the SIAR valid bit of Sampled Instruction Event Register (SIER) is set. SIAR_VALID bit is used for fetching the instruction address from Sampled Instruction Address Register(SIAR). But there are some usecases, where the user is interested only in the PMU stats at each counter overflow and the exact IP of the overflow event is not required. Dropping SIAR invalid samples will fail to record some of the counter overflows in such cases. Example of such usecase is dumping the PMU stats (event counts) after some regular amount of instructions/events from the userspace (ex: via ptrace). Here counter overflow is indicated to userspace via signal handler, and captured by monitoring and enabling I/O signaling on the event file descriptor. In these cases, we expect to get sample/overflow indication after each specified sample_period. Perf event attribute will not have PERF_SAMPLE_IP set in the sample_type if exact IP of the overflow event is not requested. So while profiling if SAMPLE_IP is not set, just record the counter overflow irrespective of SIAR_VALID check. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Reflow comment and if formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30powerpc/perf: Exclude kernel samples while counting events in user space.Athira Rajeev1-0/+10
commit aa8e21c053d72b6639ea5a7f1d3a1d0209534c94 upstream. Perf event attritube supports exclude_kernel flag to avoid sampling/profiling in supervisor state (kernel). Based on this event attr flag, Monitor Mode Control Register bit is set to freeze on supervisor state. But sometimes (due to hardware limitation), Sampled Instruction Address Register (SIAR) locks on to kernel address even when freeze on supervisor is set. Patch here adds a check to drop those samples. Cc: stable@vger.kernel.org Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606289215-1433-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30powerpc/perf: Fix crash with is_sier_available when pmu is not setAthira Rajeev1-0/+3
[ Upstream commit f75e7d73bdf73f07b0701a6d21c111ef5d9021dd ] On systems without any specific PMU driver support registered, running 'perf record' with —intr-regs will crash ( perf record -I <workload> ). The relevant portion from crash logs and Call Trace: Unable to handle kernel paging request for data at address 0x00000068 Faulting instruction address: 0xc00000000013eb18 Oops: Kernel access of bad area, sig: 11 [#1] CPU: 2 PID: 13435 Comm: kill Kdump: loaded Not tainted 4.18.0-193.el8.ppc64le #1 NIP: c00000000013eb18 LR: c000000000139f2c CTR: c000000000393d80 REGS: c0000004a07ab4f0 TRAP: 0300 Not tainted (4.18.0-193.el8.ppc64le) NIP [c00000000013eb18] is_sier_available+0x18/0x30 LR [c000000000139f2c] perf_reg_value+0x6c/0xb0 Call Trace: [c0000004a07ab770] [c0000004a07ab7c8] 0xc0000004a07ab7c8 (unreliable) [c0000004a07ab7a0] [c0000000003aa77c] perf_output_sample+0x60c/0xac0 [c0000004a07ab840] [c0000000003ab3f0] perf_event_output_forward+0x70/0xb0 [c0000004a07ab8c0] [c00000000039e208] __perf_event_overflow+0x88/0x1a0 [c0000004a07ab910] [c00000000039e42c] perf_swevent_hrtimer+0x10c/0x1d0 [c0000004a07abc50] [c000000000228b9c] __hrtimer_run_queues+0x17c/0x480 [c0000004a07abcf0] [c00000000022aaf4] hrtimer_interrupt+0x144/0x520 [c0000004a07abdd0] [c00000000002a864] timer_interrupt+0x104/0x2f0 [c0000004a07abe30] [c0000000000091c4] decrementer_common+0x114/0x120 When perf record session is started with "-I" option, capturing registers on each sample calls is_sier_available() to check for the SIER (Sample Instruction Event Register) availability in the platform. This function in core-book3s accesses 'ppmu->flags'. If a platform specific PMU driver is not registered, ppmu is set to NULL and accessing its members results in a crash. Fix the crash by returning false in is_sier_available() if ppmu is not set. Fixes: 333804dc3b7a ("powerpc/perf: Update perf_regs structure to include SIER") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606185640-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/perf/hv-gpci: Fix starting index valueKajol Jain1-3/+3
[ Upstream commit 0f9866f7e85765bbda86666df56c92f377c3bc10 ] Commit 9e9f60108423f ("powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated") adds a framework for defining gpci counters. In this patch, they adds starting_index value as '0xffffffffffffffff'. which is wrong as starting_index is of size 32 bits. Because of this, incase we try to run hv-gpci event we get error. In power9 machine: command#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ -C 0 -I 1000 event syntax error: '..bie_count_and_time_tlbie_instructions_issued/' \___ value too big for format, maximum is 4294967295 This patch fix this issue and changes starting_index value to '0xffffffff' After this patch: command#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ -C 0 -I 1000 1.000085786 1,024 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 2.000287818 1,024 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ 2.439113909 17,408 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ Fixes: 9e9f60108423 ("powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201003074943.338618-1-kjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29powerpc/perf: Exclude pmc5/6 from the irrelevant PMU group constraintsAthira Rajeev1-0/+10
[ Upstream commit 3b6c3adbb2fa42749c3d38cfc4d4d0b7e096bb7b ] PMU counter support functions enforces event constraints for group of events to check if all events in a group can be monitored. Incase of event codes using PMC5 and PMC6 ( 500fa and 600f4 respectively ), not all constraints are applicable, say the threshold or sample bits. But current code includes pmc5 and pmc6 in some group constraints (like IC_DC Qualifier bits) which is actually not applicable and hence results in those events not getting counted when scheduled along with group of other events. Patch fixes this by excluding PMC5/6 from constraints which are not relevant for it. Fixes: 7ffd948 ("powerpc/perf: factor out power8 pmu functions") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1600672204-1610-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01powerpc/perf: Implement a global lock to avoid races between trace, core and ↵Anju T Sudhakar1-24/+149
thread imc events. [ Upstream commit a36e8ba60b991d563677227f172db69e030797e6 ] IMC(In-memory Collection Counters) does performance monitoring in two different modes, i.e accumulation mode(core-imc and thread-imc events), and trace mode(trace-imc events). A cpu thread can either be in accumulation-mode or trace-mode at a time and this is done via the LDBAR register in POWER architecture. The current design does not address the races between thread-imc and trace-imc events. Patch implements a global id and lock to avoid the races between core, trace and thread imc events. With this global id-lock implementation, the system can either run core, thread or trace imc events at a time. i.e. to run any core-imc events, thread/trace imc events should not be enabled/monitored. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200313055238.8656-1-anju@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03powerpc/perf: Fix crashes with generic_compat_pmu & BHRBAlexey Kardashevskiy1-5/+14
commit b460b512417ae9c8b51a3bdcc09020cd6c60ff69 upstream. The bhrb_filter_map ("The Branch History Rolling Buffer") callback is only defined in raw CPUs' power_pmu structs. The "architected" CPUs use generic_compat_pmu, which does not have this callback, and crashes occur if a user tries to enable branch stack for an event. This add a NULL pointer check for bhrb_filter_map() which behaves as if the callback returned an error. This does not add the same check for config_bhrb() as the only caller checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0. Fixes: be80e758d0c2 ("powerpc/perf: Add generic compat mode pmu driver") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200602025612.62707-1-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03powerpc/perf: Fix soft lockups due to missed interrupt accountingAthira Rajeev1-0/+4
[ Upstream commit 17899eaf88d689529b866371344c8f269ba79b5f ] Performance monitor interrupt handler checks if any counter has overflown and calls record_and_restart() in core-book3s which invokes perf_event_overflow() to record the sample information. Apart from creating sample, perf_event_overflow() also does the interrupt and period checks via perf_event_account_interrupt(). Currently we record information only if the SIAR (Sampled Instruction Address Register) valid bit is set (using siar_valid() check) and hence the interrupt check. But it is possible that we do sampling for some events that are not generating valid SIAR, and hence there is no chance to disable the event if interrupts are more than max_samples_per_tick. This leads to soft lockup. Fix this by adding perf_event_account_interrupt() in the invalid SIAR code path for a sampling event. ie if SIAR is invalid, just do interrupt check and don't record the sample information. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596717992-7321-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple hv-24x7 ↵Kajol Jain1-10/+0
events run [ Upstream commit b4ac18eead28611ff470d0f47a35c4e0ac080d9c ] Commit 2b206ee6b0df ("powerpc/perf/hv-24x7: Display change in counter values")' added to print _change_ in the counter value rather then raw value for 24x7 counters. Incase of transactions, the event count is set to 0 at the beginning of the transaction. It also sets the event's prev_count to the raw value at the time of initialization. Because of setting event count to 0, we are seeing some weird behaviour, whenever we run multiple 24x7 events at a time. For example: command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/, hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}" -C 0 -I 1000 sleep 100 1.000121704 120 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 1.000121704 5 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 2.000357733 8 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 2.000357733 10 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 3.000495215 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 3.000495215 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 4.000641884 56 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 4.000641884 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 5.000791887 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ Getting these large values in case we do -I. As we are setting event_count to 0, for interval case, overall event_count is not coming in incremental order. As we may can get new delta lesser then previous count. Because of which when we print intervals, we are getting negative value which create these large values. This patch removes part where we set event_count to 0 in function 'h_24x7_event_read'. There won't be much impact as we do set event->hw.prev_count to the raw value at the time of initialization to print change value. With this patch In power9 platform command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/, hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}" -C 0 -I 1000 sleep 100 1.000117685 93 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 1.000117685 1 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 2.000349331 98 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 2.000349331 2 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 3.000495900 131 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 3.000495900 4 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 4.000645920 204 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ 4.000645920 61 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/ 4.284169997 22 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/ Suggested-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200525104308.9814-2-kjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-20powerpc/perf: fix imc allocation failure handlingNicholas Piggin1-11/+18
The alloc_pages_node return value should be tested for failure before being passed to page_address. Tested-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190724084638.24982-3-npiggin@gmail.com
2019-07-14Merge tag 'powerpc-5.3-1' of ↵Linus Torvalds2-3/+13
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well as some other functions only used by drivers that haven't (yet?) made it upstream. - A fix for a bug in our handling of hardware watchpoints (eg. perf record -e mem: ...) which could lead to register corruption and kernel crashes. - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc when using the Radix MMU. - A large but incremental rewrite of our exception handling code to use gas macros rather than multiple levels of nested CPP macros. And the usual small fixes, cleanups and improvements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung Bauermann, YueHaibing" * tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits) powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state. powerpc/eeh: Handle hugepages in ioremap space ocxl: Update for AFU descriptor template version 1.1 powerpc/boot: pass CONFIG options in a simpler and more robust way powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h powerpc/irq: Don't WARN continuously in arch_local_irq_restore() powerpc/module64: Use symbolic instructions names. powerpc/module32: Use symbolic instructions names. powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h powerpc/module64: Fix comment in R_PPC64_ENTRY handling powerpc/boot: Add lzo support for uImage powerpc/boot: Add lzma support for uImage powerpc/boot: don't force gzipped uImage powerpc/8xx: Add microcode patch to move SMC parameter RAM. powerpc/8xx: Use IO accessors in microcode programming. powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c powerpc/8xx: refactor programming of microcode CPM params. powerpc/8xx: refactor printing of microcode patch name. powerpc/8xx: Refactor microcode write powerpc/8xx: refactor writing of CPM microcode arrays ...
2019-07-04powerpc/perf/24x7: use rb_entryGeliang Tang1-1/+1
To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/perf: Use cpumask_last() to determine the designated cpu for ↵Anju T Sudhakar1-2/+12
nest/core units. Nest and core IMC (In-Memory Collection counters) assigns a particular cpu as the designated target for counter data collection. During system boot, the first online cpu in a chip gets assigned as the designated cpu for that chip(for nest-imc) and the first online cpu in a core gets assigned as the designated cpu for that core(for core-imc). If the designated cpu goes offline, the next online cpu from the same chip(for nest-imc)/core(for core-imc) is assigned as the next target, and the event context is migrated to the target cpu. Currently, cpumask_any_but() function is used to find the target cpu. Though this function is expected to return a `random` cpu, this always returns the next online cpu. If all cpus in a chip/core is offlined in a sequential manner, starting from the first cpu, the event migration has to happen for all the cpus which goes offline. Since the migration process involves a grace period, the total time taken to offline all the cpus will be significantly high. Example: In a system which has 2 sockets, with NUMA node0 CPU(s): 0-87 NUMA node8 CPU(s): 88-175 Time taken to offline cpu 88-175: real 2m56.099s user 0m0.191s sys 0m0.000s Use cpumask_last() to choose the target cpu, when the designated cpu goes online, so the migration will happen only when the last_cpu in the mask goes offline. This way the time taken to offline all cpus in a chip/core can be reduced. With the patch: Time taken to offline cpu 88-175: real 0m12.207s user 0m0.171s sys 0m0.000s Offlining all cpus in reverse order is also taken care because, cpumask_any_but() is used to find the designated cpu if the last cpu in the mask goes offline. Since cpumask_any_but() always return the first cpu in the mask, that becomes the designated cpu and migration will happen only when the first_cpu in the mask goes offline. Example: With the patch, Time taken to offline cpu from 175-88: real 0m9.330s user 0m0.110s sys 0m0.000s Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-02Merge tag 'powerpc-5.2-3' of ↵Linus Torvalds3-2/+10
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A minor fix to our IMC PMU code to print a less confusing error message when the driver can't initialise properly. A fix for a bug where a user requesting an unsupported branch sampling filter can corrupt PMU state, preventing the PMU from counting properly. And finally a fix for a bug in our support for kexec_file_load(), which prevented loading a kernel and initramfs. Most versions of kexec don't yet use kexec_file_load(). Thanks to: Anju T Sudhakar, Dave Young, Madhavan Srinivasan, Ravi Bangoria, Thiago Jung Bauermann" * tag 'powerpc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load() powerpc/perf: Fix MMCRA corruption by bhrb_filter powerpc/powernv: Return for invalid IMC domain
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner21-105/+21
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 138Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190524100844.067492367@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 112Thomas Gleixner2-10/+2
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 4 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091650.480557885@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22powerpc/perf: Fix MMCRA corruption by bhrb_filterRavi Bangoria3-2/+10
Consider a scenario where user creates two events: 1st event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY; fd = perf_event_open(attr, 0, 1, -1, 0); This sets cpuhw->bhrb_filter to 0 and returns valid fd. 2nd event: attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL; fd = perf_event_open(attr, 0, 1, -1, 0); It overrides cpuhw->bhrb_filter to -1 and returns with error. Now if power_pmu_enable() gets called by any path other than power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1. Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework") Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02powerpc/perf: Trace imc PMU functionsAnju T Sudhakar1-1/+204
Add PMU functions to support trace-imc. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02powerpc/perf: Trace imc events detection and cpuhotplugAnju T Sudhakar1-0/+104
Patch detects trace-imc events, does memory initilizations for each online cpu, and registers cpuhotplug call-backs. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02powerpc/perf: Add privileged access check for thread_imcMadhavan Srinivasan1-0/+3
Add code to restrict user access to thread_imc pmu since some event report privilege level information. Fixes: f74c89bd80fb3 ("powerpc/perf: Add thread IMC PMU support") Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02powerpc/perf: Rearrange setting of ldbar for thread-imcAnju T Sudhakar1-11/+17
LDBAR holds the memory address allocated for each cpu. For thread-imc the mode bit (i.e bit 1) of LDBAR is set to accumulation. Currently, ldbar is loaded with per cpu memory address and mode set to accumulation at boot time. To enable trace-imc, the mode bit of ldbar should be set to 'trace'. So to accommodate trace-mode of IMC, reposition setting of ldbar for thread-imc to thread_imc_event_add(). Also reset ldbar at thread_imc_event_del(). Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02powerpc/perf: Fix loop exit condition in nest_imc_event_initAnju T Sudhakar1-1/+1
The data structure (i.e struct imc_mem_info) to hold the memory address information for nest imc units is allocated based on the number of nodes in the system. nest_imc_event_init() traverse this struct array to calculate the memory base address for the event-cpu. If we fail to find a match for the event cpu's chip-id in imc_mem_info struct array, then the do-while loop will iterate until we crash. Fix this by changing the loop exit condition based on the number of non zero vbase elements in the array, since the allocation is done for nr_chips + 1. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support") Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02powerpc/perf: Return accordingly on invalid chip-id inAnju T Sudhakar1-0/+5
Nest hardware counter memory resides in a per-chip reserve-memory. During nest_imc_event_init(), chip-id of the event-cpu is considered to calculate the base memory addresss for that cpu. Return, proper error condition if the chip_id calculated is invalid. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support") Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02powerpc/perf: Remove PM_BR_CMPL_ALT from power9 event listMadhavan Srinivasan1-2/+0
PM_BR_CMPL_ALT event is not supported, remove it from the power9 event list. Fixes: 24bedcb7c811 ("powerpc/perf: Fix branch event code for power9") Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02powerpc/perf: Add generic compat mode pmu driverMadhavan Srinivasan4-2/+238
Most of the power processor generation performance monitoring unit (PMU) driver code is bundled in the kernel and one of those is enabled/registered based on the oprofile_cpu_type check at the boot. But things get little tricky incase of "compat" mode boot. IBM POWER System Server based processors has a compactibility mode feature, which simpily put is, Nth generation processor (lets say POWER8) will act and appear in a mode consistent with an earlier generation (N-1) processor (that is POWER7). And in this "compat" mode boot, kernel modify the "oprofile_cpu_type" to be Nth generation (POWER8). If Nth generation pmu driver is bundled (POWER8), it gets registered. Key dependency here is to have distro support for latest processor performance monitoring support. Patch here adds a generic "compat-mode" performance monitoring driver to be register in absence of powernv platform specific pmu driver. Driver supports only "cycles" and "instruction" events. "0x0001e" used as event code for "cycles" and "0x00002" used as event code for "instruction" events. New file called "generic-compat-pmu.c" is created to contain the driver specific code. And base raw event code format modeled on PPMU_ARCH_207S. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Use SPDX tag for license] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02powerpc/perf: init pmu from core-book3sMadhavan Srinivasan9-19/+46
Currenty pmu driver file for each ppc64 generation processor has a __init call in itself. Refactor the code by moving the __init call to core-books.c. This also clean's up compat mode pmu driver registration. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Use SPDX tag for license] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-07Merge tag 'powerpc-5.1-1' of ↵Linus Torvalds2-0/+28
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Enable THREAD_INFO_IN_TASK to move thread_info off the stack. - A big series from Christoph reworking our DMA code to use more of the generic infrastructure, as he said: "This series switches the powerpc port to use the generic swiotlb and noncoherent dma ops, and to use more generic code for the coherent direct mapping, as well as removing a lot of dead code." - Increase our vmalloc space to 512T with the Hash MMU on modern CPUs, allowing us to support machines with larger amounts of total RAM or distance between nodes. - Two series from Christophe, one to optimise TLB miss handlers on 6xx, and another to optimise the way STRICT_KERNEL_RWX is implemented on some 32-bit CPUs. - Support for KCOV coverage instrumentation which means we can run syzkaller and discover even more bugs in our code. And as always many clean-ups, reworks and minor fixes etc. Thanks to: Alan Modra, Alexey Kardashevskiy, Alistair Popple, Andrea Arcangeli, Andrew Donnellan, Aneesh Kumar K.V, Aravinda Prasad, Balbir Singh, Brajeswar Ghosh, Breno Leitao, Christian Lamparter, Christian Zigotzky, Christophe Leroy, Christoph Hellwig, Corentin Labbe, Daniel Axtens, David Gibson, Diana Craciun, Firoz Khan, Gustavo A. R. Silva, Igor Stoppa, Joe Lawrence, Joel Stanley, Jonathan Neuschäfer, Jordan Niethe, Laurent Dufour, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Masahiro Yamada, Mathieu Malaterre, Matteo Croce, Meelis Roos, Michael W. Bringmann, Nathan Chancellor, Nathan Fontenot, Nicholas Piggin, Nick Desaulniers, Nicolai Stange, Oliver O'Halloran, Paul Mackerras, Peter Xu, PrasannaKumar Muralidharan, Qian Cai, Rashmica Gupta, Reza Arbab, Robert P. J. Day, Russell Currey, Sabyasachi Gupta, Sam Bobroff, Sandipan Das, Sergey Senozhatsky, Souptick Joarder, Stewart Smith, Tyrel Datwyler, Vaibhav Jain, YueHaibing" * tag 'powerpc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits) powerpc/32: Clear on-stack exception marker upon exception return powerpc: Remove export of save_stack_trace_tsk_reliable() powerpc/mm: fix "section_base" set but not used powerpc/mm: Fix "sz" set but not used warning powerpc/mm: Check secondary hash page table powerpc: remove nargs from __SYSCALL powerpc/64s: Fix unrelocated interrupt trampoline address test powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables powerpc/fsl: Fix the flush of branch predictor. powerpc/powernv: Make opal log only readable by root powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to C powerpc/64s: Fix data interrupts vs d-side MCE reentrancy powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy powerpc/64s: system reset interrupt preserve HSRRs powerpc/64s: Fix HV NMI vs HV interrupt recoverability test powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback selftests/powerpc: Remove duplicate header powerpc sstep: Add support for modsd, modud instructions ...
2019-02-19Merge branch 'fixes' into nextMichael Ellerman1-0/+6
There's a few important fixes in our fixes branch, in particular the pgd/pud_present() one, so merge it now.
2019-02-04Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar1-0/+6
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-31powerpc/perf: Add mem access events to sysfsMadhavan Srinivasan2-0/+28
Add mem-loads/mem-stores events to sysfs. The event is formed based on raw event encoding. Primary PMU event used here is PM_MRK_INST_CMPL along with MMCRA[SM] modes and Thresholding bit Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-21perf/core, arch/powerpc: use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable ↵Andrew Murray3-36/+3
PMUs For PowerPC PMUs that do not support context exclusion let's advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will prevent us from handling events where any exclusion flags are set. Let's also remove the now unnecessary check for exclusion flags. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <linux@armlinux.org.uk> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: robin.murphy@arm.com Cc: suzuki.poulose@arm.com Link: https://lkml.kernel.org/r/1547128414-50693-10-git-send-email-andrew.murray@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-08powerpc/perf: Update perf_regs structure to include MMCRAMadhavan Srinivasan1-0/+6
On each sample, Monitor Mode Control Register A (MMCRA) content is saved in pt_regs. MMCRA does not have a entry as-is in the pt_regs but instead, MMCRA content is saved in the "dsisr" register of pt_regs. Patch adds another entry to the perf_regs structure to include the "MMCRA" printing which internally maps to the "dsisr" of pt_regs. It also check for the MMCRA availability in the platform and present value accordingly mpe: This was the 2nd patch in a series with commit 333804dc3b7a ("powerpc/perf: Update perf_regs structure to include SIER") but I accidentally only merged the 1st patch, so merge this one now. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21Powerpc/perf: Wire up PMI throttlingRavi Bangoria1-1/+10
Commit 14c63f17b1fde ("perf: Drop sample rate when sampling is too slow") introduced a way to throttle PMU interrupts if we're spending too much time just processing those. Wire up powerpc PMI handler to use this infrastructure. We have throttling of the *rate* of interrupts, but this adds throttling based on the *time taken* to process the interrupts. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/perf: Remove l2 bus events from HW cache event arrayMadhavan Srinivasan1-6/+2
Remove PM_L2_ST_MISS and PM_L2_ST from HW cache event array since these are bus events. And these needs to be programmed in groups. Hence remove them. Fixes: f1fb60bfde65 ('powerpc/perf: Export Power9 generic and cache events to sysfs') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/perf: Add constraints for power9 l2/l3 bus eventsMadhavan Srinivasan4-15/+40
In previous generation processors, both bus events and direct events of performance monitoring unit can be individually programmabled and monitored in PMCs. But in Power9, L2/L3 bus events are always available as a "bank" of 4 events. To obtain the counts for any of the l2/l3 bus events in a given bank, the user will have to program PMC4 with corresponding l2/l3 bus event for that bank. Patch enforce two contraints incase of L2/L3 bus events. 1)Any L2/L3 event when programmed is also expected to program corresponding PMC4 event from that group. 2)PMC4 event should always been programmed first due to group constraint logic limitation For ex. consider these L3 bus events PM_L3_PF_ON_CHIP_MEM (0x460A0), PM_L3_PF_MISS_L3 (0x160A0), PM_L3_CO_MEM (0x260A0), PM_L3_PF_ON_CHIP_CACHE (0x360A0), 1) This is an INVALID group for L3 Bus event monitoring, since it is missing PMC4 event. perf stat -e "{r160A0,r260A0,r360A0}" < > And this is a VALID group for L3 Bus events: perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < > 2) This is an INVALID group for L3 Bus event monitoring, since it is missing PMC4 event. perf stat -e "{r260A0,r360A0}" < > And this is a VALID group for L3 Bus events: perf stat -e "{r460A0,r260A0,r360A0}" < > 3) This is an INVALID group for L3 Bus event monitoring, since it is missing PMC4 event. perf stat -e "{r360A0}" < > And this is a VALID group for L3 Bus events: perf stat -e "{r460A0,r360A0}" < > Patch here implements group constraint logic suggested by Michael Ellerman. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/perf: Fix unit_sel/cache_sel checksMadhavan Srinivasan2-9/+20
Raw event code has couple of fields "unit" and "cache" in it, to capture the "unit" to monitor for a given pmcxsel and cache reload qualifier to program in MMCR1. isa207_get_constraint() refers "unit" field to update the MMCRC (L2/L3) Event bus control fields with "cache" bits of the raw event code. These are power8 specific and not supported by PowerISA v3.0 pmu. So wrap the checks to be power8 specific. Also, "cache" bit field is referred to update MMCR1[16:17] and this check can be power8 specific. Fixes: 7ffd948fae4cd ('powerpc/perf: factor out power8 pmu functions') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/perf: Cleanup cache_sel bits commentMadhavan Srinivasan1-10/+2
Update the raw event code comment in power9-pmu.c with respect to "cache" bits, since power9 MMCRC does not support these. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/perf: Update perf_regs structure to include SIERMadhavan Srinivasan2-0/+15
On each sample, Sample Instruction Event Register (SIER) content is saved in pt_regs. SIER does not have a entry as-is in the pt_regs but instead, SIER content is saved in the "dar" register of pt_regs. Patch adds another entry to the perf_regs structure to include the "SIER" printing which internally maps to the "dar" of pt_regs. It also check for the SIER availability in the platform and present value accordingly Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/perf: Fix thresholding counter data for unknown typeMadhavan Srinivasan1-1/+6
MMCRA[34:36] and MMCRA[38:44] expose the thresholding counter value. Thresholding counter can be used to count latency cycles such as load miss to reload. But threshold counter value is not relevant when the sampled instruction type is unknown or reserved. Patch to fix the thresholding counter value to zero when sampled instruction type is unknown or reserved. Fixes: 170a315f41c6('powerpc/perf: Support to export MMCRA[TEC*] field to userspace') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-25powerpc/perf: Declare static identifier a suchBreno Leitao1-3/+3
There are three symbols (two variables and a function) that are being used solely in the same file (imc-pmu.c), thus, these symbols should be static, but they are not. This was detected by sparse: arch/powerpc/perf/imc-pmu.c:31:20: warning: symbol 'nest_imc_refc' was not declared. Should it be static? arch/powerpc/perf/imc-pmu.c:37:20: warning: symbol 'core_imc_refc' was not declared. Should it be static? arch/powerpc/perf/imc-pmu.c:46:16: warning: symbol 'imc_event_to_pmu' was not declared. Should it be static? This patch simply adds the 'static' storage-class definition to these symbols, thus, restricting their usage only in the imc-pmu.c file. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>