Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 4ff3ba4db5943cac1045e3e4a3c0463ea10f6930 ]
Valid domain value is in range 1 to HV_PERF_DOMAIN_MAX. Current code has
check for domain value greater than or equal to HV_PERF_DOMAIN_MAX. But
the check for domain value 0 is missing.
Fix this issue by adding check for domain value 0.
Before:
# ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1
Using CPUID 00800200
Control descriptor is not initialized
Error:
The sys_perf_event_open() syscall returned with 5 (Input/output error) for
event (hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/).
/bin/dmesg | grep -i perf may provide additional information.
Result from dmesg:
[ 37.819387] hv-24x7: hcall failed: [0 0x60040000 0x100 0] => ret
0xfffffffffffffffc (-4) detail=0x2000000 failing ix=0
After:
# ./perf stat -v -e hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ sleep 1
Using CPUID 00800200
Control descriptor is not initialized
Warning:
hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/ event is not supported by the kernel.
failed to read counter hv_24x7/CPM_ADJUNCT_INST,domain=0,core=1/
Fixes: ebd4a5a3ebd9 ("powerpc/perf/hv-24x7: Minor improvements")
Reported-by: Krishan Gopal Sarawast <krishang@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230825055601.360083-1-kjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 03f7c1d2a49acd30e38789cd809d3300721e9b0e ]
Based on getPerfCountInfo v1.018 documentation, some of the
hv_gpci events were deprecated for platform firmware that
supports counter_info_version 0x8 or above.
Fix the hv_gpci event list by adding a new attribute group
called "hv_gpci_event_attrs_v6" and a "ENABLE_EVENTS_COUNTERINFO_V6"
macro to enable these events for platform firmware
that supports counter_info_version 0x6 or below. And assigning
the hv_gpci event list based on output counter info version
of underlying plaform.
Fixes: 97bf2640184f ("powerpc/perf/hv-gpci: add the remaining gpci requests")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221130174513.87501-1-kjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 32c5209214bd8d4f8c4e9d9b630ef4c671f58e79 ]
The interrupt frame detection and loads from the hypothetical pt_regs
are not bounds-checked. The next-frame validation only bounds-checks
STACK_FRAME_OVERHEAD, which does not include the pt_regs. Add another
test for this.
The user could set r1 to be equal to the address matching the first
interrupt frame - STACK_INT_FRAME_SIZE, which is in the previous page
due to the kernel redzone, and induce the kernel to load the marker from
there. Possibly this could cause a crash at least. If the user could
induce the previous page to contain a valid marker, then it might be
able to direct perf to read specific memory addresses in a way that
could be transmitted back to the user in the perf data.
Fixes: 20002ded4d93 ("perf_counter: powerpc: Add callchain support")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-4-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ab0cc6bbf0c812731c703ec757fcc3fc3a457a34 ]
Thresh compare bits for a event is used to program thresh compare
field in Monitor Mode Control Register A (MMCRA: 9-18 bits for power9).
When scheduling events as a group, all events in that group should
match value in threshold bits (like thresh compare, thresh control,
thresh select). Otherwise event open for the sibling events should fail.
But in the current code, incase thresh compare bits are not valid,
we are not failing in group_constraint function which can result
in invalid group schduling.
Fix the issue by returning -1 incase event is threshold and threshold
compare value is not valid.
Thresh control bits in the event code is used to program thresh_ctl
field in Monitor Mode Control Register A (MMCRA: 48-55). In below example,
the scheduling of group events PM_MRK_INST_CMPL (873534401e0) and
PM_THRESH_MET (8734340101ec) is expected to fail as both event
request different thresh control bits and invalid thresh compare value.
Result before the patch changes:
[command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1
Performance counter stats for 'sleep 1':
11,048 r8735340401e0
1,967 r8734340101ec
1.001354036 seconds time elapsed
0.001421000 seconds user
0.000000000 seconds sys
Result after the patch changes:
[command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (r8735340401e0).
/bin/dmesg | grep -i perf may provide additional information.
Fixes: 78a16d9fc1206 ("powerpc/perf: Avoid FAB_*_MATCH checks for power9")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506061015.43916-2-kjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0dcad700bb2776e3886fe0a645a4bf13b1e747cd ]
When scheduling a group of events, there are constraint checks done to
make sure all events can go in a group. Example, one of the criteria is
that events in a group cannot use the same PMC. But platform specific
PMU supports alternative event for some of the event codes. During
perf_event_open(), if any event group doesn't match constraint check
criteria, further lookup is done to find alternative event.
By current design, the array of alternatives events in PMU code is
expected to be sorted by column 0. This is because in
find_alternative() the return criteria is based on event code
comparison. ie. "event < ev_alt[i][0])". This optimisation is there
since find_alternative() can be called multiple times. In power9 PMU
code, the alternative event array is not sorted properly and hence there
is breakage in finding alternative events.
To work with existing logic, fix the alternative event array to be
sorted by column 0 for power9-pmu.c
Results:
With alternative events, multiplexing can be avoided. That is, for
example, in power9 PM_LD_MISS_L1 (0x3e054) has alternative event,
PM_LD_MISS_L1_ALT (0x400f0). This is an identical event which can be
programmed in a different PMC.
Before:
# perf stat -e r3e054,r300fc
Performance counter stats for 'system wide':
1057860 r3e054 (50.21%)
379 r300fc (49.79%)
0.944329741 seconds time elapsed
Since both the events are using PMC3 in this case, they are
multiplexed here.
After:
# perf stat -e r3e054,r300fc
Performance counter stats for 'system wide':
1006948 r3e054
182 r300fc
Fixes: 91e0bd1e6251 ("powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220419114828.89843-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f9addd85fbfacf0d155e83dbee8696d6df5ed0c7 upstream.
H_GetPerformanceCounterInfo (0xF080) hcall returns the counter data in
the result buffer. Result buffer has specific format defined in the PAPR
specification. One of the fields is counter offset and width of the
counter data returned.
Counter data are returned in a unsigned char array in big endian byte
order. To get the final counter data, the values must be left shifted
byte at a time. But commit 220a0c609ad17 ("powerpc/perf: Add support for
the hv gpci (get performance counter info) interface") made the shifting
bitwise and also assumed little endian order. Because of that, hcall
counters values are reported incorrectly.
In particular this can lead to counters go backwards which messes up the
counter prev vs now calculation and leads to huge counter value
reporting:
#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
-C 0 -I 1000
time counts unit events
1.000078854 18,446,744,073,709,535,232 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
2.000213293 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
3.000320107 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
4.000428392 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
5.000537864 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
6.000649087 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
7.000760312 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
8.000865218 16,448 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
9.000978985 18,446,744,073,709,535,232 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
10.001088891 16,384 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
11.001201435 0 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
12.001307937 18,446,744,073,709,535,232 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
Fix the shifting logic to correct match the format, ie. read bytes in
big endian order.
Fixes: e4f226b1580b ("powerpc/perf/hv-gpci: Increase request buffer size")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry<rnsastry@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210813082158.429023-1-kjain@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 10f8f96179ecc7f69c927f6d231f6d02736cea83 ]
The power PMU group constraints includes check for EBB events to make
sure all events in a group must agree on EBB. This will prevent
scheduling EBB and non-EBB events together. But in the existing check,
settings for constraint mask and value is interchanged. Patch fixes the
same.
Before the patch, PMU selftest "cpu_event_pinned_vs_ebb_test" fails with
below in dmesg logs. This happens because EBB event gets enabled along
with a non-EBB cpu event.
[35600.453346] cpu_event_pinne[41326]: illegal instruction (4)
at 10004a18 nip 10004a18 lr 100049f8 code 1 in
cpu_event_pinned_vs_ebb_test[10000000+10000]
Test results after the patch:
$ ./pmu/ebb/cpu_event_pinned_vs_ebb_test
test: cpu_event_pinned_vs_ebb
tags: git_version:v5.12-rc5-93-gf28c3125acd3-dirty
Binding to cpu 8
EBB Handler is at 0x100050c8
read error on event 0x7fffe6bd4040!
PM_RUN_INST_CMPL: result 9872 running/enabled 37930432
success: cpu_event_pinned_vs_ebb
This bug was hidden by other logic until commit 1908dc911792 (perf:
Tweak perf_event_attr::exclusive semantics).
Fixes: 4df489991182 ("powerpc/perf: Add power8 EBB support")
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Mention commit 1908dc911792]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1617725761-1464-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d137845c973147a22622cc76c7b0bc16f6206323 ]
While sampling for marked events, currently we record the sample only
if the SIAR valid bit of Sampled Instruction Event Register (SIER) is
set. SIAR_VALID bit is used for fetching the instruction address from
Sampled Instruction Address Register(SIAR). But there are some
usecases, where the user is interested only in the PMU stats at each
counter overflow and the exact IP of the overflow event is not
required. Dropping SIAR invalid samples will fail to record some of
the counter overflows in such cases.
Example of such usecase is dumping the PMU stats (event counts) after
some regular amount of instructions/events from the userspace (ex: via
ptrace). Here counter overflow is indicated to userspace via signal
handler, and captured by monitoring and enabling I/O signaling on the
event file descriptor. In these cases, we expect to get
sample/overflow indication after each specified sample_period.
Perf event attribute will not have PERF_SAMPLE_IP set in the
sample_type if exact IP of the overflow event is not requested. So
while profiling if SAMPLE_IP is not set, just record the counter
overflow irrespective of SIAR_VALID check.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Reflow comment and if formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit aa8e21c053d72b6639ea5a7f1d3a1d0209534c94 upstream.
Perf event attritube supports exclude_kernel flag to avoid
sampling/profiling in supervisor state (kernel). Based on this event
attr flag, Monitor Mode Control Register bit is set to freeze on
supervisor state. But sometimes (due to hardware limitation), Sampled
Instruction Address Register (SIAR) locks on to kernel address even
when freeze on supervisor is set. Patch here adds a check to drop
those samples.
Cc: stable@vger.kernel.org
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1606289215-1433-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0f9866f7e85765bbda86666df56c92f377c3bc10 ]
Commit 9e9f60108423f ("powerpc/perf/{hv-gpci, hv-common}: generate
requests with counters annotated") adds a framework for defining
gpci counters.
In this patch, they adds starting_index value as '0xffffffffffffffff'.
which is wrong as starting_index is of size 32 bits.
Because of this, incase we try to run hv-gpci event we get error.
In power9 machine:
command#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
-C 0 -I 1000
event syntax error: '..bie_count_and_time_tlbie_instructions_issued/'
\___ value too big for format, maximum is 4294967295
This patch fix this issue and changes starting_index value to '0xffffffff'
After this patch:
command#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ -C 0 -I 1000
1.000085786 1,024 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
2.000287818 1,024 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
2.439113909 17,408 hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
Fixes: 9e9f60108423 ("powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201003074943.338618-1-kjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3b6c3adbb2fa42749c3d38cfc4d4d0b7e096bb7b ]
PMU counter support functions enforces event constraints for group of
events to check if all events in a group can be monitored. Incase of
event codes using PMC5 and PMC6 ( 500fa and 600f4 respectively ), not
all constraints are applicable, say the threshold or sample bits. But
current code includes pmc5 and pmc6 in some group constraints (like
IC_DC Qualifier bits) which is actually not applicable and hence
results in those events not getting counted when scheduled along with
group of other events. Patch fixes this by excluding PMC5/6 from
constraints which are not relevant for it.
Fixes: 7ffd948 ("powerpc/perf: factor out power8 pmu functions")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1600672204-1610-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 17899eaf88d689529b866371344c8f269ba79b5f ]
Performance monitor interrupt handler checks if any counter has
overflown and calls record_and_restart() in core-book3s which invokes
perf_event_overflow() to record the sample information. Apart from
creating sample, perf_event_overflow() also does the interrupt and
period checks via perf_event_account_interrupt().
Currently we record information only if the SIAR (Sampled Instruction
Address Register) valid bit is set (using siar_valid() check) and
hence the interrupt check.
But it is possible that we do sampling for some events that are not
generating valid SIAR, and hence there is no chance to disable the
event if interrupts are more than max_samples_per_tick. This leads to
soft lockup.
Fix this by adding perf_event_account_interrupt() in the invalid SIAR
code path for a sampling event. ie if SIAR is invalid, just do
interrupt check and don't record the sample information.
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1596717992-7321-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
events run
[ Upstream commit b4ac18eead28611ff470d0f47a35c4e0ac080d9c ]
Commit 2b206ee6b0df ("powerpc/perf/hv-24x7: Display change in counter
values")' added to print _change_ in the counter value rather then raw
value for 24x7 counters. Incase of transactions, the event count
is set to 0 at the beginning of the transaction. It also sets
the event's prev_count to the raw value at the time of initialization.
Because of setting event count to 0, we are seeing some weird behaviour,
whenever we run multiple 24x7 events at a time.
For example:
command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/,
hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}"
-C 0 -I 1000 sleep 100
1.000121704 120 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
1.000121704 5 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
2.000357733 8 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
2.000357733 10 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
3.000495215 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
3.000495215 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
4.000641884 56 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
4.000641884 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
5.000791887 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
Getting these large values in case we do -I.
As we are setting event_count to 0, for interval case, overall event_count is not
coming in incremental order. As we may can get new delta lesser then previous count.
Because of which when we print intervals, we are getting negative value which create
these large values.
This patch removes part where we set event_count to 0 in function
'h_24x7_event_read'. There won't be much impact as we do set event->hw.prev_count
to the raw value at the time of initialization to print change value.
With this patch
In power9 platform
command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/,
hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}"
-C 0 -I 1000 sleep 100
1.000117685 93 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
1.000117685 1 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
2.000349331 98 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
2.000349331 2 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
3.000495900 131 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
3.000495900 4 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
4.000645920 204 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
4.000645920 61 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
4.284169997 22 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
Suggested-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200525104308.9814-2-kjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2d46d4877b1afd14059393a48bdb8ce27955174c ]
Raw event code has couple of fields "unit" and "cache" in it, to capture
the "unit" to monitor for a given pmcxsel and cache reload qualifier to
program in MMCR1.
isa207_get_constraint() refers "unit" field to update the MMCRC (L2/L3)
Event bus control fields with "cache" bits of the raw event code.
These are power8 specific and not supported by PowerISA v3.0 pmu. So wrap
the checks to be power8 specific. Also, "cache" bit field is referred to
update MMCR1[16:17] and this check can be power8 specific.
Fixes: 7ffd948fae4cd ('powerpc/perf: factor out power8 pmu functions')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 110df8bd3e418b3476cae80babe8add48a8ea523 upstream.
imc_common_cpuhp_mem_free() is the common function for all
IMC (In-memory Collection counters) domains to unregister cpuhotplug
callback and free memory. Since kfree of memory allocated for
nest-imc (per_nest_pmu_arr) is in the common code, all
domains (core/nest/thread) can do the kfree in the failure case.
This could potentially create a call trace as shown below, where
core(/thread/nest) imc pmu initialization fails and in the failure
path imc_common_cpuhp_mem_free() free the memory(per_nest_pmu_arr),
which is allocated by successfully registered nest units.
The call trace is generated in a scenario where core-imc
initialization is made to fail and a cpuhotplug is performed in a p9
system. During cpuhotplug ppc_nest_imc_cpu_offline() tries to access
per_nest_pmu_arr, which is already freed by core-imc.
NIP [c000000000cb6a94] mutex_lock+0x34/0x90
LR [c000000000cb6a88] mutex_lock+0x28/0x90
Call Trace:
mutex_lock+0x28/0x90 (unreliable)
perf_pmu_migrate_context+0x90/0x3a0
ppc_nest_imc_cpu_offline+0x190/0x1f0
cpuhp_invoke_callback+0x160/0x820
cpuhp_thread_fun+0x1bc/0x270
smpboot_thread_fn+0x250/0x290
kthread+0x1a8/0x1b0
ret_from_kernel_thread+0x5c/0x74
To address this scenario do the kfree(per_nest_pmu_arr) only in case
of nest-imc initialization failure, and when there is no other nest
units registered.
Fixes: 73ce9aec65b1 ("powerpc/perf: Fix IMC_MAX_PMU macro")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 73ce9aec65b17433e18163d07eb5cb6bf114bd6c upstream.
IMC_MAX_PMU is used for static storage (per_nest_pmu_arr) which holds
nest pmu information. Current value for the macro is 32 based on
the initial number of nest pmu units supported by the nest microcode.
But going forward, microcode could support more nest units. Instead
of static storage, patch to fix the code to dynamically allocate an
array based on the number of nest imc units found in the device tree.
Fixes:8f95faaac56c1 ('powerpc/powernv: Detect and create IMC device')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3202e35ec1c8fc19cea24253ff83edf702a60a02 upstream.
Consider a scenario where user creates two events:
1st event:
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
fd = perf_event_open(attr, 0, 1, -1, 0);
This sets cpuhw->bhrb_filter to 0 and returns valid fd.
2nd event:
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
fd = perf_event_open(attr, 0, 1, -1, 0);
It overrides cpuhw->bhrb_filter to -1 and returns with error.
Now if power_pmu_enable() gets called by any path other than
power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1.
Fixes: 3925f46bb590 ("powerpc/perf: Enable branch stack sampling framework")
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a913e5e8b43be1d3897a141ce61c1ec071cad89c ]
Nest hardware counter memory resides in a per-chip reserve-memory.
During nest_imc_event_init(), chip-id of the event-cpu is considered to
calculate the base memory addresss for that cpu. Return, proper error
condition if the chip_id calculated is invalid.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 885dcd709ba91 ("powerpc/perf: Add nest IMC PMU support")
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 17cfccc91545682513541924245abb876d296063 ]
MMCRA[34:36] and MMCRA[38:44] expose the thresholding counter value.
Thresholding counter can be used to count latency cycles such as
load miss to reload. But threshold counter value is not relevant
when the sampled instruction type is unknown or reserved. Patch to
fix the thresholding counter value to zero when sampled instruction
type is unknown or reserved.
Fixes: 170a315f41c6('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit d2032678e57fc508d7878307badde8f89b632ba3 upstream.
Currently memory is allocated for core-imc based on cpu_present_mask,
which has bit 'cpu' set iff cpu is populated. We use (cpu number / threads
per core) as the array index to access the memory.
Under some circumstances firmware marks a CPU as GUARDed CPU and boot the
system, until cleared of errors, these CPU's are unavailable for all
subsequent boots. GUARDed CPUs are possible but not present from linux
view, so it blows a hole when we assume the max length of our allocation
is driven by our max present cpus, where as one of the cpus might be online
and be beyond the max present cpus, due to the hole.
So (cpu number / threads per core) value bounds the array index and leads
to memory overflow.
Call trace observed during a guard test:
Faulting instruction address: 0xc000000000149f1c
cpu 0x69: Vector: 380 (Data Access Out of Range) at [c000003fea303420]
pc:c000000000149f1c: prefetch_freepointer+0x14/0x30
lr:c00000000014e0f8: __kmalloc+0x1a8/0x1ac
sp:c000003fea3036a0
msr:9000000000009033
dar:c9c54b2c91dbf6b7
current = 0xc000003fea2c0000
paca = 0xc00000000fddd880 softe: 3 irq_happened: 0x01
pid = 1, comm = swapper/104
Linux version 4.16.7-openpower1 (smc@smc-desktop) (gcc version 6.4.0
(Buildroot 2018.02.1-00006-ga8d1126)) #2 SMP Fri May 4 16:44:54 PDT 2018
enter ? for help
call trace:
__kmalloc+0x1a8/0x1ac
(unreliable)
init_imc_pmu+0x7f4/0xbf0
opal_imc_counters_probe+0x3fc/0x43c
platform_drv_probe+0x48/0x80
driver_probe_device+0x22c/0x308
__driver_attach+0xa0/0xd8
bus_for_each_dev+0x88/0xb4
driver_attach+0x2c/0x40
bus_add_driver+0x1e8/0x228
driver_register+0xd0/0x114
__platform_driver_register+0x50/0x64
opal_imc_driver_init+0x24/0x38
do_one_initcall+0x150/0x15c
kernel_init_freeable+0x250/0x254
kernel_init+0x1c/0x150
ret_from_kernel_thread+0x5c/0xc8
Allocating memory for core-imc based on cpu_possible_mask, which has
bit 'cpu' set iff cpu is populatable, will fix this issue.
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Fixes: 39a846db1d57 ("powerpc/perf: Add core IMC PMU support")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e1ebd0e5b9d0a10ba65e63a3514b6da8c6a5a819 ]
Current code in power_pmu_disable() does not clear the sampling
registers like Sampling Instruction Address Register (SIAR) and
Sampling Data Address Register (SDAR) after disabling the PMU. Since
these are userspace readable and could contain kernel addresses, add
code to explicitly clear the content of these registers.
Also add a "context synchronizing instruction" to enforce no further
updates to these registers as suggested by Power ISA v3.0B. From
section 9.4, on page 1108:
"If an mtspr instruction is executed that changes the value of a
Performance Monitor register other than SIAR, SDAR, and SIER, the
change is not guaranteed to have taken effect until after a
subsequent context synchronizing instruction has been executed (see
Chapter 11. "Synchronization Requirements for Context Alterations"
on page 1133)."
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Massage change log and add ISA reference]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bb19af816025d495376bd76bf6fbcf4244f9a06d ]
The current Branch History Rolling Buffer (BHRB) code does not check
for any privilege levels before updating the data from BHRB. This
could leak kernel addresses to userspace even when profiling only with
userspace privileges. Add proper checks to prevent it.
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ad2b6e01024ef23bddc3ce0bcb115ecd8c520b7e ]
Oops is observed during boot:
Faulting instruction address: 0xc000000000248340
cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000000ff66fb850]
pc: c000000000248340: event_function_call+0x50/0x1f0
lr: c00000000024878c: perf_remove_from_context+0x3c/0x100
sp: c000000ff66fbad0
msr: 9000000000009033
dar: 7d20e2a6f92d03c0
pid = 14, comm = cpuhp/0
While registering the cpuhotplug callbacks for nest-imc, if we fail in
the cpuhotplug online path for any random node in a multi node
system (because the opal call to stop nest-imc counters fails for that
node), ppc_nest_imc_cpu_offline() will get invoked for other nodes who
successfully returned from cpuhotplug online path.
This call trace is generated since in the ppc_nest_imc_cpu_offline()
path we are trying to migrate the event context, when nest-imc
counters are not even initialized.
Patch to add a check to ensure that nest-imc is registered before
migrating the event context.
Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5aa04b3eb6fca63d2e9827be656dcadc26d54e11 ]
When user tries to group imc (In-Memory Collections) event with
normal event, (sometime) kernel crashes with following log:
Faulting instruction address: 0x00000000
[link register ] c00000000010ce88 power_check_constraints+0x128/0x980
...
c00000000010e238 power_pmu_event_init+0x268/0x6f0
c0000000002dc60c perf_try_init_event+0xdc/0x1a0
c0000000002dce88 perf_event_alloc+0x7b8/0xac0
c0000000002e92e0 SyS_perf_event_open+0x530/0xda0
c00000000000b004 system_call+0x38/0xe0
'event_base' field of 'struct hw_perf_event' is used as flags for
normal hw events and used as memory address for imc events. While
grouping these two types of events, collect_events() tries to
interpret imc 'event_base' as a flag, which causes a corruption
resulting in a crash.
Consider only those events which belongs to 'perf_hw_context' in
collect_events().
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-By: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f41d84dddc66b164ac16acf3f584c276146f1c48 upstream.
It's theoretically possible that branch instructions recorded in
BHRB (Branch History Rolling Buffer) entries have already been
unmapped before they are processed by the kernel. Hence, trying to
dereference such memory location will result in a crash. eg:
Unable to handle kernel paging request for data at address 0xd000000019c41764
Faulting instruction address: 0xc000000000084a14
NIP [c000000000084a14] branch_target+0x4/0x70
LR [c0000000000eb828] record_and_restart+0x568/0x5c0
Call Trace:
[c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable)
[c0000000000ec378] perf_event_interrupt+0x298/0x460
[c000000000027964] performance_monitor_exception+0x54/0x70
[c000000000009ba4] performance_monitor_common+0x114/0x120
Fix it by deferefencing the addresses safely.
Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB")
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Use probe_kernel_read() which is clearer, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 05c14c03138532a3cb2aa29c2960445c8753343b ]
In the hv-24x7 code there is a function memord() which tries to
implement a sort function return -1, 0, 1. However one of the
conditions is incorrect, such that it can never be true, because we
will have already returned.
I don't believe there is a bug in practice though, because the
comparisons are an optimisation prior to calling memcmp().
Fix it by swapping the second comparision, so it can be true.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f3f1dfd600ff82b18b7ea73d80eb27f476a6aa97 upstream.
init_imc_pmu() uses topology_physical_package_id() to detect the
node id of the processor it is on to get local memory, but that's
wrong, and can lead to crashes. Fix it to use cpu_to_node().
Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Reported-By: Rob Lippert <rlippert@google.com>
Tested-By: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 4.14.
This is bigger than I like to send at rc7, but that's at least partly
because I didn't send any fixes last week. If it wasn't for the IMC
driver, which is new and getting heavy testing, the diffstat would
look a bit better. I've also added ftrace on big endian to my test
suite, so we shouldn't break that again in future.
- A fix to the handling of misaligned paste instructions (P9 only),
where a change to a #define has caused the check for the
instruction to always fail.
- The preempt handling was unbalanced in the radix THP flush (P9
only). Though we don't generally use preempt we want to keep it
working as much as possible.
- Two fixes for IMC (P9 only), one when booting with restricted
number of CPUs and one in the error handling when initialisation
fails due to firmware etc.
- A revert to fix function_graph on big endian machines, and then a
rework of the reverted patch to fix kprobes blacklist handling on
big endian machines.
Thanks to: Anju T Sudhakar, Guilherme G. Piccoli, Madhavan Srinivasan,
Naveen N. Rao, Nicholas Piggin, Paul Mackerras"
* tag 'powerpc-4.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf: Fix core-imc hotplug callback failure during imc initialization
powerpc/kprobes: Dereference function pointers only if the address does not belong to kernel text
Revert "powerpc64/elfv1: Only dereference function descriptor for non-text symbols"
powerpc/64s/radix: Fix preempt imbalance in TLB flush
powerpc: Fix check for copy/paste instructions in alignment handler
powerpc/perf: Fix IMC allocation routine
|
|
Call trace observed during boot:
nest_capp0_imc performance monitor hardware support registered
nest_capp1_imc performance monitor hardware support registered
core_imc memory allocation for cpu 56 failed
Unable to handle kernel paging request for data at address 0xffa400010
Faulting instruction address: 0xc000000000bf3294
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c000000ff38ff8d0]
pc: c000000000bf3294: mutex_lock+0x34/0x90
lr: c000000000bf3288: mutex_lock+0x28/0x90
sp: c000000ff38ffb50
msr: 9000000002009033
dar: ffa400010
dsisr: 80000
current = 0xc000000ff383de00
paca = 0xc000000007ae0000 softe: 0 irq_happened: 0x01
pid = 13, comm = cpuhp/0
Linux version 4.11.0-39.el7a.ppc64le (mockbuild@ppc-058.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Oct 3 07:42:44 EDT 2017
0:mon> t
[c000000ff38ffb80] c0000000002ddfac perf_pmu_migrate_context+0xac/0x470
[c000000ff38ffc40] c00000000011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0
[c000000ff38ffc90] c000000000125758 cpuhp_invoke_callback+0x198/0x5d0
[c000000ff38ffd00] c00000000012782c cpuhp_thread_fun+0x8c/0x3d0
[c000000ff38ffd60] c0000000001678d0 smpboot_thread_fn+0x290/0x2a0
[c000000ff38ffdc0] c00000000015ee78 kthread+0x168/0x1b0
[c000000ff38ffe30] c00000000000b368 ret_from_kernel_thread+0x5c/0x74
While registering the cpuhoplug callbacks for core-imc, if we fails
in the cpuhotplug online path for any random core (either because opal call to
initialize the core-imc counters fails or because memory allocation fails for
that core), ppc_core_imc_cpu_offline() will get invoked for other cpus who
successfully returned from cpuhotplug online path.
But in the ppc_core_imc_cpu_offline() path we are trying to migrate the event
context, when core-imc counters are not even initialized. Thus creating the
above stack dump.
Add a check to see if core-imc counters are enabled or not in the cpuhotplug
offline path before migrating the context to handle this failing scenario.
Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When setting nr_cpus=1, we observed a crash in IMC code during boot
due to a missing allocation: basically, IMC code is taking the number
of threads into account in imc_mem_init() and if we manually set
nr_cpus for a value that is not multiple of the number of threads per
core, an integer division in that function will discard the decimal
portion, leading IMC to not allocate one mem_info struct. This causes
a NULL pointer dereference later, on is_core_imc_mem_inited().
This patch just rounds that division up, fixing the bug.
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Panic observed with latest firmware, and upstream kernel:
NIP init_imc_pmu+0x8c/0xcf0
LR init_imc_pmu+0x2f8/0xcf0
Call Trace:
init_imc_pmu+0x2c8/0xcf0 (unreliable)
opal_imc_counters_probe+0x300/0x400
platform_drv_probe+0x64/0x110
driver_probe_device+0x3d8/0x580
__driver_attach+0x14c/0x1a0
bus_for_each_dev+0x8c/0xf0
driver_attach+0x34/0x50
bus_add_driver+0x298/0x350
driver_register+0x9c/0x180
__platform_driver_register+0x5c/0x70
opal_imc_driver_init+0x2c/0x40
do_one_initcall+0x64/0x1d0
kernel_init_freeable+0x280/0x374
kernel_init+0x24/0x160
ret_from_kernel_thread+0x5c/0x74
While registering nest imc at init, cpu-hotplug callback
nest_pmu_cpumask_init() makes an OPAL call to stop the engine. And if
the OPAL call fails, imc_common_cpuhp_mem_free() is invoked to cleanup
memory and cpuhotplug setup.
But when cleaning up the attribute group, we are dereferencing the
attribute element array without checking whether the backing element
is not NULL. This causes the kernel panic.
Add a check for the backing element prior to dereferencing the
attribute element, to handle the failing case gracefully.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
[mpe: Trim change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Stack trace output during a stress test:
[ 4.310049] Freeing initrd memory: 22592K
[ 4.310646] rtas_flash: no firmware flash support
[ 4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
[ 4.313465] cpuhp/64 cpuset=/ mems_allowed=0
[ 4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
[ 4.313588] Call Trace:
[ 4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
[ 4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
[ 4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
[ 4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
[ 4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
[ 4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
[ 4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
[ 4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
[ 4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
[ 4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
[ 4.314313] Mem-Info:
[ 4.314356] active_anon:0 inactive_anon:0 isolated_anon:0
core_imc_mem_init() at system boot use alloc_pages_node() to get memory
and alloc_pages_node() throws this stack dump when tried to allocate
memory from a node which has no memory behind it. Add a ___GFP_NOWARN
flag in allocation request as a fix.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Venkat R.B <venkatb3@in.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Nest/core pmu units are enabled only when it is used. A reference count is
maintained for the events which uses the nest/core pmu units. Currently in
*_imc_counters_release function a WARN() is used for notification of any
underflow of ref count.
The case where event ref count hit a negative value is, when perf session is
started, followed by offlining of all cpus in a given core.
i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
ref->count to zero, if the current cpu which is about to offline is the last
cpu in a given core and make an OPAL call to disable the engine in that core.
And on perf session termination, perf->destroy (core_imc_counters_release) will
first decrement the ref->count for this core and based on the ref->count value
an opal call is made to disable the core-imc engine.
Now, since cpuhotplug path already clears the ref->count for core and disabled
the engine, perf->destroy() decrementing again at event termination make it
negative which in turn fires the WARN_ON. The same happens for nest units.
Add a check to see if the reference count is alreday zero, before decrementing
the count, so that the ref count will not hit a negative value.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Kernel crashes if power pmu is not registered and user tries to dump
regs with 'echo p > /proc/sysrq-trigger'. Sample log:
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc0000000000d52f0
NIP [c0000000000d52f0] perf_event_print_debug+0x10/0x230
LR [c00000000058a938] sysrq_handle_showregs+0x38/0x50
Call Trace:
printk+0x38/0x4c (unreliable)
__handle_sysrq+0xe4/0x270
write_sysrq_trigger+0x64/0x80
proc_reg_write+0x80/0xd0
__vfs_write+0x40/0x200
vfs_write+0xc8/0x240
SyS_write+0x60/0x110
system_call+0x58/0x6c
Fixes: 5f6d0380c640 ("powerpc/perf: Define perf_event_print_debug() to print PMU register values")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Nothing really major this release, despite quite a lot of activity.
Just lots of things all over the place.
Some things of note include:
- Access via perf to a new type of PMU (IMC) on Power9, which can
count both core events as well as nest unit events (Memory
controller etc).
- Optimisations to the radix MMU TLB flushing, mostly to avoid
unnecessary Page Walk Cache (PWC) flushes when the structure of the
tree is not changing.
- Reworks/cleanups of do_page_fault() to modernise it and bring it
closer to other architectures where possible.
- Rework of our page table walking so that THP updates only need to
send IPIs to CPUs where the affected mm has run, rather than all
CPUs.
- The size of our vmalloc area is increased to 56T on 64-bit hash MMU
systems. This avoids problems with the percpu allocator on systems
with very sparse NUMA layouts.
- STRICT_KERNEL_RWX support on PPC32.
- A new sched domain topology for Power9, to capture the fact that
pairs of cores may share an L2 cache.
- Power9 support for VAS, which is a new mechanism for accessing
coprocessors, and initial support for using it with the NX
compression accelerator.
- Major work on the instruction emulation support, adding support for
many new instructions, and reworking it so it can be used to
implement the emulation needed to fixup alignment faults.
- Support for guests under PowerVM to use the Power9 XIVE interrupt
controller.
And probably that many things again that are almost as interesting,
but I had to keep the list short. Plus the usual fixes and cleanups as
always.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Andreas Schwab,
Aneesh Kumar K.V, Anju T Sudhakar, Arvind Yadav, Balbir Singh,
Benjamin Herrenschmidt, Bhumika Goyal, Breno Leitao, Bryant G. Ly,
Christophe Leroy, Cédric Le Goater, Dan Carpenter, Dou Liyang,
Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Hannes
Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall,
LABBE Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring,
Masahiro Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo,
Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran,
Paul Mackerras, Rashmica Gupta, Rob Herring, Rui Teng, Sam Bobroff,
Santosh Sivaraj, Scott Wood, Shilpasri G Bhat, Sukadev Bhattiprolu,
Suraj Jitindar Singh, Tobin C. Harding, Victor Aoqui"
* tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (321 commits)
powerpc/xive: Fix section __init warning
powerpc: Fix kernel crash in emulation of vector loads and stores
powerpc/xive: improve debugging macros
powerpc/xive: add XIVE Exploitation Mode to CAS
powerpc/xive: introduce H_INT_ESB hcall
powerpc/xive: add the HW IRQ number under xive_irq_data
powerpc/xive: introduce xive_esb_write()
powerpc/xive: rename xive_poke_esb() in xive_esb_read()
powerpc/xive: guest exploitation of the XIVE interrupt controller
powerpc/xive: introduce a common routine xive_queue_page_alloc()
powerpc/sstep: Avoid used uninitialized error
axonram: Return directly after a failed kzalloc() in axon_ram_probe()
axonram: Improve a size determination in axon_ram_probe()
axonram: Delete an error message for a failed memory allocation in axon_ram_probe()
powerpc/powernv/npu: Move tlb flush before launching ATSD
powerpc/macintosh: constify wf_sensor_ops structures
powerpc/iommu: Use permission-specific DEVICE_ATTR variants
powerpc/eeh: Delete an error out of memory message at init time
powerpc/mm: Use seq_putc() in two functions
macintosh: Convert to using %pOF instead of full_name
...
|
|
For understanding how the workload maps to memory channels and hardware
behavior, it's very important to collect address maps with physical
addresses. For example, 3D XPoint access can only be found by filtering
the physical address.
Add a new sample type for physical address.
perf already has a facility to collect data virtual address. This patch
introduces a function to convert the virtual address to physical address.
The function is quite generic and can be extended to any architecture as
long as a virtual address is provided.
- For kernel direct mapping addresses, virt_to_phys is used to convert
the virtual addresses to physical address.
- For user virtual addresses, __get_user_pages_fast is used to walk the
pages tables for user physical address.
- This does not work for vmalloc addresses right now. These are not
resolved, but code to do that could be added.
The new sample type requires collecting the virtual address. The
virtual address will not be output unless SAMPLE_ADDR is applied.
For security, the physical address can only be exposed to root or
privileged user.
Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Bring in the commit to rename find_linux_pte_or_hugepte() which touches
arch and KVM code, and might need to be merged with the kvmppc tree to
avoid conflicts.
|
|
Add newer helpers to make the function usage simpler. It is always
recommended to use find_current_mm_pte() for walking the page table.
If we cannot use find_current_mm_pte(), it should be documented why
the said usage of __find_linux_pte() is safe against a parallel THP
split.
For now we have KVM code using __find_linux_pte(). This is because kvm
code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
with PACA soft_enabled = 1. We may want to fix that later and make
sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
we can switch kvm to use find_linux_pte().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
nest_imc_refc is a reference count struct, used to track number of
active perf sessions using the nest units.
Currently the code accesses nest_imc_refc using node_id, which is
incorrect, the array is indexed by node number. Meaning in the case of
sparse node ids we index off the end of the array.
Fix it to use get_nest_pmu_ref() which uses the existing per-cpu
variable local_nest_imc_refc.
Fixes: 885dcd709ba91 ('powerpc/perf: Add nest IMC PMU support')
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In a multi node system with discontiguous node ids, nest event values
are not showing up properly. eg. lscpu output:
NUMA node0 CPU(s): 0-15
NUMA node8 CPU(s): 16-31
Nest event values on such systems can be counted on CPUs <= 15:
$./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 0-14 -I 1000 sleep 1000
# time counts unit events
1.000294577 30,17,24,42,880 nest_powerbus0_imc/PM_PB_CYC/
But not on CPUs >= 16:
$./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000
# time counts unit events
1.000049902 <not supported> nest_powerbus0_imc/PM_PB_CYC/
This is because, when fetching the reference count, the node id (which
may be sparse) is used as the array index, not the node number (which
is 0 based and contiguous).
Fix it by using the node number as the array index.
$./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000
# time counts unit events
1.000241961 26,12,35,28,704 nest_powerbus0_imc/PM_PB_CYC/
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
[mpe: Change log tweaks for clarity and brevity]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This function is not called with the nest_init_lock held, and it also
unlocks the nest_init_lock immediately below, so it's fairly clear
that this is a typo and should be locking the lock.
Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Fixes: 34922527a2bc ("powerpc/perf: Add power9 event list macros for generic and cache events")
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add couple of more events (PM_LD_MISS_L1 and PM_BR_2PATH) to
power9 event list and power9_event_alternatives array (these
events can be counted in more than one PMC).
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
There are some hardware events on Power systems which only count when
the processor is not idle, and there are some fixed-function counters
which count such events. For example, the "run cycles" event counts
cycles when the processor is not idle. If the user asks to count
cycles, we can use "run cycles" if this is a per-task event, since the
processor is running when the task is running, by definition. We can't
use "run cycles" if the user asks for "cycles" on a system-wide
counter.
Currently in power8 this check is done using PPMU_ONLY_COUNT_RUN flag
in power8_get_alternatives() function. Based on the flag, events are
switched if needed. This function should also be enabled in power9, so
factor out the code to isa207_get_alternatives().
Fixes: efe881afdd999 ('powerpc/perf: Factor out event_alternative function')
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Commit 20dd4c624d251 ('powerpc/perf: Fix SDAR_MODE value for continous
sampling on Power9') set the default sdar_mode value in MMCRA[SDAR_MODE]
to be used as 0b01 (Update on TLB miss). And this value is set if sdar_mode
from event is zero, or we are in continous sampling mode in power9 dd1.
But it is preferred to have the sdar_mode value for power9 as
0b10 (Update on dcache miss) for better sampling updates instead
of 0b01 (Update on TLB miss).
From Anton:
Using a bandwidth test case with a 1MB footprint, I profiled cycles and
chose TLB updates of the SDAR:
$ perf record -d -e r000400000000001E:u ./bw2001 1M
^
SDAR TLB
$ perf report -D | grep PERF_RECORD_SAMPLE | sed 's/.*addr: //' | sort -u | wc -l
4
I get 4 unique addresses. If I ran with dcache misses:
$ perf record -d -e r000800000000001E:u ./bw2001 1M
^
SDAR dcache miss
$ perf report -D|grep PERF_RECORD_SAMPLE| sed 's/.*addr: //'|sort -u | wc -l
5217
I get 5217 unique addresses. No surprises here, but it does show why
TLB misses is the wrong event to default to - we get very little useful
information out of it.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add support to register Thread In-Memory Collection PMU counters.
Patch adds thread IMC specific data structures, along with memory
init functions and CPU hotplug support.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add support to register Core In-Memory Collection PMU counters.
Patch adds core IMC specific data structures, along with memory
init functions and CPU hotplug support.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add support to register Nest In-Memory Collection PMU counters.
Patch adds a new device file called "imc-pmu.c" under powerpc/perf
folder to contain all the device PMU functions.
Device tree parser code added to parse the PMU events information
and create sysfs event attributes for the PMU.
Cpumask attribute added along with Cpu hotplug online/offline functions
specific for nest PMU. A new state "CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE"
added for the cpu hotplug callbacks. Error handle path frees the memory
and unregisters the CPU hotplug callbacks.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Similar to POWER8, POWER9 can count run cycles and run instructions
completed on more than one PMU.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|