summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)AuthorFilesLines
2024-09-07s390/facility: Disable compile time optimization for decompressor codeHeiko Carstens1-2/+4
Disable compile time optimizations of test_facility() for the decompressor. The decompressor should not contain any optimized code depending on the architecture level set the kernel image is compiled for to avoid unexpected operation exceptions. Add a __DECOMPRESSOR check to test_facility() to enforce that facilities are always checked during runtime for the decompressor. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-07s390/boot: Increase minimum architecture to z10Heiko Carstens1-4/+0
The decompressor code is partially compiled with march=z900 so it is possible to print an error message in case a kernel is booted on a machine which misses facilities to execute the kernel. Given that the decompressor code also includes header files from the core kernel this causes problems for inline assemblies and other code where the minimum assumed architecture level is set to z10 in the meantime. If such code is also used in the decompressor (e.g. inline functions) z900 support must be implemented again. In order to avoid this and to keep things simple just raise the minimum architecture level to z10 for the decompressor just like for the kernel. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-07s390/als: Remove obsolete commentHeiko Carstens1-8/+0
The bss section of the decompressor is part of the compressed kernel image since commit 980d5f9ab36b ("s390/boot: enable .bss section for compressed kernel"). Remove a now incorrect comment that states that the bss section must not be accessed. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-05s390/sha3: Fix SHA3 selftests failuresIngo Franzki3-0/+7
Since commit "s390/sha3: Support sha3 performance enhancements" the selftests of the sha3_256_s390 and sha3_512_s390 kernel digests sometimes fail with: alg: shash: sha3-256-s390 test failed (wrong result) on test vector 3, cfg="import/export" alg: self-tests for sha3-256 using sha3-256-s390 failed (rc=-22) or with alg: ahash: sha3-256-s390 test failed (wrong result) on test vector 3, cfg="digest misaligned splits crossing pages" alg: self-tests for sha3-256 using sha3-256-s390 failed (rc=-22) The first failure is because the newly introduced context field 'first_message_part' is not copied during export and import operations. Because of that the value of 'first_message_part' is more or less random after an import into a newly allocated context and may or may not fit to the state of the imported SHA3 operation, causing an invalid hash when it does not fit. Save the 'first_message_part' field in the currently unused field 'partial' of struct sha3_state, even though the meaning of 'partial' is not exactly the same as 'first_message_part'. For the caller the returned state blob is opaque and it must only be ensured that the state can be imported later on by the module that exported it. The second failure is when on entry of s390_sha_update() the flag 'first_message_part' is on, and kimd is called in the first 'if (index)' block as well as in the second 'if (len >= bsize)' block. In this case, the 'first_message_part' is turned off after the first kimd, but the function code incorrectly retains the NIP flag. Reset the NIP flag after the first kimd unconditionally besides turning 'first_message_part' off. Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Fixes: 88c02b3f79a6 ("s390/sha3: Support sha3 performance enhancements") Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Reviewed-by: Joerg Schmidbauer <jschmidb@de.ibm.com> Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-05s390/pkey: Add AES xts and HMAC clear key token supportHarald Freudenberger1-0/+4
Add support for deriving protected keys from clear key token for AES xts and HMAC keys via PCKMO instruction. Add support for protected key generation and unwrap of protected key tokens for these key types. Furthermore 4 new sysfs attributes are introduced: - /sys/devices/virtual/misc/pkey/protkey/protkey_aes_xts_128 - /sys/devices/virtual/misc/pkey/protkey/protkey_aes_xts_256 - /sys/devices/virtual/misc/pkey/protkey/protkey_hmac_512 - /sys/devices/virtual/misc/pkey/protkey/protkey_hmac_1024 Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-05s390/cpacf: Add MSA 10 and 11 new PCKMO functionsHarald Freudenberger1-12/+16
Add the defines for the new PCKMO functions covering MSA 10 (AES XTS "double" keys) and MSA 11 (HMAC keys) support. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-05s390/mm: Add cond_resched() to cmm_alloc/free_pages()Gerald Schaefer1-1/+17
Adding/removing large amount of pages at once to/from the CMM balloon can result in rcu_sched stalls or workqueue lockups, because of busy looping w/o cond_resched(). Prevent this by adding a cond_resched(). cmm_free_pages() holds a spin_lock while looping, so it cannot be added directly to the existing loop. Instead, introduce a wrapper function that operates on maximum 256 pages at once, and add it there. Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-05s390/pai_ext: Update PAI extension 1 countersThomas Richter1-0/+9
Update the internal array of PAI extension 1 NNPA counter string table to support specialized processor instrumentation assist instructions. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-05s390/pai_crypto: Add support for MSA 10 and 11 pai countersThomas Richter1-0/+16
Update the internal array of PAI crypto counter string table with new counters supported with Message Security Assist extension (MSA) 10 and MSA 11. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Finn Callies <fcallies@linux.ibm.com> Tested-by: Finn Callies <fcallies@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-04arch, mm: move definition of node_data to generic codeMike Rapoport (Microsoft)3-20/+1
Every architecture that supports NUMA defines node_data in the same way: struct pglist_data *node_data[MAX_NUMNODES]; No reason to keep multiple copies of this definition and its forward declarations, especially when such forward declaration is the only thing in include/asm/mmzone.h for many architectures. Add definition and declaration of node_data to generic code and drop architecture-specific versions. Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-04dma-mapping: clearly mark DMA ops as an architecture featureChristoph Hellwig1-1/+1
DMA ops are a helper for architectures and not for drivers to override the DMA implementation. Unfortunately driver authors keep ignoring this. Make the fact more clear by renaming the symbol to ARCH_HAS_DMA_OPS and having the two drivers overriding their dma_ops depend on that. These drivers should probably be marked broken, but we can give them a bit of a grace period for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # for IPU6 Acked-by: Robin Murphy <robin.murphy@arm.com>
2024-09-02s390/mm/fault: convert do_secure_storage_access() from follow_page() to ↵David Hildenbrand1-6/+10
folio_walk Let's get rid of another follow_page() user and perform the conversion under PTL: Note that this is also what follow_page_pte() ends up doing. Unfortunately we cannot currently optimize out the additional reference, because arch_make_folio_accessible() must be called with a raised refcount to protect against concurrent conversion to secure. We can just move the arch_make_folio_accessible() under the PTL, like follow_page_pte() would. We'll effectively drop the "writable" check implied by FOLL_WRITE: follow_page_pte() would also not check that when calling arch_make_folio_accessible(), so there is no good reason for doing that here. We'll lose the secretmem check from follow_page() as well, about which we shouldn't really care. Link: https://lkml.kernel.org/r/20240802155524.517137-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-02s390/uv: convert gmap_destroy_page() from follow_page() to folio_walkDavid Hildenbrand1-6/+12
Let's get rid of another follow_page() user and perform the UV calls under PTL -- which likely should be fine. No need for an additional reference while holding the PTL: uv_destroy_folio() and uv_convert_from_secure_folio() raise the refcount, so any concurrent make_folio_secure() would see an unexpted reference and cannot set PG_arch_1 concurrently. Do we really need a writable PTE? Likely yes, because the "destroy" part is, in comparison to the export, a destructive operation. So we'll keep the writability check for now. We'll lose the secretmem check from follow_page(). Likely we don't care about that here. Link: https://lkml.kernel.org/r/20240802155524.517137-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-02s390/uv: drop arch_make_page_accessible()David Hildenbrand2-7/+0
All code was converted to using arch_make_folio_accessible(), let's drop arch_make_page_accessible(). Link: https://lkml.kernel.org/r/20240729183844.388481-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-29s390/hiperdispatch: Add hiperdispatch debug countersMete Durlu1-0/+77
Add three counters to follow and understand hiperdispatch behavior; * adjustment_count (amount of capacity adjustments triggered) * greedy_time_ms (time spent while all cpus are on high capacity) * conservative_time_ms (time spent while only entitled cpus are on high capacity) These counters can be found under /sys/kernel/debug/s390/hiperdispatch/ Time counters are in <msec> format and only cover the time spent when hiperdispatch is active. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/hiperdispatch: Add hiperdispatch debug attributesMete Durlu1-2/+82
Add two attributes for debug purposes. They can be found under; /sys/devices/system/cpu/hiperdispatch/ * hd_stime_threshold : allows user to adjust steal time threshold * hd_delay_factor : allows user to adjust delay factor of hiperdispatch work (after topology updates, delayed work is always delayed extra by this factor) hd_stime_threshold can have values between 0-100 as it represents a percentage value. hd_delay_factor can have values greater than 1. It is multiplied with the default delay to achieve a longer interval, pushing back the next hiperdispatch adjustment after a topology update. Ex: if delay interval is 250ms and the delay factor is 4; delayed interval is now 1000ms(1sec). After each capacity adjustment or topology change, work has a delayed interval of 1 sec for one interval. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/hiperdispatch: Add hiperdispatch sysctl interfaceMete Durlu2-0/+72
Expose hiperdispatch controls via sysctl. The user can now toggle hiperdispatch via assigning 0 or 1 to s390.hiperdispatch attribute. When hiperdipatch is toggled on, it tries to adjust CPU capacities, while system is in vertical polarization to gain performance benefits from different CPU polarizations. Disabling hiperdispatch reverts the CPU capacities to their default (HIGH_CAPACITY) and stops the dynamic adjustments. Introduce a kconfig option HIPERDISPATCH_ON which allows users to use hiperdispatch by default on vertical polarization. Using the sysctl attribute s390.hiperdispatch would overwrite this behavior. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/hiperdispatch: Add trace eventsMete Durlu2-0/+63
Add trace events to debug hiperdispatch behavior and track domain rebuilding. Two events provide information about the decision making of hiperdispatch and the adjustments made. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Co-developed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/hiperdispatch: Add steal time averagingMete Durlu1-1/+10
The measurements done by hiperdispatch can have sudden spikes and dips during run time. To prevent these outliers effecting the decision making process and causing adjustment overhead, use weighted average of the steal time. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Co-developed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/hiperdispatch: Introduce hiperdispatchMete Durlu4-5/+228
When LPAR is in vertical polarization, CPUs get different polarization values, namely vertical high, vertical medium and vertical low. These values represent the likelyhood of the CPU getting physical runtime. Vertical high CPUs will always get runtime and others get varying runtime depending on the load the CEC is under. Vertical high and vertical medium CPUs are considered the CPUs which the current LPAR has the entitlement to run on. The vertical lows are on the other hand are borrowed CPUs which would only be given to the LPAR by hipervisor when the other LPARs are not utilizing them. Using the CPU capacities, hint linux scheduler when it should prioritise vertical high and vertical medium CPUs over vertical low CPUs. By tracking various system statistics hiperdispatch determines when to adjust cpu capacities. After each adjustment, rebuilding of scheduler domains is necessary to notify the scheduler about capacity changes but since this operation is costly it should be done as sparsely as possible. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Co-developed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/smp: Add cpu capacitiesMete Durlu4-0/+29
Linux scheduler allows architectures to assign capacity values to individual CPUs. This hints scheduler the performance difference between CPUs and allows more efficient task distribution them. Implement helper methods to set and get CPU capacities for s390. This is particularly helpful in vertical polarization configurations of LPARs. On vertical polarization an LPARs CPUs can get different polarization values depending on the CEC configuration. CPUs with different polarization values can perform different from each other, using CPU capacities this can be reflected to linux scheduler. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/topology: Add config option to switch to vertical during bootTobias Huschle2-0/+10
By default, all systems on s390 start in horizontal cpu polarization. Selecting the new config option SCHED_TOPOLOGY_VERTICAL allows to build a kernel that switches to vertical polarization during boot. Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/topology: Add sysctl handler for polarizationTobias Huschle1-13/+45
Provide an additional path to set the polarization of the system, such that a user no longer relies on the sysfs interface only and is able configure the polarization for every reboot via sysctl control files. The new sysctl can be set as follows: - s390.polarization=0 for horizontal polarization - s390.polarization=1 for vertical polarization Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/wti: Add debugfs file to display missed grace periods per cpuTobias Huschle1-0/+25
Introduce a new debug file which allows to determine how many warning track grace periods were missed on each CPU. The new file can be found as /sys/kernel/debug/s390/wti It is formatted as: CPU0 CPU1 [...] CPUx xyz xyz [...] xyz Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/wti: Add wti accounting for missed grace periodsTobias Huschle1-1/+50
A virtual CPU that has received a warning-track interrupt may fail to acknowledge the interrupt within the warning-track grace period. While this is usually not a problem, it will become necessary to investigate if there is a large number of such missed warning-track interrupts. Therefore, it is necessary to track these events. The information is tracked through the s390 debug facility and can be found under /sys/kernel/debug/s390dbf/wti/. The hex_ascii output is formatted as: <pid> <symbol> The values pid and current psw are collected when a warning track interrupt is received. Symbol is either the kernel symbol matching the collected psw or redacted to <user> when running in user space. Each line represents the currently executing process when a warning track interrupt was received which was then not acknowledged within its grace period. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/wti: Prepare graceful CPU pre-emption on wti receptionTobias Huschle2-1/+142
When a warning track interrupt is received, the kernel has only a very limited amount of time to make sure, that the CPU can be yielded as gracefully as possible before being pre-empted by the hypervisor. The interrupt handler for the wti therefore unparks a kernel thread which has being created on boot re-using the CPU hotplug kernel thread infrastructure. These threads exist per CPU and are assigned the highest possible real-time priority. This makes sure, that said threads will execute as soon as possible as the scheduler should pre-empt any other running user tasks to run the real-time thread. Furthermore, the interrupt handler disables all I/O interrupts to prevent additional interrupt processing on the soon-preempted CPU. Interrupt handlers are likely to take kernel locks, which in the worst case, will be kept while the interrupt handler is pre-empted from itself underlying physical CPU. In that case, all tasks or interrupt handlers on other CPUs would have to wait for the pre-empted CPU being dispatched again. By preventing further interrupt processing, this risk is minimized. Once the CPU gets dispatched again, the real-time kernel thread regains control, reenables interrupts and parks itself again. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/wti: Introduce infrastructure for warning track interruptTobias Huschle6-2/+33
The warning-track interrupt (wti) provides a notification that the receiving CPU will be pre-empted from its physical CPU within a short time frame. This time frame is called grace period and depends on the machine type. Giving up the CPU on time may prevent a task to get stuck while holding a resource. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/hypfs: Remove obsoleted declaration for hypfs_dbfs_exitGaosheng Cui1-1/+0
The hypfs_dbfs_exit() have been removed since commit 3325b4d85799 ("s390/hypfs: factor out filesystem code"), and now it is useless, so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/ftrace: Avoid extra serialization for graph caller patchingVasily Gorbik1-14/+2
The only context where ftrace_enable_ftrace_graph_caller() or ftrace_disable_ftrace_graph_caller() is called also calls ftrace_arch_code_modify_post_process(), which already performs text_poke_sync_lock(). ftrace_run_update_code() arch_ftrace_update_code() ftrace_modify_all_code() ftrace_enable_ftrace_graph_caller()/ftrace_disable_ftrace_graph_caller() ftrace_arch_code_modify_post_process() text_poke_sync_lock() Remove the redundant serialization. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/ftrace: Use get/copy_from_kernel_nofault consistentlyVasily Gorbik1-2/+5
Use get/copy_from_kernel_nofault to access the kernel text consistently. Replace memcmp() in ftrace_init_nop() to ensure that in case of inconsistencies in the 'mcount' table, the kernel reports a failure instead of potentially crashing. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/ftrace: Avoid trampolines if possibleVasily Gorbik1-6/+53
When a sequential instruction fetching facility is present, it is safe to patch ftrace NOPs in function prologues. All of them are 8-byte aligned. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/kprobes: Avoid stop machine if possibleVasily Gorbik1-2/+13
Avoid stop machine on kprobes arm/disarm when sequential instruction fetching is present. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/setup: Recognize sequential instruction fetching facilityVasily Gorbik2-0/+4
When sequential instruction fetching facility is present, certain guarantees are provided for code patching. In particular, atomic overwrites within 8 aligned bytes is safe from an instruction-fetching point of view. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/entry: Unify save_area_sync and save_area_asyncSven Schnelle4-17/+16
In the past two save areas existed because interrupt handlers and system call / program check handlers where entered with interrupts enabled. To prevent a handler from overwriting the save areas from the previous handler, interrupts used the async save area, while system call and program check handler used the sync save area. Since the removal of critical section cleanup from entry.S, handlers are entered with interrupts disabled. When the interrupts are re-enabled, the save area is no longer need. Therefore merge both save areas into one. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/sha3: Support sha3 performance enhancementsJoerg Schmidbauer5-10/+38
On newer machines the SHA3 performance of CPACF instructions KIMD and KLMD can be enhanced by using additional modifier bits. This allows the application to omit initializing the ICV, but also affects the internal processing of the instructions. Performance is mostly gained when processing short messages. The new CPACF feature is backwards compatible with older machines, i.e. the new modifier bits are ignored on older machines. However, to save the ICV initialization, the application must detect the MSA level and omit the ICV initialization only if this feature is supported. Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Joerg Schmidbauer <jschmidb@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/pkey: Introduce pkey base with handler registry and handler modulesHarald Freudenberger3-0/+7
Introduce pkey base kernel code with a simple pkey handler registry. Regroup the pkey code into these kernel modules: - pkey is the pkey api supporting the ioctls, sysfs and in-kernel api. Also the pkey base code which offers the handler registry and handler wrapping invocation functions is integrated there. This module is automatically loaded in via CPU feature if the MSA feature is available. - pkey-cca is the CCA related handler code kernel module a offering CCA specific implementation for pkey. This module is loaded in via MODULE_DEVICE_TABLE when a CEX[4-8] card becomes available. - pkey-ep11 is the EP11 related handler code kernel module offering an EP11 specific implementation for pkey. This module is loaded in via MODULE_DEVICE_TABLE when a CEX[4-8] card becomes available. - pkey-pckmo is the PCKMO related handler code kernel module. This module is loaded in via CPU feature if the MSA feature is available, but on init a check for availability of the pckmo instruction is performed. The handler modules register via a pkey_handler struct at the pkey base code and the pkey customer (that is currently the pkey api code fetches a handler via pkey handler registry functions and calls the unified handler functions via the pkey base handler functions. As a result the pkey-cca, pkey-ep11 and pkey-pckmo modules get independent from each other and it becomes possible to write new handlers which offer another kind of implementation without implicit dependencies to other handler implementations and/or kernel device drivers. For each of these 4 kernel modules there is an individual Kconfig entry: CONFIG_PKEY for the base and api, CONFIG_PKEY_CCA for the PKEY CCA support handler, CONFIG_PKEY_EP11 for the EP11 support handler and CONFIG_PKEY_PCKMO for the pckmo support. The both CEX related handler modules (PKEY CCA and PKEY EP11) have a dependency to the zcrypt api of the zcrypt device driver. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/pkey: Rework and split PKEY kernel module codeHarald Freudenberger2-4/+4
This is a huge rework of all the pkey kernel module code. The goal is to split the code into individual parts with a dedicated calling interface: - move all the sysfs related code into pkey_sysfs.c - all the CCA related code goes to pkey_cca.c - the EP11 stuff has been moved to pkey_ep11.c - the PCKMO related code is now in pkey_pckmo.c The CCA, EP11 and PCKMO code may be seen as "handlers" with a similar calling interface. The new header file pkey_base.h declares this calling interface. The remaining code in pkey_api.c handles the ioctl, the pkey module things and the "handler" independent code on top of the calling interface invoking the handlers. This regrouping of the code will be the base for a real pkey kernel module split into a pkey base module which acts as a dispatcher and handler modules providing their service. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/crypto: Add hardware acceleration for HMAC modesHolger Dengler6-8/+401
Add new shash exploiting the HMAC hardware accelerations for SHA224, SHA256, SHA384 and SHA512 introduced with message-security assist extension 11. Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/crypto: Add hardware acceleration for full AES-XTS modeHolger Dengler2-3/+119
Add new cipher exploiting the full AES-XTS hardware acceleration introduced with message-security assist extension 10. The full AES-XTS cipher is registered as preferred cipher in addition to the discrete AES-XTS variant. Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/hypfs_diag: Remove unused dentry variableMete Durlu1-6/+1
Remove leftover dentry variable after hypfs refactoring. Before 2fcb3686e160, hypfs_diag.c and other hypfs files were using debugfs_create_file() explicitly for creating debugfs files and were storing the returned pointer. After the refactor, common debugfs file operations and also the related dentry pointers have been moved into hypfs_dbfs.c and redefined as new common mechanisms. Therefore the dentry variable and the debugfs_remove() function calls in hypfs_diag.c are now redundant. Current code is not effected since the dentry pointer in hypfs_diag is implicitly assigned to NULL and debugfs_remove() returns without an error if the passed pointer is NULL. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/disassembler: Add instructionsVasily Gorbik2-4/+33
Add more instructions to the kernel disassembler. Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390: Always enable EXPOLINE_EXTERN if supportedVasily Gorbik1-10/+6
Since commit ba05b39d54ee ("s390/expoline: Make modules use kernel expolines"), there is no longer any reason not to use CONFIG_EXPOLINE_EXTERN when supported by the compiler. On the positive side: - there is only a single set of expolines generated and used by both the kernel code and modules, - it eliminates expolines "comdat" sections, which can confuse tools like kpatch. Always enable EXPOLINE_EXTERN if supported by the compiler. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/disassembler: Update instruction mnemonics to latest specJens Remus1-7/+7
Over the course of CPU generations a few instructions got extended, changing their base mnemonic, while keeping the former as an extended mnemonic. Update the instruction mnemonics in the disassembler to their latest base mnemonic as documented in the latest IBM z/Architecture Principles of Operation specification [1]. With the IBM z14 the base mnemonics of the following vector instructions have been changed: - Vector FP Load Lengthened (VFLL) - Vector FP Load Rounded (VFLR) With Message-Security-Assist Extension 5 Perform Pseudorandom Number Operation (PPNO) has been renamed to Perform Random Number Operation (PRNO). With Vector Enhancements Facility 2 the base mnemonics of the following vector instructions have been changed: - Vector FP Convert from Fixed (VCFPS) - Vector FP Convert from Logical (VCFPL) - Vector FP Convert to Fixed (VCSFP) - Vector FP Convert to Logical (VCLFP) [1] IBM z/Architecture Principles of Operation, SA22-7832-13, IBM z16, https://publibfp.dhe.ibm.com/epubs/pdf/a227832d.pdf Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29s390/disassembler: Use proper format specifiers for operand valuesJens Remus1-6/+6
Treat register numbers as unsigned. Treat signed operand values as signed. This resolves multiple instances of the Cppcheck warning: warning: %i in format string (no. 1) requires 'int' but the argument type is 'unsigned int'. [invalidPrintfArgType_sint] Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-27s390/ftrace: Avoid calling unwinder in ftrace_return_address()Vasily Gorbik2-20/+16
ftrace_return_address() is called extremely often from performance-critical code paths when debugging features like CONFIG_TRACE_IRQFLAGS are enabled. For example, with debug_defconfig, ftrace selftests on my LPAR currently execute ftrace_return_address() as follows: ftrace_return_address(0) - 0 times (common code uses __builtin_return_address(0) instead) ftrace_return_address(1) - 2,986,805,401 times (with this patch applied) ftrace_return_address(2) - 140 times ftrace_return_address(>2) - 0 times The use of __builtin_return_address(n) was replaced by return_address() with an unwinder call by commit cae74ba8c295 ("s390/ftrace: Use unwinder instead of __builtin_return_address()") because __builtin_return_address(n) simply walks the stack backchain and doesn't check for reaching the stack top. For shallow stacks with fewer than "n" frames, this results in reads at low addresses and random memory accesses. While calling the fully functional unwinder "works", it is very slow for this purpose. Moreover, potentially following stack switches and walking past IRQ context is simply wrong thing to do for ftrace_return_address(). Reimplement return_address() to essentially be __builtin_return_address(n) with checks for reaching the stack top. Since the ftrace_return_address(n) argument is always a constant, keep the implementation in the header, allowing both GCC and Clang to unroll the loop and optimize it to the bare minimum. Fixes: cae74ba8c295 ("s390/ftrace: Use unwinder instead of __builtin_return_address()") Cc: stable@vger.kernel.org Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-27s390/build: Avoid relocation information in final vmlinuxJens Remus2-5/+41
Since commit 778666df60f0 ("s390: compile relocatable kernel without -fPIE") the kernel vmlinux ELF file is linked with --emit-relocs to preserve all relocations, so that all absolute relocations can be extracted using the 'relocs' tool to adjust them during boot. Port and adapt Petr Pavlu's x86 commit 9d9173e9ceb6 ("x86/build: Avoid relocation information in final vmlinux") to s390 to strip all relocations from the final vmlinux ELF file to optimize its size. Following is his original commit message with minor adaptions for s390: The Linux build process on s390 roughly consists of compiling all input files, statically linking them into a vmlinux ELF file, and then taking and turning this file into an actual bzImage bootable file. vmlinux has in this process two main purposes: 1) It is an intermediate build target on the way to produce the final bootable image. 2) It is a file that is expected to be used by debuggers and standard ELF tooling to work with the built kernel. For the second purpose, a vmlinux file is typically collected by various package build recipes, such as distribution spec files, including the kernel's own tar-pkg target. When building the kernel vmlinux contains also relocation information produced by using the --emit-relocs linker option. This is utilized by subsequent build steps to create relocs.S and produce a relocatable image. However, the information is not needed by debuggers and other standard ELF tooling. The issue is then that the collected vmlinux file and hence distribution packages end up unnecessarily large because of this extra data. The following is a size comparison of vmlinux v6.10 with and without the relocation information: | Configuration | With relocs | Stripped relocs | | defconfig | 696 MB | 320 MB | | -CONFIG_DEBUG_INFO | 48 MB | 32 MB | Optimize a resulting vmlinux by adding a postlink step that splits the relocation information into relocs.S and then strips it from the vmlinux binary. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-27s390/ftrace: Use kernel ftrace trampoline for modulesVasily Gorbik1-24/+0
Now that both the kernel modules area and the kernel image itself are located within 4 GB, there is no longer a need to maintain a separate ftrace_plt trampoline. Use the existing trampoline in the kernel. Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-27s390/ftrace: Remove unused ftrace_plt_template*Vasily Gorbik1-2/+0
Unused since commit b860b9346e2d ("s390/ftrace: remove dead code"). Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-25Merge tag 's390-6.11-4' of ↵Linus Torvalds8-33/+85
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix KASLR base offset to account for symbol offsets in the vmlinux ELF file, preventing tool breakages like the drgn debugger - Fix potential memory corruption of physmem_info during kernel physical address randomization - Fix potential memory corruption due to overlap between the relocated lowcore and identity mapping by correctly reserving lowcore memory - Fix performance regression and avoid randomizing identity mapping base by default - Fix unnecessary delay of AP bus binding complete uevent to prevent startup lag in KVM guests using AP * tag 's390-6.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/boot: Fix KASLR base offset off by __START_KERNEL bytes s390/boot: Avoid possible physmem_info segment corruption s390/ap: Refine AP bus bindings complete processing s390/mm: Pin identity mapping base to zero s390/mm: Prevent lowcore vs identity mapping overlap
2024-08-22s390/early: Dump register contents and call trace for early crashesHeiko Carstens3-4/+24
If the early program check handler cannot resolve a program check dump register contents and a call trace to the console before loading a disabled wait psw. This makes debugging much easier. Emit an extra message with early_printk() for cases where regular printk() via the early console is not yet working so that at least some information is available. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>