summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)AuthorFilesLines
2024-12-17s390/diag: Move diag.c to diag specific folderSumanth Korikkar3-3/+3
Move implementation of s390 diagnose code to diag specific folder. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-17s390/diag324: Retrieve power readings via diag 0x324Sumanth Korikkar9-1/+272
Retrieve electrical power readings for resources in a computing environment via diag 0x324. diag 0x324 stores the power readings in the power information block (pib). Provide power readings from pib via diag324 ioctl interface. diag324 ioctl provides new pib to the user only if the threshold time has passed since the last call. Otherwise, cache data is returned. When there are no active readers, cleanup of pib buffer is performed. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-17s390/diag: Create misc device /dev/diagSumanth Korikkar2-0/+46
Create a misc device /dev/diag to fetch diagnose specific information from the kernel and provide it to userspace. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-17s390/lib: Use exrl instead of ex in string functionsSven Schnelle1-10/+5
exrl is present in all machines currently supported in the linux kernel, therefore prefer it over ex. This saves one instruction and doesn't need an additional register to hold the address of the target instruction. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-17s390/mm: Simplify noexec page protection handlingHeiko Carstens10-77/+106
By default page protection definitions like PAGE_RX have the _PAGE_NOEXEC bit set. For older machines without the instruction execution protection facility this bit is not allowed to be used in page table entries, and therefore must be removed. This is done at a couple of page table walkers, but also at some but not all page table modification functions like ptep_modify_prot_commit(). Avoid all of this and change the page, segment and region3 protection definitions so that the noexec bit is masked out automatically if the instruction execution-protection facility is not available. This is similar to what also various other architectures do which had to solve the same problem. Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-17s390/mm: Remove unused PAGE_KERNEL_EXEC and friendsHeiko Carstens1-15/+0
Remove unused PAGE_KERNEL_EXEC, SEGMENT_KERNEL_EXEC, and REGION3_KERNEL_EXEC. Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-17s390/mm: Remove incorrect commentHeiko Carstens1-7/+0
Remove an outdated comment that is also located at a random place. The generic statement that read permissions imply execute permissions is wrong since the instruction execution-protection facility is available. Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-16s390/pci: Add pci_msg debug view to PCI reportNiklas Schnelle3-5/+54
Using the newly introduced debug_dump() mechanism add formatted content of pci_debug_msg_id to the PCI report. The formatting is based on the existing sprintf format but removes caller pointer and area index and adds an column header. This will allow the platform to collect this log data together with hardware errors. This sets the reverse flag such that the newest log entries get added to the PCI report even if not all debug log entries fit. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Co-developed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-16s390/debug: Add a reverse mode for debug_dump()Niklas Schnelle2-3/+86
In this mode debug_dump() writes the debug log starting at the newest entry followed by earlier entries. To this end add a debug_prev_entry() helper analogous to debug_next_entry() a helper to get the latest entry which is one before the active entry and a helper to iterate either forward or backward. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Co-developed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-16s390/debug: Add debug_dump() to write debug view to a string bufferNiklas Schnelle2-0/+50
The debug_dump() function allows to get the content of a debug log and view pair in a string buffer. One future application of this is to provide debug logs to the platform to be collected with hardware error logs during recovery. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Co-developed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-16s390/debug: Split private data alloc/free out of file operationsNiklas Schnelle1-30/+48
Split the allocation respectively freeing of file_private_info_t out of open() respectively close(). This will be used in a follow on change to access to debug views without going through the s390dbf filesystem. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-16s390/debug: Simplify and document debug_next_entry() logicNiklas Schnelle1-10/+15
Contrary to convention debug_next_entry() returns a falsy 0 value if there are more entries and a truthy 1 value when there are no more entries. As there is only one caller just reverse this logic to be less surprising and document the behavior in a kdoc comment. Also replace the goto with an early return. In the future this allows using it in a do {} while (debug_next_entry(...)) loop. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-16s390/pci: Report PCI error recovery results via SCLPNiklas Schnelle5-5/+178
Add a mechanism with which the status of PCI error recovery runs is reported to the platform. Together with the status supply additional information that may aid in problem determination. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-16s390/mm: Consider KMSAN modules metadata for paging levelsVasily Gorbik1-0/+2
The calculation determining whether to use three- or four-level paging didn't account for KMSAN modules metadata. Include this metadata in the virtual memory size calculation to ensure correct paging mode selection and avoiding potentially unnecessary physical memory size limitations. Fixes: 65ca73f9fb36 ("s390/mm: define KMSAN metadata for vmalloc and modules") Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-16s390/mm/hugetlbfs: Remove huge_pte_none() / huge_pte_none_mostly()Heiko Carstens1-16/+7
Slightly cleanup arch/s390/include/asm/hugetlb.h: - Remove huge_pte_none() / huge_pte_none_mostly() which are identical to the generic variants - Coding style adjustments Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390: Add KERNEL_IMAGE_BASE to kasan.configVasily Gorbik1-0/+1
Although Kconfig specifies: config KERNEL_IMAGE_BASE hex "Kernel image base address" range 0x100000 0x1FFFFFE0000000 if !KASAN range 0x100000 0x1BFFFFE0000000 if KASAN default 0x3FFE0000000 if !KASAN default 0x7FFFE0000000 if KASAN Running make defconfig or make debug_defconfig followed by make kasan.config results in a suboptimal CONFIG_KERNEL_IMAGE_BASE=0x3FFE0000000. Add CONFIG_KERNEL_IMAGE_BASE=0x7FFFE0000000 to kasan.config to address that. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390/abs_lowcore: Include linux/smp.h for get_cpu() and put_cpu()Vasily Gorbik1-0/+1
Add missing include of <linux/smp.h> in abs_lowcore.h to provide declarations for get_cpu() and put_cpu() used in the code. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390: Remove __bootdata annotations from declarationsVasily Gorbik8-19/+20
For consistency, remove the `__bootdata` and `__bootdata_preserved` section annotations from variable declarations in header files. Section annotations should be applied to definitions, not declarations. This change moves the annotations to the variable definitions in the corresponding source files. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390/preempt: Optimize __preempt_count_dec_and_test()Heiko Carstens1-1/+1
Use __atomic_add_const_and_test() within __preempt_count_dec_and_test(). With this it is possible to decrease preempt_count by one and test if need_resched is set with one instruction, if the compiler has support for flag output operand constraints. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390/atomic: Provide arch_atomic_*_and_test() implementationsHeiko Carstens2-0/+109
Provide arch_atomic_*_and_test() implementations which make use of flag output constraints, and allow the compiler to generate slightly better code. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390: Remove superfluous new lines from inline assembliesHeiko Carstens3-10/+10
GCC uses the number of lines of an inline assembly to calculate its length (number of instructions). This has an impact on GCCs inlining decisions. Therefore remove superfluous new lines from a couple of inline assemblies, so that their real size is reflected. Also use an "asm inline" statement for the fpu_lfpc_safe() inline assembly to enforce that GCC assumes the minimum size for this inline assembly, since it contains various statements which make it appear much larger than the resulting code is. Suggested-by: Juergen Christ <jchrist@linux.ibm.com> Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390/preempt: Adjust coding styleHeiko Carstens1-2/+1
Just remove a line break which reduces readability. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390/preempt: Remove special pre MARCH_HAS_Z196_FEATURES implementationHeiko Carstens1-52/+0
Remove the preempt count implementation for pre MARCH_HAS_Z196_FEATURES builds. If the kernel is compiled with PREEMPT=n, which is the default for all distributions, this has close to zero impact in the generated code. Therefore remove the alternative implementation to keep things simple. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390/preempt: Add commentsHeiko Carstens1-0/+26
The s390 preempt_count implementation is more or less a copy of the x86 implementation using different instructions. For clarification how this works also add all comments from x86 with some minor modifications. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390/atomic: Consistent layering between atomic.h and atomic_ops.hHeiko Carstens2-26/+26
With commit c8a91c285d8c ("s390/atomic: move remaining inline assemblies to atomic_ops.h") all remaining atomic inline assemblies have been moved to atomic_ops.h. However the result is inconsistent: the functions in atomic_ops.h are supposed to be used with integral types like int and long pointers, while the functions in atomic.h work with atomic types. This layering got violated with the named commit. Therefore adjust this now, and also use consistent variable names in atomic_ops.h. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-15s390/atomic: Implement arch_atomic_inc() / arch_atomic_dec()Heiko Carstens1-0/+24
Implement arch_atomic_inc() / arch_atomic_dec() functions which result in a single instruction if compiled for z196 or newer architectures. Reduces the kernel image size by ~6K (defconfig): bloat-o-meter: add/remove: 0/0 grow/shrink: 12/1005 up/down: 106/-6404 (-6298) Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-11s390/ipl: Fix never less than zero warningAlexander Gordeev1-1/+1
DEFINE_IPL_ATTR_STR_RW() macro produces "unsigned 'len' is never less than zero." warning when sys_vmcmd_on_*_store() callbacks are defined. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412081614.5uel8F6W-lkp@intel.com/ Fixes: 247576bf624a ("s390/ipl: Do not accept z/VM CP diag X'008' cmds longer than max length") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-10s390/setup: Cleanup stack_alloc() and stack_free()Heiko Carstens1-6/+6
Some small cleanups to stack_alloc() and stack_free(): - Rename ret to stack to reflect what the variable is used for - Whitespace removal Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-10s390/Kconfig: Select VMAP_STACK unconditionallyHeiko Carstens5-62/+4
There is no point in supporting !VMAP_STACK kernel builds. VMAP_STACK has proven to work since many years. Also, since KASAN_VMALLOC is supported, kernels built with !VMAP_STACK are completely untested. Therefore select VMAP_STACK unconditionally and remove all config options and code required for !VMAP_STACK builds. Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-10s390/Kconfig: Select KASAN_VMALLOC if KASAN is enabledHeiko Carstens3-10/+4
Reduce the number of to be considered config options and select KASAN_VMALLOC if KASAN is enabled. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-10s390/mm: Fix DirectMap accountingHeiko Carstens1-3/+3
With uncoupling of physical and virtual address spaces population of the identity mapping was changed to use the type POPULATE_IDENTITY instead of POPULATE_DIRECT. This breaks DirectMap accounting: > cat /proc/meminfo DirectMap4k: 55296 kB DirectMap1M: 18446744073709496320 kB Adjust all locations of update_page_count() in vmem.c to use POPULATE_IDENTITY instead of POPULATE_DIRECT as well. With this accounting is correct again: > cat /proc/meminfo DirectMap4k: 54264 kB DirectMap1M: 8334336 kB Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Cc: stable@vger.kernel.org Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-12-10lib/crc32test: delete obsolete crc32test.cEric Biggers1-1/+0
Delete crc32test.c, since it has been superseded by crc_kunit.c. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Cc: Vinicius Peixoto <vpeixoto@lkcamp.dev> Link: https://lore.kernel.org/r/20241202012056.209768-11-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-09perf/core: Export perf_exclude_event()Namhyung Kim1-3/+3
While at it, rename the same function in s390 cpum_sf PMU. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20241203180441.1634709-2-namhyung@kernel.org
2024-12-02module: Convert symbol namespace to string literalPeter Zijlstra1-1/+1
Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-02Merge tag 'v6.13-rc1' into perf/core, to refresh the branchIngo Molnar111-1636/+2587
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-12-02s390/crc32: expose CRC32 functions through libEric Biggers11-322/+96
Move the s390 CRC32 assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going through the crypto API. It remains usable via the crypto API too via the shash algorithms that use the library interface. Thus all the arch-specific "shash" code becomes unnecessary and is removed. Note: to see the diff from arch/s390/crypto/crc32-vx.c to arch/s390/lib/crc32-glue.c, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20241202010844.144356-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-11-29Merge tag 's390-6.13-2' of ↵Linus Torvalds25-197/+336
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - Add swap entry for hugetlbfs support - Add PTE_MARKER support for hugetlbs mappings; this fixes a regression (possible page fault loop) which was introduced when support for UFFDIO_POISON for hugetlbfs was added - Add ARCH_HAS_PREEMPT_LAZY and PREEMPT_DYNAMIC support - Mark IRQ entries in entry code, so that stack tracers can filter out the non-IRQ parts of stack traces. This fixes stack depot capacity limit warnings, since without filtering the number of unique stack traces is huge - In PCI code fix leak of struct zpci_dev object, and fix potential double remove of hotplug slot - Fix pagefault_disable() / pagefault_enable() unbalance in arch_stack_user_walk_common() - A couple of inline assembly optimizations, more cmpxchg() to try_cmpxchg() conversions, and removal of usages of xchg() and cmpxchg() on one and two byte memory areas - Various other small improvements and cleanups * tag 's390-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (27 commits) Revert "s390/mm: Allow large pages for KASAN shadow mapping" s390/spinlock: Use flag output constraint for arch_cmpxchg_niai8() s390/spinlock: Use R constraint for arch_load_niai4() s390/spinlock: Generate shorter code for arch_spin_unlock() s390/spinlock: Remove condition code clobber from arch_spin_unlock() s390/spinlock: Use symbolic names in inline assemblies s390: Support PREEMPT_DYNAMIC s390/pci: Fix potential double remove of hotplug slot s390/pci: Fix leak of struct zpci_dev when zpci_add_device() fails s390/mm/hugetlbfs: Add missing includes s390/mm: Add PTE_MARKER support for hugetlbfs mappings s390/mm: Introduce region-third and segment table swap entries s390/mm: Introduce region-third and segment table entry present bits s390/mm: Rearrange region-third and segment table entry SW bits KVM: s390: Increase size of union sca_utility to four bytes KVM: s390: Remove one byte cmpxchg() usage KVM: s390: Use try_cmpxchg() instead of cmpxchg() loops s390/ap: Replace xchg() with WRITE_ONCE() s390/mm: Allow large pages for KASAN shadow mapping s390: Add ARCH_HAS_PREEMPT_LAZY support ...
2024-11-29Revert "s390/mm: Allow large pages for KASAN shadow mapping"Vasily Gorbik1-11/+1
This reverts commit ff123eb7741638d55abf82fac090bb3a543c1e74. Allowing large pages for KASAN shadow mappings isn't inherently wrong, but adding POPULATE_KASAN_MAP_SHADOW to large_allowed() exposes an issue in can_large_pud() and can_large_pmd(). Since commit d8073dc6bc04 ("s390/mm: Allow large pages only for aligned physical addresses"), both can_large_pud() and can_large_pmd() call _pa() to check if large page physical addresses are aligned. However, _pa() has a side effect: it allocates memory in POPULATE_KASAN_MAP_SHADOW mode. This results in massive memory leaks. The proper fix would be to address both large_allowed() and _pa()'s side effects, but for now, revert this change to avoid the leaks. Fixes: ff123eb77416 ("s390/mm: Allow large pages for KASAN shadow mapping") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28s390/spinlock: Use flag output constraint for arch_cmpxchg_niai8()Heiko Carstens1-2/+22
Add a new variant of arch_cmpxchg_niai8() which makes use of the flag output constraint, which allows the compiler to generate slightly better code. Also rename arch_cmpxchg_niai8() to arch_try_cmpxchg_niai8() which reflects the purpose of the function and makes it consistent with other "try" variants. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28s390/spinlock: Use R constraint for arch_load_niai4()Heiko Carstens1-1/+1
The load instruction used within arch_load_niai4() has a short displacement and index register. Therefore use the R constraint to reflect this. The used Q constraint does consider an index register. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28s390/spinlock: Generate shorter code for arch_spin_unlock()Heiko Carstens1-3/+3
Use mvhhi instead of sth to write a zero to spinlocks. Compared to the sth variant this avoids the load of zero to a register, and reduces register pressure. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28s390/spinlock: Remove condition code clobber from arch_spin_unlock()Heiko Carstens1-1/+1
Both instructions in arch_spin_unlock() do not clobber the condition code. Therefore remove the condition code clobber from the inline assembly. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28s390/spinlock: Use symbolic names in inline assembliesHeiko Carstens2-8/+9
Improve readability and use symbolic names. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28s390: Support PREEMPT_DYNAMICHeiko Carstens2-4/+19
Select HAVE_PREEMPT_DYNAMIC_KEY and add the pieces which are required to support PREEMPT_DYNAMIC. See commit 99cf983cc8bc ("sched/preempt: Add PREEMPT_DYNAMIC using static keys") and commit 1b2d3451ee50 ("arm64: Support PREEMPT_DYNAMIC") for more details. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28s390/pci: Fix potential double remove of hotplug slotNiklas Schnelle1-25/+9
In commit 6ee600bfbe0f ("s390/pci: remove hotplug slot when releasing the device") the zpci_exit_slot() was moved from zpci_device_reserved() to zpci_release_device() with the intention of keeping the hotplug slot around until the device is actually removed. Now zpci_release_device() is only called once all references are dropped. Since the zPCI subsystem only drops its reference once the device is in the reserved state it follows that zpci_release_device() must only deal with devices in the reserved state. Despite that it contains code to tear down from both configured and standby state. For the standby case this already includes the removal of the hotplug slot so would cause a double removal if a device was ever removed in either configured or standby state. Instead of causing a potential double removal in a case that should never happen explicitly WARN_ON() if a device in non-reserved state is released and get rid of the dead code cases. Fixes: 6ee600bfbe0f ("s390/pci: remove hotplug slot when releasing the device") Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28s390/pci: Fix leak of struct zpci_dev when zpci_add_device() failsNiklas Schnelle2-6/+25
Prior to commit 0467cdde8c43 ("s390/pci: Sort PCI functions prior to creating virtual busses") the IOMMU was initialized and the device was registered as part of zpci_create_device() with the struct zpci_dev freed if either resulted in an error. With that commit this was moved into a separate function called zpci_add_device(). While this new function logs when adding failed, it expects the caller not to use and to free the struct zpci_dev on error. This difference between it and zpci_create_device() was missed while changing the callers and the incompletely initialized struct zpci_dev may get used in zpci_scan_configured_device in the error path. This then leads to a crash due to the device not being registered with the zbus. It was also not freed in this case. Fix this by handling the error return of zpci_add_device(). Since in this case the zdev was not added to the zpci_list it can simply be discarded and freed. Also make this more explicit by moving the kref_init() into zpci_add_device() and document that zpci_zdev_get()/zpci_zdev_put() must be used after adding. Cc: stable@vger.kernel.org Fixes: 0467cdde8c43 ("s390/pci: Sort PCI functions prior to creating virtual busses") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-28s390/mm/hugetlbfs: Add missing includesHeiko Carstens1-0/+2
Add missing includes to fix this randconfig compile error: All errors (new ones prefixed by >>): In file included from mm/pagewalk.c:5: In file included from include/linux/hugetlb.h:798: >> arch/s390/include/asm/hugetlb.h:94:31: error: call to undeclared function 'is_pte_marker'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 94 | return huge_pte_none(pte) || is_pte_marker(pte); | ^ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202411281002.IPkRpIcR-lkp@intel.com/ Fixes: 487ef5d4d912 ("s390/mm: Add PTE_MARKER support for hugetlbfs mappings") Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-27s390/mm: Add PTE_MARKER support for hugetlbfs mappingsGerald Schaefer2-2/+3
Commit 8a13897fb0daa ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") added support for PTE_MARKER_POISONED for hugetlbfs, but PTE_MARKER also needs support for swap entries. For s390, swap entries were only supported on PTE level, not on the PMD/PUD levels that are used for large hugetlbfs mappings. Therefore, when writing a PTE_MARKER_POISONED entry, the resulting entry on PMD/PUD level would be an invalid / empty entry. Further access would then generate a pagefault loop, instead of the expected SIGBUS. It is a loop inside the kernel, but interruptible and uffd fault handling also calls schedule() in between, so at least it won't completely block the system. Previous commits prepared support for swap entries on PMD/PUD levels. PTE_MARKER support for hugetlbfs can now be enabled by simply adding an extra is_pte_marker() check to huge_pte_none_mostly(). Fault handling code also needs to be adjusted to expect the VM_FAULT_HWPOISON_LARGE fault flag, which was not possible on s390 before. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-27s390/mm: Introduce region-third and segment table swap entriesGerald Schaefer2-5/+71
Introduce region-third (PUD) and segment table (PMD) swap entries, and make hugetlbfs RSTE <-> PTE conversion code aware of them, so that they can be used for hugetlbfs PTE_MARKER entries. Future work could also build on this to enable THP_SWAP and THP_MIGRATION for s390. Similar to PTE swap entries, bits 0-51 can be used to store the swap offset, but bits 57-61 cannot be used for swap type because that overlaps with the INVALID and TABLE TYPE bits. PMD/PUD swap entries must be invalid, and have a correct table type so that pud_folded() check still works. Bits 53-57 can be used for swap type, but those include the PROTECT bit. So unlike swap PTEs, the PROTECT bit cannot be used to mark the swap entry. Use the "Common-Segment/Region" bit 59 instead for that. Also remove the !MACHINE_HAS_NX check in __set_huge_pte_at(). Otherwise, that would clear the _SEGMENT_ENTRY_NOEXEC bit also for swap entries, where it is used for encoding the swap type. The architecture only requires this bit to be 0 for PTEs, with !MACHINE_HAS_NX, not for segment or region-third entries. And the check is also redundant, because after __pte_to_rste() conversion, for non-swap PTEs it would only be set if it was already set in the PTE, which should never be the case for !MACHINE_HAS_NX. This is a prerequisite for hugetlbfs PTE_MARKER support on s390, which is needed to fix a regression introduced with commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs"). That commit depends on the availability of swap entries for hugetlbfs, which were not available for s390 so far. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-27s390/mm: Introduce region-third and segment table entry present bitsGerald Schaefer3-24/+47
Introduce region-third and segment table entry present SW bits, and adjust pmd/pud_present() accordingly. Also add pmd/pud_present() checks to pmd/pud_leaf(), to return false for future swap entries. Same logic applies to pmd_trans_huge(), make that return pmd_leaf() instead of duplicating the same check. huge_pte_offset() also needs to be adjusted, current code would return NULL for !pud_present(). Use the same logic as in the generic version, which allows for !pud_present() swap entries. Similar to PTE, bit 63 can be used for the new SW present bit in region and segment table entries. For segment-table entries (PMD) the architecture says that "Bits 62-63 are available for programming", so they are safe to use. The same is true for large leaf region-third-table entries (PUD). However, for non-leaf region-third-table entries, bits 62-63 indicate the TABLE LENGTH and both must be set to 1. But such entries would always be considered as present, so it is safe to use bit 63 as PRESENT bit for PUD. They also should not conflict with bit 62 potentially later used for preserving SOFT_DIRTY in swap entries, because they are not swap entries. Valid PMDs / PUDs should always have the present bit set, so add it to the various pgprot defines, and also _SEGMENT_ENTRY which is OR'ed e.g. in pmd_populate(). _REGION3_ENTRY wouldn't need any change, as the present bit is already included in the TABLE LENGTH, but also explicitly add it there, for completeness, and just in case the bit would ever be changed. gmap code needs some adjustment, to also OR the _SEGMENT_ENTRY, like it is already done gmap_shadow_pgt() when creating new PMDs, but not in __gmap_link(). Otherwise, the gmap PMDs would not be considered present, e.g. when using pmd_leaf() checks in gmap code. The various WARN_ON checks in gmap code also need adjustment, to tolerate the new present bit. This is a prerequisite for hugetlbfs PTE_MARKER support on s390, which is needed to fix a regression introduced with commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs"). That commit depends on the availability of swap entries for hugetlbfs, which were not available for s390 so far. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>