Age | Commit message (Collapse) | Author | Files | Lines |
|
Use the perf_callchain_user() code as blueprint to also add support for
USER_STACKTRACE. To describe how to use this cite the commit message of the
LoongArch implementation which came with commit 4d7bf939df08 ("LoongArch:
Add USER_STACKTRACE support"), but replace -fno-omit-frame-pointer option
with the s390 specific -mbackchain option:
====================================================================== To
get the best stacktrace output, you can compile your userspace programs
with frame pointers (at least glibc + the app you are tracing).
1, export "CC = gcc -mbackchain";
2, compile your programs with "CC";
3, use uprobe to get stacktrace output.
...
echo 'p:malloc /usr/lib64/libc.so.6:0x0a4704 size=%r2:u64' > uprobe_events
echo 'p:free /usr/lib64/libc.so.6:0x0a4d50 ptr=%r2:u64' >> uprobe_events
echo 'comm == "demo"' > ./events/uprobes/malloc/filter
echo 'comm == "demo"' > ./events/uprobes/free/filter
echo 1 > ./options/userstacktrace
echo 1 > ./options/sym-userobj
...
======================================================================
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Implement load_unaligned_zeropad() and enable DCACHE_WORD_ACCESS to
speed up string operations in fs/dcache.c and fs/namei.c.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 shadow stack support from Dave Hansen:
"This is the long awaited x86 shadow stack support, part of Intel's
Control-flow Enforcement Technology (CET).
CET consists of two related security features: shadow stacks and
indirect branch tracking. This series implements just the shadow stack
part of this feature, and just for userspace.
The main use case for shadow stack is providing protection against
return oriented programming attacks. It works by maintaining a
secondary (shadow) stack using a special memory type that has
protections against modification. When executing a CALL instruction,
the processor pushes the return address to both the normal stack and
to the special permission shadow stack. Upon RET, the processor pops
the shadow stack copy and compares it to the normal stack copy.
For more information, refer to the links below for the earlier
versions of this patch set"
Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/
* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
x86/shstk: Change order of __user in type
x86/ibt: Convert IBT selftest to asm
x86/shstk: Don't retry vm_munmap() on -EINTR
x86/kbuild: Fix Documentation/ reference
x86/shstk: Move arch detail comment out of core mm
x86/shstk: Add ARCH_SHSTK_STATUS
x86/shstk: Add ARCH_SHSTK_UNLOCK
x86: Add PTRACE interface for shadow stack
selftests/x86: Add shadow stack test
x86/cpufeatures: Enable CET CR4 bit for shadow stack
x86/shstk: Wire in shadow stack interface
x86: Expose thread features in /proc/$PID/status
x86/shstk: Support WRSS for userspace
x86/shstk: Introduce map_shadow_stack syscall
x86/shstk: Check that signal frame is shadow stack mem
x86/shstk: Check that SSP is aligned on sigreturn
x86/shstk: Handle signals for shadow stack
x86/shstk: Introduce routines modifying shstk
x86/shstk: Handle thread shadow stack
x86/shstk: Add user-mode shadow stack support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- An extensive rework of kexec and crash Kconfig from Eric DeVolder
("refactor Kconfig to consolidate KEXEC and CRASH options")
- kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
couple of macros to args.h")
- gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
commands")
- vsprintf inclusion rationalization from Andy Shevchenko
("lib/vsprintf: Rework header inclusions")
- Switch the handling of kdump from a udev scheme to in-kernel
handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
hot un/plug")
- Many singleton patches to various parts of the tree
* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
document while_each_thread(), change first_tid() to use for_each_thread()
drivers/char/mem.c: shrink character device's devlist[] array
x86/crash: optimize CPU changes
crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
crash: hotplug support for kexec_load()
x86/crash: add x86 crash hotplug support
crash: memory and CPU hotplug sysfs attributes
kexec: exclude elfcorehdr from the segment digest
crash: add generic infrastructure for crash hotplug support
crash: move a few code bits to setup support of crash hotplug
kstrtox: consistently use _tolower()
kill do_each_thread()
nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
scripts/bloat-o-meter: count weak symbol sizes
treewide: drop CONFIG_EMBEDDED
lockdep: fix static memory detection even more
lib/vsprintf: declare no_hash_pointers in sprintf.h
lib/vsprintf: split out sprintf() and friends
kernel/fork: stop playing lockless games for exe_file replacement
adfs: delete unused "union adfs_dirtail" definition
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in
add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support
tracking KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with
UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the
GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
architectures to take GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency
improvements ("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation,
from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code
("Two minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
most file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
memmap on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock
migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table
range API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM
subsystem documentation ("Improve mm documentation").
* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
maple_tree: shrink struct maple_tree
maple_tree: clean up mas_wr_append()
secretmem: convert page_is_secretmem() to folio_is_secretmem()
nios2: fix flush_dcache_page() for usage from irq context
hugetlb: add documentation for vma_kernel_pagesize()
mm: add orphaned kernel-doc to the rst files.
mm: fix clean_record_shared_mapping_range kernel-doc
mm: fix get_mctgt_type() kernel-doc
mm: fix kernel-doc warning from tlb_flush_rmaps()
mm: remove enum page_entry_size
mm: allow ->huge_fault() to be called without the mmap_lock held
mm: move PMD_ORDER to pgtable.h
mm: remove checks for pte_index
memcg: remove duplication detection for mem_cgroup_uncharge_swap
mm/huge_memory: work on folio->swap instead of page->private when splitting folio
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
mm/swap: use dedicated entry for swap in folio
mm/swap: stop using page->private on tail pages for THP_SWAP
selftests/mm: fix WARNING comparing pointer to 0
selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
...
|
|
This patch is a minor cleanup to the series "refactor Kconfig to
consolidate KEXEC and CRASH options".
In that series, a new option ARCH_DEFAULT_KEXEC was introduced in order to
obtain the equivalent behavior of s390 original Kconfig settings for
KEXEC. As it turns out, this new option did not fully provide the
equivalent behavior, rather a "select KEXEC" did.
As such, the ARCH_DEFAULT_KEXEC is not needed anymore, so remove it.
Link: https://lkml.kernel.org/r/20230802161750.2215-1-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The Kconfig refactor to consolidate KEXEC and CRASH options utilized
option names of the form ARCH_SUPPORTS_<option>. Thus rename the
ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow
the same.
Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.
Link: https://lkml.kernel.org/r/20230712161545.87870-13-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Arm disabled hugetlb vmemmap optimization [1] because hugetlb vmemmap
optimization includes an update of both the permissions (writeable to
read-only) and the output address (pfn) of the vmemmap ptes. That is not
supported without unmapping of pte(marking it invalid) by some
architectures.
With DAX vmemmap optimization we don't require such pte updates and
architectures can enable DAX vmemmap optimization while having hugetlb
vmemmap optimization disabled. Hence split DAX optimization support into
a different config.
s390, loongarch and riscv don't have devdax support. So the DAX config is
not enabled for them. With this change, arm64 should be able to select
DAX optimization
[1] commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP")
Link: https://lkml.kernel.org/r/20230724190759.483013-8-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
By taking GENERIC_IOREMAP method, the generic generic_ioremap_prot(),
generic_iounmap(), and their generic wrapper ioremap_prot(), ioremap() and
iounmap() are all visible and available to arch. Arch needs to provide
wrapper functions to override the generic versions if there's arch
specific handling in its ioremap_prot(), ioremap() or iounmap(). This
change will simplify implementation by removing duplicated code with
generic_ioremap_prot() and generic_iounmap(), and has the equivalent
functioality as before.
Here, add wrapper functions ioremap_prot() and iounmap() for s390's
special operation when ioremap() and iounmap().
And also replace including <asm-generic/io.h> with <asm/io.h> in
arch/s390/kernel/perf_cpum_sf.c, otherwise building error will be seen
because macro defined in <asm/io.h> can't be seen in perf_cpum_sf.c.
Link: https://lkml.kernel.org/r/20230706154520.11257-11-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A build failure was reported when sha256() is not present:
gcc-13.1.0-nolibc/s390-linux/bin/s390-linux-ld: arch/s390/kernel/cert_store.o: in function `check_certificate_hash':
arch/s390/kernel/cert_store.c:267: undefined reference to `sha256'
Therefore make CONFIG_CERT_STORE select CRYPTO_LIB_SHA256.
Fixes: 8cf57d7217c3 ("s390: add support for user-defined certificates")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/all/8ecb57fb-4560-bdfc-9e55-63e3b0937132@infradead.org/
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230728100430.1567328-1-svens@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
and fix all in-tree references.
Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation directory and making
the docs hierarchy more closely match the source hierarchy.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230718045550.495428-1-costa.shul@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add support for tracing return values in the function graph tracer.
This requires return_to_handler() to record gpr2 and the frame pointer
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The s390_hypfs filesystem is deprecated and shouldn't be used due to its
rather odd semantics. It creates a whole directory structure with static
file contents so a user can read a consistent state while within that
directory.
Writing to its update attribute will remove and rebuild nearly the whole
filesystem, so that again a user can read a consistent state, even if
multiple files need to be read.
Given that this wastes a lot of CPU cycles, and involves a lot of code,
binary interfaces have been added quite a couple of years ago, which simply
pass the binary data to user space, and let user space decode the data.
This is the preferred and only way how the data should be retrieved.
The assumption is that there are no users of the s390_hypfs filesystem.
However instead of just removing the code, and having to revert in case
there are actually users, factor the filesystem code out and make it only
available via a new config option.
This config option is supposed to be disabled. If it turns out there are no
complaints the filesystem code can be removed probably in a couple of
years.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Enable receiving the user-defined certificates from the s390x
hypervisor via new diagnose 0x320 calls, and make them available to the
Linux root user as 'cert_store_key' type keys in a so-called
'cert_store' keyring.
New user-space interfaces:
/sys/firmware/cert_store/refresh
Writing to this attribute re-fetches certificates via DIAG 0x320
/sys/firmware/cert_store/cs_status
Reading from this attribute returns either of:
"uninitialized"
If no certificate has been retrieved yet
"ok"
If certificates have been successfully retrieved
"failed (<number>)"
If certificate retrieval failed with reason code <number>
New debug trace areas:
/sys/kernel/debug/s390dbf/cert_store_msg
/sys/kernel/debug/s390dbf/cert_store_hexdump
Usage example:
To initiate request for certificates available to the system as root:
$ echo 1 > /sys/firmware/cert_store/refresh
Upon success the '/sys/firmware/cert_store/cs_status' contains
the value 'ok'.
$ cat /sys/firmware/cert_store/cs_status
ok
Get the ID of the keyring 'cert_store':
$ keyctl search @us keyring cert_store
OR
$ keyctl link @us @s; keyctl request keyring cert_store
Obtain list of IDs of certificates:
$ keyctl rlist <cert_store keyring ID>
Display certificate content as hex-dump:
$ keyctl read <certificate ID>
Read certificate contents as binary data:
$ keyctl pipe <certificate ID> >cert_data
Display certificate description:
$ keyctl describe <certificate ID>
The certificate description has the following format:
<64 bytes certificate name in EBCDIC> ':'
<certificate index as obtained from hypervisor> ':'
<certificate store token obtained from hypervisor>
The certificate description in /proc/keys has certificate name
represented in ASCII.
Users can read but cannot update the content of the certificate.
Signed-off-by: Anastasia Eskova <anastasia.eskova@ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The x86 Shadow stack feature includes a new type of memory called shadow
stack. This shadow stack memory has some unusual properties, which requires
some core mm changes to function properly.
One of these unusual properties is that shadow stack memory is writable,
but only in limited ways. These limits are applied via a specific PTE
bit combination. Nevertheless, the memory is writable, and core mm code
will need to apply the writable permissions in the typical paths that
call pte_mkwrite(). The goal is to make pte_mkwrite() take a VMA, so
that the x86 implementation of it can know whether to create regular
writable or shadow stack mappings.
But there are a couple of challenges to this. Modifying the signatures of
each arch pte_mkwrite() implementation would be error prone because some
are generated with macros and would need to be re-implemented. Also, some
pte_mkwrite() callers operate on kernel memory without a VMA.
So this can be done in a three step process. First pte_mkwrite() can be
renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
added that just calls pte_mkwrite_novma(). Next callers without a VMA can
be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
can be changed to take/pass a VMA.
Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and
adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same
pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(),
create a new arch config HAS_HUGE_PAGE that can be used to tell if
pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the
compiler would not be able to find pmd_mkwrite_novma().
No functional change.
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/
Link: https://lore.kernel.org/all/20230613001108.3040476-2-rick.p.edgecombe%40intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Fix the style of protected key API driver source: use x-mas tree for
all local variable declarations
- Rework protected key API driver to not use the struct pkey_protkey
and pkey_clrkey anymore. Both structures have a fixed size buffer,
but with the support of ECC protected key these buffers are not big
enough. Use dynamic buffers internally and transparently for
userspace
- Add support for a new 'non CCA clear key token' with ECC clear keys
supported: ECC P256, ECC P384, ECC P521, ECC ED25519 and ECC ED448.
This makes it possible to derive a protected key from the ECC clear
key input via PKEY_KBLOB2PROTK3 ioctl, while currently the only way
to derive is via PCKMO instruction
- The s390 PMU of PAI crypto and extension 1 NNPA counters use atomic_t
for reference counting. Replace this with the proper data type
refcount_t
- Select ARCH_SUPPORTS_INT128, but limit this to clang for now, since
gcc generates inefficient code, which may lead to stack overflows
- Replace one-element array with flexible-array member in struct
vfio_ccw_parent and refactor the rest of the code accordingly. Also,
prefer struct_size() over sizeof() open- coded versions
- Introduce OS_INFO_FLAGS_ENTRY pointing to a flags field and
OS_INFO_FLAG_REIPL_CLEAR flag that informs a dumper whether the
system memory should be cleared or not once dumped
- Fix a hang when a user attempts to remove a VFIO-AP mediated device
attached to a guest: add VFIO_DEVICE_GET_IRQ_INFO and
VFIO_DEVICE_SET_IRQS IOCTLs and wire up the VFIO bus driver callback
to request a release of the device
- Fix calculation for R_390_GOTENT relocations for modules
- Allow any user space process with CAP_PERFMON capability read and
display the CPU Measurement facility counter sets
- Rework large statically-defined per-CPU cpu_cf_events data structure
and replace it with dynamically allocated structures created when a
perf_event_open() system call is invoked or /dev/hwctr device is
accessed
* tag 's390-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpum_cf: rework PER_CPU_DEFINE of struct cpu_cf_events
s390/cpum_cf: open access to hwctr device for CAP_PERFMON privileged process
s390/module: fix rela calculation for R_390_GOTENT
s390/vfio-ap: wire in the vfio_device_ops request callback
s390/vfio-ap: realize the VFIO_DEVICE_SET_IRQS ioctl
s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctl
s390/pkey: add support for ecc clear key
s390/pkey: do not use struct pkey_protkey
s390/pkey: introduce reverse x-mas trees
s390/zcore: conditionally clear memory on reipl
s390/ipl: add REIPL_CLEAR flag to os_info
vfio/ccw: use struct_size() helper
vfio/ccw: replace one-element array with flexible-array member
s390: select ARCH_SUPPORTS_INT128
s390/pai_ext: replace atomic_t with refcount_t
s390/pai_crypto: replace atomic_t with refcount_t
|
|
These don't do anything anymore, the only user of the symbol was
VFIO_CCW/AP which already "depends on VFIO" and VFIO itself selects
IOMMU_API.
When this was added VFIO was wrongly doing "depends on IOMMU_API" which
required some contortions like this to ensure IOMMU_API was turned on.
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-eb322ce2e547+188f-rm_iommu_ccw_jgg@nvidia.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Commit f1045056c726 ("topology/sysfs: rework book and drawer topology
ifdefery") activates the book and drawer topology, previously activated by
CONFIG_SCHED_{BOOK,DRAWER}, dependent on the existence of certain macro
definitions. Hence, since then, CONFIG_SCHED_{BOOK,DRAWER} have no effect
and any further purpose.
Remove the obsolete configs SCHED_{BOOK,DRAWER}.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20230508040916.16733-1-lukas.bulwahn@gmail.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
s390 has instructions to support 128 bit arithmetics, e.g. a 64 bit
multiply instruction with a 128 bit result. Also 128 bit integer
artithmetics are already used in s390 specific architecture code (see
e.g. read_persistent_clock64()).
Therefore select ARCH_SUPPORTS_INT128.
However limit this to clang for now, since gcc generates inefficient code,
which may lead to stack overflows, when compiling
lib/crypto/curve25519-hacl64.c which depends on ARCH_SUPPORTS_INT128. The
gcc generated functions have 6kb stack frames, compared to only 1kb of the
code generated with clang.
If the kernel is compiled with -Os library calls for __ashlti3(),
__ashrti3(), and __lshrti3() may be generated. Similar to arm64
and riscv provide assembler implementations for these functions.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Commit 0da6e5fd6c37 ("gcc: disable '-Warray-bounds' for gcc-13 too") makes
config GCC11_NO_ARRAY_BOUNDS to be for disabling -Warray-bounds in any gcc
version 11 and upwards, and with that, removes the GCC12_NO_ARRAY_BOUNDS
config as it is now covered by the semantics of GCC11_NO_ARRAY_BOUNDS.
As GCC11_NO_ARRAY_BOUNDS is yes by default, there is no need for the s390
architecture to explicitly select GCC11_NO_ARRAY_BOUNDS. Hence, the select
GCC12_NO_ARRAY_BOUNDS in arch/s390/Kconfig can simply be dropped.
Remove the unneeded "select GCC12_NO_ARRAY_BOUNDS".
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for stackleak feature. Also allow specifying
architecture-specific stackleak poison function to enable faster
implementation. On s390, the mvc-based implementation helps decrease
typical overhead from a factor of 3 to just 25%
- Convert all assembler files to use SYM* style macros, deprecating the
ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS
- Improve KASLR to also randomize module and special amode31 code base
load addresses
- Rework decompressor memory tracking to support memory holes and
improve error handling
- Add support for protected virtualization AP binding
- Add support for set_direct_map() calls
- Implement set_memory_rox() and noexec module_alloc()
- Remove obsolete overriding of mem*() functions for KASAN
- Rework kexec/kdump to avoid using nodat_stack to call purgatory
- Convert the rest of the s390 code to use flexible-array member
instead of a zero-length array
- Clean up uaccess inline asm
- Enable ARCH_HAS_MEMBARRIER_SYNC_CORE
- Convert to using CONFIG_FUNCTION_ALIGNMENT and enable
DEBUG_FORCE_FUNCTION_ALIGN_64B
- Resolve last_break in userspace fault reports
- Simplify one-level sysctl registration
- Clean up branch prediction handling
- Rework CPU counter facility to retrieve available counter sets just
once
- Other various small fixes and improvements all over the code
* tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits)
s390/stackleak: provide fast __stackleak_poison() implementation
stackleak: allow to specify arch specific stackleak poison function
s390: select ARCH_USE_SYM_ANNOTATIONS
s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc()
s390: wire up memfd_secret system call
s390/mm: enable ARCH_HAS_SET_DIRECT_MAP
s390/mm: use BIT macro to generate SET_MEMORY bit masks
s390/relocate_kernel: adjust indentation
s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc.
s390/entry: use SYM* macros instead of ENTRY(), etc.
s390/purgatory: use SYM* macros instead of ENTRY(), etc.
s390/kprobes: use SYM* macros instead of ENTRY(), etc.
s390/reipl: use SYM* macros instead of ENTRY(), etc.
s390/head64: use SYM* macros instead of ENTRY(), etc.
s390/earlypgm: use SYM* macros instead of ENTRY(), etc.
s390/mcount: use SYM* macros instead of ENTRY(), etc.
s390/crc32le: use SYM* macros instead of ENTRY(), etc.
s390/crc32be: use SYM* macros instead of ENTRY(), etc.
s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc.
s390/amode31: use SYM* macros instead of ENTRY(), etc.
...
|
|
All old style assembly annotations have been converted for s390. Select
ARCH_USE_SYM_ANNOTATIONS to make sure the old macros like ENTRY() aren't
available anymore. This prevents that new code which uses the old macros
will be added again.
This follows what has been done for x86 with commit 2ce0d7f9766f ("x86/asm:
Provide a Kconfig symbol for disabling old assembly annotations") and for
arm64 with commit 50479d58eaa3 ("arm64: Disable old style assembly
annotations").
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Implement the set_direct_map_*() API, which allows to invalidate and set
default permissions to pages within the direct mapping.
Note that kernel_page_present(), which is also supposed to be part of this
API, is intentionally not implemented. The reason for this is that
kernel_page_present() is only used (and currently only makes sense) for
suspend/resume, which isn't supported on s390.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to
indicate devdax and hugetlb vmemmap optimization support. Hence rename
that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP
Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Tarun Sahu <tsahu@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit dfe843dce775 ("s390/checksum: support GENERIC_CSUM, enable it for
KASAN") switched s390 to use the generic checksum functions, so that KASAN
instrumentation also works checksum functions by avoiding architecture
specific inline assemblies.
There is however the problem that the generic csum_partial() function
returns a 32 bit value with a 16 bit folded checksum, while the original
s390 variant does not fold to 16 bit. This in turn causes that the
ipib_checksum in lowcore contains different values depending on kernel
config options.
The ipib_checksum is used by system dumpers to verify if pointers in
lowcore point to valid data. Verification is done by comparing checksum
values. The system dumpers still use 32 bit checksum values which are not
folded, and therefore the checksum verification fails (incorrectly).
Symptom is that reboot after dump does not work anymore when a KASAN
instrumented kernel is dumped.
Fix this by not using the generic checksum implementation. Instead add an
explicit kasan_check_read() so that KASAN knows about the read access from
within the inline assembly.
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Fixes: dfe843dce775 ("s390/checksum: support GENERIC_CSUM, enable it for KASAN")
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
This is the s390 variant of "x86/mm: try VMA lock-based page fault handling
first".
Link: https://lkml.kernel.org/r/20230314132808.1266335-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add support for the stackleak feature. Whenever the kernel returns to user
space the kernel stack is filled with a poison value.
Enabling this feature is quite expensive: e.g. after instrumenting the
getpid() system call function to have a 4kb stack the result is an
increased runtime of the system call by a factor of 3.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
s390 trivially supports the ARCH_HAS_MEMBARRIER_SYNC_CORE requirements
since the used lpswe(y) instruction to return from any kernel context to
user space performs CPU serialization. This is very similar to arm, arm64
and powerpc.
See commit 70216e18e519 ("membarrier: Provide core serializing command,
*_SYNC_CORE") for further details.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Make use of CONFIG_FUNCTION_ALIGNMENT which was introduced with commit
d49a0626216b ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT").
Select FUNCTION_ALIGNMENT_8B for gcc in order to reflect gcc's default
function alignment. For all other compilers, which is only clang, select
a function alignment of 16 bytes which reflects the default function
alignment for clang.
Also change the __ALIGN define to follow whatever the value of
CONFIG_FUNCTION_ALIGNMENT is. This makes sure that the alignment of C and
assembler functions is the same.
In result everything still uses the default function alignment for both
compilers. However in addition this is now also true for all assembly
functions, so that all functions have a consistent alignment.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Add empty command line parameter handling stubs to kernel for all
command line parameters which are handled in the decompressor. This
avoids invalid "Unknown kernel command line parameters" messages from
the kernel, and also avoids that these will be incorrectly passed to
user space. This caused already confusion, therefore add the empty
stubs
- Add missing phys_to_virt() handling to machine check handler
- Introduce and use a union to be used for zcrypt inline assemblies.
This makes sure that only a register wide member of the union is
passed as input and output parameter to inline assemblies, while
usual C code uses other members of the union to access bit fields of
it
- Add and use a READ_ONCE_ALIGNED_128() macro, which can be used to
atomically read a 128-bit value from memory. This replaces the
(mis-)use of the 128-bit cmpxchg operation to do the same in cpum_sf
code. Currently gcc does not generate the used lpq instruction if
__READ_ONCE() is used for aligned 128-bit accesses, therefore use
this s390 specific helper
- Simplify machine check handler code if a task needs to be killed
because of e.g. register corruption due to a machine malfunction
- Perform CPU reset to clear pending interrupts and TLB entries on an
already stopped target CPU before delegating work to it
- Generate arch/s390/boot/vmlinux.map link map for the decompressor,
when CONFIG_VMLINUX_MAP is enabled for debugging purposes
- Fix segment type handling for dcssblk devices. It incorrectly always
returned type "READ/WRITE" even for read-only segements, which can
result in a kernel panic if somebody tries to write to a read-only
device
- Sort config S390 select list again
- Fix two kprobe reenter bugs revealed by a recently added kprobe kunit
test
* tag 's390-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kprobes: fix current_kprobe never cleared after kprobes reenter
s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handler
s390/Kconfig: sort config S390 select list again
s390/extmem: return correct segment type in __segment_load()
s390/decompressor: add link map saving
s390/smp: perform cpu reset before delegating work to target cpu
s390/mcck: cleanup user process termination path
s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg
s390/rwonce: add READ_ONCE_ALIGNED_128() macro
s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg union
s390/nmi: fix virtual-physical address confusion
s390/setup: do not complain about parameters handled in decompressor
|
|
Keep the config S390 select list sorted.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Pull VFIO updates from Alex Williamson:
- Remove redundant resource check in vfio-platform (Angus Chen)
- Use GFP_KERNEL_ACCOUNT for persistent userspace allocations, allowing
removal of arbitrary kernel limits in favor of cgroup control (Yishai
Hadas)
- mdev tidy-ups, including removing the module-only build restriction
for sample drivers, Kconfig changes to select mdev support,
documentation movement to keep sample driver usage instructions with
sample drivers rather than with API docs, remove references to
out-of-tree drivers in docs (Christoph Hellwig)
- Fix collateral breakages from mdev Kconfig changes (Arnd Bergmann)
- Make mlx5 migration support match device support, improve source and
target flows to improve pre-copy support and reduce downtime (Yishai
Hadas)
- Convert additional mdev sysfs case to use sysfs_emit() (Bo Liu)
- Resolve copy-paste error in mdev mbochs sample driver Kconfig (Ye
Xingchen)
- Avoid propagating missing reset error in vfio-platform if reset
requirement is relaxed by module option (Tomasz Duszynski)
- Range size fixes in mlx5 variant driver for missed last byte and
stricter range calculation (Yishai Hadas)
- Fixes to suspended vaddr support and locked_vm accounting, excluding
mdev configurations from the former due to potential to indefinitely
block kernel threads, fix underflow and restore locked_vm on new mm
(Steve Sistare)
- Update outdated vfio documentation due to new IOMMUFD interfaces in
recent kernels (Yi Liu)
- Resolve deadlock between group_lock and kvm_lock, finally (Matthew
Rosato)
- Fix NULL pointer in group initialization error path with IOMMUFD (Yan
Zhao)
* tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfio: (32 commits)
vfio: Fix NULL pointer dereference caused by uninitialized group->iommufd
docs: vfio: Update vfio.rst per latest interfaces
vfio: Update the kdoc for vfio_device_ops
vfio/mlx5: Fix range size calculation upon tracker creation
vfio: no need to pass kvm pointer during device open
vfio: fix deadlock between group lock and kvm lock
vfio: revert "iommu driver notify callback"
vfio/type1: revert "implement notify callback"
vfio/type1: revert "block on invalid vaddr"
vfio/type1: restore locked_vm
vfio/type1: track locked_vm per dma
vfio/type1: prevent underflow of locked_vm via exec()
vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
vfio: platform: ignore missing reset if disabled at module init
vfio/mlx5: Improve the target side flow to reduce downtime
vfio/mlx5: Improve the source side flow upon pre_copy
vfio/mlx5: Check whether VF is migratable
samples: fix the prompt about SAMPLE_VFIO_MDEV_MBOCHS
vfio/mdev: Use sysfs_emit() to instead of sprintf()
vfio-mdev: add back CONFIG_VFIO dependency
...
|
|
CONFIG_VFIO_MDEV cannot be selected when VFIO itself is
disabled, otherwise we get a link failure:
WARNING: unmet direct dependencies detected for VFIO_MDEV
Depends on [n]: VFIO [=n]
Selected by [y]:
- SAMPLE_VFIO_MDEV_MTTY [=y] && SAMPLES [=y]
- SAMPLE_VFIO_MDEV_MDPY [=y] && SAMPLES [=y]
- SAMPLE_VFIO_MDEV_MBOCHS [=y] && SAMPLES [=y]
/home/arnd/cross/arm64/gcc-13.0.1-nolibc/x86_64-linux/bin/x86_64-linux-ld: samples/vfio-mdev/mdpy.o: in function `mdpy_remove':
mdpy.c:(.text+0x1e1): undefined reference to `vfio_unregister_group_dev'
/home/arnd/cross/arm64/gcc-13.0.1-nolibc/x86_64-linux/bin/x86_64-linux-ld: samples/vfio-mdev/mdpy.o: in function `mdpy_probe':
mdpy.c:(.text+0x149e): undefined reference to `_vfio_alloc_device'
Fixes: 8bf8c5ee1f38 ("vfio-mdev: turn VFIO_MDEV into a selectable symbol")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230126211211.1762319-1-arnd@kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
VFIO_MDEV is just a library with helpers for the drivers. Stop making
it a user choice and just select it by the drivers that use the helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20230110091009.474427-3-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
That's an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes:
Replace kretprobe with rethook on x86") to s390.
Replaces the kretprobe code with rethook on s390. With this patch,
kretprobe on s390 uses the rethook instead of kretprobe specific
trampoline code.
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Factor out handle_write() function and simplify 3215 console write
operation
- When 3170 terminal emulator is connected to the 3215 console driver
the boot time could be very long due to limited buffer space or
missing operator input. Add con3215_drop command line parameter and
con3215_drop sysfs attribute file to instruct the kernel drop console
data when such conditions are met
- Fix white space errors in 3215 console driver
- Move enum paiext_mode definition to a header file and rename it to
paievt_mode to indicate this is now used for several events. Rename
PAI_MODE_COUNTER to PAI_MODE_COUNTING to make consistent with
PAI_MODE_SAMPLING
- Simplify the logic of PMU pai_crypto mapped buffer reference counter
and make it consistent with PMU pai_ext
- Rename PMU pai_crypto mapped buffer structure member users to
active_events to make it consistent with PMU pai_ext
- Enable HUGETLB_PAGE_OPTIMIZE_VMEMMAP configuration option. This
results in saving of 12K per 1M hugetlb page (~1.2%) and 32764K per
2G hugetlb page (~1.6%)
- Use generic serial.h, bugs.h, shmparam.h and vga.h header files and
scrap s390-specific versions
- The generic percpu setup code does not expect the s390-like
implementation and emits a warning. To get rid of that warning and
provide sane CPU-to-node and CPU-to-CPU distance mappings implementat
a minimal version of setup_per_cpu_areas()
- Use kstrtobool() instead of strtobool() for re-IPL sysfs device
attributes
- Avoid unnecessary lookup of a pointer to MSI descriptor when setting
IRQ affinity for a PCI device
- Get rid of "an incompatible function type cast" warning by changing
debug_sprintf_format_fn() function prototype so it matches the
debug_format_proc_t function type
- Remove unused info_blk_hdr__pcpus() and get_page_state() functions
- Get rid of clang "unused unused insn cache ops function" warning by
moving s390_insn definition to a private header
- Get rid of clang "unused function" warning by making function
raw3270_state_final() only available if CONFIG_TN3270_CONSOLE is
enabled
- Use kstrobool() to parse sclp_con_drop parameter to make it identical
to the con3215_drop parameter and allow passing values like "yes" and
"true"
- Use sysfs_emit() for all SCLP sysfs show functions, which is the
current standard way to generate output strings
- Make SCLP con_drop sysfs attribute also writable and allow to change
its value during runtime. This makes SCLP console drop handling
consistent with the 3215 device driver
- Virtual and physical addresses are indentical on s390. However, there
is still a confusion when pointers are directly casted to physical
addresses or vice versa. Use correct address converters
virt_to_phys() and phys_to_virt() for s390 channel IO drivers
- Support for power managemant has been removed from s390 since quite
some time. Remove unused power managemant code from the appldata
device driver
- Allow memory tools like KASAN see memory accesses from the checksum
code. Switch to GENERIC_CSUM if KASAN is enabled, just like x86 does
- Add support of ECKD DASDs disks so it could be used as boot and dump
devices
- Follow checkpatch recommendations and use octal values instead of
S_IRUGO and S_IWUSR for dump device attributes in sysfs
- Changes to vx-insn.h do not cause a recompile of C files that use
asm(".include \"asm/vx-insn.h\"\n") magic to access vector
instruction macros from inline assemblies. Add wrapper include header
file to avoid this problem
- Use vector instruction macros instead of byte patterns to increase
register validation routine readability
- The current machine check register validation handling does not take
into account various scenarios and might lead to killing a wrong user
process or potentially ignore corrupted FPU registers. Simplify logic
of the machine check handler and stop the whole machine if the
previous context was kerenel mode. If the previous context was user
mode, kill the current task
- Introduce sclp_emergency_printk() function which can be used to emit
a message in emergency cases. It is supposed to be used in cases
where regular console device drivers may not work anymore, e.g.
unrecoverable machine checks
Keep the early Service-Call Control Block so it can also be used
after initdata has been freed to allow sclp_emergency_printk()
implementation
- In case a system will be stopped because of an unrecoverable machine
check error print the machine check interruption code to give a hint
of what went wrong
- Move storage error checking from the assembly entry code to C in
order to simplify machine check handling. Enter the handler with DAT
turned on, which simplifies the entry code even more
- The machine check extended save areas are allocated using a private
"nmi_save_areas" slab cache which guarantees a required power-of-two
alignment. Get rid of that cache in favour of kmalloc()
* tag 's390-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (38 commits)
s390/nmi: get rid of private slab cache
s390/nmi: move storage error checking back to C, enter with DAT on
s390/nmi: print machine check interruption code before stopping system
s390/sclp: introduce sclp_emergency_printk()
s390/sclp: keep sclp_early_sccb
s390/nmi: rework register validation handling
s390/nmi: use vector instruction macros instead of byte patterns
s390/vx: add vx-insn.h wrapper include file
s390/ipl: use octal values instead of S_* macros
s390/ipl: add eckd dump support
s390/ipl: add eckd support
vfio/ccw: identify CCW data addresses as physical
vfio/ccw: sort out physical vs virtual pointers usage
s390/checksum: support GENERIC_CSUM, enable it for KASAN
s390/appldata: remove power management callbacks
s390/cio: sort out physical vs virtual pointers usage
s390/sclp: allow to change sclp_console_drop during runtime
s390/sclp: convert to use sysfs_emit()
s390/sclp: use kstrobool() to parse sclp_con_drop parameter
s390/3270: make raw3270_state_final() depend on CONFIG_TN3270_CONSOLE
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates. This is the second in a series from an ongoing
review of the RCU documentation.
- Miscellaneous fixes.
- Introduce a default-off Kconfig option that depends on RCU_NOCB_CPU
that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument
CPU lists, causes call_rcu() to introduce delays.
These delays result in significant power savings on nearly idle
Android and ChromeOS systems. These savings range from a few percent
to more than ten percent.
This series also includes several commits that change call_rcu() to a
new call_rcu_hurry() function that avoids these delays in a few
cases, for example, where timely wakeups are required. Several of
these are outside of RCU and thus have acks and reviews from the
relevant maintainers.
- Create an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe()
for architectures that support NMIs, but which do not provide
NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required
by the upcoming lockless printk() work by John Ogness et al.
- Changes providing minor but important increases in torture test
coverage for the new RCU polled-grace-period APIs.
- Changes to torturescript that avoid redundant kernel builds, thus
providing about a 30% speedup for the torture.sh acceptance test.
* tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
net: devinet: Reduce refcount before grace period
net: Use call_rcu_hurry() for dst_release()
workqueue: Make queue_rcu_work() use call_rcu_hurry()
percpu-refcount: Use call_rcu_hurry() for atomic switch
scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
rcu/rcutorture: Use call_rcu_hurry() where needed
rcu/rcuscale: Use call_rcu_hurry() for async reader test
rcu/sync: Use call_rcu_hurry() instead of call_rcu
rcuscale: Add laziness and kfree tests
rcu: Shrinker for lazy rcu
rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
rcu: Make call_rcu() lazy to save power
rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC
srcu: Debug NMI safety even on archs that don't require it
srcu: Explain the reason behind the read side critical section on GP start
srcu: Warn when NMI-unsafe API is used in NMI
arch/s390: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
arch/loongarch: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()
rcu-tasks: Make grace-period-age message human-readable
...
|
|
This is the s390 variant of commit d911c67e10b4 ("x86: kasan: kmsan:
support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN"). Even
though most of the s390 specific checksum code is written in C there is
still the csum_partial() inline assembly which could prevent KASAN and
KMSAN from seeing all memory accesses.
Therefore switch to GENERIC_CSUM if KASAN is enabled just like x86.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
s390 allows to enable CONFIG_NUMA, mainly to enable a couple of system
calls which are only present if NUMA is enabled. The NUMA specific system
calls are required by a couple of applications, which wouldn't work if the
system calls wouldn't be present.
The NUMA implementation itself maps all CPUs and memory to node 0. A
special case is the generic percpu setup code, which doesn't expect an s390
like implementation and therefore emits a message/warning:
"percpu: cpu 0 has no node -1 or node-local memory".
In order to get rid of this message, and also to provide sane CPU to node
and CPU distance mappings implement a minimal setup_per_cpu_areas()
function, which is very close to the generic variant.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Enable HUGETLB_PAGE_OPTIMIZE_VMEMMAP for s390.
With this, vmemmap pages used to back struct pages for compound tail
pages of hugetlb pages are freed and remapped to compound head page
frame as RO, see also Documentation/vm/vmemmap_dedup.rst.
For 1M hugetlb pages, this results in freeing 3 of 4 vmemmap pages,
saving 12K of memory for each 1M hugetlb page (~1.2%).
/sys/kernel/debug/kernel_page_tables will show the impact:
---[ vmemmap Area Start ]---
[...]
0x0000037202d84000-0x0000037202d85000 4K PTE RW NX
0x0000037202d85000-0x0000037202d88000 12K PTE RO NX
For 2G hugetlb pages, this results in freeing 8191 of 8192 vmemmap
pages, saving 32764K of memory for each 2G hugetlb page (~1.6%)
/sys/kernel/debug/kernel_page_tables will show the impact:
---[ vmemmap Area Start ]---
[...]
0x000003720a000000-0x000003720a001000 4K PTE RW NX
0x000003720a001000-0x000003720c000000 32764K PTE RO NX
The memory savings come with some costs:
- vmemmap mapping for compound hugetlb pages is not a PMD mapping any
more, but split to 4K PTE mappings, and it will not be coalesced back
to PMD mapping after freeing hugetlb pages from the pool.
Apart from theoretical performance impact, this will also (slightly)
relativize the memory savings because of additional 2K PTE pagetable
allocations.
- Workload using "on the fly" hugetlb allocations via
"nr_overcommit_hugepages" instead of using the hugetlb pool via
"nr_hugepages" will suffer from considerably increased fault handling
time, see also description from commit 78f39084b41d
("mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl").
- Freeing hugetlb pages from the pool will require re-allocation of the
freed struct pages, and therefore needs some memory available to the
kernel. This might fail in memory constrained scenarios.
- For the same reason, memory offline might fail even for ZONE_MOVABLE
when hugetlb pages are present (but not for s390, since we do not
support ARCH_ENABLE_HUGEPAGE_MIGRATION, and therefore cannot have
hugetlb pages in ZONE_MOVABLE).
- General increased complexity and overhead in kernel handling of
compound (head) pages.
Therefore, this feature is disabled by default, and has to be enabled
explicitly either by adding "hugetlb_free_vmemmap=on" kernel parameter,
or during run-time via "/proc/sys/vm/hugetlb_optimize_vmemmap" sysctl.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Nathan Chancellor reported several link errors on s390 with
CONFIG_RELOCATABLE disabled, after binutils commit 906f69cf65da ("IBM
zSystems: Issue error for *DBL relocs on misaligned symbols"). The binutils
commit reveals potential miscompiles that might have happened already
before with linker script defined symbols at odd addresses.
A similar bug was recently fixed in the kernel with commit c9305b6c1f52
("s390: fix nospec table alignments").
See https://github.com/ClangBuiltLinux/linux/issues/1747 for an analysis
from Ulich Weigand.
Therefore always build a relocatable kernel to avoid this problem. There is
hardly any use-case for non-relocatable kernels, so this shouldn't be
controversial.
Link: https://github.com/ClangBuiltLinux/linux/issues/1747
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20221030182202.2062705-1-hca@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The s390 architecture uses either a cmpxchg loop (old systems)
or the laa add-to-memory instruction (new systems) to implement
this_cpu_add(), both of which are NMI safe. This means that the old
and more-efficient srcu_read_lock() may be used in NMI context, without
the need for srcu_read_lock_nmisafe(). Therefore, add the new Kconfig
option ARCH_HAS_NMI_SAFE_THIS_CPU_OPS to arch/s390/Kconfig, which will
cause NEED_SRCU_NMI_SAFE to be deselected, thus preserving the current
srcu_read_lock() behavior.
[ paulmck: Apply Christian Borntraeger feedback. ]
Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/
Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: <linux-s390@vger.kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"Though there's been a decent amount of RNG-related development during
this last cycle, not all of it is coming through this tree, as this
cycle saw a shift toward tackling early boot time seeding issues,
which took place in other trees as well.
Here's a summary of the various patches:
- The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
option have been removed, as they overlapped with the more widely
supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
"random.trust_cpu". This change allowed simplifying a bit of arch
code.
- x86's RDRAND boot time test has been made a bit more robust, with
RDRAND disabled if it's clearly producing bogus results. This would
be a tip.git commit, technically, but I took it through random.git
to avoid a large merge conflict.
- The RNG has long since mixed in a timestamp very early in boot, on
the premise that a computer that does the same things, but does so
starting at different points in wall time, could be made to still
produce a different RNG state. Unfortunately, the clock isn't set
early in boot on all systems, so now we mix in that timestamp when
the time is actually set.
- User Mode Linux now uses the host OS's getrandom() syscall to
generate a bootloader RNG seed and later on treats getrandom() as
the platform's RDRAND-like faculty.
- The arch_get_random_{seed_,}_long() family of functions is now
arch_get_random_{seed_,}_longs(), which enables certain platforms,
such as s390, to exploit considerable performance advantages from
requesting multiple CPU random numbers at once, while at the same
time compiling down to the same code as before on platforms like
x86.
- A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
Uros.
- A comment spelling fix"
More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:
https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/
* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: correct spelling of "overwrites"
random: handle archrandom with multiple longs
um: seed rng using host OS rng
random: use try_cmpxchg in _credit_init_bits
timekeeping: contribute wall clock to rng on time change
x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
random: remove CONFIG_ARCH_RANDOM
|
|
Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.
- MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
but does *NOT* want to call it for each VMA.
- MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
invalidate across multiple VMAs if possible.
With these it is possible to capture the three forms:
1) empty stubs;
select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS
2) start: flush_cache_range(), end: empty;
select MMU_GATHER_MERGE_VMAS
3) start: flush_cache_range(), end: flush_tlb_range();
default
Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When RDRAND was introduced, there was much discussion on whether it
should be trusted and how the kernel should handle that. Initially, two
mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and
"nordrand", a boot-time switch.
Later the thinking evolved. With a properly designed RNG, using RDRAND
values alone won't harm anything, even if the outputs are malicious.
Rather, the issue is whether those values are being *trusted* to be good
or not. And so a new set of options were introduced as the real
ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu".
With these options, RDRAND is used, but it's not always credited. So in
the worst case, it does nothing, and in the best case, maybe it helps.
Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the
center and became something certain platforms force-select.
The old options don't really help with much, and it's a bit odd to have
special handling for these instructions when the kernel can deal fine
with the existence or untrusted existence or broken existence or
non-existence of that CPU capability.
Simplify the situation by removing CONFIG_ARCH_RANDOM and using the
ordinary asm-generic fallback pattern instead, keeping the two options
that are actually used. For now it leaves "nordrand" for now, as the
removal of that will take a different route.
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since commit 4c0f032d4963 ("s390/purgatory: Omit use of bin2c"),
s390 builds the purgatory without using bin2c.
Remove 'select BUILD_BIN2C' to avoid the unneeded build of bin2c.
Fixes: 4c0f032d4963 ("s390/purgatory: Omit use of bin2c")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20220613170902.1775211-1-masahiroy@kernel.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- a small cleanup removing "export" of an __init function
- a small series adding a new infrastructure for platform flags
- a series adding generic virtio support for Xen guests (frontend side)
* tag 'for-linus-5.19a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: unexport __init-annotated xen_xlate_map_ballooned_pages()
arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices
xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices
xen/grant-dma-iommu: Introduce stub IOMMU driver
dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops
xen/virtio: Enable restricted memory access using Xen grant mappings
xen/grant-dma-ops: Add option to restrict memory access under Xen
xen/grants: support allocating consecutive grants
arm/xen: Introduce xen_setup_dma_ops()
virtio: replace arch_has_restricted_virtio_memory_access()
kernel: add platform_has() infrastructure
|
|
In commit 8b202ee21839 ("s390: disable -Warray-bounds") the s390 people
disabled the '-Warray-bounds' warning for gcc-12, because the new logic
in gcc would cause warnings for their use of the S390_lowcore macro,
which accesses absolute pointers.
It turns out gcc-12 has many other issues in this area, so this takes
that s390 warning disable logic, and turns it into a kernel build config
entry instead.
Part of the intent is that we can make this all much more targeted, and
use this conflig flag to disable it in only particular configurations
that cause problems, with the s390 case as an example:
select GCC12_NO_ARRAY_BOUNDS
and we could do that for other configuration cases that cause issues.
Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more
targeted way, and disable the warning only for particular uses: again
the s390 case as an example:
KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds)
but this ends up just doing it globally in the top-level Makefile, since
the current issues are spread fairly widely all over:
KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
We'll try to limit this later, since the gcc-12 problems are rare enough
that *much* of the kernel can be built with it without disabling this
warning.
Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Instead of using arch_has_restricted_virtio_memory_access() together
with CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, replace those
with platform_has() and a new platform feature
PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> # Arm64 only
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Borislav Petkov <bp@suse.de>
|