summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
36 hourstesting/selftests/mm: add soft-dirty merge self-testLorenzo Stoakes1-1/+126
commit c7ba92bcfea34f6b4afc744c3b65c8f7420fefe0 upstream. Assert that we correctly merge VMAs containing VM_SOFTDIRTY flags now that we correctly handle these as sticky. In order to do so, we have to account for the fact the pagemap interface checks soft dirty PTEs and additionally that newly merged VMAs are marked VM_SOFTDIRTY. We do this by using use unfaulted anon VMAs, establishing one and clearing references on that one, before establishing another and merging the two before checking that soft-dirty is propagated as expected. We check that this functions correctly with mremap() and mprotect() as sample cases, because VMA merge of adjacent newly mapped VMAs will automatically be made soft-dirty due to existing logic which does so. We are therefore exercising other means of merging VMAs. Link: https://lkml.kernel.org/r/d5a0f735783fb4f30a604f570ede02ccc5e29be9.1763399675.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Andrey Vagin <avagin@gmail.com> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursmm: propagate VM_SOFTDIRTY on mergeLorenzo Stoakes1-12/+6
commit 6707915e030a3258868355f989b80140c1a45bbe upstream. Patch series "make VM_SOFTDIRTY a sticky VMA flag", v2. Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by establishing a new VMA, or via merge) as implemented in __mmap_complete() and do_brk_flags(). However, when performing a merge of existing mappings such as when performing mprotect(), we may lose the VM_SOFTDIRTY flag. Now we have the concept of making VMA flags 'sticky', that is that they both don't prevent merge and, importantly, are propagated to merged VMAs, this seems a sensible alternative to the existing special-casing of VM_SOFTDIRTY. We additionally add a self-test that demonstrates that this logic behaves as expected. This patch (of 2): Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by establishing a new VMA, or via merge) as implemented in __mmap_complete() and do_brk_flags(). However, when performing a merge of existing mappings such as when performing mprotect(), we may lose the VM_SOFTDIRTY flag. This is because currently we simply ignore VM_SOFTDIRTY for the purposes of merge, so one VMA may possess the flag and another not, and whichever happens to be the target VMA will be the one upon which the merge is performed which may or may not have VM_SOFTDIRTY set. Now we have the concept of 'sticky' VMA flags, let's make VM_SOFTDIRTY one which solves this issue. Additionally update VMA userland tests to propagate changes. [akpm@linux-foundation.org: update comments, per Lorenzo] Link: https://lkml.kernel.org/r/0019e0b8-ee1e-4359-b5ee-94225cbe5588@lucifer.local Link: https://lkml.kernel.org/r/cover.1763399675.git.ljs@kernel.org Link: https://lkml.kernel.org/r/955478b5170715c895d1ef3b7f68e0cd77f76868.1763399675.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Andrey Vagin <avagin@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com> Fixes: 34228d473efe ("mm: ignore VM_SOFTDIRTY on VMA merging") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursmm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD oneLorenzo Stoakes1-0/+26
commit ab04b530e7e8bd5cf9fb0c1ad20e0deee8f569ec upstream. Gather all the VMA flags whose presence implies that page tables must be copied on fork into a single bitmap - VM_COPY_ON_FORK - and use this rather than specifying individual flags in vma_needs_copy(). We also add VM_MAYBE_GUARD to this list, as it being set on a VMA implies that there may be metadata contained in the page tables (that is - guard markers) which would will not and cannot be propagated upon fork. This was already being done manually previously in vma_needs_copy(), but this makes it very explicit, alongside VM_PFNMAP, VM_MIXEDMAP and VM_UFFD_WP all of which imply the same. Note that VM_STICKY flags ought generally to be marked VM_COPY_ON_FORK too - because equally a flag being VM_STICKY indicates that the VMA contains metadat that is not propagated by being faulted in - i.e. that the VMA metadata does not fully describe the VMA alone, and thus we must propagate whatever metadata there is on a fork. However, for maximum flexibility, we do not make this necessarily the case here. Link: https://lkml.kernel.org/r/5d41b24e7bc622cda0af92b6d558d7f4c0d1bc8c.1763460113.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursmm: implement sticky VMA flagsLorenzo Stoakes1-0/+28
commit 64212ba02e66e705cabce188453ba4e61e9d7325 upstream. It is useful to be able to designate that certain flags are 'sticky', that is, if two VMAs are merged one with a flag of this nature and one without, the merged VMA sets this flag. As a result we ignore these flags for the purposes of determining VMA flag differences between VMAs being considered for merge. This patch therefore updates the VMA merge logic to perform this action, with flags possessing this property being described in the VM_STICKY bitmap. Those flags which ought to be ignored for the purposes of VMA merge are described in the VM_IGNORE_MERGE bitmap, which the VMA merge logic is also updated to use. As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it already had this behaviour, alongside VM_STICKY as sticky flags by implication must not disallow merge. Ultimately it seems that we should make VM_SOFTDIRTY a sticky flag in its own right, but this change is out of scope for this series. The only sticky flag designated as such is VM_MAYBE_GUARD, so as a result of this change, once the VMA flag is set upon guard region installation, VMAs with guard ranges will now not have their merge behaviour impacted as a result and can be freely merged with other VMAs without VM_MAYBE_GUARD set. Also update the comments for vma_modify_flags() to directly reference sticky flags now we have established the concept. We also update the VMA userland tests to account for the changes. Link: https://lkml.kernel.org/r/22ad5269f7669d62afb42ce0c79bad70b994c58d.1763460113.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursmm: update vma_modify_flags() to handle residual flags, documentLorenzo Stoakes1-1/+2
commit 9119d6c2095bb20292cb9812dd70d37f17e3bd37 upstream. The vma_modify_*() family of functions each either perform splits, a merge or no changes at all in preparation for the requested modification to occur. When doing so for a VMA flags change, we currently don't account for any flags which may remain (for instance, VM_SOFTDIRTY) despite the requested change in the case that a merge succeeded. This is made more important by subsequent patches which will introduce the concept of sticky VMA flags which rely on this behaviour. This patch fixes this by passing the VMA flags parameter as a pointer and updating it accordingly on merge and updating callers to accommodate for this. Additionally, while we are here, we add kdocs for each of the vma_modify_*() functions, as the fact that the requested modification is not performed is confusing so it is useful to make this abundantly clear. We also update the VMA userland tests to account for this change. Link: https://lkml.kernel.org/r/23b5b549b0eaefb2922625626e58c2a352f3e93c.1763460113.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
36 hoursmm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smapsLorenzo Stoakes1-0/+1
commit 5dba5cc2e0ffa76f2f6c8922a04469dc9602c396 upstream. Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4. Currently, guard regions are not visible to users except through /proc/$pid/pagemap, with no explicit visibility at the VMA level. This makes the feature less useful, as it isn't entirely apparent which VMAs may have these entries present, especially when performing actions which walk through memory regions such as those performed by CRIU. This series addresses this issue by introducing the VM_MAYBE_GUARD flag which fulfils this role, updating the smaps logic to display an entry for these. The semantics of this flag are that a guard region MAY be present if set (we cannot be sure, as we can't efficiently track whether an MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if not set the VMA definitely does NOT have any guard regions present. It's problematic to establish this flag without further action, because that means that VMAs with guard regions in them become non-mergeable with adjacent VMAs for no especially good reason. To work around this, this series also introduces the concept of 'sticky' VMA flags - that is flags which: a. if set in one VMA and not in another still permit those VMAs to be merged (if otherwise compatible). b. When they are merged, the resultant VMA must have the flag set. The VMA logic is updated to propagate these flags correctly. Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve an issue with file-backed guard regions - previously these established an anon_vma object for file-backed mappings solely to have vma_needs_copy() correctly propagate guard region mappings to child processes. We introduce a new flag alias VM_COPY_ON_FORK (which currently only specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly for this flag and to copy page tables if it is present, which resolves this issue. Additionally, we add the ability for allow-listed VMA flags to be atomically writable with only mmap/VMA read locks held. The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure does not cause any races by being allowed to do so. This allows us to maintain guard region installation as a read-locked operation and not endure the overhead of obtaining a write lock here. Finally we introduce extensive VMA userland tests to assert that the sticky VMA logic behaves correctly as well as guard region self tests to assert that smaps visibility is correctly implemented. This patch (of 9): Currently, if a user needs to determine if guard regions are present in a range, they have to scan all VMAs (or have knowledge of which ones might have guard regions). Since commit 8e2f2aeb8b48 ("fs/proc/task_mmu: add guard region bit to pagemap") and the related commit a516403787e0 ("fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions"), users can use either /proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this operation at a virtual address level. This is not ideal, and it gives no visibility at a /proc/$pid/smaps level that guard regions exist in ranges. This patch remedies the situation by establishing a new VMA flag, VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is uncertain because we cannot reasonably determine whether a MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and additionally VMAs may change across merge/split). We utilise 0x800 for this flag which makes it available to 32-bit architectures also, a flag that was previously used by VM_DENYWRITE, which was removed in commit 8d0920bde5eb ("mm: remove VM_DENYWRITE") and hasn't bee reused yet. We also update the smaps logic and documentation to identify these VMAs. Another major use of this functionality is that we can use it to identify that we ought to copy page tables on fork. We do not actually implement usage of this flag in mm/madvise.c yet as we need to allow some VMA flags to be applied atomically under mmap/VMA read lock in order to avoid the need to acquire a write lock for this purpose. Link: https://lkml.kernel.org/r/cover.1763460113.git.ljs@kernel.org Link: https://lkml.kernel.org/r/cf8ef821eba29b6c5b5e138fffe95d6dcabdedb9.1763460113.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysselftests: mptcp: add test for extra_subflows underflow on userspace PMTao Cui1-0/+4
commit 06fd2bec7aebf393288e4b78924482fe170caabc upstream. Add a test to verify that when userspace PM fails to create a subflow (e.g. using an unreachable address), the extra_subflows counter is not decremented below zero. Fixes: 77e4b94a3de6 ("mptcp: update userspace pm infos") Cc: stable@vger.kernel.org Signed-off-by: Tao Cui <cuitao@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260602-net-mptcp-misc-fixes-7-1-rc7-v2-6-856831229976@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daystracing/probes: Point the error offset correctly for eprobe argument errorMasami Hiramatsu (Google)1-1/+1
commit 85e0f27dd1396307913ffc5745b0c05137e9beac upstream. Fix to point the error offset correctly for eprobe argument error. In the cleanup commit 1b8b0cd754cd ("tracing/probes: Move event parameter fetching code to common parser"), due to incorrect backward compatibility aimed at conforming to the test specifications, the error location was set to 0 when a non-existent formal parameter was specified for Eprobe. However, this should be corrected in both the test and the implementation to point correct error position. Link: https://lore.kernel.org/all/177967567399.209006.1451571244515632097.stgit@devnote2/ Fixes: 1b8b0cd754cd ("tracing/probes: Move event parameter fetching code to common parser") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysverification/rvgen: Fix ltl2k writing True as a literalGabriele Monaco1-4/+5
[ Upstream commit df996599cc69a9b74ff437c67751cf8a61f62e39 ] The rvgen parser for LTL stores literal true values in the python representation (capitalised True), this doesn't build in C. The Literal class should already handle this case but ASTNode skips its strigification method and converts the value (true/false) directly. Fix by delegating ASTNode stringification to the Literal and Variable classes instead of bypassing them. Fixes: 97ffa4ce6ab32 ("verification/rvgen: Add support for linear temporal logic") Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260514152055.229162-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysverification/rvgen: Fix options shared among commandsGabriele Monaco1-4/+6
[ Upstream commit 5f845ad706c0b394ae274e9a930044f78bef782e ] After rvgen was refactored to use subparsers, the common options (-a and -D) were left in the main parser. This meant that they needed to be called /before/ the subcommand and using them without subcommand was allowed. This is not the original intent. rvgen -D "some description" container -n name Define the options as parent in the subparsers to allow them to be used from both subcommands together with other options. rvgen container -n name -D "some description" Fixes: 5270a0e3041c ("verification/dot2k: Replace is_container() hack with subparsers") Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260514152055.229162-7-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daystools/rv: Fix cleanup after failed trace setupGabriele Monaco1-1/+1
[ Upstream commit 33ec2269a4155cad7e9e42c92327dcaa9aee59a7 ] Currently if ikm_setup_trace_instance() fails, the tool returns without any cleanup, if rv was called with both -t and -r, this means the reactor is not going to be cleared. Jump to the cleanup label to restore the reactor if necessary. Fixes: 6d60f89691fc9 ("tools/rv: Add in-kernel monitor interface") Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260514152055.229162-5-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daystools/rv: Fix substring match when listing container monitorsGabriele Monaco1-2/+6
[ Upstream commit ba0247c5aa3fcb2890a92a97a88c70fe5ce704a6 ] When listing monitors within a specific container (rv list <container>), the tool incorrectly matched monitors if the requested container name was only a prefix of the actual container (e.g., 'rv list sche' would incorrectly list monitors from 'sched:'). Fix this by ensuring the container name is an exact match and is immediately followed by the ':' separator. Fixes: eba321a16fc6 ("tools/rv: Add support for nested monitors") Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260514152055.229162-3-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daystools/rv: Fix substring match bug in monitor name searchGabriele Monaco1-23/+25
[ Upstream commit a963fbf3166f2e178ac38b6c3c186a0c98092fb9 ] __ikm_find_monitor_name() relies on strstr() to find a monitor by name, which fails if the target monitor is a substring of a previously listed monitor. Fix it by tokenizing the available_monitors file and matching full tokens instead. Fixes: eba321a16fc6 ("tools/rv: Add support for nested monitors") Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260514152055.229162-2-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daystools/rv: Ensure monitor name and desc are NUL-terminatedGabriele Monaco1-3/+4
[ Upstream commit 08904765bb941f98306ae6841c33cfd299343faf ] ikm_fill_monitor_definition() copies monitor name and description with strncpy(), but does not guarantee NUL termination when source strings are equal to or longer than the destination buffers. Clamp copies to sizeof(dst) - 1 and explicitly append '\0' for both fields to keep them safe for later string operations. Suggested-by: unknownbbqrx <dev@unknownbbqr.xyz> Fixes: 6d60f89691fc9 ("tools/rv: Add in-kernel monitor interface") Link: https://lore.kernel.org/r/20260604120946.90302-2-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysselftests: harness: fix pidfd leak in __wait_for_testGeliang Tang1-0/+1
[ Upstream commit 0eb307d61317b42b120ab02099b597226318358a ] Fix the pidfd leak in kselftest_harness.h's __wait_for_test() where childfd = syscall(__NR_pidfd_open, t->pid, 0) is never closed. Fixes: 73a3cde97677 ("selftests: harness: Implement test timeouts through pidfd") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://patch.msgid.link/a82e275ccfb2609a1984d90ab559fa3af78f1e81.1776678050.git.tanggeliang@kylinos.cn Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-09tools: ynl: add scope qualifier for definitionsJakub Kicinski1-2/+29
commit fbf5df34a4dbcd09d433dd4f0916bf9b2ddb16de upstream. Using definitions in kernel policies is awkward right now. On one hand we want defines for max values and such. On the other we don't have a way of adding kernel-only defines. Adding unnecessary defines to uAPI is a bad idea, we won't be able to delete them. And when it comes to policy user space should just query it via the policy dump, not use hard coded defines. Add a "scope" property to definitions, which will let us tell the codegen that a definition is for kernel use only. Support following values: - uapi: render into the uAPI header (default, today's behavior) - kernel: render to kernel header only - user: same as kernel but for the user-side generated header Definitions may have a header property (definition is "external", provided by existing header). Extend the scope to headers, too. If definition has both scope and header properties we will only generate the includes in the right scope. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20260510192904.3987113-8-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-06-09selftests: mptcp: drop nanoseconds width specifierMatthieu Baerts (NGI0)2-8/+8
[ Upstream commit 01ff78e4b3d98689184c52d97f9575dfbdc3b10f ] Using the format specifier +%s%3N with GNU date is honoured, and only prints 3 digits of the nanoseconds portion of the seconds since epoch, which corresponds to the milliseconds. The uutils implementation of date currently does not honour this, and always prints all 9 digits. This is a known issue [1], but can be worked around by adapting this test to use nanoseconds instead of microseconds, and then divide it by 1e6. This fix is similar to what has been done on systemd side [2], and it is needed to run the selftests on Ubuntu 26.04, containing uutils 0.8.0. Note that the Fixes tag is there even if this patch doesn't fix an issue in the kernel selftests, but it is useful for those using uutils 0.8.0. Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp") Cc: stable@vger.kernel.org Link: https://github.com/uutils/coreutils/issues/11658 [1] Link: https://github.com/systemd/systemd/pull/41627 [2] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260515-net-mptcp-misc-fixes-7-1-rc4-v2-6-701e96419f2f@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> [ kept `timeout ${timeout_test}` wrapper in do_transfer() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-06-09cxl/test: Update mock dev array before calling platform_device_add()Li Ming1-62/+43
[ Upstream commit d90f236f8b9e354848bd226f581db27755ab901d ] CXL test environment hits the following error sometimes. cxl_mem mem9: endpoint7 failed probe All mock memdevs are platform firmware devices added by cxl_test module, and cxl_test module also provides a platform device driver for them to create a memdev device to CXL subsystem. cxl_test module uses cxl_rcd/mem_single/mem arrays to store different types of mock memdevs. CXL drivers calls registered mock functions for a mock memdev by checking if a given memdev is in these arrays. When cxl_test module adds these mock memdevs, it always calls platform_device_add() before adding them to a suitable mock memdev array. However, there is a small window where CXL drivers calls mock function for a added memdev before it added to a mock memdev array. In above case, cxl endpoint driver considers a added memdev was not a mock memdev, then calling devm_cxl_endpoint_decoders_setup() for it rather than mock_endpoint_decoders_setup(). An appropriate solution is that adding a new mock device to a mock device array before calling platform_device_add() for it. It can guarantee the new mock device is visible to CXL subsystem. This patch introduces a new helped called cxl_mock_platform_device_add() to handle the issue, and uses the function for all mock devices addition. Fixes: 3a2b97b3210b ("cxl/test: Improve init-order fidelity relative to real-world systems") Signed-off-by: Li Ming <ming.li@zohomail.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20260520121457.234404-1-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-09tools/bootconfig: Fix buf leaks in apply_xbcHongtao Lee1-1/+3
[ Upstream commit f42d01aadcedd7bbf4f9a466cabe25c1781dedad ] If data calloc failed, free the buf before return. Link: https://lore.kernel.org/all/20260520030126.147782-1-lihongtao@kylinos.cn/ Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command") Signed-off-by: Hongtao Lee <lihongtao@kylinos.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01selftests: net: Fix checksums in xdp_nativeNimrod Oren1-25/+30
[ Upstream commit dfc077043351a81887d1e4c9ac244e9243f3cbf2 ] Data adjustment cases failed with "Data exchange failed" when using IPv4 because the program did not update the IP and UDP checksums in the IPv4 branch. The issue was masked when both IPv4 and IPv6 were configured, since the test harness prefers IPv6. While here, generalize csum_fold_helper() to fold twice so it works for any 32-bit input. Fixes: 0b65cfcef9c5 ("selftests: drv-net: Test tail-adjustment support") Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Nimrod Oren <noren@nvidia.com> Link: https://patch.msgid.link/20260520153928.3371765-1-noren@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01selftests: ublk: cap nthreads to kernel's actual nr_hw_queuesMing Lei1-0/+11
[ Upstream commit 87d0740b7c4cc847be1b6f307ab6d8547cb1a726 ] dev->nthreads is derived from the user-requested queue count before the ADD command, but the kernel may reduce nr_hw_queues (capped to nr_cpu_ids). When the VM has fewer CPUs than requested queues, the daemon creates more handler threads than there are kernel queues. In non-batch mode, the extra threads access uninitialized queues (q_depth=0), submit zero io_uring SQEs, and block forever in io_cqring_wait. In batch mode, the extra threads cause similar hangs during device removal. In both cases, the stuck threads prevent the daemon from closing the char device, holding the last ublk_device reference and causing ublk_ctrl_del_dev() to hang in wait_event_interruptible(). Fix by capping dev->nthreads to the kernel-returned nr_hw_queues after the ADD command completes. per_io_tasks mode is excluded because threads interleave across all queues, so nthreads > nr_hw_queues is valid. Fixes: abe54c160346 ("selftests: ublk: kublk: decouple ublk_queues from ublk server threads") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260513101941.1373998-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01selftests/mm: run_vmtests.sh: fix destructive tests invocationLuiz Capitulino1-1/+1
commit 3432cbb291aabf85f8af4b9d1ec37179168ff999 upstream. Destructive tests should be invoked with -d command-line option, but this won't work today since 'd' is missing in getopts command-line. This commit fixes it. Link: https://lore.kernel.org/214fd9e4-5398-4c26-859e-c982c2e277c3@redhat.com Fixes: f16ff3b692ad ("selftests/mm: run_vmtests.sh: add missing tests") Signed-off-by: Luiz Capitulino <luizcap@redhat.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-06-01mm/memory: fix spurious warning when unmapping device-private/exclusive pagesAlistair Popple1-0/+50
commit be3f38d05cc5a7c3f13e51994c5dd043ab604d28 upstream. Device private and exclusive entries are only supported for anonymous folios. This condition is tested in __migrate_device_pages() and make_device_exclusive() using folio_test_anon(). However the unmap path tests this assumption using vma_is_anonymous(). This is wrong because whilst anonymous VMAs can only contain folios where folio_test_anon() is true the opposite relation does not hold. A folio for which folio_test_anon() is true does not imply vma_is_anonymous() is true. Such a condition can occur if for example a folio is part of a private filebacked mapping. In this case vma_is_anonymous() is false as the mapping is filebacked, but folio_test_anon() may be true, thus permitting devices to migrate the folio to device private memory. This can lead to the following spurious warnings during process teardown: [ 772.737706] ------------[ cut here ]------------ [ 772.739201] WARNING: mm/memory.c:1754 at unmap_page_range.cold+0x26/0x18a, CPU#17: hmm-tests/2041 [ 772.742050] Modules linked in: test_hmm nvidia_uvm(O) nvidia(O) [ 772.743959] CPU: 17 UID: 0 PID: 2041 Comm: hmm-tests Tainted: G W O 7.0.0+ #387 PREEMPT(full) [ 772.747104] Tainted: [W]=WARN, [O]=OOT_MODULE [ 772.748509] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 772.752117] RIP: 0010:unmap_page_range.cold+0x26/0x18a [ 772.753780] Code: 7e fe ff ff 48 89 4c 24 78 4c 89 44 24 38 e8 f2 ff b1 00 48 8b 4c 24 78 4c 8b 44 24 38 48 8b 44 24 18 48 83 78 48 00 74 04 90 <0f> 0b 90 48 89 ca b8 ff ff 37 00 48 c1 ea 03 48 c1 e0 2a 80 3c 02 [ 772.759602] RSP: 0018:ffff888112607550 EFLAGS: 00010286 [ 772.761310] RAX: ffff88811bbf4dc0 RBX: dffffc0000000000 RCX: ffffea03e9bfffd8 [ 772.763583] RDX: 1ffff1102377e9c1 RSI: 0000000000000008 RDI: ffff88811bbf4e08 [ 772.765914] RBP: 0000000000000006 R08: ffff8881059f7448 R09: ffffed10224c0e68 [ 772.768184] R10: ffff888112607347 R11: 0000000000000001 R12: 0000000000000001 [ 772.770461] R13: ffffea03e9bfffc0 R14: ffff888112607908 R15: ffffea03e9bfffc0 [ 772.772782] FS: 00007f327caa2780(0000) GS:ffff888427b7d000(0000) knlGS:0000000000000000 [ 772.775328] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 772.777187] CR2: 00007f327ca89000 CR3: 00000001994d5000 CR4: 00000000000006f0 [ 772.779135] Call Trace: [ 772.779792] <TASK> [ 772.780317] ? dmirror_interval_invalidate+0x1a3/0x290 [test_hmm] [ 772.781873] ? vm_normal_page_pud+0x2b0/0x2b0 [ 772.782992] ? __rwlock_init+0x150/0x150 [ 772.784006] ? lock_release+0x216/0x2b0 [ 772.785008] ? __mmu_notifier_invalidate_range_start+0x505/0x6e0 [ 772.786522] ? lock_release+0x216/0x2b0 [ 772.787498] ? unmap_single_vma+0xb6/0x210 [ 772.788573] unmap_vmas+0x27d/0x520 [ 772.789506] ? unmap_single_vma+0x210/0x210 [ 772.790607] ? mas_update_gap.part.0+0x620/0x620 [ 772.791834] unmap_region+0x19e/0x350 [ 772.792769] ? remove_vma+0x130/0x130 [ 772.793684] ? mas_alloc_nodes+0x1f2/0x300 [ 772.794730] vms_complete_munmap_vmas+0x8c1/0xe20 [ 772.795926] ? unmap_region+0x350/0x350 [ 772.796917] do_vmi_align_munmap+0x36a/0x4e0 [ 772.798018] ? lock_release+0x216/0x2b0 [ 772.799024] ? vma_shrink+0x620/0x620 [ 772.799983] do_vmi_munmap+0x150/0x2c0 [ 772.800939] __vm_munmap+0x161/0x2c0 [ 772.801872] ? expand_downwards+0xd60/0xd60 [ 772.802948] ? clockevents_program_event+0x1ef/0x540 [ 772.804217] ? lock_release+0x216/0x2b0 [ 772.805158] __x64_sys_munmap+0x59/0x80 [ 772.805776] do_syscall_64+0xfc/0x670 [ 772.806336] ? irqentry_exit+0xda/0x580 [ 772.806976] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 772.807772] RIP: 0033:0x7f327cbb2717 [ 772.808323] Code: 73 01 c3 48 8b 0d f9 76 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 76 0d 00 f7 d8 64 89 01 48 [ 772.811337] RSP: 002b:00007ffde7f57d38 EFLAGS: 00000202 ORIG_RAX: 000000000000000b [ 772.812564] RAX: ffffffffffffffda RBX: 00007f327cc9c000 RCX: 00007f327cbb2717 [ 772.813733] RDX: 0000000000000000 RSI: 0000000000400000 RDI: 00007f327c289000 [ 772.814867] RBP: 0000000000421360 R08: 000000000000001a R09: 0000000000000000 [ 772.815991] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffde7f57d74 [ 772.817121] R13: 00007f327c689010 R14: 0000000000100000 R15: 00007f327c289000 [ 772.818272] </TASK> [ 772.818614] irq event stamp: 0 [ 772.819159] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 772.820174] hardirqs last disabled at (0): [<ffffffff82a57ab3>] copy_process+0x19f3/0x6440 [ 772.821511] softirqs last enabled at (0): [<ffffffff82a57b00>] copy_process+0x1a40/0x6440 [ 772.822869] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 772.823871] ---[ end trace 0000000000000000 ]--- Fix this by using the same check for folio_test_anon() in zap_nonpresent_ptes(). Also add a hmm-test case for this. Link: https://lore.kernel.org/20260501065116.2057242-1-apopple@nvidia.com Fixes: 999dad824c39 ("mm/shmem: persist uffd-wp bit across zapping for file-backed") Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: Arsen Arsenović <aarsenovic@baylibre.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-23selftests/bpf: Remove test_access_variable_arrayVenkat Rao Bagalkote2-35/+0
commit aacee214d57636fa1f63007c65f333b5ea75a7a0 upstream. test_access_variable_array relied on accessing struct sched_domain::span to validate variable-length array handling via BTF. Recent scheduler refactoring removed or hid this field, causing the test to fail to build. Given that this test depends on internal scheduler structures that are subject to refactoring, and equivalent variable-length array coverage already exists via bpf_testmod-based tests, remove test_access_variable_array entirely. Link: https://lore.kernel.org/all/177434340048.1647592.8586759362906719839.tip-bot2@tip-bot2/ Signed-off-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Tested-by: Naveen Kumar Thummalapenta <naveen66@linux.ibm.com> Link: https://lore.kernel.org/r/20260410105404.91126-1-venkat88@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-23rtla: Fix parse_cpu_set() bug introduced by strtoi()Costa Shulyupin1-6/+4
[ Upstream commit 6ea8a206108fe8b5940c2797afc54ae9f5a7bbdd ] The patch 'Replace atoi() with a robust strtoi()' introduced a bug in parse_cpu_set(), which relies on partial parsing of the input string. The function parses CPU specifications like '0-3,5' by incrementing a pointer through the string. strtoi() rejects strings with trailing characters, causing parse_cpu_set() to fail on any CPU list with multiple entries. Restore the original use of atoi() in parse_cpu_set(). Fixes: 7e9dfccf8f11 ("rtla: Replace atoi() with a robust strtoi()") Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20260112192642.212848-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23kselftest/arm64: Include <asm/ptrace.h> for user_gcs definitionLeo Yan2-6/+1
[ Upstream commit bb7235e226888607e6aac1288062fcb1ac105589 ] kselftest includes kernel uAPI headers with option: -isystem $(top_srcdir)/usr/include Include <asm/ptrace.h> in libc-gcs.c for the definition of struct user_gcs from the uAPI headers, and remove the redundant definition in gcs-util.h. This fixes a compilation error on systems where the toolchain defines NT_ARM_GCS. Fixes: a505a52b4e29 ("kselftest/arm64: Add a GCS test program built with the system libc") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23tools/power turbostat: Fix unrecognized option '-P'David Arcari1-1/+1
[ Upstream commit ce012c966b518c53475ba9a4e979242d7322d819 ] The '-P' short option (shorthand for --no-perf) is not present in the optstring of the second call to getopt_long_only(). This results in the "unrecognized option" error when the tool reaches the main parsing loop. Add 'P' to the second getopt_long_only() call to ensure it is consistently recognized. Fixes: a0e86c90b83c ("tools/power turbostat: Add --no-perf option") Signed-off-by: David Arcari <darcari@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23tools/power turbostat: Fix and document --header_iterationsLen Brown2-12/+12
[ Upstream commit 96718ad296af4a6d984b3a09276b165ab6a3b0c8 ] The "header_iterations" option is commonly used to de-clutter the screen of redundant header label rows in an interactive session: Eg. every 10 rows: $ sudo turbostat --header_iterations 10 -S -q -i 1 But --header_iterations was missing from turbostat.8 Also turbostat help advertised the "-N" short option that did not actually work: $ turbostat --help -N, --header_iterations num print header every num iterations Repair "-N" Document "--header_iterations" on turbostat.8 Signed-off-by: Len Brown <len.brown@intel.com> Stable-dep-of: ce012c966b51 ("tools/power turbostat: Fix unrecognized option '-P'") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23tools/power turbostat: Use strtoul() for iteration parsingKaushlendra Kumar1-6/+8
[ Upstream commit 8e5c0cc326f2e95a71bb6e6063e65caa60c8f951 ] Replace strtod() with strtoul() and check errno for -n/-N options, since num_iterations and header_iterations are unsigned long counters. Reject zero and conversion errors; negative inputs wrap to large positive values per standard unsigned semantics. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Stable-dep-of: ce012c966b51 ("tools/power turbostat: Fix unrecognized option '-P'") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23tools/power turbostat.8: Document the "--force" optionLen Brown1-3/+7
[ Upstream commit 785953cf6e63aa5a9fcdfa9577b1411e0281c4bc ] Starting in turbostat v2025.01.14, turbostat refused to run on unsupported hardware, pointing to "RUN THE LATEST VERSION" on turbostat(8). At that time, turbostat supported and advertised the "--force" parameter to run anyway (with unsupported results). Also document "--force" on turbostat.8. Signed-off-by: Len Brown <len.brown@intel.com> Stable-dep-of: ce012c966b51 ("tools/power turbostat: Fix unrecognized option '-P'") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23tcp: send a challenge ACK on SEG.ACK > SND.NXTJiayuan Chen1-1/+3
[ Upstream commit 42726ec644cbdde0035c3e0417fee8ed9547e120 ] RFC 5961 Section 5.2 validates an incoming segment's ACK value against the range [SND.UNA - MAX.SND.WND, SND.NXT] and states: "All incoming segments whose ACK value doesn't satisfy the above condition MUST be discarded and an ACK sent back." Commit 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") opted Linux into this mitigation and implements the challenge ACK on the lower side (SEG.ACK < SND.UNA - MAX.SND.WND), but the symmetric upper side (SEG.ACK > SND.NXT) still takes the pre-RFC-5961 path and silently returns SKB_DROP_REASON_TCP_ACK_UNSENT_DATA, even though RFC 793 Section 3.9 (now RFC 9293 Section 3.10.7.4) has always required: "If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an ACK, drop the segment, and return." Complete the mitigation by sending a challenge ACK on that branch, reusing the existing tcp_send_challenge_ack() path which already enforces the per-socket RFC 5961 Section 7 rate limit via __tcp_oow_rate_limited(). FLAG_NO_CHALLENGE_ACK is honoured for symmetry with the lower-edge case. Update the existing tcp_ts_recent_invalid_ack.pkt selftest, which drives this exact path, to consume the new challenge ACK. Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260422123605.320000-2-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf util: Kill die() prototype, dead for a long timeArnaldo Carvalho de Melo1-1/+0
[ Upstream commit e5cce1b9c82fbd48e2f1f7a25a9fad8ee228176f ] In fef2a735167a827a ("perf tools: Kill die()") the die() function was removed, but not the prototype in util.h, now when building with LIBPERL=1, during a 'make -C tools/perf build-test' routine test, it is failing as perl likes die() calls and then this clashes with this remnant, remove it. Fixes: fef2a735167a827a ("perf tools: Kill die()") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf maps: Fix copy_from that can break sorted by name orderIan Rogers1-10/+3
[ Upstream commit f552b132e4d5248715828e7e5c2bf7889bf05b2e ] When an parent is copied into a child the name array is populated in address not name order. Make sure the name array isn't flagged as sorted. Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf maps: Fix fixup_overlap_and_insert that can break sorted by name orderIan Rogers1-0/+2
[ Upstream commit c4f3ff3289380437d26177e8f2fe4b7507816ee3 ] When an entry in the address array is replaced, the corresponding name entry is replaced. The entries names may sort differently and so it is important that the sorted by name property be cleared on the maps. Fixes: 0d11fab32714 ("perf maps: Fixup maps_by_name when modifying maps_by_address") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf cgroup: Update metric leader in evlist__expand_cgroupIan Rogers1-7/+23
[ Upstream commit c9ef786c0970991578397043f1c819229e2b7197 ] When the evlist is expanded the metric leader wasn't being updated. As the original evsel is deleted this creates a use-after-free in stat-shadow's prepare_metric. This was detected running the "perf stat --bpf-counters --for-each-cgroup test" with sanitizers. The change itself puts the copied evsel into the priv field (known unused because of evsel__clone use) and then in a second pass over the list updates the copied values using the priv pointer. Fixes: d1c5a0e86a4e ("perf stat: Add --for-each-cgroup option") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Sun Jian <sun.jian.kdev@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf expr: Return -EINVAL for syntax error in expr__find_ids()Leo Yan1-1/+2
[ Upstream commit 3a61fd866ef9aaa1d3158b460f852b74a2df07f4 ] expr__find_ids() propagates the parser return value directly. For syntax errors, the parser can return a positive value, but callers treat it as success, e.g., for below case on Arm64 platform: metric expr 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) for backend_bound parsing metric: 100 * (STALL_SLOT_BACKEND / (CPU_CYCLES * #slots) - BR_MIS_PRED * 3 / CPU_CYCLES) Failure to read '#slots' literal: #slots = nan syntax error Convert positive parser returns in expr__find_ids() to -EINVAL, as a result, the error value will be respected by callers. Before: perf stat -C 5 Failure to read '#slots'Failure to read '#slots'Failure to read '#slots'Failure to read '#slots'Segmentation fault After: perf stat -C 5 Failure to read '#slots'Cannot find metric or group `Default' Fixes: ded80bda8bc9 ("perf expr: Migrate expr ids table to a hashmap") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf tools: Fix module symbol resolution for non-zero .text sh_addrChuck Lever1-2/+6
[ Upstream commit 9a82bfde4775b7a87cd1a7e791f46f83ae442848 ] When perf resolves symbols from kernel module ELF files (ET_REL), it converts symbol addresses to file offsets so that sample IPs can be matched to the correct symbol. The conversion adjusts each symbol's st_value: sym->st_value -= shdr->sh_addr - shdr->sh_offset; For vmlinux (ET_EXEC), st_value is a virtual address and sh_addr is the section's virtual base, so subtracting sh_addr and adding sh_offset correctly yields a file offset. For kernel modules (ET_REL), st_value is a section-relative offset. The module loader ignores sh_addr entirely and places symbols at module_base + st_value. Converting to file offset requires only adding sh_offset; subtracting sh_addr introduces an error equal to sh_addr bytes. When .text has sh_addr == 0 -- the historical norm for simple modules -- both formulas produce the same result and the bug is latent. As modules gain more metadata sections before .text (.note, .static_call.text, etc.), the linker assigns .text a non-zero sh_addr, exposing the defect. For example, nfsd.ko on this kernel has sh_addr=0xa80, kvm-intel.ko has sh_addr=0x1e90. The effect is that all .text symbols in affected modules shift by sh_addr bytes relative to sample IPs, causing perf report to attribute samples to incorrect, nearby symbols. This was observed as 13% of LLC-load-miss samples misattributed to nfsd_file_get_dio_attrs when the actual hot function was nfsd_cache_lookup, approximately 0xa80 bytes away in the symbol table. Use the existing dso__rel() flag (already set for ET_REL modules) to select the correct adjustment: add sh_offset for ET_REL, subtract (sh_addr - sh_offset) for ET_EXEC/ET_DYN. Fixes: 0131c4ec794a ("perf tools: Make it possible to read object code from kernel modules") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf stat: Fix opt->value type for parse_cache_levelIan Rogers1-20/+23
[ Upstream commit 44311ae84ad9177fb311aee856027861c22f17b2 ] Commit f5803651b4a4 ("perf stat: Choose the most disaggregate command line option") changed aggregation option handling for `perf stat` but not `perf stat report` leading to parse_cache_level being passed a struct in the `perf stat` case but erroneously an aggr_mode enum value for `perf stat report`. Change the `perf stat report` aggregation handling to use the same opt_aggr_mode as `perf stat`. Also, just pass the boolean for consistency with other boolean argument handling. Fixes: f5803651b4a4 ("perf stat: Choose the most disaggregate command line option") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf lock: Fix option value type in parse_max_stackIan Rogers1-1/+1
[ Upstream commit cfaade34b52aa1ec553044255702c4b31b57c005 ] The value is a void* and the address of an int, max_stack_depth, is set up in the perf lock options. The parse_max_stack function treats the int* as a long*, make this more correct by declaring the value to be an int*. Fixes: 0a277b622670 ("perf lock contention: Check --max-stack option") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf: tools: cs-etm: Fix print issue for Coresight debug in ETE/TRBE traceMike Leach1-38/+13
[ Upstream commit 6c478e7b3eba3f387a2d6c749e3e3ee0f8ad1c53 ] Building perf with CORESIGHT=1 and the optional CSTRACE_RAW=1 enables additional debug printing of raw trace data when using command:- perf report --dump. This raw trace prints the CoreSight formatted trace frames, which may be used to investigate suspected issues with trace quality / corruption / decode. These frames are not present in ETE + TRBE trace. This fix removes the unnecessary call to print these frames. This fix also rationalises implementation - original code had helper function that unnecessarily repeated initialisation calls that had already been made. Due to an addtional fault with the OpenCSD library, this call when ETE/TRBE are being decoded will cause a segfault in perf. This fix also prevents that problem for perf using older (<= 1.8.0 version) OpenCSD libraries. Fixes: 68ffe3902898 ("perf tools: Add decoder mechanic to support dumping trace data") Reported-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Mike Leach <mike.leach@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf branch: Avoid incrementing NULLIan Rogers1-0/+3
[ Upstream commit c969a9d7bbf46f983c4a48566b3b2f7340b02296 ] If the entry is NULL the value is meaningless so early return NULL to avoid an increment of NULL. This was happening in calls from has_stitched_lbr when running the "perf record LBR tests". The return value isn't used in that case, so returning NULL as no effect. Fixes: 42bbabed09ce ("perf tools: Add hw_idx in struct branch_stack") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf trace: Avoid an ERR_PTR in syscall_statsIan Rogers1-5/+7
[ Upstream commit d05073adda0f047e9b2115a2932bcb2797eab238 ] hashmap__new may return an ERR_PTR and previously this would be assigned to syscall_stats meaning all use of syscall_stats needs to test for NULL (uninitialized) or an ERR_PTR. Given the only reason hashmap__new can fail is ENOMEM, just use NULL to indicate the allocation failure and avoid the code having to test for NULL and IS_ERR. Fixes: 96f202eab813 (perf trace: Fix IS_ERR() vs NULL check bug) Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23perf trace: Fix IS_ERR() vs NULL check bugwangguangju1-1/+1
[ Upstream commit 96f202eab8133f94479b14a32902c636e9bdf6af ] The alloc_syscall_stats() function always returns an error pointer (ERR_PTR) on failure. So replace NULL check with IS_ERR() check after calling delete_syscall_stats() function. Fixes: ef2da619b132c6f74 ("perf trace: Convert syscall_stats to hashmap") Signed-off-by: wangguangju <wangguangju@hygon.cn> Reviewed-by: Howard Chu <howardchu95@gmail.com> Acked-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23libbpf: Prevent double close and leak of btf objectsJiri Olsa1-10/+11
[ Upstream commit 380044c40b1636a72fd8f188b5806be6ae564279 ] Sashiko found possible double close of btf object fd [1], which happens when strdup in load_module_btfs fails at which point the obj->btf_module_cnt is already incremented. The error path close btf fd and so does later cleanup code in bpf_object_post_load_cleanup function. Also libbpf_ensure_mem failure leaves btf object not assigned and it's leaked. Replacing the err_out label with break to make the error path less confusing as suggested by Alan. Incrementing obj->btf_module_cnt only if there's no failure and releasing btf object in error path. Fixes: 91abb4a6d79d ("libbpf: Support attachment of BPF tracing programs to kernel modules") [1] https://sashiko.dev/#/patchset/20260324081846.2334094-1-jolsa%40kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260416100034.1610852-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23bpf: allow UTF-8 literals in bpf_bprintf_prepare()Yihan Ding1-1/+2
[ Upstream commit b960430ea8862ef37ce53c8bf74a8dc79d3f2404 ] bpf_bprintf_prepare() only needs ASCII parsing for conversion specifiers. Plain text can safely carry bytes >= 0x80, so allow UTF-8 literals outside '%' sequences while keeping ASCII control bytes rejected and format specifiers ASCII-only. This keeps existing parsing rules for format directives unchanged, while allowing helpers such as bpf_trace_printk() to emit UTF-8 literal text. Update test_snprintf_negative() in the same commit so selftests keep matching the new plain-text vs format-specifier split during bisection. Fixes: 48cac3f4a96d ("bpf: Implement formatted output helpers with bstr_printf") Signed-off-by: Yihan Ding <dingyihan@uniontech.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260416120142.1420646-2-dingyihan@uniontech.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23rtla/utils: Fix resource leak in set_comm_sched_attr()Wander Lairson Costa1-5/+6
[ Upstream commit 5b6dc659ad792c72b3ff1be8039ae2945e030928 ] The set_comm_sched_attr() function opens the /proc directory via opendir() but fails to call closedir() on its successful exit path. If the function iterates through all processes without error, it returns 0 directly, leaking the DIR stream pointer. Fix this by refactoring the function to use a single exit path. A retval variable is introduced to track the success or failure status. All exit points now jump to a unified out label that calls closedir() before the function returns, ensuring the resource is always freed. Fixes: dada03db9bb19 ("rtla: Remove procps-ng dependency") Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260309195040.1019085-18-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23rtla: Replace atoi() with a robust strtoi()Wander Lairson Costa3-8/+41
[ Upstream commit 7e9dfccf8f11c26208211457c4597a466135b56a ] The atoi() function does not perform error checking, which can lead to undefined behavior when parsing invalid or out-of-range strings. This can cause issues when parsing user-provided numerical inputs, such as signal numbers, PIDs, or CPU lists. To address this, introduce a new strtoi() helper function that safely converts a string to an integer. This function validates the input and checks for overflows, returning a negative value on failure. Replace all calls to atoi() with the new strtoi() function and add proper error handling to make the parsing more robust and prevent potential issues. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20260106133655.249887-5-wander@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Stable-dep-of: 5b6dc659ad79 ("rtla/utils: Fix resource leak in set_comm_sched_attr()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23rtla: Fix -C/--cgroup interfaceIvan Pravdin6-76/+55
[ Upstream commit 7b71f3a6986c93defbb72bb6c143e04122720cb1 ] Currently, user can only specify cgroup to the tracer's thread the following ways: `-C[cgroup]` `-C[=cgroup]` `--cgroup[=cgroup]` If user tries to specify cgroup as `-C [cgroup]` or `--cgroup [cgroup]`, the parser silently fails and rtla's cgroup is used for the tracer threads. To make interface more user-friendly, allow user to specify cgroup in the aforementioned way, i.e. `-C [cgroup]` and `--cgroup [cgroup]`. Refactor identical logic between -t/--trace and -C/--cgroup into a common function. Change documentation to reflect this user interface change. Fixes: a957cbc02531 ("rtla: Add -C cgroup support") Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/16132f1565cf5142b5fbd179975be370b529ced7.1762186418.git.ipravdin.official@gmail.com [ use capital letter in subject, as required by tracing subsystem ] Signed-off-by: Tomas Glozar <tglozar@redhat.com> Stable-dep-of: 5b6dc659ad79 ("rtla/utils: Fix resource leak in set_comm_sched_attr()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23ktest: Run POST_KTEST hooks on failure and cancellationRicardo B. Marlière1-5/+22
[ Upstream commit bc6e165a452da909cef0efbc286e6695624db372 ] PRE_KTEST can be useful for setting up the environment and POST_KTEST to tear it down, however POST_KTEST only runs on the normal end-of-run path. It is skipped when ktest exits through dodie() or cancel_test(). Final cleanup hooks are skipped. Factor the final hook execution into run_post_ktest(), call it from the normal exit path and from the early exit paths, and guard it so the hook runs at most once. Cc: John Hawley <warthog9@eaglescrag.net> Cc: Andrea Righi <arighi@nvidia.com> Cc: Marcos Paulo de Souza <mpdesouza@suse.com> Cc: Matthieu Baerts <matttbe@kernel.org> Cc: Fernando Fernandez Mancera <fmancera@suse.de> Cc: Pedro Falcato <pfalcato@suse.de> Link: https://patch.msgid.link/20260307-ktest-fixes-v1-8-565d412f4925@suse.com Fixes: 921ed4c7208e ("ktest: Add PRE/POST_KTEST and TEST options") Signed-off-by: Ricardo B. Marlière <rbm@suse.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-05-23ktest: Honor empty per-test option overridesRicardo B. Marlière1-2/+4
[ Upstream commit a2de57a3c8192dcd67cccaff6c341b93748d799b ] A per-test override can clear an inherited default option by assigning an empty value, but __set_test_option() still used option_defined() to decide whether a per-test key existed. That turned an empty per-test assignment back into "fall back to the default", so tests still could not clear inherited settings. For example: DEFAULTS (...) LOG_FILE = /tmp/ktest-empty-override.log CLEAR_LOG = 1 ADD_CONFIG = /tmp/.config TEST_START TEST_TYPE = build BUILD_TYPE = nobuild ADD_CONFIG = This would run the test with ADD_CONFIG[1] = /tmp/.config Fix by checking whether the per-test key exists before falling back. If it does exist but is empty, treat it as unset for that test and stop the fallback chain there. Cc: John Hawley <warthog9@eaglescrag.net> Cc: Andrea Righi <arighi@nvidia.com> Cc: Marcos Paulo de Souza <mpdesouza@suse.com> Cc: Matthieu Baerts <matttbe@kernel.org> Cc: Fernando Fernandez Mancera <fmancera@suse.de> Cc: Pedro Falcato <pfalcato@suse.de> Link: https://patch.msgid.link/20260307-ktest-fixes-v1-4-565d412f4925@suse.com Fixes: 22c37a9ac49d ("ktest: Allow tests to undefine default options") Signed-off-by: Ricardo B. Marlière <rbm@suse.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>