summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)AuthorFilesLines
2022-11-19llist: avoid extra memory read in llist_add_batchUros Bizjak1-2/+2
try_cmpxchg implicitly assigns old head->first value to "first" when cmpxchg fails. There is no need to re-read the value in the loop. Link: https://lkml.kernel.org/r/20221017145226.4044-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-19lib/oid_registry.c: remove redundant assignment to variable numColin Ian King1-1/+0
The variable num is being assigned a value that is never read, it is being re-assigned a new value in both paths if an if-statement. The assignment is redundant and can be removed. Cleans up clang scan build warning: lib/oid_registry.c:149:3: warning: Value stored to 'num' is never read [deadcode.DeadStores] Link: https://lkml.kernel.org/r/20221017214556.863357-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-18treewide: use get_random_u32_inclusive() when possibleJason A. Donenfeld3-8/+8
These cases were done with this Coccinelle: @@ expression H; expression L; @@ - (get_random_u32_below(H) + L) + get_random_u32_inclusive(L, H + L - 1) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - + E - - E ) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - - E - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - - E + F - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - + E + F - - E ) And then subsequently cleaned up by hand, with several automatic cases rejected if it didn't make sense contextually. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18treewide: use get_random_u32_{above,below}() instead of manual loopJason A. Donenfeld2-8/+2
These cases were done with this Coccinelle: @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I > E); + I = get_random_u32_below(E + 1); @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I >= E); + I = get_random_u32_below(E); @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I < E); + I = get_random_u32_above(E - 1); @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I <= E); + I = get_random_u32_above(E); @@ identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (!I); + I = get_random_u32_above(0); @@ identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I == 0); + I = get_random_u32_above(0); @@ expression E; @@ - E + 1 + get_random_u32_below(U32_MAX - E) + get_random_u32_above(E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18treewide: use get_random_u32_below() instead of deprecated functionJason A. Donenfeld11-24/+24
This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-16sbitmap: Try each queue to wake up at least one waiterGabriel Krisman Bertazi1-16/+12
Jan reported the new algorithm as merged might be problematic if the queue being awaken becomes empty between the waitqueue_active inside sbq_wake_ptr check and the wake up. If that happens, wake_up_nr will not wake up any waiter and we loose too many wake ups. In order to guarantee progress, we need to wake up at least one waiter here, if there are any. This now requires trying to wake up from every queue. Instead of walking through all the queues with sbq_wake_ptr, this call moves the wake up inside that function. In a previous version of the patch, I found that updating wake_index several times when walking through queues had a measurable overhead. This ensures we only update it once, at the end. Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags") Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221115224553.23594-4-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-16sbitmap: Advance the queue index before waking up a queueGabriel Krisman Bertazi1-2/+8
When a queue is awaken, the wake_index written by sbq_wake_ptr currently keeps pointing to the same queue. On the next wake up, it will thus retry the same queue, which is unfair to other queues, and can lead to starvation. This patch, moves the index update to happen before the queue is returned, such that it will now try a different queue first on the next wake up, improving fairness. Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags") Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20221115224553.23594-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-16lib/test_linear_ranges: Use LINEAR_RANGE()Matti Vaittinen1-11/+2
New initialization macro for linear ranges was added. Slightly simplify the test code by using this macro - and at the same time also verify the macro is working as intended. Use the newly added LINEAR_RANGE() initialization macro for linear range test. Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com> Link: https://lore.kernel.org/r/Y3R13IRrs+x5PcZ4@dc75zzyyyyyyyyyyyyydt-3.rev.dnainternet.fi Signed-off-by: Mark Brown <broonie@kernel.org>
2022-11-16lib/debugobjects: fix stat count and optimize debug_objects_mem_initwuchi1-0/+10
1. Var debug_objects_allocated tracks valid kmem_cache_alloc calls, so track it in debug_objects_replace_static_objects. Do similar things in object_cpu_offline. 2. In debug_objects_mem_init, there is no need to call function cpuhp_setup_state_nocalls when debug_objects_enabled = 0 (out of memory). Link: https://lkml.kernel.org/r/20220611130634.99741-1-wuchi.zero@gmail.com Fixes: 634d61f45d6f ("debugobjects: Percpu pool lookahead freeing/allocation") Fixes: c4b73aabd098 ("debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done") Signed-off-by: wuchi <wuchi.zero@gmail.com> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-14memregion: Add cpu_cache_invalidate_memregion() interfaceDavidlohr Bueso1-0/+3
With CXL security features, and CXL dynamic provisioning, global CPU cache flushing nvdimm requirements are no longer specific to that subsystem, even beyond the scope of security_ops. CXL will need such semantics for features not necessarily limited to persistent memory. The functionality this is enabling is to be able to instantaneously secure erase potentially terabytes of memory at once and the kernel needs to be sure that none of the data from before the erase is still present in the cache. It is also used when unlocking a memory device where speculative reads and firmware accesses could have cached poison from before the device was unlocked. Lastly this facility is used when mapping new devices, or new capacity into an established physical address range. I.e. when the driver switches DeviceA mapping AddressX to DeviceB mapping AddressX then any cached data from DeviceA:AddressX needs to be invalidated. This capability is typically only used once per-boot (for unlock), or once per bare metal provisioning event (secure erase), like when handing off the system to another tenant or decommissioning a device. It may also be used for dynamic CXL region provisioning. Users must first call cpu_cache_has_invalidate_memregion() to know whether this functionality is available on the architecture. On x86 this respects the constraints of when wbinvd() is tolerable. It is already the case that wbinvd() is problematic to allow in VMs due its global performance impact and KVM, for example, has been known to just trap and ignore the call. With confidential computing guest execution of wbinvd() may even trigger an exception. Given guests should not be messing with the bare metal address map via CXL configuration changes cpu_cache_has_invalidate_memregion() returns false in VMs. While this global cache invalidation facility, is exported to modules, since NVDIMM and CXL support can be built as a module, it is not for general use. The intent is that this facility is not available outside of specific "device-memory" use cases. To make that expectation as clear as possible the API is scoped to a new "DEVMEM" module namespace that only the NVDIMM and CXL subsystems are expected to import. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-14lib/raid6: drop RAID6_USE_EMPTY_ZERO_PAGEGiulio Benetti1-2/+0
RAID6_USE_EMPTY_ZERO_PAGE is unused and hardcoded to 0, so let's drop it. Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
2022-11-11sbitmap: Use single per-bitmap counting to wake up queued tagsGabriel Krisman Bertazi1-96/+26
sbitmap suffers from code complexity, as demonstrated by recent fixes, and eventual lost wake ups on nested I/O completion. The later happens, from what I understand, due to the non-atomic nature of the updates to wait_cnt, which needs to be subtracted and eventually reset when equal to zero. This two step process can eventually miss an update when a nested completion happens to interrupt the CPU in between the wait_cnt updates. This is very hard to fix, as shown by the recent changes to this code. The code complexity arises mostly from the corner cases to avoid missed wakes in this scenario. In addition, the handling of wake_batch recalculation plus the synchronization with sbq_queue_wake_up is non-trivial. This patchset implements the idea originally proposed by Jan [1], which removes the need for the two-step updates of wait_cnt. This is done by tracking the number of completions and wakeups in always increasing, per-bitmap counters. Instead of having to reset the wait_cnt when it reaches zero, we simply keep counting, and attempt to wake up N threads in a single wait queue whenever there is enough space for a batch. Waking up less than batch_wake shouldn't be a problem, because we haven't changed the conditions for wake up, and the existing batch calculation guarantees at least enough remaining completions to wake up a batch for each queue at any time. Performance-wise, one should expect very similar performance to the original algorithm for the case where there is no queueing. In both the old algorithm and this implementation, the first thing is to check ws_active, which bails out if there is no queueing to be managed. In the new code, we took care to avoid accounting completions and wakeups when there is no queueing, to not pay the cost of atomic operations unnecessarily, since it doesn't skew the numbers. For more interesting cases, where there is queueing, we need to take into account the cross-communication of the atomic operations. I've been benchmarking by running parallel fio jobs against a single hctx nullb in different hardware queue depth scenarios, and verifying both IOPS and queueing. Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel jobs. fio was issuing fixed-size randwrites with qd=64 against nullb, varying only the hardware queue length per test. queue size 2 4 8 16 32 64 6.1-rc2 1681.1K (1.6K) 2633.0K (12.7K) 6940.8K (16.3K) 8172.3K (617.5K) 8391.7K (367.1K) 8606.1K (351.2K) patched 1721.8K (15.1K) 3016.7K (3.8K) 7543.0K (89.4K) 8132.5K (303.4K) 8324.2K (230.6K) 8401.8K (284.7K) The following is a similar experiment, ran against a nullb with a single bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40 parallel fio jobs operating on the same device queue size 2 4 8 16 32 64 6.1-rc2 1081.0K (2.3K) 957.2K (1.5K) 1699.1K (5.7K) 6178.2K (124.6K) 12227.9K (37.7K) 13286.6K (92.9K) patched 1081.8K (2.8K) 1316.5K (5.4K) 2364.4K (1.8K) 6151.4K (20.0K) 11893.6K (17.5K) 12385.6K (18.4K) It has also survived blktests and a 12h-stress run against nullb. I also ran the code against nvme and a scsi SSD, and I didn't observe performance regression in those. If there are other tests you think I should run, please let me know and I will follow up with results. [1] https://lore.kernel.org/all/aef9de29-e9f5-259a-f8be-12d1b734e72@google.com/ Cc: Hugh Dickins <hughd@google.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Liu Song <liusong@linux.alibaba.com> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20221105231055.25953-1-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-09lib/scatterlist: add check when merging zone device pagesLogan Gunthorpe1-10/+15
Consecutive zone device pages should not be merged into the same sgl or bvec segment with other types of pages or if they belong to different pgmaps. Otherwise getting the pgmap of a given segment is not possible without scanning the entire segment. This helper returns true either if both pages are not zone device pages or both pages are zone device pages with the same pgmap. Factor out the check for page mergability into a pages_are_mergable() helper and add a check with zone_device_pages_are_mergeable(). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221021174116.7200-6-logang@deltatee.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-09iov_iter: introduce iov_iter_get_pages_[alloc_]flags()Logan Gunthorpe1-8/+24
Add iov_iter_get_pages_flags() and iov_iter_get_pages_alloc_flags() which take a flags argument that is passed to get_user_pages_fast(). This is so that FOLL_PCI_P2PDMA can be passed when appropriate. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221021174116.7200-4-logang@deltatee.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-09maple_tree: don't set a new maximum on the node when not reusing nodesLiam Howlett1-2/+1
In RCU mode, the node limits were being updated to the last pivot which may not be correct and would cause the metadata to be set when it shouldn't. Fix this by not setting a new limit in this case. Link: https://lkml.kernel.org/r/20221107163857.867377-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09maple_tree: fix depth tracking in maple_stateLiam Howlett1-1/+2
It is possible to confuse the depth tracking in the maple state by searching the same node for values. Fix the depth tracking by moving where the depth is incremented closer to where the node changes level. Also change the initial depth setting when using the root node. Link: https://lkml.kernel.org/r/20221107163814.866612-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09kmsan: make sure PREEMPT_RT is offAlexander Potapenko1-0/+1
As pointed out by Peter Zijlstra, __msan_poison_alloca() does not play well with IRQ code when PREEMPT_RT is on, because in that mode even GFP_ATOMIC allocations cannot be performed. Fixing this would require making stackdepot completely lockless, which is quite challenging and may be excessive for the time being. Instead, make sure KMSAN is incompatible with PREEMPT_RT, like other debug configs are. Link: https://lkml.kernel.org/r/20221102110611.1085175-4-glider@google.com Link: https://lore.kernel.org/lkml/20221025221755.3810809-1-glider@google.com/ Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARNAlexander Potapenko1-1/+1
As pointed out by Masahiro Yamada, Kconfig picks up the first default entry which has true 'if' condition. Hence, the previously added check for KMSAN was never used, because it followed the checks for 64BIT and !64BIT. Put KMSAN check before others to ensure it is always applied. Link: https://lkml.kernel.org/r/20221102110611.1085175-3-glider@google.com Link: https://github.com/google/kmsan/issues/89 Link: https://lore.kernel.org/linux-mm/20221024212144.2852069-3-glider@google.com/ Fixes: 921757bc9b61 ("Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default") Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Elver <elver@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09maple_tree: reorganize testing to restore module testingLiam Howlett4-35741/+232
Along the development cycle, the testing code support for module/in-kernel compiles was removed. Restore this functionality by moving any internal API tests to the userspace side, as well as threading tests. Fix the lockdep issues and add a way to reduce memory usage so the tests can complete with KASAN + memleak detection. Make the tests work on 32 bit hosts where possible and detect 32 bit hosts in the radix test suite. [akpm@linux-foundation.org: fix module export] [akpm@linux-foundation.org: fix it some more] [liam.howlett@oracle.com: fix compile warnings on 32bit build in check_find()] Link: https://lkml.kernel.org/r/20221107203816.1260327-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20221028180415.3074673-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09maple_tree: mas_anode_descend() clang-analyzer cleanupLiam Howlett1-6/+4
clang-analyzer reported some Dead Stores in mas_anode_descend(). Upon inspection, there were a few clean ups that would make the code cleaner: The count variable was set from the mt_slots array and then updated but never used again. Just use the array reference directly. Also stop updating the type since it isn't used after the update. Stop setting the gaps pointer to NULL at the start since it is always set before the loop begins. Link: https://lkml.kernel.org/r/20221026151413.4032730-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09maple_tree: remove pointer to pointer use in mas_alloc_nodes()Liam Howlett1-3/+1
There is a more direct and cleaner way of implementing the same functional code. Remove the confusing and unnecessary use of pointers here. Link: https://lkml.kernel.org/r/20221026151241.4031117-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-03Merge tag 'net-6.1-rc4' of ↵Linus Torvalds1-26/+15
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and netfilter. Current release - regressions: - net: several zerocopy flags fixes - netfilter: fix possible memory leak in nf_nat_init() - openvswitch: add missing .resv_start_op Previous releases - regressions: - neigh: fix null-ptr-deref in neigh_table_clear() - sched: fix use after free in red_enqueue() - dsa: fall back to default tagger if we can't load the one from DT - bluetooth: fix use-after-free in l2cap_conn_del() Previous releases - always broken: - netfilter: netlink notifier might race to release objects - nfc: fix potential memory leak of skb - bluetooth: fix use-after-free caused by l2cap_reassemble_sdu - bluetooth: use skb_put to set length - eth: tun: fix bugs for oversize packet when napi frags enabled - eth: lan966x: fixes for when MTU is changed - eth: dwmac-loongson: fix invalid mdio_node" * tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) vsock: fix possible infinite sleep in vsock_connectible_wait_data() vsock: remove the unused 'wait' in vsock_connectible_recvmsg() ipv6: fix WARNING in ip6_route_net_exit_late() bridge: Fix flushing of dynamic FDB entries net, neigh: Fix null-ptr-deref in neigh_table_clear() net/smc: Fix possible leaked pernet namespace in smc_init() stmmac: dwmac-loongson: fix invalid mdio_node ibmvnic: Free rwi on reset success net: mdio: fix undefined behavior in bit shift for __mdiobus_register Bluetooth: L2CAP: Fix attempting to access uninitialized memory Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect Bluetooth: L2CAP: Fix memory leak in vhci_write Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del() Bluetooth: virtio_bt: Use skb_put to set length Bluetooth: hci_conn: Fix CIS connection dst_type handling Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu netfilter: ipset: enforce documented limit to prevent allocating huge memory isdn: mISDN: netjet: fix wrong check of device registration ...
2022-11-02netlink: introduce bigendian integer typesFlorian Westphal1-26/+15
Jakub reported that the addition of the "network_byte_order" member in struct nla_policy increases size of 32bit platforms. Instead of scraping the bit from elsewhere Johannes suggested to add explicit NLA_BE types instead, so do this here. NLA_POLICY_MAX_BE() macro is removed again, there is no need for it: NLA_POLICY_MAX(NLA_BE.., ..) will do the right thing. NLA_BE64 can be added later. Fixes: 08724ef69907 ("netlink: introduce NLA_POLICY_MAX_BE") Reported-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20221031123407.9158-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-01test_printf: Refactor fwnode_pointer() to make it more readableAndy Shevchenko1-14/+12
Converting fwnode_pointer() to use better swnode API allows to make code more readable. While at it, rename full_name to full_name_third to show exact relation in the hierarchy. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220824170542.18263-1-andriy.shevchenko@linux.intel.com
2022-10-30Merge tag 'mm-hotfixes-stable-2022-10-28' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "Eight fix pre-6.0 bugs and the remainder address issues which were introduced in the 6.1-rc merge cycle, or address issues which aren't considered sufficiently serious to warrant a -stable backport" * tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region lib: maple_tree: remove unneeded initialization in mtree_range_walk() mmap: fix remap_file_pages() regression mm/shmem: ensure proper fallback if page faults mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page() x86: fortify: kmsan: fix KMSAN fortify builds x86: asm: make sure __put_user_size() evaluates pointer once Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default x86/purgatory: disable KMSAN instrumentation mm: kmsan: export kmsan_copy_page_meta() mm: migrate: fix return value if all subpages of THPs are migrated successfully mm/uffd: fix vma check on userfault for wp mm: prep_compound_tail() clear page->private mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs mm/page_isolation: fix clang deadcode warning fs/ext4/super.c: remove unused `deprecated_msg' ipc/msg.c: fix percpu_counter use after free memory tier, sysfs: rename attribute "nodes" to "nodelist" MAINTAINERS: git://github.com -> https://github.com for nilfs2 mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops ...
2022-10-28cgroup: Implement DEBUG_CGROUP_REFTejun Heo1-0/+10
It's really difficult to debug when cgroup or css refs leak. Let's add a debug option to force the refcnt function to not be inlined so that they can be kprobed for debugging. Signed-off-by: Tejun Heo <tj@kernel.org>
2022-10-28lib: maple_tree: remove unneeded initialization in mtree_range_walk()Lukas Bulwahn1-2/+2
Before the do-while loop in mtree_range_walk(), the variables next, min, max need to be initialized. The variables last, prev_min and prev_max are set within the loop body before they are eventually used after exiting the loop body. As it is a do-while loop, the loop body is executed at least once, so the variables last, prev_min and prev_max do not need to be initialized before the loop body. Remove unneeded initialization of last and prev_min. The needless initialization was reported by clang-analyzer as Dead Stores. As the compiler already identifies these assignments as unneeded, it optimizes the assignments away. Hence: No functional change. No change in object code. Link: https://lkml.kernel.org/r/20221026120029.12555-2-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by defaultAlexander Potapenko1-1/+2
KMSAN adds a lot of instrumentation to the code, which results in increased stack usage (up to 2048 bytes and more in some cases). It's hard to predict how big the stack frames can be, so we disable the warnings for KMSAN instead. Link: https://lkml.kernel.org/r/20221024212144.2852069-3-glider@google.com Link: https://github.com/google/kmsan/issues/89 Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-27Merge tag 'net-6.1-rc3-2' of ↵Linus Torvalds1-36/+22
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from 802.15.4 (Zigbee et al). Current release - regressions: - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1 Current release - new code bugs: - mptcp: fix abba deadlock on fastopen - eth: stmmac: rk3588: allow multiple gmac controllers in one system Previous releases - regressions: - ip: rework the fix for dflt addr selection for connected nexthop - net: couple more fixes for misinterpreting bits in struct page after the signature was added Previous releases - always broken: - ipv6: ensure sane device mtu in tunnels - openvswitch: switch from WARN to pr_warn on a user-triggerable path - ethtool: eeprom: fix null-deref on genl_info in dump - ieee802154: more return code fixes for corner cases in dgram_sendmsg - mac802154: fix link-quality-indicator recording - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack offload - eth: fec: limit register access on i.MX6UL - eth: bcm4908_enet: update TX stats after actual transmission - can: rcar_canfd: improve IRQ handling for RZ/G2L Misc: - genetlink: piggy back on the newly added resv_op_start to enforce more sanity checks on new commands" * tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) net: enetc: survive memory pressure without crashing kcm: do not sense pfmemalloc status in kcm_sendpage() net: do not sense pfmemalloc status in skb_append_pagefrags() net/mlx5e: Fix macsec sci endianness at rx sa update net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function net/mlx5e: Fix macsec rx security association (SA) update/delete net/mlx5e: Fix macsec coverity issue at rx sa update net/mlx5: Fix crash during sync firmware reset net/mlx5: Update fw fatal reporter state on PCI handlers successful recover net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed net/mlx5e: TC, Reject forwarding from internal port to internal port net/mlx5: Fix possible use-after-free in async command interface net/mlx5: ASO, Create the ASO SQ with the correct timestamp format net/mlx5e: Update restore chain id for slow path packets net/mlx5e: Extend SKB room check to include PTP-SQ net/mlx5: DR, Fix matcher disconnect error flow net/mlx5: Wait for firmware to enable CRS before pci_restore_state net/mlx5e: Do not increment ESN when updating IPsec ESN state netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed ...
2022-10-27Merge tag 'hardening-v6.1-rc3' of ↵Linus Torvalds1-11/+36
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Fix older Clang vs recent overflow KUnit test additions (Nick Desaulniers, Kees Cook) - Fix kern-doc visibility for overflow helpers (Kees Cook) * tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: overflow: Refactor test skips for Clang-specific issues overflow: disable failing tests for older clang versions overflow: Fix kern-doc markup for functions
2022-10-27kunit: remove unused structure definitionYoungJun.park1-5/+0
remove unused string_stream_alloc_context structure definition. Signed-off-by: YoungJun.park <her0gyugyu@gmail.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-27kunit: Add KUnit memory block assertions to the example_all_expect_macros_testMaíra Canal1-0/+7
Augment the example_all_expect_macros_test with the KUNIT_EXPECT_MEMEQ and KUNIT_EXPECT_MEMNEQ macros by creating a test with memory block assertions. Signed-off-by: Maíra Canal <mairacanal@riseup.net> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-27kunit: Introduce KUNIT_EXPECT_MEMEQ and KUNIT_EXPECT_MEMNEQ macrosMaíra Canal1-0/+56
Currently, in order to compare memory blocks in KUnit, the KUNIT_EXPECT_EQ or KUNIT_EXPECT_FALSE macros are used in conjunction with the memcmp function, such as: KUNIT_EXPECT_EQ(test, memcmp(foo, bar, size), 0); Although this usage produces correct results for the test cases, when the expectation fails, the error message is not very helpful, indicating only the return of the memcmp function. Therefore, create a new set of macros KUNIT_EXPECT_MEMEQ and KUNIT_EXPECT_MEMNEQ that compare memory blocks until a specified size. In case of expectation failure, those macros print the hex dump of the memory blocks, making it easier to debug test failures for memory blocks. That said, the expectation KUNIT_EXPECT_EQ(test, memcmp(foo, bar, size), 0); would translate to the expectation KUNIT_EXPECT_MEMEQ(test, foo, bar, size); Signed-off-by: Maíra Canal <mairacanal@riseup.net> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-27kunit: log numbers in decimal and hexMark Rutland1-2/+4
When KUNIT_EXPECT_EQ() or KUNIT_ASSERT_EQ() log a failure, they log the two values being compared, with numerical values logged in decimal. In some cases, decimal output is painful to consume, and hexadecimal output would be more helpful. For example, this is the case for tests I'm currently developing for the arm64 insn encoding/decoding code, where comparing two 32-bit instruction opcodes results in output such as: | # test_insn_add_shifted_reg: EXPECTATION FAILED at arch/arm64/lib/test_insn.c:2791 | Expected obj_insn == gen_insn, but | obj_insn == 2332164128 | gen_insn == 1258422304 To make this easier to consume, this patch logs the values in both decimal and hexadecimal: | # test_insn_add_shifted_reg: EXPECTATION FAILED at arch/arm64/lib/test_insn.c:2791 | Expected obj_insn == gen_insn, but | obj_insn == 2332164128 (0x8b020020) | gen_insn == 1258422304 (0x4b020020) As can be seen from the example, having hexadecimal makes it significantly easier for a human to spot which specific bits are incorrect. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: linux-kselftest@vger.kernel.org Cc: kunit-dev@googlegroups.com Acked-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-26rhashtable: make test actually randomRolf Eike Beer1-36/+22
The "random rhlist add/delete operations" actually wasn't very random, as all cases tested the same bit. Since the later parts of this loop depend on the first case execute this unconditionally, and then test on different bits for the remaining tests. While at it only request as much random bits as are actually used. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-26overflow: Refactor test skips for Clang-specific issuesKees Cook1-17/+35
Convert test exclusion into test skipping. This brings the logic for why a test is being skipped into the test itself, instead of having to spread ifdefs around the code. This will make cleanup easier as minimum tests get raised. Drop __maybe_unused so missed tests will be noticed again and clean up whitespace. For example, clang-11 on i386: [15:52:32] ================== overflow (18 subtests) ================== [15:52:32] [PASSED] u8_u8__u8_overflow_test [15:52:32] [PASSED] s8_s8__s8_overflow_test [15:52:32] [PASSED] u16_u16__u16_overflow_test [15:52:32] [PASSED] s16_s16__s16_overflow_test [15:52:32] [PASSED] u32_u32__u32_overflow_test [15:52:32] [PASSED] s32_s32__s32_overflow_test [15:52:32] [SKIPPED] u64_u64__u64_overflow_test [15:52:32] [SKIPPED] s64_s64__s64_overflow_test [15:52:32] [SKIPPED] u32_u32__int_overflow_test [15:52:32] [PASSED] u32_u32__u8_overflow_test [15:52:32] [PASSED] u8_u8__int_overflow_test [15:52:32] [PASSED] int_int__u8_overflow_test [15:52:32] [PASSED] shift_sane_test [15:52:32] [PASSED] shift_overflow_test [15:52:32] [PASSED] shift_truncate_test [15:52:32] [PASSED] shift_nonsense_test [15:52:32] [PASSED] overflow_allocation_test [15:52:32] [PASSED] overflow_size_helpers_test [15:52:32] ==================== [PASSED] overflow ===================== [15:52:32] ============================================================ [15:52:32] Testing complete. Ran 18 tests: passed: 15, skipped: 3 Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: llvm@lists.linux.dev Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20221006230017.1833458-1-keescook@chromium.org
2022-10-26overflow: disable failing tests for older clang versionsNick Desaulniers1-1/+8
Building the overflow kunit tests with clang-11 fails with: $ ./tools/testing/kunit/kunit.py run --arch=arm --make_options LLVM=1 \ overflow ... ld.lld: error: undefined symbol: __mulodi4 ... Clang 11 and earlier generate unwanted libcalls for signed output, unsigned input. Disable these tests for now, but should these become used in the kernel we might consider that as justification for dropping clang-11 support. Keep the clang-11 build alive a little bit longer. Avoid -Wunused-function warnings via __maybe_unused. To test W=1: $ make LLVM=1 -j128 defconfig $ ./scripts/config -e KUNIT -e KUNIT_ALL $ make LLVM=1 -j128 olddefconfig lib/overflow_kunit.o W=1 Link: https://github.com/ClangBuiltLinux/linux/issues/1711 Link: https://github.com/llvm/llvm-project/commit/3203143f1356a4e4e3ada231156fc6da6e1a9f9d Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221006171751.3444575-1-ndesaulniers@google.com
2022-10-19kcsan: Fix trivial typo in Kconfig help commentsRyosuke Yasuoka1-3/+3
Fix trivial typo in Kconfig help comments in KCSAN_SKIP_WATCH and KCSAN_SKIP_WATCH_RANDOMIZE Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-19kunit: update NULL vs IS_ERR() testsDan Carpenter2-3/+3
The alloc_string_stream() functions were changed from returning NULL on error to returning error pointers so these caller needs to be updated as well. Fixes: 78b1c6584fce ("kunit: string-stream: Simplify resource use") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-18vsprintf: replace in_irq() with in_hardirq()ye xingchen1-1/+1
Replace the obsolete and ambiguos macro in_irq() with new macro in_hardirq(). Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20221011024831.322799-1-ye.xingchen@zte.com.cn
2022-10-17Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds17-47/+38
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
2022-10-16Merge tag 'kbuild-fixes-v6.1' of ↵Linus Torvalds1-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the combination of Clang >= 14 and GAS <= 2.35. - Drop vmlinux.bz2 from the rpm package as it just annoyingly increased the package size. - Fix modpost error under build environments using musl. - Make *.ll files keep value names for easier debugging - Fix single directory build - Prevent RISC-V from selecting the broken DWARF5 support when Clang and GAS are used together. * tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5 kbuild: fix single directory build kbuild: add -fno-discard-value-names to cmd_cc_ll_c scripts/clang-tools: Convert clang-tidy args to list modpost: put modpost options before argument kbuild: Stop including vmlinux.bz2 in the rpm's Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
2022-10-16lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5Nathan Chancellor1-2/+7
When building with a RISC-V kernel with DWARF5 debug info using clang and the GNU assembler, several instances of the following error appear: /tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported Dumping the .s file reveals these .uleb128 directives come from .debug_loc and .debug_ranges: .Ldebug_loc0: .byte 4 # DW_LLE_offset_pair .uleb128 .Lfunc_begin0-.Lfunc_begin0 # starting offset .uleb128 .Ltmp1-.Lfunc_begin0 # ending offset .byte 1 # Loc expr size .byte 90 # DW_OP_reg10 .byte 0 # DW_LLE_end_of_list .Ldebug_ranges0: .byte 4 # DW_RLE_offset_pair .uleb128 .Ltmp6-.Lfunc_begin0 # starting offset .uleb128 .Ltmp27-.Lfunc_begin0 # ending offset .byte 4 # DW_RLE_offset_pair .uleb128 .Ltmp28-.Lfunc_begin0 # starting offset .uleb128 .Ltmp30-.Lfunc_begin0 # ending offset .byte 0 # DW_RLE_end_of_list There is an outstanding binutils issue to support a non-constant operand to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to be any movement on it, due to concerns over how it would work with linker relaxation. To avoid these build errors, prevent DWARF5 from being selected when using clang and an assembler that does not have support for these symbol deltas, which can be easily checked in Kconfig with as-instr plus the small test program from the dwz test suite from the binutils issue. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215 Link: https://github.com/ClangBuiltLinux/linux/issues/1719 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-14Merge tag 'mm-stable-2022-10-13' of ↵Linus Torvalds3-25/+126
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - fix a race which causes page refcounting errors in ZONE_DEVICE pages (Alistair Popple) - fix userfaultfd test harness instability (Peter Xu) - various other patches in MM, mainly fixes * tag 'mm-stable-2022-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (29 commits) highmem: fix kmap_to_page() for kmap_local_page() addresses mm/page_alloc: fix incorrect PGFREE and PGALLOC for high-order page mm/selftest: uffd: explain the write missing fault check mm/hugetlb: use hugetlb_pte_stable in migration race check mm/hugetlb: fix race condition of uffd missing/minor handling zram: always expose rw_page LoongArch: update local TLB if PTE entry exists mm: use update_mmu_tlb() on the second thread kasan: fix array-bounds warnings in tests hmm-tests: add test for migrate_device_range() nouveau/dmem: evict device private memory during release nouveau/dmem: refactor nouveau_dmem_fault_copy_one() mm/migrate_device.c: add migrate_device_range() mm/migrate_device.c: refactor migrate_vma and migrate_deivce_coherent_page() mm/memremap.c: take a pgmap reference on page allocation mm: free device private pages have zero refcount mm/memory.c: fix race when faulting a device private page mm/damon: use damon_sz_region() in appropriate place mm/damon: move sz_damon_region to damon_sz_region lib/test_meminit: add checks for the allocation functions ...
2022-10-14Merge tag 'parisc-for-6.1-1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "Fixes: - When we added basic vDSO support in kernel 5.18 we introduced a bug which prevented a mmap() of graphic card memory. This is because we used the DMB (data memory break trap bit) page flag as special-bit, but missed to clear that bit when loading the TLB. - Graphics card memory size was not correctly aligned - Spelling fixes (from Colin Ian King) Enhancements: - PDC console (which uses firmware calls) now rewritten as early console - Reduced size of alternative tables" * tag 'parisc-for-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix spelling mistake "mis-match" -> "mismatch" in eisa driver parisc: Fix userspace graphics card breakage due to pgtable special bit parisc: fbdev/stifb: Align graphics memory size to 4MB parisc: Convert PDC console to an early console parisc: Reduce kernel size by packing alternative tables
2022-10-13hmm-tests: add test for migrate_device_range()Alistair Popple2-21/+100
Link: https://lkml.kernel.org/r/a73cf109de0224cfd118d22be58ddebac3ae2897.1664366292.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-13mm: free device private pages have zero refcountAlistair Popple1-1/+1
Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page refcount") device private pages have no longer had an extra reference count when the page is in use. However before handing them back to the owning device driver we add an extra reference count such that free pages have a reference count of one. This makes it difficult to tell if a page is free or not because both free and in use pages will have a non-zero refcount. Instead we should return pages to the drivers page allocator with a zero reference count. Kernel code can then safely use kernel functions such as get_page_unless_zero(). Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-13mm/memory.c: fix race when faulting a device private pageAlistair Popple1-3/+4
Patch series "Fix several device private page reference counting issues", v2 This series aims to fix a number of page reference counting issues in drivers dealing with device private ZONE_DEVICE pages. These result in use-after-free type bugs, either from accessing a struct page which no longer exists because it has been removed or accessing fields within the struct page which are no longer valid because the page has been freed. During normal usage it is unlikely these will cause any problems. However without these fixes it is possible to crash the kernel from userspace. These crashes can be triggered either by unloading the kernel module or unbinding the device from the driver prior to a userspace task exiting. In modules such as Nouveau it is also possible to trigger some of these issues by explicitly closing the device file-descriptor prior to the task exiting and then accessing device private memory. This involves some minor changes to both PowerPC and AMD GPU code. Unfortunately I lack hardware to test either of those so any help there would be appreciated. The changes mimic what is done in for both Nouveau and hmm-tests though so I doubt they will cause problems. This patch (of 8): When the CPU tries to access a device private page the migrate_to_ram() callback associated with the pgmap for the page is called. However no reference is taken on the faulting page. Therefore a concurrent migration of the device private page can free the page and possibly the underlying pgmap. This results in a race which can crash the kernel due to the migrate_to_ram() function pointer becoming invalid. It also means drivers can't reliably read the zone_device_data field because the page may have been freed with memunmap_pages(). Close the race by getting a reference on the page while holding the ptl to ensure it has not been freed. Unfortunately the elevated reference count will cause the migration required to handle the fault to fail. To avoid this failure pass the faulting page into the migrate_vma functions so that if an elevated reference count is found it can be checked to see if it's expected or not. [mpe@ellerman.id.au: fix build] Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Lyude Paul <lyude@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-13lib/test_meminit: add checks for the allocation functionsXiaoke Wang1-0/+21
alloc_pages(), kmalloc() and vmalloc() are all memory allocation functions which can return NULL when some internal memory failures happen. So it is better to check the return of them to catch the failure in time for better test them. Link: https://lkml.kernel.org/r/tencent_D44A49FFB420EDCCBFB9221C8D14DFE12908@qq.com Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-13Merge tag 'linux-kselftest-kunit-6.1-rc1-2' of ↵Linus Torvalds4-95/+43
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull more KUnit updates from Shuah Khan: "Features and fixes: - simplify resource use - make kunit_malloc() and kunit_free() allocations and frees consistent. kunit_free() frees only the memory allocated by kunit_malloc() - stop downloading risc-v opensbi binaries using wget - other fixes and improvements to tool and KUnit framework" * tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: Documentation: kunit: Update description of --alltests option kunit: declare kunit_assert structs as const kunit: rename base KUNIT_ASSERTION macro to _KUNIT_FAILED kunit: remove format func from struct kunit_assert, get it to 0 bytes kunit: tool: Don't download risc-v opensbi firmware with wget kunit: make kunit_kfree(NULL) a no-op to match kfree() kunit: make kunit_kfree() not segfault on invalid inputs kunit: make kunit_kfree() only work on pointers from kunit_malloc() and friends kunit: drop test pointer in string_stream_fragment kunit: string-stream: Simplify resource use