summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)AuthorFilesLines
2023-03-10sbitmap: Try each queue to wake up at least one waiterGabriel Krisman Bertazi1-16/+12
commit 26edb30dd1c0c9be11fa676b4f330ada7b794ba6 upstream. Jan reported the new algorithm as merged might be problematic if the queue being awaken becomes empty between the waitqueue_active inside sbq_wake_ptr check and the wake up. If that happens, wake_up_nr will not wake up any waiter and we loose too many wake ups. In order to guarantee progress, we need to wake up at least one waiter here, if there are any. This now requires trying to wake up from every queue. Instead of walking through all the queues with sbq_wake_ptr, this call moves the wake up inside that function. In a previous version of the patch, I found that updating wake_index several times when walking through queues had a measurable overhead. This ensures we only update it once, at the end. Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags") Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221115224553.23594-4-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10sbitmap: Advance the queue index before waking up a queueGabriel Krisman Bertazi1-2/+8
commit 976570b4ecd30d3ec6e1b0910da8e5edc591f2b6 upstream. When a queue is awaken, the wake_index written by sbq_wake_ptr currently keeps pointing to the same queue. On the next wake up, it will thus retry the same queue, which is unfair to other queues, and can lead to starvation. This patch, moves the index update to happen before the queue is returned, such that it will now try a different queue first on the next wake up, improving fairness. Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags") Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20221115224553.23594-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUGPeter Zijlstra1-1/+14
[ Upstream commit 5a5d7e9badd2cb8065db171961bd30bd3595e4b6 ] In order to avoid WARN/BUG from generating nested or even recursive warnings, force rcu_is_watching() true during WARN/lockdep_rcu_suspicious(). Notably things like unwinding the stack can trigger rcu_dereference() warnings, which then triggers more unwinding which then triggers more warnings etc.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230126151323.408156109@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10kobject: Fix slab-out-of-bounds in fill_kobj_path()Wang Hai1-2/+10
[ Upstream commit 3bb2a01caa813d3a1845d378bbe4169ef280d394 ] In kobject_get_path(), if kobj->name is changed between calls get_kobj_path_length() and fill_kobj_path() and the length becomes longer, then fill_kobj_path() will have an out-of-bounds bug. The actual current problem occurs when the ixgbe probe. In ixgbe_mii_bus_init(), if the length of netdev->dev.kobj.name length becomes longer, out-of-bounds will occur. cpu0 cpu1 ixgbe_probe register_netdev(netdev) netdev_register_kobject device_add kobject_uevent // Sending ADD events systemd-udevd // rename netdev dev_change_name device_rename kobject_rename ixgbe_mii_bus_init | mdiobus_register | __mdiobus_register | device_register | device_add | kobject_uevent | kobject_get_path | len = get_kobj_path_length // old name | path = kzalloc(len, gfp_mask); | kobj->name = name; /* name length becomes * longer */ fill_kobj_path /* kobj path length is * longer than path, * resulting in out of * bounds when filling path */ This is the kasan report: ================================================================== BUG: KASAN: slab-out-of-bounds in fill_kobj_path+0x50/0xc0 Write of size 7 at addr ff1100090573d1fd by task kworker/28:1/673 Workqueue: events work_for_cpu_fn Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_address_description.constprop.0+0x86/0x1e7 print_report+0x36/0x4f kasan_report+0xad/0x130 kasan_check_range+0x35/0x1c0 memcpy+0x39/0x60 fill_kobj_path+0x50/0xc0 kobject_get_path+0x5a/0xc0 kobject_uevent_env+0x140/0x460 device_add+0x5c7/0x910 __mdiobus_register+0x14e/0x490 ixgbe_probe.cold+0x441/0x574 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 process_one_work+0x3b6/0x6a0 worker_thread+0x368/0x520 kthread+0x165/0x1a0 ret_from_fork+0x1f/0x30 This reproducer triggers that bug: while: do rmmod ixgbe sleep 0.5 modprobe ixgbe sleep 0.5 When calling fill_kobj_path() to fill path, if the name length of kobj becomes longer, return failure and retry. This fixes the problem. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20221220012143.52141-1-wanghai38@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10kobject: modify kobject_get_path() to take a const *Greg Kroah-Hartman1-5/+5
[ Upstream commit 33a0a1e3b3d17445832177981dc7a1c6a5b009f8 ] kobject_get_path() does not modify the kobject passed to it, so make the pointer constant. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20221001165315.2690141-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 3bb2a01caa81 ("kobject: Fix slab-out-of-bounds in fill_kobj_path()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10printf: fix errname.c listArnd Bergmann1-8/+14
[ Upstream commit 0c2baf6509af1d11310ae4c1c839481a6e9a4bc4 ] On most architectures, gcc -Wextra warns about the list of error numbers containing both EDEADLK and EDEADLOCK: lib/errname.c:15:67: warning: initialized field overwritten [-Woverride-init] 15 | #define E(err) [err + BUILD_BUG_ON_ZERO(err <= 0 || err > 300)] = "-" #err | ^~~ lib/errname.c:172:2: note: in expansion of macro 'E' 172 | E(EDEADLK), /* EDEADLOCK */ | ^ On parisc, a similar error happens with -ECANCELLED, which is an alias for ECANCELED. Make the EDEADLK printing conditional on the number being distinct from EDEADLOCK, and remove the -ECANCELLED bit completely as it can never be hit. To ensure these are correct, add static_assert lines that verify all the remaining aliases are in fact identical to the canonical name. Fixes: 57f5677e535b ("printf: add support for printing symbolic error names") Cc: Petr Mladek <pmladek@suse.com> Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/all/20210514213456.745039-1-arnd@kernel.org/ Link: https://lore.kernel.org/all/20210927123409.1109737-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230206194126.380350-1-arnd@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10lib/mpi: Fix buffer overrun when SG is too longHerbert Xu1-1/+2
[ Upstream commit 7361d1bc307b926cbca214ab67b641123c2d6357 ] The helper mpi_read_raw_from_sgl sets the number of entries in the SG list according to nbytes. However, if the last entry in the SG list contains more data than nbytes, then it may overrun the buffer because it only allocates enough memory for nbytes. Fixes: 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") Reported-by: Roberto Sassu <roberto.sassu@huaweicloud.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10sbitmap: correct wake_batch recalculation to avoid potential IO hungKemeng Shi1-4/+1
[ Upstream commit b5fcf7871acb7f9a3a8ed341a68bd86aba3e254a ] Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened") mentioned that in case of shared tags, there could be just one real active hctx(queue) because of lazy detection of tag idle. Then driver tag allocation may wait forever on this real active hctx(queue) if wake_batch is > hctx_max_depth where hctx_max_depth is available tags depth for the actve hctx(queue). However, the condition wake_batch > hctx_max_depth is not strong enough to avoid IO hung as the sbitmap_queue_wake_up will only wake up one wait queue for each wake_batch even though there is only one waiter in the woken wait queue. After this, there is only one tag to free and wake_batch may not be reached anymore. Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened") methioned that driver tag allocation may wait forever. Actually, the inactive hctx(queue) will be truely idle after at most 30 seconds and will call blk_mq_tag_wakeup_all to wake one waiter per wait queue to break the hung. But IO hung for 30 seconds is also not acceptable. Set batch size to small enough that depth of the shared hctx(queue) is enough to wake up all of the queues like sbq_calc_wake_batch do to fix this potential IO hung. Although hctx_max_depth will be clamped to at least 4 while wake_batch recalculation does not do the clamp, the wake_batch will be always recalculated to 1 when hctx_max_depth <= 4. Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-6-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10sbitmap: Use single per-bitmap counting to wake up queued tagsGabriel Krisman Bertazi1-96/+26
[ Upstream commit 4f8126bb2308066b877859e4b5923ffb54143630 ] sbitmap suffers from code complexity, as demonstrated by recent fixes, and eventual lost wake ups on nested I/O completion. The later happens, from what I understand, due to the non-atomic nature of the updates to wait_cnt, which needs to be subtracted and eventually reset when equal to zero. This two step process can eventually miss an update when a nested completion happens to interrupt the CPU in between the wait_cnt updates. This is very hard to fix, as shown by the recent changes to this code. The code complexity arises mostly from the corner cases to avoid missed wakes in this scenario. In addition, the handling of wake_batch recalculation plus the synchronization with sbq_queue_wake_up is non-trivial. This patchset implements the idea originally proposed by Jan [1], which removes the need for the two-step updates of wait_cnt. This is done by tracking the number of completions and wakeups in always increasing, per-bitmap counters. Instead of having to reset the wait_cnt when it reaches zero, we simply keep counting, and attempt to wake up N threads in a single wait queue whenever there is enough space for a batch. Waking up less than batch_wake shouldn't be a problem, because we haven't changed the conditions for wake up, and the existing batch calculation guarantees at least enough remaining completions to wake up a batch for each queue at any time. Performance-wise, one should expect very similar performance to the original algorithm for the case where there is no queueing. In both the old algorithm and this implementation, the first thing is to check ws_active, which bails out if there is no queueing to be managed. In the new code, we took care to avoid accounting completions and wakeups when there is no queueing, to not pay the cost of atomic operations unnecessarily, since it doesn't skew the numbers. For more interesting cases, where there is queueing, we need to take into account the cross-communication of the atomic operations. I've been benchmarking by running parallel fio jobs against a single hctx nullb in different hardware queue depth scenarios, and verifying both IOPS and queueing. Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel jobs. fio was issuing fixed-size randwrites with qd=64 against nullb, varying only the hardware queue length per test. queue size 2 4 8 16 32 64 6.1-rc2 1681.1K (1.6K) 2633.0K (12.7K) 6940.8K (16.3K) 8172.3K (617.5K) 8391.7K (367.1K) 8606.1K (351.2K) patched 1721.8K (15.1K) 3016.7K (3.8K) 7543.0K (89.4K) 8132.5K (303.4K) 8324.2K (230.6K) 8401.8K (284.7K) The following is a similar experiment, ran against a nullb with a single bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40 parallel fio jobs operating on the same device queue size 2 4 8 16 32 64 6.1-rc2 1081.0K (2.3K) 957.2K (1.5K) 1699.1K (5.7K) 6178.2K (124.6K) 12227.9K (37.7K) 13286.6K (92.9K) patched 1081.8K (2.8K) 1316.5K (5.4K) 2364.4K (1.8K) 6151.4K (20.0K) 11893.6K (17.5K) 12385.6K (18.4K) It has also survived blktests and a 12h-stress run against nullb. I also ran the code against nvme and a scsi SSD, and I didn't observe performance regression in those. If there are other tests you think I should run, please let me know and I will follow up with results. [1] https://lore.kernel.org/all/aef9de29-e9f5-259a-f8be-12d1b734e72@google.com/ Cc: Hugh Dickins <hughd@google.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Liu Song <liusong@linux.alibaba.com> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20221105231055.25953-1-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: b5fcf7871acb ("sbitmap: correct wake_batch recalculation to avoid potential IO hung") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10sbitmap: remove redundant check in __sbitmap_queue_get_batchKemeng Shi1-5/+3
[ Upstream commit 903e86f3a64d9573352bbab2f211fdbbaa5772b7 ] Commit fbb564a557809 ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()") mentioned that "Checking free bits when setting the target bits. Otherwise, it may reuse the busying bits." This commit add check to make sure all masked bits in word before cmpxchg is zero. Then the existing check after cmpxchg to check any zero bit is existing in masked bits in word is redundant. Actually, old value of word before cmpxchg is stored in val and we will filter out busy bits in val by "(get_mask & ~val)" after cmpxchg. So we will not reuse busy bits methioned in commit fbb564a557809 ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()"). Revert new-added check to remove redundant check. Fixes: fbb564a55780 ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-3-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-25uaccess: Add speculation barrier to copy_from_user()Dave Hansen1-0/+7
commit 74e19ef0ff8061ef55957c3abd71614ef0f42f47 upstream. The results of "access_ok()" can be mis-speculated. The result is that you can end speculatively: if (access_ok(from, size)) // Right here even for bad from/size combinations. On first glance, it would be ideal to just add a speculation barrier to "access_ok()" so that its results can never be mis-speculated. But there are lots of system calls just doing access_ok() via "copy_to_user()" and friends (example: fstat() and friends). Those are generally not problematic because they do not _consume_ data from userspace other than the pointer. They are also very quick and common system calls that should not be needlessly slowed down. "copy_from_user()" on the other hand uses a user-controller pointer and is frequently followed up with code that might affect caches. Take something like this: if (!copy_from_user(&kernelvar, uptr, size)) do_something_with(kernelvar); If userspace passes in an evil 'uptr' that *actually* points to a kernel addresses, and then do_something_with() has cache (or other) side-effects, it could allow userspace to infer kernel data values. Add a barrier to the common copy_from_user() code to prevent mis-speculated values which happen after the copy. Also add a stub for architectures that do not define barrier_nospec(). This makes the macro usable in generic code. Since the barrier is now usable in generic code, the x86 #ifdef in the BPF code can also go away. Reported-by: Jordy Zomer <jordyzomer@google.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09maple_tree: fix mas_empty_area_rev() lower bound validationLiam Howlett2-9/+97
commit 7327e8111adb315423035fb5233533016dfd3f2e upstream. mas_empty_area_rev() was not correctly validating the start of a gap against the lower limit. This could lead to the range starting lower than the requested minimum. Fix the issue by better validating a gap once one is found. This commit also adds tests to the maple tree test suite for this issue and tests the mas_empty_area() function for similar bound checking. Link: https://lkml.kernel.org/r/20230111200136.1851322-1-Liam.Howlett@oracle.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216911 Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: <amanieu@gmail.com> Link: https://lore.kernel.org/linux-mm/0b9f5425-08d4-8013-aa4c-e620c3b10bb2@leemhuis.info/ Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-09maple_tree: should get pivots boundary by typeWei Yang1-2/+3
[ Upstream commit ab6ef70a8b0d314c2160af70b0de984664d675e0 ] We should get pivots boundary by type. Fixes a potential overindexing of mt_pivots[]. Link: https://lkml.kernel.org/r/20221112234308.23823-1-richard.weiyang@gmail.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01netlink: prevent potential spectre v1 gadgetsEric Dumazet1-0/+3
[ Upstream commit f0950402e8c76e7dcb08563f1b4e8000fbc62455 ] Most netlink attributes are parsed and validated from __nla_validate_parse() or validate_nla() u16 type = nla_type(nla); if (type == 0 || type > maxtype) { /* error or continue */ } @type is then used as an array index and can be used as a Spectre v1 gadget. array_index_nospec() can be used to prevent leaking content of kernel memory to malicious users. This should take care of vast majority of netlink uses, but an audit is needed to take care of others where validation is not yet centralized in core netlink functions. Fixes: bfa83a9e03cf ("[NETLINK]: Type-safe netlink messages/attributes interface") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230119110150.2678537-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01lockref: stop doing cpu_relax in the cmpxchg loopMateusz Guzik1-1/+0
[ Upstream commit f5fe24ef17b5fbe6db49534163e77499fb10ae8c ] On the x86-64 architecture even a failing cmpxchg grants exclusive access to the cacheline, making it preferable to retry the failed op immediately instead of stalling with the pause instruction. To illustrate the impact, below are benchmark results obtained by running various will-it-scale tests on top of the 6.2-rc3 kernel and Cascade Lake (2 sockets * 24 cores * 2 threads) CPU. All results in ops/s. Note there is some variance in re-runs, but the code is consistently faster when contention is present. open3 ("Same file open/close"): proc stock no-pause 1 805603 814942 (+%1) 2 1054980 1054781 (-0%) 8 1544802 1822858 (+18%) 24 1191064 2199665 (+84%) 48 851582 1469860 (+72%) 96 609481 1427170 (+134%) fstat2 ("Same file fstat"): proc stock no-pause 1 3013872 3047636 (+1%) 2 4284687 4400421 (+2%) 8 3257721 5530156 (+69%) 24 2239819 5466127 (+144%) 48 1701072 5256609 (+209%) 96 1269157 6649326 (+423%) Additionally, a kernel with a private patch to help access() scalability: access2 ("Same file access"): proc stock patched patched +nopause 24 2378041 2005501 5370335 (-15% / +125%) That is, fixing the problems in access itself *reduces* scalability after the cacheline ping-pong only happens in lockref with the pause instruction. Note that fstat and access benchmarks are not currently integrated into will-it-scale, but interested parties can find them in pull requests to said project. Code at hand has a rather tortured history. First modification showed up in commit d472d9d98b46 ("lockref: Relax in cmpxchg loop"), written with Itanium in mind. Later it got patched up to use an arch-dependent macro to stop doing it on s390 where it caused a significant regression. Said macro had undergone revisions and was ultimately eliminated later, going back to cpu_relax. While I intended to only remove cpu_relax for x86-64, I got the following comment from Linus: I would actually prefer just removing it entirely and see if somebody else hollers. You have the numbers to prove it hurts on real hardware, and I don't think we have any numbers to the contrary. So I think it's better to trust the numbers and remove it as a failure, than say "let's just remove it on x86-64 and leave everybody else with the potentially broken code" Additionally, Will Deacon (maintainer of the arm64 port, one of the architectures previously benchmarked): So, from the arm64 side of the fence, I'm perfectly happy just removing the cpu_relax() calls from lockref. As such, come back full circle in history and whack it altogether. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=WdA@mail.gmail.com/ Acked-by: Tony Luck <tony.luck@intel.com> # ia64 Acked-by: Nicholas Piggin <npiggin@gmail.com> # powerpc Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-24panic: Consolidate open-coded panic_on_warn checksKees Cook1-2/+1
commit 79cc1ba7badf9e7a12af99695a557e9ce27ee967 upstream. Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll their own warnings, and each check "panic_on_warn". Consolidate this into a single function so that future instrumentation can be added in a single location. Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Gow <davidgow@google.com> Cc: tangmeng <tangmeng@uniontech.com> Cc: Jann Horn <jannh@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-12kunit: alloc_string_stream_fragment error handling bug fixYoungJun.park1-1/+3
[ Upstream commit 93ef83050e597634d2c7dc838a28caf5137b9404 ] When it fails to allocate fragment, it does not free and return error. And check the pointer inappropriately. Fixed merge conflicts with commit 618887768bb7 ("kunit: update NULL vs IS_ERR() tests") Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: YoungJun.park <her0gyugyu@gmail.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-07test_kprobes: Fix implicit declaration error of test_kprobesLi Hua1-0/+1
commit 63a4dc0a0bb0e9bfeb2c88ccda81abdde4cdd6b8 upstream. If KPROBES_SANITY_TEST and ARCH_CORRECT_STACKTRACE_ON_KRETPROBE is enabled, but STACKTRACE is not set. Build failed as below: lib/test_kprobes.c: In function ‘stacktrace_return_handler’: lib/test_kprobes.c:228:8: error: implicit declaration of function ‘stack_trace_save’; did you mean ‘stacktrace_driver’? [-Werror=implicit-function-declaration] ret = stack_trace_save(stack_buf, STACK_BUF_SIZE, 0); ^~~~~~~~~~~~~~~~ stacktrace_driver cc1: all warnings being treated as errors scripts/Makefile.build:250: recipe for target 'lib/test_kprobes.o' failed make[2]: *** [lib/test_kprobes.o] Error 1 To fix this error, Select STACKTRACE if ARCH_CORRECT_STACKTRACE_ON_KRETPROBE is enabled. Link: https://lore.kernel.org/all/20221121030620.63181-1-hucool.lihua@huawei.com/ Fixes: 1f6d3a8f5e39 ("kprobes: Add a test case for stacktrace from kretprobe handler") Cc: stable@vger.kernel.org Signed-off-by: Li Hua <hucool.lihua@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-31maple_tree: fix mas_spanning_rebalance() on insufficient dataLiam Howlett1-1/+3
commit 0abb964aae3da746ea2fd4301599a6fa26da58db upstream. Mike Rapoport contacted me off-list with a regression in running criu. Periodic tests fail with an RCU stall during execution. Although rare, it is possible to hit this with other uses so this patch should be backported to fix the regression. This patchset adds the fix and a test case to the maple tree test suite. This patch (of 2): An insufficient node was causing an out-of-bounds access on the node in mas_leaf_max_gap(). The cause was the faulty detection of the new node being a root node when overwriting many entries at the end of the tree. Fix the detection of a new root and ensure there is sufficient data prior to entering the spanning rebalance loop. Link: https://lkml.kernel.org/r/20221219161922.2708732-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20221219161922.2708732-2-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Mike Rapoport <rppt@kernel.org> Tested-by: Mike Rapoport <rppt@kernel.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-31test_maple_tree: add test for mas_spanning_rebalance() on insufficient dataLiam Howlett1-0/+23
commit c5651b31f51584bd1199b3a552c8211a8523d6e1 upstream. Add a test to the maple tree test suite for the spanning rebalance insufficient node issue does not go undetected again. Link: https://lkml.kernel.org/r/20221219161922.2708732-3-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-31test_firmware: fix memory leak in test_firmware_init()Zhengchao Shao1-0/+1
[ Upstream commit 7610615e8cdb3f6f5bbd9d8e7a5d8a63e3cabf2e ] When misc_register() failed in test_firmware_init(), the memory pointed by test_fw_config->name is not released. The memory leak information is as follows: unreferenced object 0xffff88810a34cb00 (size 32): comm "insmod", pid 7952, jiffies 4294948236 (age 49.060s) hex dump (first 32 bytes): 74 65 73 74 2d 66 69 72 6d 77 61 72 65 2e 62 69 test-firmware.bi 6e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 n............... backtrace: [<ffffffff81b21fcb>] __kmalloc_node_track_caller+0x4b/0xc0 [<ffffffff81affb96>] kstrndup+0x46/0xc0 [<ffffffffa0403a49>] __test_firmware_config_init+0x29/0x380 [test_firmware] [<ffffffffa040f068>] 0xffffffffa040f068 [<ffffffff81002c41>] do_one_initcall+0x141/0x780 [<ffffffff816a72c3>] do_init_module+0x1c3/0x630 [<ffffffff816adb9e>] load_module+0x623e/0x76a0 [<ffffffff816af471>] __do_sys_finit_module+0x181/0x240 [<ffffffff89978f99>] do_syscall_64+0x39/0xb0 [<ffffffff89a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20221119035721.18268-1-shaozhengchao@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31lib/notifier-error-inject: fix error when writing -errno to debugfs fileAkinobu Mita1-1/+1
[ Upstream commit f883c3edd2c432a2931ec8773c70a570115a50fe ] The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"). This restores the previous behaviour by using newly introduced DEFINE_SIMPLE_ATTRIBUTE_SIGNED instead of DEFINE_SIMPLE_ATTRIBUTE. Link: https://lkml.kernel.org/r/20220919172418.45257-3-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31lib/fonts: fix undefined behavior in bit shift for get_default_fontGaosheng Cui1-2/+2
[ Upstream commit 6fe888c4d2fb174408e4540bb2d5602b9f507f90 ] Shifting signed 32-bit value by 31 bits is undefined, so changing significant bit to unsigned. The UBSAN warning calltrace like below: UBSAN: shift-out-of-bounds in lib/fonts/fonts.c:139:20 left shift of 1 by 31 places cannot be represented in type 'int' <TASK> dump_stack_lvl+0x7d/0xa5 dump_stack+0x15/0x1b ubsan_epilogue+0xe/0x4e __ubsan_handle_shift_out_of_bounds+0x1e7/0x20c get_default_font+0x1c7/0x1f0 fbcon_startup+0x347/0x3a0 do_take_over_console+0xce/0x270 do_fbcon_takeover+0xa1/0x170 do_fb_registered+0x2a8/0x340 fbcon_fb_registered+0x47/0xe0 register_framebuffer+0x294/0x4a0 __drm_fb_helper_initial_config_and_unlock+0x43c/0x880 [drm_kms_helper] drm_fb_helper_initial_config+0x52/0x80 [drm_kms_helper] drm_fbdev_client_hotplug+0x156/0x1b0 [drm_kms_helper] drm_fbdev_generic_setup+0xfc/0x290 [drm_kms_helper] bochs_pci_probe+0x6ca/0x772 [bochs] local_pci_probe+0x4d/0xb0 pci_device_probe+0x119/0x320 really_probe+0x181/0x550 __driver_probe_device+0xc6/0x220 driver_probe_device+0x32/0x100 __driver_attach+0x195/0x200 bus_for_each_dev+0xbb/0x120 driver_attach+0x27/0x30 bus_add_driver+0x22e/0x2f0 driver_register+0xa9/0x190 __pci_register_driver+0x90/0xa0 bochs_pci_driver_init+0x52/0x1000 [bochs] do_one_initcall+0x76/0x430 do_init_module+0x61/0x28a load_module+0x1f82/0x2e50 __do_sys_finit_module+0xf8/0x190 __x64_sys_finit_module+0x23/0x30 do_syscall_64+0x58/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Link: https://lkml.kernel.org/r/20221031113829.4183153-1-cuigaosheng1@huawei.com Fixes: c81f717cb9e0 ("fbcon: Fix typo and bogus logic in get_default_font") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31lib/debugobjects: fix stat count and optimize debug_objects_mem_initwuchi1-0/+10
[ Upstream commit eabb7f1ace53e127309407b2b5e74e8199e85270 ] 1. Var debug_objects_allocated tracks valid kmem_cache_alloc calls, so track it in debug_objects_replace_static_objects. Do similar things in object_cpu_offline. 2. In debug_objects_mem_init, there is no need to call function cpuhp_setup_state_nocalls when debug_objects_enabled = 0 (out of memory). Link: https://lkml.kernel.org/r/20220611130634.99741-1-wuchi.zero@gmail.com Fixes: 634d61f45d6f ("debugobjects: Percpu pool lookahead freeing/allocation") Fixes: c4b73aabd098 ("debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done") Signed-off-by: wuchi <wuchi.zero@gmail.com> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-03Merge tag 'mm-hotfixes-stable-2022-12-02' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "15 hotfixes, 11 marked cc:stable. Only three or four of the latter address post-6.0 issues, which is hopefully a sign that things are converging" * tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible" Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths mm/khugepaged: fix GUP-fast interaction by sending IPI mm/khugepaged: take the right locks for page table retraction mm: migrate: fix THP's mapcount on isolation mm: introduce arch_has_hw_nonleaf_pmd_young() mm: add dummy pmd_young() for architectures not having it mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes() tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep" nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing madvise: use zap_page_range_single for madvise dontneed mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
2022-12-02error-injection: Add prompt for function error injectionSteven Rostedt (Google)1-1/+7
The config to be able to inject error codes into any function annotated with ALLOW_ERROR_INJECTION() is enabled when FUNCTION_ERROR_INJECTION is enabled. But unfortunately, this is always enabled on x86 when KPROBES is enabled, and there's no way to turn it off. As kprobes is useful for observability of the kernel, it is useful to have it enabled in production environments. But error injection should be avoided. Add a prompt to the config to allow it to be disabled even when kprobes is enabled, and get rid of the "def_bool y". This is a kernel debug feature (it's in Kconfig.debug), and should have never been something enabled by default. Cc: stable@vger.kernel.org Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-01Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabledLee Jones1-0/+1
When enabled, KASAN enlarges function's stack-frames. Pushing quite a few over the current threshold. This can mainly be seen on 32-bit architectures where the present limit (when !GCC) is a lowly 1024-Bytes. Link: https://lkml.kernel.org/r/20221125120750.3537134-3-lee@kernel.org Signed-off-by: Lee Jones <lee@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-27Merge tag 'char-misc-6.1-rc7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small driver fixes for 6.1-rc7, they include: - build warning fix for the vdso when using new versions of grep - iio driver fixes for reported issues - small nvmem driver fixes - fpga Kconfig fix - interconnect dt binding fix All of these have been in linux-next with no reported issues" * tag 'char-misc-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: lib/vdso: use "grep -E" instead of "egrep" nvmem: lan9662-otp: Change return type of lan9662_otp_wait_flag_clear() nvmem: rmem: Fix return value check in rmem_read() fpga: m10bmc-sec: Fix kconfig dependencies dt-bindings: iio: adc: Remove the property "aspeed,trim-data-valid" iio: adc: aspeed: Remove the trim valid dts property. iio: core: Fix entry not deleted when iio_register_sw_trigger_type() fails iio: accel: bma400: Fix memory leak in bma400_get_steps_reg() iio: light: rpr0521: add missing Kconfig dependencies iio: health: afe4404: Fix oob read in afe4404_[read|write]_raw iio: health: afe4403: Fix oob read in afe4403_read_raw iio: light: apds9960: fix wrong register for gesture gain dt-bindings: interconnect: qcom,msm8998-bwmon: Correct SC7280 CPU compatible
2022-11-23lib/vdso: use "grep -E" instead of "egrep"Greg Kroah-Hartman1-1/+1
The latest version of grep claims the egrep is now obsolete so the build now contains warnings that look like: egrep: warning: egrep is obsolescent; using grep -E fix this up by moving the vdso Makefile to use "grep -E" instead. Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20220920170633.3133829-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-23test_kprobes: fix implicit declaration error of test_kprobesLi Hua1-0/+1
If KPROBES_SANITY_TEST and ARCH_CORRECT_STACKTRACE_ON_KRETPROBE is enabled, but STACKTRACE is not set. Build failed as below: lib/test_kprobes.c: In function `stacktrace_return_handler': lib/test_kprobes.c:228:8: error: implicit declaration of function `stack_trace_save'; did you mean `stacktrace_driver'? [-Werror=implicit-function-declaration] ret = stack_trace_save(stack_buf, STACK_BUF_SIZE, 0); ^~~~~~~~~~~~~~~~ stacktrace_driver cc1: all warnings being treated as errors scripts/Makefile.build:250: recipe for target 'lib/test_kprobes.o' failed make[2]: *** [lib/test_kprobes.o] Error 1 To fix this error, Select STACKTRACE if ARCH_CORRECT_STACKTRACE_ON_KRETPROBE is enabled. Link: https://lkml.kernel.org/r/20221121030620.63181-1-hucool.lihua@huawei.com Fixes: 1f6d3a8f5e39 ("kprobes: Add a test case for stacktrace from kretprobe handler") Signed-off-by: Li Hua <hucool.lihua@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-23mm: fix unexpected changes to {failslab|fail_page_alloc}.attrQi Zheng1-5/+8
When we specify __GFP_NOWARN, we only expect that no warnings will be issued for current caller. But in the __should_failslab() and __should_fail_alloc_page(), the local GFP flags alter the global {failslab|fail_page_alloc}.attr, which is persistent and shared by all tasks. This is not what we expected, let's fix it. [akpm@linux-foundation.org: unexport should_fail_ex()] Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com Fixes: 3f913fc5f974 ("mm: fix missing handler for __GFP_NOWARN") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09maple_tree: don't set a new maximum on the node when not reusing nodesLiam Howlett1-2/+1
In RCU mode, the node limits were being updated to the last pivot which may not be correct and would cause the metadata to be set when it shouldn't. Fix this by not setting a new limit in this case. Link: https://lkml.kernel.org/r/20221107163857.867377-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09maple_tree: fix depth tracking in maple_stateLiam Howlett1-1/+2
It is possible to confuse the depth tracking in the maple state by searching the same node for values. Fix the depth tracking by moving where the depth is incremented closer to where the node changes level. Also change the initial depth setting when using the root node. Link: https://lkml.kernel.org/r/20221107163814.866612-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09kmsan: make sure PREEMPT_RT is offAlexander Potapenko1-0/+1
As pointed out by Peter Zijlstra, __msan_poison_alloca() does not play well with IRQ code when PREEMPT_RT is on, because in that mode even GFP_ATOMIC allocations cannot be performed. Fixing this would require making stackdepot completely lockless, which is quite challenging and may be excessive for the time being. Instead, make sure KMSAN is incompatible with PREEMPT_RT, like other debug configs are. Link: https://lkml.kernel.org/r/20221102110611.1085175-4-glider@google.com Link: https://lore.kernel.org/lkml/20221025221755.3810809-1-glider@google.com/ Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARNAlexander Potapenko1-1/+1
As pointed out by Masahiro Yamada, Kconfig picks up the first default entry which has true 'if' condition. Hence, the previously added check for KMSAN was never used, because it followed the checks for 64BIT and !64BIT. Put KMSAN check before others to ensure it is always applied. Link: https://lkml.kernel.org/r/20221102110611.1085175-3-glider@google.com Link: https://github.com/google/kmsan/issues/89 Link: https://lore.kernel.org/linux-mm/20221024212144.2852069-3-glider@google.com/ Fixes: 921757bc9b61 ("Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default") Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Elver <elver@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09maple_tree: reorganize testing to restore module testingLiam Howlett4-35741/+232
Along the development cycle, the testing code support for module/in-kernel compiles was removed. Restore this functionality by moving any internal API tests to the userspace side, as well as threading tests. Fix the lockdep issues and add a way to reduce memory usage so the tests can complete with KASAN + memleak detection. Make the tests work on 32 bit hosts where possible and detect 32 bit hosts in the radix test suite. [akpm@linux-foundation.org: fix module export] [akpm@linux-foundation.org: fix it some more] [liam.howlett@oracle.com: fix compile warnings on 32bit build in check_find()] Link: https://lkml.kernel.org/r/20221107203816.1260327-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20221028180415.3074673-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09maple_tree: mas_anode_descend() clang-analyzer cleanupLiam Howlett1-6/+4
clang-analyzer reported some Dead Stores in mas_anode_descend(). Upon inspection, there were a few clean ups that would make the code cleaner: The count variable was set from the mt_slots array and then updated but never used again. Just use the array reference directly. Also stop updating the type since it isn't used after the update. Stop setting the gaps pointer to NULL at the start since it is always set before the loop begins. Link: https://lkml.kernel.org/r/20221026151413.4032730-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-09maple_tree: remove pointer to pointer use in mas_alloc_nodes()Liam Howlett1-3/+1
There is a more direct and cleaner way of implementing the same functional code. Remove the confusing and unnecessary use of pointers here. Link: https://lkml.kernel.org/r/20221026151241.4031117-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-03Merge tag 'net-6.1-rc4' of ↵Linus Torvalds1-26/+15
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and netfilter. Current release - regressions: - net: several zerocopy flags fixes - netfilter: fix possible memory leak in nf_nat_init() - openvswitch: add missing .resv_start_op Previous releases - regressions: - neigh: fix null-ptr-deref in neigh_table_clear() - sched: fix use after free in red_enqueue() - dsa: fall back to default tagger if we can't load the one from DT - bluetooth: fix use-after-free in l2cap_conn_del() Previous releases - always broken: - netfilter: netlink notifier might race to release objects - nfc: fix potential memory leak of skb - bluetooth: fix use-after-free caused by l2cap_reassemble_sdu - bluetooth: use skb_put to set length - eth: tun: fix bugs for oversize packet when napi frags enabled - eth: lan966x: fixes for when MTU is changed - eth: dwmac-loongson: fix invalid mdio_node" * tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) vsock: fix possible infinite sleep in vsock_connectible_wait_data() vsock: remove the unused 'wait' in vsock_connectible_recvmsg() ipv6: fix WARNING in ip6_route_net_exit_late() bridge: Fix flushing of dynamic FDB entries net, neigh: Fix null-ptr-deref in neigh_table_clear() net/smc: Fix possible leaked pernet namespace in smc_init() stmmac: dwmac-loongson: fix invalid mdio_node ibmvnic: Free rwi on reset success net: mdio: fix undefined behavior in bit shift for __mdiobus_register Bluetooth: L2CAP: Fix attempting to access uninitialized memory Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect Bluetooth: L2CAP: Fix memory leak in vhci_write Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del() Bluetooth: virtio_bt: Use skb_put to set length Bluetooth: hci_conn: Fix CIS connection dst_type handling Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu netfilter: ipset: enforce documented limit to prevent allocating huge memory isdn: mISDN: netjet: fix wrong check of device registration ...
2022-11-02netlink: introduce bigendian integer typesFlorian Westphal1-26/+15
Jakub reported that the addition of the "network_byte_order" member in struct nla_policy increases size of 32bit platforms. Instead of scraping the bit from elsewhere Johannes suggested to add explicit NLA_BE types instead, so do this here. NLA_POLICY_MAX_BE() macro is removed again, there is no need for it: NLA_POLICY_MAX(NLA_BE.., ..) will do the right thing. NLA_BE64 can be added later. Fixes: 08724ef69907 ("netlink: introduce NLA_POLICY_MAX_BE") Reported-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20221031123407.9158-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-30Merge tag 'mm-hotfixes-stable-2022-10-28' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "Eight fix pre-6.0 bugs and the remainder address issues which were introduced in the 6.1-rc merge cycle, or address issues which aren't considered sufficiently serious to warrant a -stable backport" * tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region lib: maple_tree: remove unneeded initialization in mtree_range_walk() mmap: fix remap_file_pages() regression mm/shmem: ensure proper fallback if page faults mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page() x86: fortify: kmsan: fix KMSAN fortify builds x86: asm: make sure __put_user_size() evaluates pointer once Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default x86/purgatory: disable KMSAN instrumentation mm: kmsan: export kmsan_copy_page_meta() mm: migrate: fix return value if all subpages of THPs are migrated successfully mm/uffd: fix vma check on userfault for wp mm: prep_compound_tail() clear page->private mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs mm/page_isolation: fix clang deadcode warning fs/ext4/super.c: remove unused `deprecated_msg' ipc/msg.c: fix percpu_counter use after free memory tier, sysfs: rename attribute "nodes" to "nodelist" MAINTAINERS: git://github.com -> https://github.com for nilfs2 mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops ...
2022-10-28lib: maple_tree: remove unneeded initialization in mtree_range_walk()Lukas Bulwahn1-2/+2
Before the do-while loop in mtree_range_walk(), the variables next, min, max need to be initialized. The variables last, prev_min and prev_max are set within the loop body before they are eventually used after exiting the loop body. As it is a do-while loop, the loop body is executed at least once, so the variables last, prev_min and prev_max do not need to be initialized before the loop body. Remove unneeded initialization of last and prev_min. The needless initialization was reported by clang-analyzer as Dead Stores. As the compiler already identifies these assignments as unneeded, it optimizes the assignments away. Hence: No functional change. No change in object code. Link: https://lkml.kernel.org/r/20221026120029.12555-2-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by defaultAlexander Potapenko1-1/+2
KMSAN adds a lot of instrumentation to the code, which results in increased stack usage (up to 2048 bytes and more in some cases). It's hard to predict how big the stack frames can be, so we disable the warnings for KMSAN instead. Link: https://lkml.kernel.org/r/20221024212144.2852069-3-glider@google.com Link: https://github.com/google/kmsan/issues/89 Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-27Merge tag 'net-6.1-rc3-2' of ↵Linus Torvalds1-36/+22
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from 802.15.4 (Zigbee et al). Current release - regressions: - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1 Current release - new code bugs: - mptcp: fix abba deadlock on fastopen - eth: stmmac: rk3588: allow multiple gmac controllers in one system Previous releases - regressions: - ip: rework the fix for dflt addr selection for connected nexthop - net: couple more fixes for misinterpreting bits in struct page after the signature was added Previous releases - always broken: - ipv6: ensure sane device mtu in tunnels - openvswitch: switch from WARN to pr_warn on a user-triggerable path - ethtool: eeprom: fix null-deref on genl_info in dump - ieee802154: more return code fixes for corner cases in dgram_sendmsg - mac802154: fix link-quality-indicator recording - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack offload - eth: fec: limit register access on i.MX6UL - eth: bcm4908_enet: update TX stats after actual transmission - can: rcar_canfd: improve IRQ handling for RZ/G2L Misc: - genetlink: piggy back on the newly added resv_op_start to enforce more sanity checks on new commands" * tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) net: enetc: survive memory pressure without crashing kcm: do not sense pfmemalloc status in kcm_sendpage() net: do not sense pfmemalloc status in skb_append_pagefrags() net/mlx5e: Fix macsec sci endianness at rx sa update net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function net/mlx5e: Fix macsec rx security association (SA) update/delete net/mlx5e: Fix macsec coverity issue at rx sa update net/mlx5: Fix crash during sync firmware reset net/mlx5: Update fw fatal reporter state on PCI handlers successful recover net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed net/mlx5e: TC, Reject forwarding from internal port to internal port net/mlx5: Fix possible use-after-free in async command interface net/mlx5: ASO, Create the ASO SQ with the correct timestamp format net/mlx5e: Update restore chain id for slow path packets net/mlx5e: Extend SKB room check to include PTP-SQ net/mlx5: DR, Fix matcher disconnect error flow net/mlx5: Wait for firmware to enable CRS before pci_restore_state net/mlx5e: Do not increment ESN when updating IPsec ESN state netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed ...
2022-10-27Merge tag 'hardening-v6.1-rc3' of ↵Linus Torvalds1-11/+36
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Fix older Clang vs recent overflow KUnit test additions (Nick Desaulniers, Kees Cook) - Fix kern-doc visibility for overflow helpers (Kees Cook) * tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: overflow: Refactor test skips for Clang-specific issues overflow: disable failing tests for older clang versions overflow: Fix kern-doc markup for functions
2022-10-26rhashtable: make test actually randomRolf Eike Beer1-36/+22
The "random rhlist add/delete operations" actually wasn't very random, as all cases tested the same bit. Since the later parts of this loop depend on the first case execute this unconditionally, and then test on different bits for the remaining tests. While at it only request as much random bits as are actually used. Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-26overflow: Refactor test skips for Clang-specific issuesKees Cook1-17/+35
Convert test exclusion into test skipping. This brings the logic for why a test is being skipped into the test itself, instead of having to spread ifdefs around the code. This will make cleanup easier as minimum tests get raised. Drop __maybe_unused so missed tests will be noticed again and clean up whitespace. For example, clang-11 on i386: [15:52:32] ================== overflow (18 subtests) ================== [15:52:32] [PASSED] u8_u8__u8_overflow_test [15:52:32] [PASSED] s8_s8__s8_overflow_test [15:52:32] [PASSED] u16_u16__u16_overflow_test [15:52:32] [PASSED] s16_s16__s16_overflow_test [15:52:32] [PASSED] u32_u32__u32_overflow_test [15:52:32] [PASSED] s32_s32__s32_overflow_test [15:52:32] [SKIPPED] u64_u64__u64_overflow_test [15:52:32] [SKIPPED] s64_s64__s64_overflow_test [15:52:32] [SKIPPED] u32_u32__int_overflow_test [15:52:32] [PASSED] u32_u32__u8_overflow_test [15:52:32] [PASSED] u8_u8__int_overflow_test [15:52:32] [PASSED] int_int__u8_overflow_test [15:52:32] [PASSED] shift_sane_test [15:52:32] [PASSED] shift_overflow_test [15:52:32] [PASSED] shift_truncate_test [15:52:32] [PASSED] shift_nonsense_test [15:52:32] [PASSED] overflow_allocation_test [15:52:32] [PASSED] overflow_size_helpers_test [15:52:32] ==================== [PASSED] overflow ===================== [15:52:32] ============================================================ [15:52:32] Testing complete. Ran 18 tests: passed: 15, skipped: 3 Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: llvm@lists.linux.dev Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20221006230017.1833458-1-keescook@chromium.org
2022-10-26overflow: disable failing tests for older clang versionsNick Desaulniers1-1/+8
Building the overflow kunit tests with clang-11 fails with: $ ./tools/testing/kunit/kunit.py run --arch=arm --make_options LLVM=1 \ overflow ... ld.lld: error: undefined symbol: __mulodi4 ... Clang 11 and earlier generate unwanted libcalls for signed output, unsigned input. Disable these tests for now, but should these become used in the kernel we might consider that as justification for dropping clang-11 support. Keep the clang-11 build alive a little bit longer. Avoid -Wunused-function warnings via __maybe_unused. To test W=1: $ make LLVM=1 -j128 defconfig $ ./scripts/config -e KUNIT -e KUNIT_ALL $ make LLVM=1 -j128 olddefconfig lib/overflow_kunit.o W=1 Link: https://github.com/ClangBuiltLinux/linux/issues/1711 Link: https://github.com/llvm/llvm-project/commit/3203143f1356a4e4e3ada231156fc6da6e1a9f9d Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221006171751.3444575-1-ndesaulniers@google.com
2022-10-19kunit: update NULL vs IS_ERR() testsDan Carpenter2-3/+3
The alloc_string_stream() functions were changed from returning NULL on error to returning error pointers so these caller needs to be updated as well. Fixes: 78b1c6584fce ("kunit: string-stream: Simplify resource use") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-17Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds17-47/+38
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1