summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-10-19io_uring: split slow path from io_queue_sqePavel Begunkov1-4/+11
We don't want the slow path of io_queue_sqe to be inlined, so extract a function from it. text data bss dec hex filename 91950 13986 8 105944 19dd8 ./fs/io_uring.o 91758 13986 8 105752 19d18 ./fs/io_uring.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fb01253911f8fb374268f65b1ba939b54ca6583f.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: remove drain_active check from hot pathPavel Begunkov1-24/+29
req->ctx->active_drain is a bit too expensive, partially because of two dereferences. Do a trick, if we see it set in io_init_req(), set REQ_F_FORCE_ASYNC and it automatically goes through a slower path where we can catch it. It's nearly free to do in io_init_req() because there is already ->restricted check and it's in the same byte of a bitmask. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7e7ddc63c15e8a300833132abb3eb8fd3918aef.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: deduplicate io_queue_sqe() call sitesPavel Begunkov1-11/+9
There are two call sites of io_queue_sqe() in io_submit_sqe(), combine them into one, because io_queue_sqe() is inline and we don't want to bloat binary, and will become even bigger text data bss dec hex filename 92126 13986 8 106120 19e88 ./fs/io_uring.o 91966 13986 8 105960 19de8 ./fs/io_uring.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/506124b8e767f0a4576f7a459f6aea3d13fb4dda.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: don't pass state to io_submit_state_endPavel Begunkov1-3/+5
Submission state and ctx and coupled together, no need to passs Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e22d77a5786ef77e0c49b933ad74bae55cfb6ca6.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: don't pass tail into io_free_batch_listPavel Begunkov1-8/+4
io_free_batch_list() iterates all requests in the passed in list, so we don't really need to know the tail but can keep iterating until meet NULL. Just pass the first node into it and it will be enough. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4a12c84b6d887d980e05f417ba4172d04c64acae.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: inline completion batching helpersPavel Begunkov1-44/+22
We now have a single function for batched put of requests, just inline struct req_batch and all related helpers into it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/595a2917f80dd94288cd7203052c7934f5446580.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: optimise batch completionPavel Begunkov1-28/+10
First, convert rest of iopoll bits to single linked lists, and also replace per-request list_add_tail() with splicing a part of slist. With that, use io_free_batch_list() to put/free requests. The main advantage of it is that it's now the only user of struct req_batch and friends, and so they can be inlined. The main overhead there was per-request call to not-inlined io_req_free_batch(), which is expensive enough. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b37fc6d5954b241e025eead7ab92c6f44a42f229.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: convert iopoll_completed to store_releasePavel Begunkov1-11/+8
Convert explicit barrier around iopoll_completed to smp_load_acquire() and smp_store_release(). Similar on the callback side, but replaces a single smp_rmb() with per-request smp_load_acquire(), neither imply any extra CPU ordering for x86. Use READ_ONCE as usual where it doesn't matter. Use it to move filling CQEs by iopoll earlier, that will be necessary to avoid traversing the list one extra time in the future. Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8bd663cb15efdc72d6247c38ee810964e744a450.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: add a helper for batch freePavel Begunkov1-13/+21
Add a helper io_free_batch_list(), which takes a single linked list and puts/frees all requests from it in an efficient manner. Will be reused later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4fc8306b542c6b1dd1d08e8021ef3bdb0ad15010.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: use single linked list for iopollPavel Begunkov2-27/+30
Use single linked lists for keeping iopoll requests, takes less space, may be faster, but mostly will be of benefit for further patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/314033676b100cd485518c3bc55e1b95a0dcd71f.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: split iopoll loopPavel Begunkov1-13/+19
The main loop of io_do_iopoll() iterates and does ->iopoll() until it meets a first completed request, then it continues from that position and splices requests to pass them through io_iopoll_complete(). Split the loop in two for clearness, iopolling and reaping completed requests from the list. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a7f6fd27a94845e5dc925a47a4a9765a92e514fb.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: replace list with stack for req cachesPavel Begunkov1-27/+24
Replace struct list_head free_list serving for caching requests with singly linked stack, which is faster. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1bc942b82422fb2624b8353bd93aca183a022846.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io-wq: add io_wq_work_node based stackPavel Begunkov1-7/+50
Apart from just using lists (i.e. io_wq_work_list), we also want to have stacks, which are a bit faster, and have some interoperability between them. Add a stack implementation based on io_wq_work_node and some helpers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5d3a412a5ac0d47e0f0499d70d2207d70a68925e.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: remove allocation cache arrayPavel Begunkov1-43/+17
We have several of request allocation layers, remove the last one, which is the submit->reqs array, and always use submit->free_reqs instead. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8547095c35f7a87bab14f6447ecd30a273ed7500.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: use slist for completion batchingPavel Begunkov1-27/+25
Currently we collect requests for completion batching in an array. Replace them with a singly linked list. It's as fast as arrays but doesn't take some much space in ctx, and will be used in future patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a666826f2854d17e9fb9417fb302edfeb750f425.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: make io_do_iopoll return number of reqsPavel Begunkov1-17/+17
Don't pass nr_events pointer around but return directly, it's less expensive than pointer increments. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f771a8153a86f16f12ff4272524e9e549c5de40b.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: force_nonspinPavel Begunkov1-6/+6
We don't really need to pass the number of requests to complete into io_do_iopoll(), a flag whether to enforce non-spin mode is enough. Should be straightforward, maybe except io_iopoll_check(). We pass !min there, because we do never enter with the number of already reaped requests is larger than the specified @min, apart from the first iteration, where nr_events is 0 and so the final check should be identical. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/782b39d1d8ec584eae15bca0a1feb6f0571fe5b8.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: mark having different creds unlikelyPavel Begunkov1-1/+1
Hint the compiler that it's not as likely to have creds different from current attached to a request. The current code generation is far from ideal, hopefully it can help to some compilers to remove duplicated jump tables and so. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e7815251ac4bf5a4a23d298c752f029ae19f3837.1632516769.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: return boolean value for io_alloc_async_dataHao Xu1-1/+1
boolean value is good enough for io_alloc_async_data. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210922101522.9179-1-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: optimise io_req_init() sqe flags checksPavel Begunkov1-12/+17
IOSQE_IO_DRAIN is quite marginal and we don't care too much about IOSQE_BUFFER_SELECT. Save to ifs and hide both of them under SQE_VALID_FLAGS check. Now we first check whether it uses a "safe" subset, i.e. without DRAIN and BUFFER_SELECT, and only if it's not true we test the rest of the flags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dccfb9ab2ab0969a2d8dc59af88fa0ce44eeb1d5.1631703764.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: remove ctx referencing from complete_postPavel Begunkov1-8/+2
Now completions are done from task context, that means that it's either the task itself, task_work or io-wq worker. In all those cases the ctx will be staying alive by mutexing, explicit referencing or req references by iowq. Remove extra ctx pinning from io_req_complete_post(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60a0e96434c16ab4fe587651448290d61ec9a113.1631703756.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: add more uring info to fdinfo for debugHao Xu1-4/+55
Developers may need some uring info to help themselves debug and address issues in production. This includes sqring/cqring head/tail and the detailed sqe/cqe info, which is very useful when an application is hung on a ring. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210913130854.38542-1-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: kill extra wake_up_process in tw addPavel Begunkov1-2/+3
TWA_SIGNAL already wakes the thread, no need in wake_up_process() after it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7e90cf643f633e857443e0c9e72471b221735c50.1631115443.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: dedup CQE flushing non-empty checksPavel Begunkov1-9/+12
We don't do io_submit_flush_completions() when there is no requests enqueued, and every single caller checks for it. Hide that check into the function not forgetting about inlining. That will make it much easier for changing the empty check condition in the future. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7ff8cef5da1b38e8ea648f5aad9a315ddfc7b57.1631115443.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: inline linked part of io_req_find_nextPavel Begunkov1-20/+19
Inline part of __io_req_find_next() that returns a request but doesn't need io_disarm_next(). It's just two places, but makes links a bit faster. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4126d13f23d0e91b39b3558e16bd86cafa7fcef2.1631115443.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: inline io_dismantle_reqPavel Begunkov1-1/+1
io_dismantle_req() is hot, and not _too_ huge. Inline it, there are 3 call sites, which hopefully will turn into 2 in the future. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bdd2dc30716cac270c2403e99bccd6286e4ae201.1631115443.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: kill off ios_leftPavel Begunkov1-7/+4
->ios_left is only used to decide whether to plug or not, kill it to avoid this extra accounting, just use the initial submission number. There is no much difference in regards of enabling plugging, where this one does it in a few more cases, but all major ones should be covered well. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f13993bcf5b477f9a7d52881fc49f9457ea9870a.1631115443.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io-wq: Remove duplicate code in io_workqueue_create()Bixuan Cui1-5/+4
While task_work_add() in io_workqueue_create() is true, then duplicate code is executed: -> clear_bit_unlock(0, &worker->create_state); -> io_worker_release(worker); -> atomic_dec(&acct->nr_running); -> io_worker_ref_put(wq); -> return false; -> clear_bit_unlock(0, &worker->create_state); // back to io_workqueue_create() -> io_worker_release(worker); -> kfree(worker); The io_worker_release() and clear_bit_unlock() are executed twice. Fixes: 3146cba99aa2 ("io-wq: make worker creation resilient against signals") Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Link: https://lore.kernel.org/r/20210911085847.34849-1-cuibixuan@huawei.com Reviwed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: dump sqe contents if issue failsJens Axboe2-0/+63
I recently had to look at a production problem where a request ended up getting the dreaded -EINVAL error on submit. The most used and hence useless of error codes, as it just tells you that something was wrong with your request, but not more than that. Let's dump the full sqe contents if we run into an issue failure, that'll allow easier diagnosing of a wide variety of issues. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: fix too broad elevator check in blk_mq_free_request()Jens Axboe1-1/+1
We added RQF_ELV to tell whether there's an IO scheduler attached, and RQF_ELVPRIV tells us whether there's an IO scheduler with private data attached. Don't check RQF_ELV in blk_mq_free_request(), what we care about here is just if we have scheduler private data attached. This fixes a boot crash Fixes: 2ff0682da6e0 ("block: store elevator state in request") Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19mmc: winbond: don't build on M68KRandy Dunlap1-1/+1
The Winbond MMC driver fails to build on ARCH=m68k so prevent that build config. Silences these build errors: ../drivers/mmc/host/wbsd.c: In function 'wbsd_request_end': ../drivers/mmc/host/wbsd.c:212:28: error: implicit declaration of function 'claim_dma_lock' [-Werror=implicit-function-declaration] 212 | dmaflags = claim_dma_lock(); ../drivers/mmc/host/wbsd.c:215:17: error: implicit declaration of function 'release_dma_lock'; did you mean 'release_task'? [-Werror=implicit-function-declaration] 215 | release_dma_lock(dmaflags); Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pierre Ossman <pierre@ossman.eu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20211017175949.23838-1-rdunlap@infradead.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-19docs: fs: locks.rst: update comment about mandatory file lockingMauro Carvalho Chehab1-12/+5
The mandatory file locking got removed due to its problems, but the fs locks documentation still points to it. Update the text there, informing that it was removed on Kernel 5.14. Fixes: f7e33bdbd6d1 ("fs: remove mandatory file locking support") Reported-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-10-19mmc: sdhci-esdhc-imx: clear the buffer_read_ready to reset standard tuning ↵Haibo Chen1-0/+16
circuit To reset standard tuning circuit completely, after clear ESDHC_MIX_CTRL_EXE_TUNE, also need to clear bit buffer_read_ready, this operation will finally clear the USDHC IP internal logic flag execute_tuning_with_clr_buf, make sure the following normal data transfer will not be impacted by standard tuning logic used before. Find this issue when do quick SD card insert/remove stress test. During standard tuning prodedure, if remove SD card, USDHC standard tuning logic can't clear the internal flag execute_tuning_with_clr_buf. Next time when insert SD card, all data related commands can't get any data related interrupts, include data transfer complete interrupt, data timeout interrupt, data CRC interrupt, data end bit interrupt. Always trigger software timeout issue. Even reset the USDHC through bits in register SYS_CTRL (0x2C, bit28 reset tuning, bit26 reset data, bit 25 reset command, bit 24 reset all) can't recover this. From the user's point of view, USDHC stuck, SD can't be recognized any more. Fixes: d9370424c948 ("mmc: sdhci-esdhc-imx: reset tuning circuit when power on mmc card") Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1634263236-6111-1-git-send-email-haibo.chen@nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-19ARM: 9141/1: only warn about XIP address when not compile testingArnd Bergmann1-1/+1
In randconfig builds, we sometimes come across this warning: arm-linux-gnueabi-ld: XIP start address may cause MPU programming issues While this is helpful for actual systems to figure out why it fails, the warning does not provide any benefit for build testing, so guard it in a check for CONFIG_COMPILE_TEST, which is usually set on randconfig builds. Fixes: 216218308cfb ("ARM: 8713/1: NOMMU: Support MPU in XIP configuration") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-19ARM: 9139/1: kprobes: fix arch_init_kprobes() prototypeArnd Bergmann1-1/+1
With extra warnings enabled, gcc complains about this function definition: arch/arm/probes/kprobes/core.c: In function 'arch_init_kprobes': arch/arm/probes/kprobes/core.c:465:12: warning: old-style function definition [-Wold-style-definition] 465 | int __init arch_init_kprobes() Link: https://lore.kernel.org/all/20201027093057.c685a14b386acacb3c449e3d@kernel.org/ Fixes: 24ba613c9d6c ("ARM kprobes: core code") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-19ARM: 9138/1: fix link warning with XIP + frame-pointerArnd Bergmann1-0/+4
When frame pointers are used instead of the ARM unwinder, and the kernel is built using clang with an external assembler and CONFIG_XIP_KERNEL, every file produces two warnings like: arm-linux-gnueabi-ld: warning: orphan section `.ARM.extab' from `net/mac802154/util.o' being placed in section `.ARM.extab' arm-linux-gnueabi-ld: warning: orphan section `.ARM.exidx' from `net/mac802154/util.o' being placed in section `.ARM.exidx' The same fix was already merged for the normal (non-XIP) linker script, with a longer description. Fixes: c39866f268f8 ("arm/build: Always handle .ARM.exidx and .ARM.extab sections") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-19ARM: 9134/1: remove duplicate memcpy() definitionArnd Bergmann1-0/+3
Both the decompressor code and the kasan logic try to override the memcpy() and memmove() definitions, which leading to a clash in a KASAN-enabled kernel with XZ decompression: arch/arm/boot/compressed/decompress.c:50:9: error: 'memmove' macro redefined [-Werror,-Wmacro-redefined] #define memmove memmove ^ arch/arm/include/asm/string.h:59:9: note: previous definition is here #define memmove(dst, src, len) __memmove(dst, src, len) ^ arch/arm/boot/compressed/decompress.c:51:9: error: 'memcpy' macro redefined [-Werror,-Wmacro-redefined] #define memcpy memcpy ^ arch/arm/include/asm/string.h:58:9: note: previous definition is here #define memcpy(dst, src, len) __memcpy(dst, src, len) ^ Here we want the set of functions from the decompressor, so undefine the other macros before the override. Link: https://lore.kernel.org/linux-arm-kernel/CACRpkdZYJogU_SN3H9oeVq=zJkRgRT1gDz3xp59gdqWXxw-B=w@mail.gmail.com/ Link: https://lore.kernel.org/lkml/202105091112.F5rmd4By-lkp@intel.com/ Fixes: d6d51a96c7d6 ("ARM: 9014/2: Replace string mem* functions for KASan") Fixes: a7f464f3db93 ("ARM: 7001/2: Wire up support for the XZ decompressor") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-19ARM: 9133/1: mm: proc-macros: ensure *_tlb_fns are 4B alignedNick Desaulniers1-0/+1
A kernel built with CONFIG_THUMB2_KERNEL=y and using clang as the assembler could generate non-naturally-aligned v7wbi_tlb_fns which results in a boot failure. The original commit adding the macro missed the .align directive on this data. Link: https://github.com/ClangBuiltLinux/linux/issues/1447 Link: https://lore.kernel.org/all/0699da7b-354f-aecc-a62f-e25693209af4@linaro.org/ Debugged-by: Ard Biesheuvel <ardb@kernel.org> Debugged-by: Nathan Chancellor <nathan@kernel.org> Debugged-by: Richard Henderson <richard.henderson@linaro.org> Fixes: 66a625a88174 ("ARM: mm: proc-macros: Add generic proc/cache/tlb struct definition macros") Suggested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-19ARM: 9132/1: Fix __get_user_check failure with ARM KASAN imagesLexi Shao1-1/+3
ARM: kasan: Fix __get_user_check failure with kasan In macro __get_user_check defined in arch/arm/include/asm/uaccess.h, error code is store in register int __e(r0). When kasan is enabled, assigning value to kernel address might trigger kasan check, which unexpectedly overwrites r0 and causes undefined behavior on arm kasan images. One example is failure in do_futex and results in process soft lockup. Log: watchdog: BUG: soft lockup - CPU#0 stuck for 62946ms! [rs:main Q:Reg:1151] ... (__asan_store4) from (futex_wait_setup+0xf8/0x2b4) (futex_wait_setup) from (futex_wait+0x138/0x394) (futex_wait) from (do_futex+0x164/0xe40) (do_futex) from (sys_futex_time32+0x178/0x230) (sys_futex_time32) from (ret_fast_syscall+0x0/0x50) The soft lockup happens in function futex_wait_setup. The reason is function get_futex_value_locked always return EINVAL, thus pc jump back to retry label and causes looping. This line in function get_futex_value_locked ret = __get_user(*dest, from); is expanded to *dest = (typeof(*(p))) __r2; , in macro __get_user_check. Writing to pointer dest triggers kasan check and overwrites the return value of __get_user_x function. The assembly code of get_futex_value_locked in kernel/futex.c: ... c01f6dc8: eb0b020e bl c04b7608 <__get_user_4> // "x = (typeof(*(p))) __r2;" triggers kasan check and r0 is overwritten c01f6dCc: e1a00007 mov r0, r7 c01f6dd0: e1a05002 mov r5, r2 c01f6dd4: eb04f1e6 bl c0333574 <__asan_store4> c01f6dd8: e5875000 str r5, [r7] // save ret value of __get_user(*dest, from), which is dest address now c01f6ddc: e1a05000 mov r5, r0 ... // checking return value of __get_user failed c01f6e00: e3550000 cmp r5, #0 ... c01f6e0c: 01a00005 moveq r0, r5 // assign return value to EINVAL c01f6e10: 13e0000d mvnne r0, #13 Return value is the destination address of get_user thus certainly non-zero, so get_futex_value_locked always return EINVAL. Fix it by using a tmp vairable to store the error code before the assignment. This fix has no effects to non-kasan images thanks to compiler optimization. It only affects cases that overwrite r0 due to kasan check. This should fix bug discussed in Link: [1] https://lore.kernel.org/linux-arm-kernel/0ef7c2a5-5d8b-c5e0-63fa-31693fd4495c@gmail.com/ Fixes: 421015713b30 ("ARM: 9017/2: Enable KASan for ARM") Signed-off-by: Lexi Shao <shaolexi@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-19ARM: 9125/1: fix incorrect use of get_kernel_nofault()Ard Biesheuvel1-1/+1
Commit 344179fc7ef4 ("ARM: 9106/1: traps: use get_kernel_nofault instead of set_fs()") replaced an occurrence of __get_user() with get_kernel_nofault(), but inverted the sense of the conditional in the process, resulting in no values to be printed at all. I.e., every exception stack now looks like this: Exception stack(0xc18d1fb0 to 0xc18d1ff8) 1fa0: ???????? ???????? ???????? ???????? 1fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 1fe0: ???????? ???????? ???????? ???????? ???????? ???????? which is rather unhelpful. Fixes: 344179fc7ef4 ("ARM: 9106/1: traps: use get_kernel_nofault instead of set_fs()") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-19ARM: 9122/1: select HAVE_FUTEX_CMPXCHGNick Desaulniers1-0/+1
tglx notes: This function [futex_detect_cmpxchg] is only needed when an architecture has to runtime discover whether the CPU supports it or not. ARM has unconditional support for this, so the obvious thing to do is the below. Fixes linkage failure from Clang randconfigs: kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text' and boot failures for CONFIG_THUMB2_KERNEL. Link: https://github.com/ClangBuiltLinux/linux/issues/325 Comments from Nick Desaulniers: See-also: 03b8c7b623c8 ("futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test") Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Cc: stable@vger.kernel.org # v3.14+ Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-19drm/i915: Remove memory frequency calculationJosé Roberto de Souza2-36/+2
This memory frequency calculated is only used to check if it is zero, what is not useful as it will never actually be zero. Also the calculation is wrong, we should be checking other bit to select the appropriate frequency multiplier while this code is stuck with a fixed multiplier. So here dropping it as whole. v2: - Also remove memory frequency calculation for gen9 LP platforms Cc: Yakui Zhao <yakui.zhao@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Fixes: 5d0c938ec9cc ("drm/i915/gen11+: Only load DRAM information from pcode") Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211013010046.91858-1-jose.souza@intel.com (cherry picked from commit 83f52364b15265aec47d07e02b0fbf4093ab8554) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-10-19ceph: fix handling of "meta" errorsJeff Layton5-27/+8
Currently, we check the wb_err too early for directories, before all of the unsafe child requests have been waited on. In order to fix that we need to check the mapping->wb_err later nearer to the end of ceph_fsync. We also have an overly-complex method for tracking errors after blocklisting. The errors recorded in cleanup_session_requests go to a completely separate field in the inode, but we end up reporting them the same way we would for any other error (in fsync). There's no real benefit to tracking these errors in two different places, since the only reporting mechanism for them is in fsync, and we'd need to advance them both every time. Given that, we can just remove i_meta_err, and convert the places that used it to instead just use mapping->wb_err instead. That also fixes the original problem by ensuring that we do a check_and_advance of the wb_err at the end of the fsync op. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52864 Reported-by: Patrick Donnelly <pdonnell@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-10-19ceph: skip existing superblocks that are blocklisted or shut down when mountingJeff Layton1-3/+14
Currently when mounting, we may end up finding an existing superblock that corresponds to a blocklisted MDS client. This means that the new mount ends up being unusable. If we've found an existing superblock with a client that is already blocklisted, and the client is not configured to recover on its own, fail the match. Ditto if the superblock has been forcibly unmounted. While we're in here, also rename "other" to the more conventional "fsc". Cc: stable@vger.kernel.org URL: https://bugzilla.redhat.com/show_bug.cgi?id=1901499 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-10-19can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX pathMarc Kleine-Budde1-0/+3
When the a large chunk of data send and the receiver does not send a Flow Control frame back in time, the sendmsg() does not return a error code, but the number of bytes sent corresponding to the size of the packet. If a timeout occurs the isotp_tx_timer_handler() is fired, sets sk->sk_err and calls the sk->sk_error_report() function. It was wrongly expected that the error would be propagated to user space in every case. For isotp_sendmsg() blocking on wait_event_interruptible() this is not the case. This patch fixes the problem by checking if sk->sk_err is set and returning the error to user space. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://github.com/hartkopp/can-isotp/issues/42 Link: https://github.com/hartkopp/can-isotp/pull/43 Link: https://lore.kernel.org/all/20210507091839.1366379-1-mkl@pengutronix.de Cc: stable@vger.kernel.org Reported-by: Sottas Guillaume (LMB) <Guillaume.Sottas@liebherr.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-19mailmap: add Andrej ShaduraAndrej Shadura1-0/+2
Add a mapping for my old work email for BelDisplayTech to the personal email, and make sure the Collabora email has the correct spelling of the first name. Link: https://lkml.kernel.org/r/20210917091016.30232-1-andrew.shadura@collabora.co.uk Signed-off-by: Andrej Shadura <andrew.shadura@collabora.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-19mm/thp: decrease nr_thps in file's mapping on THP splitMarek Szyprowski1-2/+4
Decrease nr_thps counter in file's mapping to ensure that the page cache won't be dropped excessively on file write access if page has been already split. I've tried a test scenario running a big binary, kernel remaps it with THPs, then force a THP split with /sys/kernel/debug/split_huge_pages. During any further open of that binary with O_RDWR or O_WRITEONLY kernel drops page cache for it, because of non-zero thps counter. Link: https://lkml.kernel.org/r/20211012120237.2600-1-m.szyprowski@samsung.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: 09d91cda0e82 ("mm,thp: avoid writes to file with THP in pagecache") Fixes: 06d3eff62d9d ("mm/thp: fix node page state in split_huge_page_to_list()") Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: <sfoon.kim@samsung.com> Cc: Song Liu <songliubraving@fb.com> Cc: Rik van Riel <riel@surriel.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-19mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem()Sean Christopherson1-1/+1
Check for a NULL page->mapping before dereferencing the mapping in page_is_secretmem(), as the page's mapping can be nullified while gup() is running, e.g. by reclaim or truncation. BUG: kernel NULL pointer dereference, address: 0000000000000068 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 6 PID: 4173897 Comm: CPU 3/KVM Tainted: G W RIP: 0010:internal_get_user_pages_fast+0x621/0x9d0 Code: <48> 81 7a 68 80 08 04 bc 0f 85 21 ff ff 8 89 c7 be RSP: 0018:ffffaa90087679b0 EFLAGS: 00010046 RAX: ffffe3f37905b900 RBX: 00007f2dd561e000 RCX: ffffe3f37905b934 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe3f37905b900 ... CR2: 0000000000000068 CR3: 00000004c5898003 CR4: 00000000001726e0 Call Trace: get_user_pages_fast_only+0x13/0x20 hva_to_pfn+0xa9/0x3e0 try_async_pf+0xa1/0x270 direct_page_fault+0x113/0xad0 kvm_mmu_page_fault+0x69/0x680 vmx_handle_exit+0xe1/0x5d0 kvm_arch_vcpu_ioctl_run+0xd81/0x1c70 kvm_vcpu_ioctl+0x267/0x670 __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x56/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lkml.kernel.org/r/20211007231502.3552715-1-seanjc@google.com Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: Sean Christopherson <seanjc@google.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Stephen <stephenackerman16@gmail.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-19vfs: check fd has read access in kernel_read_file_from_fd()Matthew Wilcox (Oracle)1-1/+1
If we open a file without read access and then pass the fd to a syscall whose implementation calls kernel_read_file_from_fd(), we get a warning from __kernel_read(): if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ))) This currently affects both finit_module() and kexec_file_load(), but it could affect other syscalls in the future. Link: https://lkml.kernel.org/r/20211007220110.600005-1-willy@infradead.org Fixes: b844f0ecbc56 ("vfs: define kernel_copy_file_from_fd()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Hao Sun <sunhao.th@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-19elfcore: correct reference to CONFIG_UMLLukas Bulwahn1-1/+1
Commit 6e7b64b9dd6d ("elfcore: fix building with clang") introduces special handling for two architectures, ia64 and User Mode Linux. However, the wrong name, i.e., CONFIG_UM, for the intended Kconfig symbol for User-Mode Linux was used. Although the directory for User Mode Linux is ./arch/um; the Kconfig symbol for this architecture is called CONFIG_UML. Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs: UM Referencing files: include/linux/elfcore.h Similar symbols: UML, NUMA Correct the name of the config to the intended one. [akpm@linux-foundation.org: fix um/x86_64, per Catalin] Link: https://lkml.kernel.org/r/20211006181119.2851441-1-catalin.marinas@arm.com Link: https://lkml.kernel.org/r/YV6pejGzLy5ppEpt@arm.com Link: https://lkml.kernel.org/r/20211006082209.417-1-lukas.bulwahn@gmail.com Fixes: 6e7b64b9dd6d ("elfcore: fix building with clang") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Barret Rhoden <brho@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>