summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-05-08xfs: hoist setting of XFS_LI_RECOVERED to callerDarrick J. Wong5-19/+2
The only purpose of XFS_LI_RECOVERED is to prevent log recovery from trying to replay recovered intents more than once. Therefore, we can move the bit setting up to the ->iop_recover caller. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor intent item iop_recover callsDarrick J. Wong5-137/+54
Now that we've made the recovered item tests all the same, we can hoist the test and the ail locking code to the ->iop_recover caller and call the recovery function directly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor intent item RECOVERED flag into the log itemDarrick J. Wong9-42/+20
Rename XFS_{EFI,BUI,RUI,CUI}_RECOVERED to XFS_LI_RECOVERED so that we track recovery status in the log item, then get rid of the now unused flags fields in each of those log item types. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor adding recovered intent items to the logDarrick J. Wong6-28/+26
During recovery, every intent that we recover from the log has to be added to the AIL. Replace the open-coded addition with a helper. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2020-05-08xfs: refactor releasing finished intents during log recoveryDarrick J. Wong7-129/+78
Replace the open-coded AIL item walking with a proper helper when we're trying to release an intent item that has been finished. We add a new ->iop_match method to decide if an intent item matches a supplied ID. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor xlog_item_is_intent now that we're done convertingDarrick J. Wong1-14/+6
Now that we've finished converting all types of log intent items to provide an ->iop_recover function, we can convert the "is this an intent item?" predicate to look for a non-null iop_recover pointer. Move the predicate closer to the functions that use it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor recovered BUI log item playbackDarrick J. Wong3-73/+41
Move the code that processes the log items created from the recovered log items into the per-item source code files and use dispatch functions to call them. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor recovered CUI log item playbackDarrick J. Wong3-58/+35
Move the code that processes the log items created from the recovered log items into the per-item source code files and use dispatch functions to call them. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor recovered RUI log item playbackDarrick J. Wong3-58/+35
Move the code that processes the log items created from the recovered log items into the per-item source code files and use dispatch functions to call them. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor recovered EFI log item playbackDarrick J. Wong4-61/+45
Move the code that processes the log items created from the recovered log items into the per-item source code files and use dispatch functions to call them. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: remove log recovery quotaoff item dispatch for pass2 commit functionsDarrick J. Wong2-27/+7
Quotaoff doesn't actually do anything, so take advantage of the commit_pass2 pointer being optional and get rid of the switch statement clause. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery BUI item dispatch for pass2 commit functionsDarrick J. Wong3-132/+131
Move the bmap update intent and intent-done pass2 commit code into the per-item source code files and use dispatch functions to call them. We do these one at a time because there's a lot of code to move. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery CUI item dispatch for pass2 commit functionsDarrick J. Wong3-128/+127
Move the refcount update intent and intent-done pass2 commit code into the per-item source code files and use dispatch functions to call them. We do these one at a time because there's a lot of code to move. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery RUI item dispatch for pass2 commit functionsDarrick J. Wong3-104/+101
Move the rmap update intent and intent-done pass2 commit code into the per-item source code files and use dispatch functions to call them. We do these one at a time because there's a lot of code to move. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery EFI item dispatch for pass2 commit functionsDarrick J. Wong3-107/+104
Move the extent free intent and intent-done pass2 commit code into the per-item source code files and use dispatch functions to call them. We do these one at a time because there's a lot of code to move. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery icreate item dispatch for pass2 commit functionsDarrick J. Wong2-126/+132
Move the log icreate item pass2 commit code into the per-item source code files and use the dispatch function to call it. We do these one at a time because there's a lot of code to move. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery dquot item dispatch for pass2 commit functionsDarrick J. Wong2-112/+109
Move the log dquot item pass2 commit code into the per-item source code files and use the dispatch function to call it. We do these one at a time because there's a lot of code to move. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery inode item dispatch for pass2 commit functionsDarrick J. Wong2-355/+355
Move the log inode item pass2 commit code into the per-item source code files and use the dispatch function to call it. We do these one at a time because there's a lot of code to move. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery buffer item dispatch for pass2 commit functionsDarrick J. Wong3-791/+820
Move the log buffer item pass2 commit code into the per-item source code files and use the dispatch function to call it. We do these one at a time because there's a lot of code to move. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery item dispatch for pass1 commit functionsDarrick J. Wong4-89/+64
Move the pass1 commit code into the per-item source code files and use the dispatch function to call them. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery item dispatch for pass2 readhead functionsDarrick J. Wong5-92/+73
Move the pass2 readhead code into the per-item source code files and use the dispatch function to call them. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: refactor log recovery item sorting into a generic dispatch structureDarrick J. Wong11-39/+265
Create a generic dispatch structure to delegate recovery of different log item types into various code modules. This will enable us to move code specific to a particular log item type out of xfs_log_recover.c and into the log item source. The first operation we virtualize is the log item sorting. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08xfs: convert xfs_log_recover_item_t to struct xfs_log_recover_itemDarrick J. Wong2-14/+16
Remove the old typedefs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-28/+33
Merge misc fixes from Andrew Morton: "14 fixes and one selftest to verify the ipc fixes herein" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: limit boost_watermark on small zones ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST mm/vmscan: remove unnecessary argument description of isolate_lru_pages() epoll: atomically remove wait entry on wake up kselftests: introduce new epoll60 testcase for catching lost wakeups percpu: make pcpu_alloc() aware of current gfp context mm/slub: fix incorrect interpretation of s->offset scripts/gdb: repair rb_first() and rb_last() eventpoll: fix missing wakeup for ovflist in ep_poll_callback arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() scripts/decodecode: fix trapping instruction formatting kernel/kcov.c: fix typos in kcov_remote_start documentation mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() mm, memcg: fix error return value of mem_cgroup_css_alloc() ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
2020-05-08f2fs: introduce f2fs_bmap_compress()Chao Yu1-0/+34
to support bmap() on compressed inode: if queried block locates in non-compressed cluster, return its physical block address. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-08f2fs: support fiemap on compressed inodeChao Yu1-2/+53
Map normal/compressed cluster of compressed inode correctly, and give the right fiemap flag FIEMAP_EXTENT_ENCODED on mapped compressed extent. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-08f2fs: support partial truncation on compressed inodeChao Yu3-5/+65
Supports to truncate compressed/normal cluster partially on compressed inode. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-08f2fs: remove redundant compress inode checkChao Yu1-2/+0
due to f2fs_post_read_required() has did that. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-08f2fs: use strcmp() in parse_options()Eric Biggers1-28/+16
Remove the pointless string length checks. Just use strcmp(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-08f2fs: Use the correct style for SPDX License IdentifierNishad Kamdar7-7/+7
This patch corrects the SPDX License Identifier style in header files related to F2FS File System support. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used). Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-08gfs2: More gfs2_find_jhead fixesAndreas Gruenbacher1-7/+12
It turns out that when extending an existing bio, gfs2_find_jhead fails to check if the block number is consecutive, which leads to incorrect reads for fragmented journals. In addition, limit the maximum bio size to an arbitrary value of 2 megabytes: since commit 07173c3ec276 ("block: enable multipage bvecs"), if we just keep adding pages until bio_add_page fails, bios will grow much larger than useful, which pins more memory than necessary with barely any additional performance gains. Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-05-08gfs2: Another gfs2_walk_metadata fixAndreas Gruenbacher1-7/+9
Make sure we don't walk past the end of the metadata in gfs2_walk_metadata: the inode holds fewer pointers than indirect blocks. Slightly clean up gfs2_iomap_get. Fixes: a27a0c9b6a20 ("gfs2: gfs2_walk_metadata fix") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-05-08gfs2: Fix use-after-free in gfs2_logd after withdrawBob Peterson1-0/+5
When the gfs2_logd daemon withdrew, the withdraw sequence called into make_fs_ro() to make the file system read-only. That caused the journal descriptors to be freed. However, those journal descriptors were used by gfs2_logd's call to gfs2_ail_flush_reqd(). This caused a use-after free and NULL pointer dereference. This patch changes function gfs2_logd() so that it stops all logd work until the thread is told to stop. Once a withdraw is done, it only does an interruptible sleep. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-08gfs2: Fix BUG during unmount after file system withdrawBob Peterson1-4/+6
Before this patch, when the logd daemon was forced to withdraw, it would try to request its journal be recovered by another cluster node. However, in single-user cases with lock_nolock, there are no other nodes to recover the journal. Function signal_our_withdraw() was recognizing the lock_nolock situation, but not until after it had evicted its journal inode. Since the journal descriptor that points to the inode was never removed from the master list, when the unmount occurred, it did another iput on the evicted inode, which resulted in a BUG_ON(inode->i_state & I_CLEAR). This patch moves the check for this situation earlier in function signal_our_withdraw(), which avoids the extra iput, so the unmount may happen normally. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-08gfs2: Fix error exit in do_xmoteBob Peterson1-1/+1
Before this patch, if an error was detected from glock function go_sync by function do_xmote, it would return. But the function had temporarily unlocked the gl_lockref spin_lock, and it never re-locked it. When the caller of do_xmote tried to unlock it again, it was already unlocked, which resulted in a corrupted spin_lock value. This patch makes sure the gl_lockref spin_lock is re-locked after it is unlocked. Thanks to Wu Bo <wubo40@huawei.com> for reporting this problem. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-08fanotify: Replace zero-length array with flexible-arrayGustavo A. R. Silva1-1/+1
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Link: https://lore.kernel.org/r/20200507185230.GA14229@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-08crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.hEric Biggers1-1/+0
<linux/cryptohash.h> sounds very generic and important, like it's the header to include if you're doing cryptographic hashing in the kernel. But actually it only includes the library implementation of the SHA-1 compression function (not even the full SHA-1). This should basically never be used anymore; SHA-1 is no longer considered secure, and there are much better ways to do cryptographic hashing in the kernel. Most files that include this header don't actually need it. So in preparation for removing it, remove all these unneeded includes of it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08ubifs: use crypto_shash_tfm_digest()Eric Biggers3-34/+9
Instead of manually allocating a 'struct shash_desc' on the stack and calling crypto_shash_digest(), switch to using the new helper function crypto_shash_tfm_digest() which does this for us. Cc: linux-mtd@lists.infradead.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08nfsd: use crypto_shash_tfm_digest()Eric Biggers1-20/+6
Instead of manually allocating a 'struct shash_desc' on the stack and calling crypto_shash_digest(), switch to using the new helper function crypto_shash_tfm_digest() which does this for us. Cc: linux-nfs@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08ecryptfs: use crypto_shash_tfm_digest()Eric Biggers1-16/+1
Instead of manually allocating a 'struct shash_desc' on the stack and calling crypto_shash_digest(), switch to using the new helper function crypto_shash_tfm_digest() which does this for us. Cc: ecryptfs@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08fscrypt: use crypto_shash_tfm_digest()Eric Biggers2-11/+2
Instead of manually allocating a 'struct shash_desc' on the stack and calling crypto_shash_digest(), switch to using the new helper function crypto_shash_tfm_digest() which does this for us. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08epoll: atomically remove wait entry on wake upRoman Penyaev1-19/+24
This patch does two things: - fixes a lost wakeup introduced by commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") - improves performance for events delivery. The description of the problem is the following: if N (>1) threads are waiting on ep->wq for new events and M (>1) events come, it is quite likely that >1 wakeups hit the same wait queue entry, because there is quite a big window between __add_wait_queue_exclusive() and the following __remove_wait_queue() calls in ep_poll() function. This can lead to lost wakeups, because thread, which was woken up, can handle not all the events in ->rdllist. (in better words the problem is described here: https://lkml.org/lkml/2019/10/7/905) The idea of the current patch is to use init_wait() instead of init_waitqueue_entry(). Internally init_wait() sets autoremove_wake_function as a callback, which removes the wait entry atomically (under the wq locks) from the list, thus the next coming wakeup hits the next wait entry in the wait queue, thus preventing lost wakeups. Problem is very well reproduced by the epoll60 test case [1]. Wait entry removal on wakeup has also performance benefits, because there is no need to take a ep->lock and remove wait entry from the queue after the successful wakeup. Here is the timing output of the epoll60 test case: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): real 0m6.970s user 0m49.786s sys 0m0.113s After this patch: real 0m5.220s user 0m36.879s sys 0m0.019s The other testcase is the stress-epoll [2], where one thread consumes all the events and other threads produce many events: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): threads events/ms run-time ms 8 5427 1474 16 6163 2596 32 6824 4689 64 7060 9064 128 6991 18309 After this patch: threads events/ms run-time ms 8 5598 1429 16 7073 2262 32 7502 4265 64 7640 8376 128 7634 16767 (number of "events/ms" represents event bandwidth, thus higher is better; number of "run-time ms" represents overall time spent doing the benchmark, thus lower is better) [1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c [2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jason Baron <jbaron@akamai.com> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Heiher <r@hev.cc> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200430130326.1368509-2-rpenyaev@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-08eventpoll: fix missing wakeup for ovflist in ep_poll_callbackKhazhismel Kumykov1-9/+9
In the event that we add to ovflist, before commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") we would be woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback. With that wakeup removed, if we add to ovflist here, we may never wake up. Rather than adding back the ep_scan_ready_list wakeup - which was resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback. We noticed that one of our workloads was missing wakeups starting with 339ddb53d373 and upon manual inspection, this wakeup seemed missing to me. With this patch added, we no longer see missing wakeups. I haven't yet tried to make a small reproducer, but the existing kselftests in filesystem/epoll passed for me with this patch. [khazhy@google.com: use if/elif instead of goto + cleanup suggested by Roman] Link: http://lkml.kernel.org/r/20200424190039.192373-1-khazhy@google.com Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Roman Penyaev <rpenyaev@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: Heiher <r@hev.cc> Cc: Jason Baron <jbaron@akamai.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200424025057.118641-1-khazhy@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-08exec: Rename flush_old_exec begin_new_execEric W. Biederman5-6/+6
There is and has been for a very long time been a lot more going on in flush_old_exec than just flushing the old state. After the movement of code from setup_new_exec there is a whole lot more going on than just flushing the old executables state. Rename flush_old_exec to begin_new_exec to more accurately reflect what this function does. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-08exec: Move most of setup_new_exec into flush_old_execEric W. Biederman1-41/+44
The current idiom for the callers is: flush_old_exec(bprm); set_personality(...); setup_new_exec(bprm); In 2010 Linus split flush_old_exec into flush_old_exec and setup_new_exec. With the intention that setup_new_exec be what is called after the processes new personality is set. Move the code that doesn't depend upon the personality from setup_new_exec into flush_old_exec. This is to facilitate future changes by having as much code together in one function as possible. To see why it is safe to move this code please note that effectively this change moves the personality setting in the binfmt and the following three lines of code after everything except unlocking the mutexes: arch_pick_mmap_layout arch_setup_new_exec mm->task_size = TASK_SIZE The function arch_pick_mmap_layout at most sets: mm->get_unmapped_area mm->mmap_base mm->mmap_legacy_base mm->mmap_compat_base mm->mmap_compat_legacy_base which nothing in flush_old_exec or setup_new_exec depends on. The function arch_setup_new_exec only sets architecture specific state and the rest of the functions only deal in state that applies to all architectures. The last line just sets mm->task_size and again nothing in flush_old_exec or setup_new_exec depend on task_size. Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions") Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-08exec: In setup_new_exec cache current in the local variable meEric W. Biederman1-11/+12
At least gcc 8.3 when generating code for x86_64 has a hard time consolidating multiple calls to current aka get_current(), and winds up unnecessarily rereading %gs:current_task several times in setup_new_exec. Caching the value of current in the local variable of me generates slightly better and shorter assembly. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-08exec: Merge install_exec_creds into setup_new_execEric W. Biederman5-34/+26
The two functions are now always called one right after the other so merge them together to make future maintenance easier. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-08exec: Rename the flag called_exec_mmap point_of_no_returnEric W. Biederman1-6/+6
Update the comments and make the code easier to understand by renaming this flag. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-08exec: Make unlocking exec_update_mutex explictEric W. Biederman1-3/+3
With install_exec_creds updated to follow immediately after setup_new_exec, the failure of unshare_sighand is the only code path where exec_update_mutex is held but not explicitly unlocked. Update that code path to explicitly unlock exec_update_mutex. Remove the unlocking of exec_update_mutex from free_bprm. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-08binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elfEric W. Biederman3-4/+3
In 2016 Linus moved install_exec_creds immediately after setup_new_exec, in binfmt_elf as a cleanup and as part of closing a potential information leak. Perform the same cleanup for the other binary formats. Different binary formats doing the same things the same way makes exec easier to reason about and easier to maintain. Greg Ungerer reports: > I tested the the whole series on non-MMU m68k and non-MMU arm > (exercising binfmt_flat) and it all tested out with no problems, > so for the binfmt_flat changes: Tested-by: Greg Ungerer <gerg@linux-m68k.org> Ref: 9f834ec18def ("binfmt_elf: switch to new creds when switching to new mm") Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>