summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2018-11-19dax: Avoid losing wakeup in dax_lock_mapping_entryMatthew Wilcox1-0/+1
After calling get_unlocked_entry(), you have to call put_unlocked_entry() to avoid subsequent waiters losing wakeups. Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-19exec: make de_thread() freezableChanho Min1-2/+3
Suspend fails due to the exec family of functions blocking the freezer. The casue is that de_thread() sleeps in TASK_UNINTERRUPTIBLE waiting for all sub-threads to die, and we have the deadlock if one of them is frozen. This also can occur with the schedule() waiting for the group thread leader to exit if it is frozen. In our machine, it causes freeze timeout as bellows. Freezing of tasks failed after 20.010 seconds (1 tasks refusing to freeze, wq_busy=0): setcpushares-ls D ffffffc00008ed70 0 5817 1483 0x0040000d Call trace: [<ffffffc00008ed70>] __switch_to+0x88/0xa0 [<ffffffc000d1c30c>] __schedule+0x1bc/0x720 [<ffffffc000d1ca90>] schedule+0x40/0xa8 [<ffffffc0001cd784>] flush_old_exec+0xdc/0x640 [<ffffffc000220360>] load_elf_binary+0x2a8/0x1090 [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240 [<ffffffc00021c584>] load_script+0x20c/0x228 [<ffffffc0001ccff4>] search_binary_handler+0x9c/0x240 [<ffffffc0001ce8e0>] do_execveat_common.isra.14+0x4f8/0x6e8 [<ffffffc0001cedd0>] compat_SyS_execve+0x38/0x48 [<ffffffc00008de30>] el0_svc_naked+0x24/0x28 To fix this, make de_thread() freezable. It looks safe and works fine. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Chanho Min <chanho.min@lge.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-11-19udf: Allow mounting volumes with incorrect identification stringsJan Kara2-9/+21
Commit c26f6c615788 ("udf: Fix conversion of 'dstring' fields to UTF8") started to be more strict when checking whether converted strings are properly formatted. Sudip reports that there are DVDs where the volume identification string is actually too long - UDF reports: [ 632.309320] UDF-fs: incorrect dstring lengths (32/32) during mount and fails the mount. This is mostly harmless failure as we don't need volume identification (and even less volume set identification) for anything. So just truncate the volume identification string if it is too long and replace it with 'Invalid' if we just cannot convert it for other reasons. This keeps slightly incorrect media still mountable. CC: stable@vger.kernel.org Fixes: c26f6c615788 ("udf: Fix conversion of 'dstring' fields to UTF8") Reported-and-tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-19exportfs: fix 'passing zero to ERR_PTR()' warningYueHaibing1-0/+1
Fix a static code checker warning: fs/exportfs/expfs.c:171 reconnect_one() warn: passing zero to 'ERR_PTR' The error path for lookup_one_len_unlocked failure should set err to PTR_ERR. Fixes: bbf7a8a3562f ("exportfs: move most of reconnect_path to helper function") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-19Merge tag 'v4.20-rc3' into for-4.21/blockJens Axboe33-164/+301
Merge in -rc3 to resolve a few conflicts, but also to get a few important fixes that have gone into mainline since the block 4.21 branch was forked off (most notably the SCSI queue issue, which is both a conflict AND needed fix). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-19fs/openpromfs: Use of_node_name_eq for node name comparisonsRob Herring1-1/+1
Convert string compares of DT node names to use of_node_name_eq helper instead. This removes direct access to the node name pointer. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19fs/openpromfs: use full_name instead of path_component_nameRob Herring1-4/+5
In preparation to remove struct device_node.path_component_name, use full_name instead. kbasename is used so full_name can be used whether it is the full path or just the node's name and unit-address. Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18ocfs2: free up write context when direct IO failedWengang Wang2-2/+19
The write context should also be freed even when direct IO failed. Otherwise a memory leak is introduced and entries remain in oi->ip_unwritten_list causing the following BUG later in unlink path: ERROR: bug expression: !list_empty(&oi->ip_unwritten_list) ERROR: Clear inode of 215043, inode has unwritten extents ... Call Trace: ? __set_current_blocked+0x42/0x68 ocfs2_evict_inode+0x91/0x6a0 [ocfs2] ? bit_waitqueue+0x40/0x33 evict+0xdb/0x1af iput+0x1a2/0x1f7 do_unlinkat+0x194/0x28f SyS_unlinkat+0x1b/0x2f do_syscall_64+0x79/0x1ae entry_SYSCALL_64_after_hwframe+0x151/0x0 This patch also logs, with frequency limit, direct IO failures. Link: http://lkml.kernel.org/r/20181102170632.25921-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-18mm: don't reclaim inodes with many attached pagesRoman Gushchin1-2/+5
Spock reported that commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects") leads to a regression on his setup: periodically the majority of the pagecache is evicted without an obvious reason, while before the change the amount of free memory was balancing around the watermark. The reason behind is that the mentioned above change created some minimal background pressure on the inode cache. The problem is that if an inode is considered to be reclaimed, all belonging pagecache page are stripped, no matter how many of them are there. So, if a huge multi-gigabyte file is cached in the memory, and the goal is to reclaim only few slab objects (unused inodes), we still can eventually evict all gigabytes of the pagecache at once. The workload described by Spock has few large non-mapped files in the pagecache, so it's especially noticeable. To solve the problem let's postpone the reclaim of inodes, which have more than 1 attached page. Let's wait until the pagecache pages will be evicted naturally by scanning the corresponding LRU lists, and only then reclaim the inode structure. Link: http://lkml.kernel.org/r/20181023164302.20436-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Spock <dairinin@gmail.com> Tested-by: Spock <dairinin@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> [4.19.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-17dax: Fix huge page faultsMatthew Wilcox1-8/+4
Using xas_load() with a PMD-sized xa_state would work if either a PMD-sized entry was present or a PTE sized entry was present in the first 64 entries (of the 512 PTEs in a PMD on x86). If there was no PTE in the first 64 entries, grab_mapping_entry() would believe there were no entries present, allocate a PMD-sized entry and overwrite the PTE in the page cache. Use xas_find_conflict() instead which turns out to simplify both get_unlocked_entry() and grab_mapping_entry(). Also remove a WARN_ON_ONCE from grab_mapping_entry() as it will have already triggered in get_unlocked_entry(). Fixes: cfc93c6c6c96 ("dax: Convert dax_insert_pfn_mkwrite to XArray") Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-17dax: Fix dax_unlock_mapping_entry for PMD pagesMatthew Wilcox1-9/+8
Device DAX PMD pages do not set the PageHead bit for compound pages. Fix for now by retrieving the PMD bit from the entry, but eventually we will be passed the page size by the caller. Reported-by: Dan Williams <dan.j.williams@intel.com> Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray") Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-17aio: fix failure to put the file pointerJens Axboe1-0/+1
If the ioprio capability check fails, we return without putting the file pointer. Fixes: d9a08a9e616b ("fs: Add aio iopriority support") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-17dax: Reinstate RCU protection of inodeMatthew Wilcox1-3/+19
For the device-dax case, it is possible that the inode can go away underneath us. The rcu_read_lock() was there to prevent it from being freed, and not (as I thought) to protect the tree. Bring back the rcu_read_lock() protection. Also add a little kernel-doc; while this function is not exported to modules, it is used from outside dax.c Reported-by: Dan Williams <dan.j.williams@intel.com> Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray") Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-17dax: Make sure the unlocking entry isn't lockedMatthew Wilcox1-0/+1
I wrote the semantics in the commit message, but didn't document it in the source code. Use a BUG_ON instead (if any code does do this, it's really buggy; we can't recover and it's worth taking the machine down). Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-17dax: Remove optimisation from dax_lock_mapping_entryMatthew Wilcox1-5/+2
Skipping some of the revalidation after we sleep can lead to returning a mapping which has already been freed. Just drop this optimisation. Reported-by: Dan Williams <dan.j.williams@intel.com> Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray") Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-16Merge tag 'fsnotify_for_v4.20-rc3' of ↵Linus Torvalds2-7/+10
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fix from Jan Kara: "One small fsnotify fix for duplicate events" * tag 'fsnotify_for_v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix handling of events on child sub-directory
2018-11-16Merge tag 'gfs2-4.20.fixes3' of ↵Linus Torvalds2-28/+29
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull bfs2 fixes from Andreas Gruenbacher: "Fix two bugs leading to leaked buffer head references: - gfs2: Put bitmap buffers in put_super - gfs2: Fix iomap buffer head reference counting bug And one bug leading to significant slow-downs when deleting large files: - gfs2: Fix metadata read-ahead during truncate (2)" * tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix iomap buffer head reference counting bug gfs2: Fix metadata read-ahead during truncate (2) gfs2: Put bitmap buffers in put_super
2018-11-16gfs2: Fix iomap buffer head reference counting bugAndreas Gruenbacher1-23/+17
GFS2 passes the inode buffer head (dibh) from gfs2_iomap_begin to gfs2_iomap_end in iomap->private. It sets that private pointer in gfs2_iomap_get. Users of gfs2_iomap_get other than gfs2_iomap_begin would have to release iomap->private, but this isn't done correctly, leading to a leak of buffer head references. To fix this, move the code for setting iomap->private from gfs2_iomap_get to gfs2_iomap_begin. Fixes: 64bc06bb32 ("gfs2: iomap buffered write support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-16block: don't plug for aio/O_DIRECT HIPRI IOJens Axboe1-2/+11
Those will go straight to issue inside blk-mq, so don't bother setting up a block plug for them. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16block: for async O_DIRECT, mark us as polling if asked toJens Axboe1-0/+3
Inherit the iocb IOCB_HIPRI flag, and pass on REQ_HIPRI for those kinds of requests. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16block: add polled wakeup task helperJens Axboe2-3/+3
If we're polling for IO on a device that doesn't use interrupts, then IO completion loop (and wake of task) is done by submitting task itself. If that is the case, then we don't need to enter the wake_up_process() function, we can simply mark ourselves as TASK_RUNNING. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-16Merge tag 'fuse-fixes-4.20-rc3' of ↵Linus Torvalds2-5/+15
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "A couple of fixes, all bound for -stable (i.e. not regressions in this cycle)" * tag 'fuse-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix use-after-free in fuse_direct_IO() fuse: fix possibly missed wake-up after abort fuse: fix leaked notify reply
2018-11-15rxrpc: Fix life checkDavid Howells1-1/+10
The life-checking function, which is used by kAFS to make sure that a call is still live in the event of a pending signal, only samples the received packet serial number counter; it doesn't actually provoke a change in the counter, rather relying on the server to happen to give us a packet in the time window. Fix this by adding a function to force a ping to be transmitted. kAFS then keeps track of whether there's been a stall, and if so, uses the new function to ping the server, resetting the timeout to allow the reply to come back. If there's a stall, a ping and the call is *still* stalled in the same place after another period, then the call will be aborted. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Fixes: f4d15fb6f99a ("rxrpc: Provide functions for allowing cleaner handling of signals") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15dlm: fix missing idr_destroy for recover_idrDavid Teigland1-0/+1
Which would leak memory for the idr internals. Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15Merge tag 'nfs-for-4.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds3-8/+17
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - Don't exit the NFSv4 state manager without clearing NFS4CLNT_MANAGER_RUNNING Bugfixes: - Fix an Oops when destroying the RPCSEC_GSS credential cache - Fix an Oops during delegation callbacks - Ensure that the NFSv4 state manager exits the loop on SIGKILL - Fix a bogus get/put in generic_key_to_expire()" * tag 'nfs-for-4.20-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Fix an Oops during delegation callbacks SUNRPC: Fix a bogus get/put in generic_key_to_expire() SUNRPC: Fix a Oops when destroying the RPCSEC_GSS credential cache NFSv4: Ensure that the state manager exits the loop on SIGKILL NFSv4: Don't exit the state manager without clearing NFS4CLNT_MANAGER_RUNNING
2018-11-15fsnotify/fdinfo: include fdinfo.h for inotify_show_fdinfo()Eric Biggers1-0/+1
inotify_show_fdinfo() is defined in fs/notify/fdinfo.c and declared in fs/notify/fdinfo.h, but the declaration isn't included at the point of the definition. Include the header to enforce that the definition matches the declaration. This addresses a gcc warning when -Wmissing-prototypes is enabled. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-15ext2: change reusable parameter to true when calling mb_cache_entry_create()Chengguang Xu1-1/+2
Reusable parameter of mb_cache_entry_create() is bool type, so it's better to set true instead of 1. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-15dlm: memory leaks on error path in dlm_user_request()Vasily Averin1-7/+7
According to comment in dlm_user_request() ua should be freed in dlm_free_lkb() after successful attach to lkb. However ua is attached to lkb not in set_lock_args() but later, inside request_lock(). Fixes 597d0cae0f99 ("[DLM] dlm: user locks") Cc: stable@kernel.org # 2.6.19 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15dlm: lost put_lkb on error path in receive_convert() and receive_unlock()Vasily Averin1-0/+2
Fixes 6d40c4a708e0 ("dlm: improve error and debug messages") Cc: stable@kernel.org # 3.5 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15dlm: possible memory leak on error path in create_lkb()Vasily Averin1-0/+1
Fixes 3d6aa675fff9 ("dlm: keep lkbs in idr") Cc: stable@kernel.org # 3.1 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15dlm: fixed memory leaks after failed ls_remove_names allocationVasily Averin1-1/+1
If allocation fails on last elements of array need to free already allocated elements. v2: just move existing out_rsbtbl label to right place Fixes 789924ba635f ("dlm: fix race between remove and lookup") Cc: stable@kernel.org # 3.6 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15Merge tag 'nfsd-4.20-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-0/+3
Pull nfsd fixes from Bruce Fields: "Three nfsd bugfixes. None are new bugs, but they all take a little effort to hit, which might explain why they weren't found sooner" * tag 'nfsd-4.20-1' of git://linux-nfs.org/~bfields/linux: SUNRPC: drop pointless static qualifier in xdr_get_next_encode_buffer() nfsd: COPY and CLONE operations require the saved filehandle to be set sunrpc: correct the computation for page_ptr when truncating
2018-11-14Merge branch 'for-linus' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fix from Eric Biederman: "Benjamin Coddington noticed an unkillable busy loop in the kernel that anyone who is sufficiently motivated can trigger. This bug did not exist in earlier kernels making this bug a regression. I have tested the change personally and confirmed that the bug exists and that the fix works. This fix has been picked up by linux-next and hopefully the automated testing bots and no problems have been reported from those sources. Ordinarily I would let something like this sit a little longer but I am going to be away at Linux Plumbers the rest of this week and I am afraid if I don't send the pull request now this fix will get lost" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mnt: fix __detach_mounts infinite loop
2018-11-14Btrfs: ensure path name is null terminated at btrfs_control_ioctlFilipe Manana1-0/+1
We were using the path name received from user space without checking that it is null terminated. While btrfs-progs is well behaved and does proper validation and null termination, someone could call the ioctl and pass a non-null terminated patch, leading to buffer overrun problems in the kernel. The ioctl is protected by CAP_SYS_ADMIN. So just set the last byte of the path to a null character, similar to what we do in other ioctls (add/remove/resize device, snapshot creation, etc). CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-14ext2: remove redundant condition checkChengguang Xu1-6/+4
ext2_xattr_destroy_cache() can handle NULL pointer correctly, so there is no need to check NULL pointer before calling ext2_xattr_destroy_cache(). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-14NFSv4: Fix an Oops during delegation callbacksTrond Myklebust2-4/+11
If the server sends a CB_GETATTR or a CB_RECALL while the filesystem is being unmounted, then we can Oops when releasing the inode in nfs4_callback_getattr() and nfs4_callback_recall(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-13dlm: fix possible call to kfree() for non-initialized pointerDenis V. Lunev1-1/+1
Technically dlm_config_nodes() could return error and keep nodes uninitialized. After that on the fail path of we'll call kfree() for that uninitialized value. The patch is simple - we should just initialize nodes with NULL. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-13fanotify: introduce new event mask FAN_OPEN_EXEC_PERMMatthew Bobrowski2-2/+3
A new event mask FAN_OPEN_EXEC_PERM has been defined. This allows users to receive events and grant access to files that are intending to be opened for execution. Events of FAN_OPEN_EXEC_PERM type will be generated when a file has been opened by using either execve(), execveat() or uselib() system calls. This acts in the same manner as previous permission event mask, meaning that an access response is required from the user application in order to permit any further operations on the file. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-13fanotify: introduce new event mask FAN_OPEN_EXECMatthew Bobrowski2-2/+3
A new event mask FAN_OPEN_EXEC has been defined so that users have the ability to receive events specifically when a file has been opened with the intent to be executed. Events of FAN_OPEN_EXEC type will be generated when a file has been opened using either execve(), execveat() or uselib() system calls. The feature is implemented within fsnotify_open() by generating the FAN_OPEN_EXEC event type if __FMODE_EXEC is set within file->f_flags. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-13fanotify: return only user requested event types in event maskMatthew Bobrowski1-12/+16
Modify fanotify_should_send_event() so that it now returns a mask for an event that contains ONLY flags for the event types that have been specifically requested by the user. Flags that may have been included within the event mask, but have not been explicitly requested by the user will not be present in the returned value. As an example, given the situation where a user requests events of type FAN_OPEN. Traditionally, the event mask returned within an event that occurred on a filesystem object that has been marked for monitoring and is opened, will only ever have the FAN_OPEN bit set. With the introduction of the new flags like FAN_OPEN_EXEC, and perhaps any other future event flags, there is a possibility of the returned event mask containing more than a single bit set, despite having only requested the single event type. Prior to these modifications performed to fanotify_should_send_event(), a user would have received a bundled event mask containing flags FAN_OPEN and FAN_OPEN_EXEC in the instance that a file was opened for execution via execve(), for example. This means that a user would receive event types in the returned event mask that have not been requested. This runs the possibility of breaking existing systems and causing other unforeseen issues. To mitigate this possibility, fanotify_should_send_event() has been modified to return the event mask containing ONLY event types explicitly requested by the user. This means that we will NOT report events that the user did no set a mask for, and we will NOT report events that the user has set an ignore mask for. The function name fanotify_should_send_event() has also been updated so that it's more relevant to what it has been designed to do. Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-11-13Btrfs: fix rare chances for data loss when doing a fast fsyncFilipe Manana1-0/+24
After the simplification of the fast fsync patch done recently by commit b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") and commit e7175a692765 ("btrfs: remove the wait ordered logic in the log_one_extent path"), we got a very short time window where we can get extents logged without writeback completing first or extents logged without logging the respective data checksums. Both issues can only happen when doing a non-full (fast) fsync. As soon as we enter btrfs_sync_file() we trigger writeback, then lock the inode and then wait for the writeback to complete before starting to log the inode. However before we acquire the inode's lock and after we started writeback, it's possible that more writes happened and dirtied more pages. If that happened and those pages get writeback triggered while we are logging the inode (for example, the VM subsystem triggering it due to memory pressure, or another concurrent fsync), we end up seeing the respective extent maps in the inode's list of modified extents and will log matching file extent items without waiting for the respective ordered extents to complete, meaning that either of the following will happen: 1) We log an extent after its writeback finishes but before its checksums are added to the csum tree, leading to -EIO errors when attempting to read the extent after a log replay. 2) We log an extent before its writeback finishes. Therefore after the log replay we will have a file extent item pointing to an unwritten extent (and without the respective data checksums as well). This could not happen before the fast fsync patch simplification, because for any extent we found in the list of modified extents, we would wait for its respective ordered extent to finish writeback or collect its checksums for logging if it did not complete yet. Fix this by triggering writeback again after acquiring the inode's lock and before waiting for ordered extents to complete. Fixes: e7175a692765 ("btrfs: remove the wait ordered logic in the log_one_extent path") Fixes: b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-13btrfs: Always try all copies when reading extent buffersNikolay Borisov1-10/+1
When a metadata read is served the endio routine btree_readpage_end_io_hook is called which eventually runs the tree-checker. If tree-checker fails to validate the read eb then it sets EXTENT_BUFFER_CORRUPT flag. This leads to btree_read_extent_buffer_pages wrongly assuming that all available copies of this extent buffer are wrong and failing prematurely. Fix this modify btree_read_extent_buffer_pages to read all copies of the data. This failure was exhibitted in xfstests btrfs/124 which would spuriously fail its balance operations. The reason was that when balance was run following re-introduction of the missing raid1 disk __btrfs_map_block would map the read request to stripe 0, which corresponded to devid 2 (the disk which is being removed in the test): item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 3553624064) itemoff 15975 itemsize 112 length 1073741824 owner 2 stripe_len 65536 type DATA|RAID1 io_align 65536 io_width 65536 sector_size 4096 num_stripes 2 sub_stripes 1 stripe 0 devid 2 offset 2156920832 dev_uuid 8466c350-ed0c-4c3b-b17d-6379b445d5c8 stripe 1 devid 1 offset 3553624064 dev_uuid 1265d8db-5596-477e-af03-df08eb38d2ca This caused read requests for a checksum item that to be routed to the stale disk which triggered the aforementioned logic involving EXTENT_BUFFER_CORRUPT flag. This then triggered cascading failures of the balance operation. Fixes: a826d6dcb32d ("Btrfs: check items for correctness as we search") CC: stable@vger.kernel.org # 4.4+ Suggested-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-11-13NFSv4: Ensure that the state manager exits the loop on SIGKILLTrond Myklebust1-1/+1
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-13NFSv4: Don't exit the state manager without clearing NFS4CLNT_MANAGER_RUNNINGTrond Myklebust1-3/+5
If we exit the NFSv4 state manager due to a umount, then we can end up leaving the NFS4CLNT_MANAGER_RUNNING flag set. If another mount causes the nfs4_client to be rereferenced before it is destroyed, then we end up never being able to recover state. Fixes: 47c2199b6eb5 ("NFSv4.1: Ensure state manager thread dies on last ...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.15+
2018-11-12userfaultfd: Replace spin_is_locked() with lockdepLance Roy1-1/+1
lockdep_assert_held() is better suited to checking locking requirements, since it only checks if the current thread holds the lock regardless of whether someone else does. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12mnt: fix __detach_mounts infinite loopBenjamin Coddington1-3/+3
Since commit ff17fa561a04 ("d_invalidate(): unhash immediately") immediately unhashes the dentry, we'll never return the mountpoint in lookup_mountpoint(), which can lead to an unbreakable loop in d_invalidate(). I have reports of NFS clients getting into this condition after the server removes an export of an existing mount created through follow_automount(), but I suspect there are various other ways to produce this problem if we hunt down users of d_invalidate(). For example, it is possible to get into this state by using XFS' d_invalidate() call in xfs_vn_unlink(): truncate -s 100m img{1,2} mkfs.xfs -q -n version=ci img1 mkfs.xfs -q -n version=ci img2 mkdir -p /mnt/xfs mount img1 /mnt/xfs mkdir /mnt/xfs/sub1 mount img2 /mnt/xfs/sub1 cat > /mnt/xfs/sub1/foo & umount -l /mnt/xfs/sub1 mount img2 /mnt/xfs/sub1 mount --make-private /mnt/xfs mkdir /mnt/xfs/sub2 mount --move /mnt/xfs/sub1 /mnt/xfs/sub2 rmdir /mnt/xfs/sub1 Fix this by moving the check for an unlinked dentry out of the detach_mounts() path. Fixes: ff17fa561a04 ("d_invalidate(): unhash immediately") Cc: stable@vger.kernel.org Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2018-11-12Merge tag 'for-4.20-rc1-tag' of ↵Linus Torvalds8-57/+107
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Several fixes to recent release (4.19, fixes tagged for stable) and other fixes" * tag 'for-4.20-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix missing delayed iputs on unmount Btrfs: fix data corruption due to cloning of eof block Btrfs: fix infinite loop on inode eviction after deduplication of eof block Btrfs: fix deadlock on tree root leaf when finding free extent btrfs: avoid link error with CONFIG_NO_AUTO_INLINE btrfs: tree-checker: Fix misleading group system information Btrfs: fix missing data checksums after a ranged fsync (msync) btrfs: fix pinned underflow after transaction aborted Btrfs: fix cur_offset in the error case for nocow
2018-11-12Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds5-31/+51
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A large number of ext4 bug fixes, mostly buffer and memory leaks on error return cleanup paths" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: missing !bh check in ext4_xattr_inode_write() ext4: fix buffer leak in __ext4_read_dirblock() on error path ext4: fix buffer leak in ext4_expand_extra_isize_ea() on error path ext4: fix buffer leak in ext4_xattr_move_to_block() on error path ext4: release bs.bh before re-using in ext4_xattr_block_find() ext4: fix buffer leak in ext4_xattr_get_block() on error path ext4: fix possible leak of s_journal_flag_rwsem in error path ext4: fix possible leak of sbi->s_group_desc_leak in error path ext4: remove unneeded brelse call in ext4_xattr_inode_update_ref() ext4: avoid possible double brelse() in add_new_gdb() on error path ext4: avoid buffer leak in ext4_orphan_add() after prior errors ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty() ext4: fix possible inode leak in the retry loop of ext4_resize_fs() ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizing ext4: add missing brelse() update_backups()'s error path ext4: add missing brelse() add_new_gdb_meta_bg()'s error path ext4: add missing brelse() in set_flexbg_block_bitmap()'s error path ext4: avoid potential extra brelse in setup_new_flex_group_blocks()
2018-11-10Merge branch 'for-linus' of ↵Linus Torvalds1-5/+17
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "I believe all of these are simple obviously correct bug fixes. These fall into two groups: - Fixing the implementation of MNT_LOCKED which prevents lesser privileged users from seeing unders mounts created by more privileged users. - Fixing the extended uid and group mapping in user namespaces. As well as ensuring the code looks correct I have spot tested these changes as well and in my testing the fixes are working" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mount: Prevent MNT_DETACH from disconnecting locked mounts mount: Don't allow copying MNT_UNBINDABLE|MNT_LOCKED mounts mount: Retest MNT_LOCKED in do_umount userns: also map extents in the reverse map to kernel IDs
2018-11-10sysv: return 'err' instead of 0 in __sysv_write_inodeYueHaibing1-1/+1
Fixes gcc '-Wunused-but-set-variable' warning: fs/sysv/inode.c: In function '__sysv_write_inode': fs/sysv/inode.c:239:6: warning: variable 'err' set but not used [-Wunused-but-set-variable] __sysv_write_inode should return 'err' instead of 0 Fixes: 05459ca81ac3 ("repair sysv_write_inode(), switch sysv to simple_fsync()") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>