summaryrefslogtreecommitdiff
path: root/fs/notify/mark.c
AgeCommit message (Collapse)AuthorFilesLines
2016-01-15fsnotify: destroy marks with call_srcu instead of dedicated threadJeff Layton1-52/+14
At the time that this code was originally written, call_srcu didn't exist, so this thread was required to ensure that we waited for that SRCU grace period to settle before finally freeing the object. It does exist now however and we can much more efficiently use call_srcu to handle this. That also allows us to potentially use srcu_barrier to ensure that they are all of the callbacks have run before proceeding. In order to conserve space, we union the rcu_head with the g_list. This will be necessary for nfsd which will allocate marks from a dedicated slabcache. We have to be able to ensure that all of the objects are destroyed before destroying the cache. That's fairly Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: Eric Paris <eparis@parisplace.org> Reviewed-by: Jan Kara <jack@suse.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-05fsnotify: get rid of fsnotify_destroy_mark_locked()Jan Kara1-33/+40
fsnotify_destroy_mark_locked() is subtle to use because it temporarily releases group->mark_mutex. To avoid future problems with this function, split it into two. fsnotify_detach_mark() is the part that needs group->mark_mutex and fsnotify_free_mark() is the part that must be called outside of group->mark_mutex. This way it's much clearer what's going on and we also avoid some pointless acquisitions of group->mark_mutex. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-05fsnotify: remove mark->free_listJan Kara1-15/+25
Free list is used when all marks on given inode / mount should be destroyed when inode / mount is going away. However we can free all of the marks without using a special list with some care. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()Jan Kara1-5/+25
fsnotify_clear_marks_by_group_flags() can race with fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked() drops mark_mutex, a mark from the list iterated by fsnotify_clear_marks_by_group_flags() can be freed and thus the next entry pointer we have cached may become stale and we dereference free memory. Fix the problem by first moving marks to free to a special private list and then always free the first entry in the special list. This method is safe even when entries from the list can disappear once we drop the lock. Signed-off-by: Jan Kara <jack@suse.com> Reported-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Ashish Sangwan <a.sangwan@samsung.com> Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-22Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()"Linus Torvalds1-14/+20
This reverts commit a2673b6e040663bf16a552f8619e6bde9f4b9acf. Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms: "Thanks for report! You are right that my patch introduces a race between fsnotify kthread and fsnotify_destroy_group() which can result in leaking inotify event on group destruction. I haven't yet decided whether the right fix is not to queue events for dying notification group (as that is pointless anyway) or whether we should just fix the original problem differently... Whenever I look at fsnotify code mark handling I get lost in the maze of locks, lists, and subtle differences between how different notification systems handle notification marks :( I'll think about it over night" and after thinking about it, Jan says: "OK, I have looked into the code some more and I found another relatively simple way of fixing the original oops. It will be IMHO better than trying to fixup this issue which has more potential for breakage. I'll ask Linus to revert the fsnotify fix he already merged and send a new fix" Reported-by: Kinglong Mee <kinglongmee@gmail.com> Requested-by: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-18fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()Jan Kara1-20/+14
fsnotify_clear_marks_by_group_flags() can race with fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops mark_mutex, a mark from the list iterated by fsnotify_clear_marks_by_group_flags() can be freed and we dereference free memory in the loop there. Fix the problem by keeping mark_mutex held in fsnotify_destroy_mark_locked(). The reason why we drop that mutex is that we need to call a ->freeing_mark() callback which may acquire mark_mutex again. To avoid this and similar lock inversion issues, we move the call to ->freeing_mark() callback to the kthread destroying the mark. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Ashish Sangwan <a.sangwan@samsung.com> Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13fsnotify: remove destroy_list from fsnotify_markJan Kara1-4/+4
destroy_list is used to track marks which still need waiting for srcu period end before they can be freed. However by the time mark is added to destroy_list it isn't in group's list of marks anymore and thus we can reuse fsnotify_mark->g_list for queueing into destroy_list. This saves two pointers for each fsnotify_mark. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13fsnotify: unify inode and mount marks handlingJan Kara1-3/+86
There's a lot of common code in inode and mount marks handling. Factor it out to a common helper function. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-14fanotify: fix notification of groups with inode & mount marksJan Kara1-0/+36
fsnotify() needs to merge inode and mount marks lists when notifying groups about events so that ignore masks from inode marks are reflected in mount mark notifications and groups are notified in proper order (according to priorities). Currently the sorting of the lists done by fsnotify_add_inode_mark() / fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted ignore masks not being used in some cases. Fix the problem by always using the same comparison function when sorting / merging the mark lists. Thanks to Heinrich Schuchardt for improvements of my patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721 Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05fs/notify/mark.c: trivial cleanupDavid Cohen1-1/+1
Do not initialize private_destroy_list twice. list_replace_init() already takes care of initializing private_destroy_list. We don't need to initialize it with LIST_HEAD() beforehand. Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fsnotify: update comments concerning locking schemeLino Sanfilippo1-28/+22
There have been changes in the locking scheme of fsnotify but the comments in the source code have not been updated yet. This patch corrects this. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11fsnotify: change locking orderLino Sanfilippo1-10/+10
On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote: > > I finally built and tested a v3.0 kernel with these patches (I know I'm > SOOOOOO far behind). Not what I hoped for: > > > [ 150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day... > > [ 150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 > > [ 150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0 > > [ 150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > [ 150.946012] CPU 0 > > [ 150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan] > > [ 150.946012] > > [ 150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM > > [ 150.946012] RIP: 0010:[<ffffffff810ffd58>] [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP: 0018:ffff88002c2e5df8 EFLAGS: 00010282 > > [ 150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438 > > [ 150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240 > > [ 150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000 > > [ 150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428 > > [ 150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610 > > [ 150.946012] FS: 00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000 > > [ 150.946012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0 > > [ 150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > [ 150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760) > > [ 150.946012] Stack: > > [ 150.946012] ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76 > > [ 150.946012] ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250 > > [ 150.946012] ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438 > > [ 150.946012] Call Trace: > > [ 150.946012] [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130 > > [ 150.946012] [<ffffffff8115e9be>] evict+0x7e/0x170 > > [ 150.946012] [<ffffffff8115ee40>] iput_final+0xd0/0x190 > > [ 150.946012] [<ffffffff8115ef33>] iput+0x33/0x40 > > [ 150.946012] [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160 > > [ 150.946012] [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50 > > [ 150.946012] [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0 > > [ 150.946012] [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b > > [ 150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00 > > [ 150.946012] 83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a > > [ 150.946012] RIP [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP <ffff88002c2e5df8> > > [ 150.946012] CR2: 0000000000000070 > > Looks at aweful lot like the problem from: > http://www.spinics.net/lists/linux-fsdevel/msg46101.html > I tried to reproduce this bug with your test program, but without success. However, if I understand correctly, this occurs since we dont hold any locks when we call iput() in mark_destroy(), right? With the patches you tested, iput() is also not called within any lock, since the groups mark_mutex is released temporarily before iput() is called. This is, since the original codes behaviour is similar. However since we now have a mutex as the biggest lock, we can do what you suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and call iput() with the mutex held to avoid the race. The patch below implements this. It uses nested locking to avoid deadlock in case we do the final iput() on an inode which still holds marks and thus would take the mutex again when calling fsnotify_inode_delete() in destroy_inode(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: dont put marks on temporary list when clearing marks by groupLino Sanfilippo1-8/+2
In clear_marks_by_group_flags() the mark list of a group is iterated and the marks are put on a temporary list. Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list any more and are able to remove the marks while the mark list is iterated and the mark list mutex is held. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: introduce locked versions of fsnotify_add_mark() and ↵Lino Sanfilippo1-12/+30
fsnotify_remove_mark() This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked() which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but assume that the caller has already taken the groups mark mutex. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: pass group to fsnotify_destroy_mark()Lino Sanfilippo1-17/+4
In fsnotify_destroy_mark() dont get the group from the passed mark anymore, but pass the group itself as an additional parameter to the function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: use a mutex instead of a spinlock to protect a groups mark listLino Sanfilippo1-9/+9
Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead of a spinlock results in more flexibility (i.e it allows to sleep while the lock is held). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: take groups mark_lock before mark lockLino Sanfilippo1-10/+16
Race-free addition and removal of a mark to a groups mark list would be easier if we could lock the mark list of group before we lock the specific mark. This patch changes the order used to add/remove marks to/from mark lists from 1. mark->lock 2. group->mark_lock 3. inode->i_lock to 1. group->mark_lock 2. mark->lock 3. inode->i_lock Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-12-11fsnotify: use reference counting for groupsLino Sanfilippo1-10/+14
Get a group ref for each mark that is added to the groups list and release that ref when the mark is freed in fsnotify_put_mark(). We also use get a group reference for duplicated marks and for private event data. Now we dont free a group any more when the number of marks becomes 0 but when the groups ref count does. Since this will only happen when all marks are removed from a groups mark list, we dont have to set the groups number of marks to 1 at group creation. Beside clearing all marks in fsnotify_destroy_group() we do also flush the groups event queue. This is since events may hold references to groups (due to private event data) and we have to put those references first before we get a chance to put the final ref, which will result in a call to fsnotify_final_destroy_group(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-15fsnotify: don't BUG in fsnotify_destroy_mark()Miklos Szeredi1-3/+5
Removing the parent of a watched file results in "kernel BUG at fs/notify/mark.c:139". To reproduce add "-w /tmp/audit/dir/watched_file" to audit.rules rm -rf /tmp/audit/dir This is caused by fsnotify_destroy_mark() being called without an extra reference taken by the caller. Reported by Francesco Cosoleto here: https://bugzilla.novell.com/show_bug.cgi?id=689860 Fix by removing the BUG_ON and adding a comment about not accessing mark after the iput. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-27atomic: use <linux/atomic.h>Arun Sharma1-1/+1
This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-31Fix common misspellingsLucas De Marchi1-1/+1
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-25fs: rename inode_lock to inode_hash_lockDave Chinner1-1/+0
All that remains of the inode_lock is protecting the inode hash list manipulation and traversals. Rename the inode_lock to inode_hash_lock to reflect it's actual function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-07-28fsnotify: remove global fsnotify groups listsEric Paris1-9/+0
The global fsnotify groups lists were invented as a way to increase the performance of fsnotify by shortcutting events which were not interesting. With the changes to walk the object lists rather than global groups lists these shortcuts are not useful. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: Exchange list heads instead of moving elementsAndreas Gruenbacher1-4/+2
Instead of moving list elements from destroy_list to &private_destroy_list, exchange the list heads. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: srcu to protect read side of inode and vfsmount locksEric Paris1-3/+57
Currently reading the inode->i_fsnotify_marks or vfsmount->mnt_fsnotify_marks lists are protected by a spinlock on both the read and the write side. This patch protects the read side of those lists with a new single srcu. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been calledEric Paris1-4/+7
Currently fsnotify check is mark->group is NULL to decide if fsnotify_destroy_mark() has already been called or not. With the upcoming rcu work it is a heck of a lot easier to use an explicit flag than worry about group being set to NULL. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: call iput on inodes when no longer markedEric Paris1-1/+1
fsnotify takes an igrab on an inode when it adds a mark. The code was supposed to drop the reference when the mark was removed but didn't. This caused problems when an fs was unmounted because those inodes would clearly not be gone. Thus resulting in the most devistating of messages: VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds. >>> Have a nice day... Jiri Slaby bisected the problem to a patch in the fsnotify tree. The code snippets below show my stupidity quite clearly. void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark) { ... mark->inode = NULL; ... } void fsnotify_destroy_mark(struct fsnotify_mark *mark) { struct inode *inode = NULL; ... if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) { fsnotify_destroy_inode_mark(mark); inode = mark->i.inode; } ... if (inode) iput(inode); ... } Obviously the intent was to capture the inode before it was set to NULL in fsnotify_destory_inode_mark() so we wouldn't be leaking inodes forever. Instead we leaked them (and exploded on umount) Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fanotify: clear all fanotify marksEric Paris1-5/+16
fanotify listeners may want to clear all marks. They may want to do this to destroy all of their inode marks which have nothing but ignores. Realistically this is useful for av vendors who update policy and want to clear all of their cached allows. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: ignored_mask - excluding notificationEric Paris1-0/+6
The ignored_mask is a new mask which is part of fsnotify marks. A group's should_send_event() function can use the ignored mask to determine that certain events are not of interest. In particular if a group registers a mask including FS_OPEN on a vfsmount they could add FS_OPEN to the ignored_mask for individual inodes and not send open events for those inodes. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: allow marks to not pin inodes in coreEric Paris1-1/+16
inotify marks must pin inodes in core. dnotify doesn't technically need to since they are closed when the directory is closed. fanotify also need to pin inodes in core as it works today. But the next step is to introduce the concept of 'ignored masks' which is actually a mask of events for an inode of no interest. I claim that these should be liberally sent to the kernel and should not pin the inode in core. If the inode is brought back in the listener will get an event it may have thought excluded, but this is not a serious situation and one any listener should deal with. This patch lays the ground work for non-pinning inode marks by using lazy inode pinning. We do not pin a mark until it has a non-zero mask entry. If a listener new sets a mask we never pin the inode. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: vfsmount marks generic functionsEric Paris1-9/+11
Much like inode-mark.c has all of the code dealing with marks on inodes this patch adds a vfsmount-mark.c which has similar code but is intended for marks on vfsmounts. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: clear marks to 0 in fsnotify_init_markEric Paris1-4/+1
Currently fsnotify_init_mark sets some fields to 0/NULL. Some users already used some sorts of zalloc, some didn't. This patch uses memset to explicitly zero everything in the fsnotify_mark when it is initialized so we don't have to be careful if fields are later added to marks. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-07-28fsnotify: split generic and inode specific mark codeEric Paris1-0/+294
currently all marking is done by functions in inode-mark.c. Some of this is pretty generic and should be instead done in a generic function and we should only put the inode specific code in inode-mark.c Signed-off-by: Eric Paris <eparis@redhat.com>