summaryrefslogtreecommitdiff
path: root/fs/autofs4
AgeCommit message (Collapse)AuthorFilesLines
2012-10-21autofs4 - fix reset pending flag on mount failIan Kent1-2/+4
commit 49999ab27eab6289a8e4f450e148bdab521361b2 upstream. In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING mount pending flag is not cleared. One effect of this is when using the "browse" option, directory entry attributes show up with all "?"s due to the incorrect callback and subsequent failure return (when in fact no callback should be made). Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-07autofs: make the autofsv5 packet file descriptor use a packetized pipeLinus Torvalds3-2/+13
commit 64f371bc3107e69efce563a3d0f0e6880de0d537 upstream. The autofs packet size has had a very unfortunate size problem on x86: because the alignment of 'u64' differs in 32-bit and 64-bit modes, and because the packet data was not 8-byte aligned, the size of the autofsv5 packet structure differed between 32-bit and 64-bit modes despite looking otherwise identical (300 vs 304 bytes respectively). We first fixed that up by making the 64-bit compat mode know about this problem in commit a32744d4abae ("autofs: work around unhappy compat problem on x86-64"), and that made a 32-bit 'systemd' work happily on a 64-bit kernel because everything then worked the same way as on a 32-bit kernel. But it turned out that 'automount' had actually known and worked around this problem in user space, so fixing the kernel to do the proper 32-bit compatibility handling actually *broke* 32-bit automount on a 64-bit kernel, because it knew that the packet sizes were wrong and expected those incorrect sizes. As a result, we ended up reverting that compatibility mode fix, and thus breaking systemd again, in commit fcbf94b9dedd. With both automount and systemd doing a single read() system call, and verifying that they get *exactly* the size they expect but using different sizes, it seemed that fixing one of them inevitably seemed to break the other. At one point, a patch I seriously considered applying from Michael Tokarev did a "strcmp()" to see if it was automount that was doing the operation. Ugly, ugly. However, a prettier solution exists now thanks to the packetized pipe mode. By marking the communication pipe as being packetized (by simply setting the O_DIRECT flag), we can always just write the bigger packet size, and if user-space does a smaller read, it will just get that partial end result and the extra alignment padding will simply be thrown away. This makes both automount and systemd happy, since they now get the size they asked for, and the kernel side of autofs simply no longer needs to care - it could pad out the packet arbitrarily. Of course, if there is some *other* user of autofs (please, please, please tell me it ain't so - and we haven't heard of any) that tries to read the packets with multiple writes, that other user will now be broken - the whole point of the packetized mode is that one system call gets exactly one packet, and you cannot read a packet in pieces. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-07Revert "autofs: work around unhappy compat problem on x86-64"Linus Torvalds4-22/+2
commit fcbf94b9dedd2ce08e798a99aafc94fec8668161 upstream. This reverts commit a32744d4abae24572eff7269bc17895c41bd0085. While that commit was technically the right thing to do, and made the x86-64 compat mode work identically to native 32-bit mode (and thus fixing the problem with a 32-bit systemd install on a 64-bit kernel), it turns out that the automount binaries had workarounds for this compat problem. Now, the workarounds are disgusting: doing an "uname()" to find out the architecture of the kernel, and then comparing it for the 64-bit cases and fixing up the size of the read() in automount for those. And they were confused: it's not actually a generic 64-bit issue at all, it's very much tied to just x86-64, which has different alignment for an 'u64' in 64-bit mode than in 32-bit mode. But the end result is that fixing the compat layer actually breaks the case of a 32-bit automount on a x86-64 kernel. There are various approaches to fix this (including just doing a "strcmp()" on current->comm and comparing it to "automount"), but I think that I will do the one that teaches pipes about a special "packet mode", which will allow user space to not have to care too deeply about the padding at the end of the autofs packet. That change will make the compat workaround unnecessary, so let's revert it first, and get automount working again in compat mode. The packetized pipes will then fix autofs for systemd. Reported-and-requested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12autofs: work around unhappy compat problem on x86-64Ian Kent4-3/+23
commit a32744d4abae24572eff7269bc17895c41bd0085 upstream. When the autofs protocol version 5 packet type was added in commit 5c0a32fc2cd0 ("autofs4: add new packet type for v5 communications"), it obvously tried quite hard to be word-size agnostic, and uses explicitly sized fields that are all correctly aligned. However, with the final "char name[NAME_MAX+1]" array at the end, the actual size of the structure ends up being not very well defined: because the struct isn't marked 'packed', doing a "sizeof()" on it will align the size of the struct up to the biggest alignment of the members it has. And despite all the members being the same, the alignment of them is different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round number (256), the name[] array starts out a 4-byte aligned. End result: the "packed" size of the structure is 300 bytes: 4-byte, but not 8-byte aligned. As a result, despite all the fields being in the same place on all architectures, sizeof() will round up that size to 304 bytes on architectures that have 8-byte alignment for u64. Note that this is *not* a problem for 32-bit compat mode on POWER, since there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit and 64-bit alignment is different for 64-bit entities, and as a result the structure that has exactly the same layout has different sizes. So on x86-64, but no other architecture, we will just subtract 4 from the size of the structure when running in a compat task. That way we will write the properly sized packet that user mode expects. Not pretty. Sadly, this very subtle, and unnecessary, size difference has been encoded in user space that wants to read packets of *exactly* the right size, and will refuse to touch anything else. Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2011-05-30autofs4: bogus dentry_unhash() added in ->unlink()Al Viro1-2/+0
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26vfs: push dentry_unhash on rmdir into file systemsSage Weil1-0/+2
Only a few file systems need this. Start by pushing it down into each fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs basis. This does not change behavior for any in-tree file systems. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-31Fix common misspellingsLucas De Marchi1-1/+1
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-24autofs4: Do not potentially dereference NULL pointer returned by fget() in ↵Jesper Juhl1-0/+4
autofs_dev_ioctl_setpipefd() In fs/autofs4/dev-ioctl.c::autofs_dev_ioctl_setpipefd() we call fget(), which may return NULL, but we do not explicitly test for that NULL return so we may end up dereferencing a NULL pointer - bad. When I originally submitted this patch I had chosen EBUSY as the return value to use if this happens. Ian Kent was kind enough to explain why that would most likely be wrong and why EBADF should most likely be used instead. This version of the patch uses EBADF. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - remove autofs4_lockIan Kent4-31/+20
The autofs4_lock introduced by the rcu-walk changes has unnecessarily broad scope. The locking is better handled by the per-autofs super block lookup_lock. Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - fix d_manage() return on rcu-walkIan Kent1-0/+2
The daemon never needs to block and, in the rcu-walk case an error return isn't used, so always return zero. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - fix autofs4_expire_indirect() traversalIan Kent1-1/+51
The vfs-scale changes changed the traversal used in autofs4_expire_indirect() from a list to a depth first tree traversal which isn't right. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - fix dentry leak in autofs4_expire_direct()Ian Kent1-4/+3
There is a missing dput() when returning from autofs4_expire_direct() when we see that the dentry is already a pending mount. Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24autofs4 - reinstate last used update on accessIan Kent2-34/+14
When direct (and offset) mounts were introduced the the last used timeout could no longer be updated in ->d_revalidate(). This is because covered direct mounts would be followed over without calling the autofs file system. As a result the definition of the busyness check for all entries was changed to be "actually busy" being an open file or working directory within the automount. But now we have a call back in the follow so the last used update on any access can be re-instated. This requires DCACHE_MANAGE_TRANSIT to always be set. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-18lose 'mounting_here' argument in ->d_manage()Al Viro1-3/+3
it's always false... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4: clean ->d_release() and autofs4_free_ino() upAl Viro3-19/+16
The latter is called only when both ino and dentry are about to be freed, so cleaning ->d_fsdata and ->dentry is pointless. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4: split autofs4_init_ino()Al Viro3-26/+15
split init_ino into new_ino and clean_ino; the former is what used to be init_ino(NULL, sbi), the latter is for cases where we passed non-NULL ino. Lose unused arguments. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4: mkdir and symlink always get a dentry that had passed lookupAl Viro1-18/+10
... so ->d_fsdata will have been set up before we get there Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4: autofs4_get_inode() doesn't need autofs_info * argument anymoreAl Viro3-7/+5
Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4: kill ->size in autofs_infoAl Viro3-6/+3
It's used only to pass the length of symlink body to autofs4_get_inode() in autofs4_dir_symlink(). We can bloody well set inode->i_size in autofs4_dir_symlink() directly and be done with that. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4: pass mode to autofs4_get_inode() explicitlyAl Viro3-16/+15
In all cases we'd set inf->mode to know value just before passing it to autofs4_get_inode(). That kills the need to store it in autofs_info and pass it to autofs_init_ino() Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4: autofs4_mkroot() is not different from autofs4_init_ino()Al Viro1-12/+1
Kill it. Mind you, it's been an obfuscated call of autofs4_init_ino() ever since 2.3.99pre6-4... Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4: keep symlink body in inode->i_privateAl Viro4-28/+9
gets rid of all ->free()/->u.symlink machinery in autofs; we simply keep symlink bodies in inode->i_private and free them in ->evict_inode(). Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4 - fix debug print in autofs4_lookup()Ian Kent1-1/+2
oz_mode isn't defined any more, use autofs4_oz_mode(sbi) instead. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18autofs4 - fix get_next_positive_dentry()Ian Kent1-2/+2
The initialization condition in fs/autofs4/expire.c:get_next_positive_dentry() appears to be incorrect. If prev == NULL I believe that root should be returned. Further down, at the current dentry check for it being simple_positive() it looks like the d_lock for dentry p should be dropped instead of dentry ret, otherwise when p is assinged to ret we end up with no lock on p and a lost lock on ret, which leads to a deadlock. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16autofs4: Merge the remaining dentry ops tablesDavid Howells3-20/+4
Merge the remaining autofs4 dentry ops tables. It doesn't matter if d_automount and d_manage are present on something that's not mountable or holdable as these ops are only used if the appropriate flags are set in dentry->d_flags. [AV] switch to ->s_d_op, since now _everything_ on autofs4 is using the same dentry_operations. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16Allow d_manage() to be used in RCU-walk modeDavid Howells1-2/+6
Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call d_manage() directly. d_manage() needs a parameter to indicate that it is in RCU-walk mode as it isn't allowed to sleep if in that mode (but should return -ECHILD instead). autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon accesses it and otherwise request dropping back to ref-walk mode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16autofs4: Add v4 pseudo direct mount supportIan Kent1-0/+58
Version 4 of autofs provides a pseudo direct mount implementation that relies on directories at the leaves of a directory tree under an indirect mount to trigger mounts. This patch adds support for that functionality. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16autofs4: Fix wait validationIan Kent1-1/+16
It is possible for the check in wait.c:validate_request() to return an incorrect result if the dentry that was mounted upon has changed during the callback. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16autofs4: Clean up autofs4_free_ino()Ian Kent2-22/+0
When this function is called the local reference count does't need to be updated since the dentry is going away and dput definitely must not be called here. Also the autofs info struct field inode isn't used so remove it. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16autofs4: Clean up dentry operationsIan Kent3-29/+26
There are now two distinct dentry operations uses. One for dentrys that trigger mounts and one for dentrys that do not. Rationalize the use of these dentry operations and rename them to reflect their function. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16autofs4: Clean up inode operationsIan Kent3-21/+1
Since the use of ->follow_link() has been eliminated there is no need to separate the indirect and direct inode operations. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16autofs4: Remove unused codeIan Kent2-250/+0
Remove code that is not used due to the use of ->d_automount() and ->d_manage(). Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16autofs4: Add d_manage() dentry operationIan Kent4-40/+159
This patch required a previous patch to add the ->d_automount() dentry operation. Add a function to use the newly defined ->d_manage() dentry operation for blocking during mount and expire. Whether the VFS calls the dentry operations d_automount() and d_manage() is controled by the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags. autofs uses the d_automount() operation to callback to user space to request mount operations and the d_manage() operation to block walks into mounts that are under construction or destruction. In order to prevent these functions from being called unnecessarily the DMANAGED_* flags are cleared for cases which would cause this. In the common case the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags are both set for dentrys waiting to be mounted. The DMANAGED_TRANSIT flag is cleared upon successful mount request completion and set during expire runs, both during the dentry expire check, and if selected for expire, is left set until a subsequent successful mount request completes. The exception to this is the so-called rootless multi-mount which has no actual mount at its base. In this case the DMANAGED_AUTOMOUNT flag is cleared upon successful mount request completion as well and set again after a successful expire. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16autofs4: Add d_automount() dentry operationIan Kent4-112/+189
Add a function to use the newly defined ->d_automount() dentry operation for triggering mounts instead of doing the user space callback in ->lookup() and ->d_revalidate(). Note, to be useful the subsequent patch to add the ->d_manage() dentry operation is also needed so the discussion of functionality is deferred to that patch. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16Add a dentry op to allow processes to be held during pathwalk transitDavid Howells4-21/+7
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (*d_manage)(struct path *path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-07fs: rcu-walk aware d_revalidate methodNick Piggin1-3/+10
Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache reduce branches in lookup pathNick Piggin2-6/+6
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache remove d_mountedNick Piggin1-2/+11
Rather than keep a d_mounted count in the dentry, set a dentry flag instead. The flag can be cleared by checking the hash table to see if there are any mounts left, which is not time critical because it is performed at detach time. The mounted state of a dentry is only used to speculatively take a look in the mount hash table if it is set -- before following the mount, vfsmount lock is taken and mount re-checked without races. This saves 4 bytes on 32-bit, nothing on 64-bit but it does provide a hole I might use later (and some configs have larger than 32-bit spinlocks which might make use of the hole). Autofs4 conversion and changelog by Ian Kent <raven@themaw.net>: In autofs4, when expring direct (or offset) mounts we need to ensure that we block user path walks into the autofs mount, which is covered by another mount. To do this we clear the mounted status so that follows stop before walking into the mount and are essentially blocked until the expire is completed. The automount daemon still finds the correct dentry for the umount due to the follow mount logic in fs/autofs4/root.c:autofs4_follow_link(), which is set as an inode operation for direct and offset mounts only and is called following the lookup that stopped at the covered mount. At the end of the expire the covering mount probably has gone away so the mounted status need not be restored. But we need to check this and only restore the mounted status if the expire failed. XXX: autofs may not work right if we have other mounts go over the top of it? Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache remove dcache_lockNick Piggin4-29/+35
dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: Use rename lock and RCU for multi-step operationsNick Piggin1-3/+15
The remaining usages for dcache_lock is to allow atomic, multi-step read-side operations over the directory tree by excluding modifications to the tree. Also, to walk in the leaf->root direction in the tree where we don't have a natural d_lock ordering. This could be accomplished by taking every d_lock, but this would mean a huge number of locks and actually gets very tricky. Solve this instead by using the rename seqlock for multi-step read-side operations, retry in case of a rename so we don't walk up the wrong parent. Concurrent dentry insertions are not serialised against. Concurrent deletes are tricky when walking up the directory: our parent might have been deleted when dropping locks so also need to check and retry for that. We can also use the rename lock in cases where livelock is a worry (and it is introduced in subsequent patch). Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale subdirsNick Piggin3-69/+87
Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale d_unhashedNick Piggin2-18/+16
Protect d_unhashed(dentry) condition with d_lock. This means keeping DCACHE_UNHASHED bit in synch with hash manipulations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache scale dentry refcountNick Piggin2-7/+7
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2010-12-07autofs4 - remove ioctl mutex (bz23142)Ian Kent1-11/+1
With the recent changes to remove the BKL a mutex was added to the ioctl entry point for calls to the old ioctl interface. This mutex needs to be removed because of the need for the expire ioctl to call back to the daemon to perform a umount and receive a completion status (via another ioctl). This should be fine as the new ioctl interface uses much of the same code and it has been used without a mutex for around a year without issue, as was the original intention. Ref: Bugzilla bug 23142 Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-29convert get_sb_nodev() usersAl Viro1-4/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-26fs: do not assign default i_ino in new_inodeChristoph Hellwig1-0/+1
Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-22Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bklLinus Torvalds1-0/+1
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek
2010-10-15llseek: automatically add .llseek fopArnd Bergmann1-0/+1
All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
2010-10-05autofs4: Only declare function when CONFIG_COMPAT is definedFelipe Contreras1-0/+2
The patch solves the following warnings message when CONFIG_COMPAT is not defined: fs/autofs4/root.c:31: warning: ‘autofs4_root_compat_ioctl’ declared ‘static’ but never defined Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Cc: Ian Kent <raven@themaw.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-10-04BKL: Remove BKL from autofs4Arnd Bergmann1-5/+7
autofs4 uses the BKL only to guard its ioctl operations. This can be trivially converted to use a mutex, as we have done with most device drivers before. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ian Kent <raven@themaw.net>