BMC/Intel-BMC/linux.git - Intel OpenBMC Linux kernel source tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2012-03-30	Merge branch 'x86-x32-for-linus' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x32 support for x86-64 from Ingo Molnar: "This tree introduces the X32 binary format and execution mode for x86: 32-bit data space binaries using 64-bit instructions and 64-bit kernel syscalls. This allows applications whose working set fits into a 32 bits address space to make use of 64-bit instructions while using a 32-bit address space with shorter pointers, more compressed data structures, etc." Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c} * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) x32: Fix alignment fail in struct compat_siginfo x32: Fix stupid ia32/x32 inversion in the siginfo format x32: Add ptrace for x32 x32: Switch to a 64-bit clock_t x32: Provide separate is_ia32_task() and is_x32_task() predicates x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls x86/x32: Fix the binutils auto-detect x32: Warn and disable rather than error if binutils too old x32: Only clear TIF_X32 flag once x32: Make sure TS_COMPAT is cleared for x32 tasks fs: Remove missed ->fds_bits from cessation use of fd_set structs internally fs: Fix close_on_exec pointer in alloc_fdtable x32: Drop non-__vdso weak symbols from the x32 VDSO x32: Fix coding style violations in the x32 VDSO code x32: Add x32 VDSO support x32: Allow x32 to be configured x32: If configured, add x32 system calls to system call tables x32: Handle process creation x32: Signal-related system calls x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h> ...
2012-03-21	autofs: set things up before registering fs type	Al Viro	1	-3/+3
	it's not a serious race, but we really want misc device before anybody gets to mount this sucker. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-21	switch open-coded instances of d_make_root() to new helper	Al Viro	1	-8/+2
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-02-26	autofs: work around unhappy compat problem on x86-64	Ian Kent	4	-3/+23
	When the autofs protocol version 5 packet type was added in commit 5c0a32fc2cd0 ("autofs4: add new packet type for v5 communications"), it obvously tried quite hard to be word-size agnostic, and uses explicitly sized fields that are all correctly aligned. However, with the final "char name[NAME_MAX+1]" array at the end, the actual size of the structure ends up being not very well defined: because the struct isn't marked 'packed', doing a "sizeof()" on it will align the size of the struct up to the biggest alignment of the members it has. And despite all the members being the same, the alignment of them is different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round number (256), the name[] array starts out a 4-byte aligned. End result: the "packed" size of the structure is 300 bytes: 4-byte, but not 8-byte aligned. As a result, despite all the fields being in the same place on all architectures, sizeof() will round up that size to 304 bytes on architectures that have 8-byte alignment for u64. Note that this is not a problem for 32-bit compat mode on POWER, since there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit and 64-bit alignment is different for 64-bit entities, and as a result the structure that has exactly the same layout has different sizes. So on x86-64, but no other architecture, we will just subtract 4 from the size of the structure when running in a compat task. That way we will write the properly sized packet that user mode expects. Not pretty. Sadly, this very subtle, and unnecessary, size difference has been encoded in user space that wants to read packets of exactly the right size, and will refuse to touch anything else. Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-19	Wrap accesses to the fd_sets in struct fdtable	David Howells	1	-1/+1
	Wrap accesses to the fd_sets in struct fdtable (for recording open files and close-on-exec flags) so that we can move away from using fd_sets since we abuse the fd_set structs by not allocating the full-sized structure under normal circumstances and by non-core code looking at the internals of the fd_sets. The first abuse means that use of FD_ZERO() on these fd_sets is not permitted, since that cannot be told about their abnormal lengths. This introduces six wrapper functions for setting, clearing and testing close-on-exec flags and fd-is-open flags: void __set_close_on_exec(int fd, struct fdtable fdt); void __clear_close_on_exec(int fd, struct fdtable fdt); bool close_on_exec(int fd, const struct fdtable fdt); void __set_open_fd(int fd, struct fdtable fdt); void __clear_open_fd(int fd, struct fdtable fdt); bool fd_is_open(int fd, const struct fdtable fdt); Note that I've prepended '__' to the names of the set/clear functions because they require the caller to hold a lock to use them. Note also that I haven't added wrappers for looking behind the scenes at the the array. Possibly that should exist too. Signed-off-by: David Howells <dhowells@redhat.com> Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk>
2012-02-14	autofs4 - fix lockdep splat in autofs	Steven Rostedt	1	-0/+2
	When recursing down the locks when traversing a tree/list in get_next_positive_dentry() or get_next_positive_subdir() a lock can change from being nested to being a parent which breaks lockdep. This patch tells lockdep about what we did. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-13	autofs4 - fix deal with autofs4_write races	Ian Kent	1	-1/+1
	I don't know how I missed this obvious mistake when I reviewed Als' patches, sorry. [ Quoting Al: Grr... Note to self: do git status and git stash show -p before git push. Nothing like "WTF? I'd fixed that braino" feeling ;-/ Al sent the same patch - it got broken in commit d668dc56631d: "autofs4: deal with autofs4_write/autofs4_write races". ] Reported-and-tested-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-11	autofs4: deal with autofs4_write/autofs4_write races	Al Viro	3	-4/+7
	Just serialize the actual writing of packets into pipe on a new mutex, independent from everything else in the locking hierarchy. As soon as something has started feeding a piece of packet into the pipe to daemon, we want everything else about to try the same to wait until we are done. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-11	autofs4: catatonic_mode vs. notify_daemon race	Al Viro	1	-11/+14
	we need to hold ->wq_mutex while we are forming the packet to send, lest we have autofs4_catatonic_mode() setting wq->name.name to NULL just as autofs4_notify_daemon() decides to memcpy() from it... We do have check for catatonic mode immediately after that (under ->wq_mutex, as it ought to be) and packet won't be actually sent, but it'll be too late for us if we oops on that memcpy() from NULL... Fix is obvious - just extend the area covered by ->wq_mutex over that switch and check whether it's catatonic before doing anything else. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-11	autofs4: autofs4_wait() vs. autofs4_catatonic_mode() race	Al Viro	1	-1/+7
	We need to recheck ->catatonic after autofs4_wait() got ->wq_mutex for good, or we might end up with wq inserted into queue after autofs4_catatonic_mode() had done its thing. It will stick there forever, since there won't be anything to clear its ->name.name. A bit of a complication: validate_request() drops and regains ->wq_mutex. It actually ends up the most convenient place to stick the check into... Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-07	vfs: switch ->show_options() to struct dentry *	Al Viro	1	-3/+3
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-07	vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb	Al Viro	1	-5/+5
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04	autofs4: propagate umode_t	Al Viro	2	-2/+2
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04	switch vfs_mkdir() and ->mkdir() to umode_t	Al Viro	1	-2/+2
	vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-02	filesystems: add set_nlink()	Miklos Szeredi	1	-1/+1
	Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-08-08	autofs4: fix debug printk warning uncovered by cleanup	Linus Torvalds	1	-1/+1
	The previous comit made the autofs4 debug printouts check types against the printout format, and uncovered this bug: fs/autofs4/waitq.c:106:2: warning: format ‘%08lx’ expects type ‘long unsigned int’, but argument 4 has type ‘autofs_wqt_t’ which is due to the insane type for wait_queue_token. That thing should be some fixed well-defined size (preferably just 'unsigned int' or 'u32') but for unexplained reasons it is randomly either 'unsigned long' or 'unsigned int' depending on the architecture. For now, cast it to 'unsigned long' for printing, the way we do elsewhere. Somebody else can try to explain the typedef mess. (There's a reason we don't support excessive use of typedefs in the kernel: it's usually just a good way of confusing yourself). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-08	autofs4: clean up uaotfs use of debug/info/warning printouts	Linus Torvalds	1	-18/+8
	Use 'pr_debug()' for DPRINTK, which will do the proper type checking on the arguments (without generating code) even when DEBUG isn't #defined. Also, use the standard __VA_ARGS__ for the macros, and stop the pointless abuse of 'do { xyz } while (0)' when the macro is already a perfectly well-formed single statement. Reported-by: David Howells <dhowells@redhat.com> Suggested-by: Joe Perches <joe@perches.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-30	autofs4: bogus dentry_unhash() added in ->unlink()	Al Viro	1	-2/+0
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-26	vfs: push dentry_unhash on rmdir into file systems	Sage Weil	1	-0/+2
	Only a few file systems need this. Start by pushing it down into each fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs basis. This does not change behavior for any in-tree file systems. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-31	Fix common misspellings	Lucas De Marchi	1	-1/+1
	Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-24	autofs4: Do not potentially dereference NULL pointer returned by fget() in ↵	Jesper Juhl	1	-0/+4
	autofs_dev_ioctl_setpipefd() In fs/autofs4/dev-ioctl.c::autofs_dev_ioctl_setpipefd() we call fget(), which may return NULL, but we do not explicitly test for that NULL return so we may end up dereferencing a NULL pointer - bad. When I originally submitted this patch I had chosen EBUSY as the return value to use if this happens. Ian Kent was kind enough to explain why that would most likely be wrong and why EBADF should most likely be used instead. This version of the patch uses EBADF. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24	autofs4 - remove autofs4_lock	Ian Kent	4	-31/+20
	The autofs4_lock introduced by the rcu-walk changes has unnecessarily broad scope. The locking is better handled by the per-autofs super block lookup_lock. Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24	autofs4 - fix d_manage() return on rcu-walk	Ian Kent	1	-0/+2
	The daemon never needs to block and, in the rcu-walk case an error return isn't used, so always return zero. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24	autofs4 - fix autofs4_expire_indirect() traversal	Ian Kent	1	-1/+51
	The vfs-scale changes changed the traversal used in autofs4_expire_indirect() from a list to a depth first tree traversal which isn't right. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24	autofs4 - fix dentry leak in autofs4_expire_direct()	Ian Kent	1	-4/+3
	There is a missing dput() when returning from autofs4_expire_direct() when we see that the dentry is already a pending mount. Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24	autofs4 - reinstate last used update on access	Ian Kent	2	-34/+14
	When direct (and offset) mounts were introduced the the last used timeout could no longer be updated in ->d_revalidate(). This is because covered direct mounts would be followed over without calling the autofs file system. As a result the definition of the busyness check for all entries was changed to be "actually busy" being an open file or working directory within the automount. But now we have a call back in the follow so the last used update on any access can be re-instated. This requires DCACHE_MANAGE_TRANSIT to always be set. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-18	lose 'mounting_here' argument in ->d_manage()	Al Viro	1	-3/+3
	it's always false... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4: clean ->d_release() and autofs4_free_ino() up	Al Viro	3	-19/+16
	The latter is called only when both ino and dentry are about to be freed, so cleaning ->d_fsdata and ->dentry is pointless. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4: split autofs4_init_ino()	Al Viro	3	-26/+15
	split init_ino into new_ino and clean_ino; the former is what used to be init_ino(NULL, sbi), the latter is for cases where we passed non-NULL ino. Lose unused arguments. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4: mkdir and symlink always get a dentry that had passed lookup	Al Viro	1	-18/+10
	... so ->d_fsdata will have been set up before we get there Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4: autofs4_get_inode() doesn't need autofs_info * argument anymore	Al Viro	3	-7/+5
	Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4: kill ->size in autofs_info	Al Viro	3	-6/+3
	It's used only to pass the length of symlink body to autofs4_get_inode() in autofs4_dir_symlink(). We can bloody well set inode->i_size in autofs4_dir_symlink() directly and be done with that. Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4: pass mode to autofs4_get_inode() explicitly	Al Viro	3	-16/+15
	In all cases we'd set inf->mode to know value just before passing it to autofs4_get_inode(). That kills the need to store it in autofs_info and pass it to autofs_init_ino() Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4: autofs4_mkroot() is not different from autofs4_init_ino()	Al Viro	1	-12/+1
	Kill it. Mind you, it's been an obfuscated call of autofs4_init_ino() ever since 2.3.99pre6-4... Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4: keep symlink body in inode->i_private	Al Viro	4	-28/+9
	gets rid of all ->free()/->u.symlink machinery in autofs; we simply keep symlink bodies in inode->i_private and free them in ->evict_inode(). Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4 - fix debug print in autofs4_lookup()	Ian Kent	1	-1/+2
	oz_mode isn't defined any more, use autofs4_oz_mode(sbi) instead. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18	autofs4 - fix get_next_positive_dentry()	Ian Kent	1	-2/+2
	The initialization condition in fs/autofs4/expire.c:get_next_positive_dentry() appears to be incorrect. If prev == NULL I believe that root should be returned. Further down, at the current dentry check for it being simple_positive() it looks like the d_lock for dentry p should be dropped instead of dentry ret, otherwise when p is assinged to ret we end up with no lock on p and a lost lock on ret, which leads to a deadlock. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	autofs4: Merge the remaining dentry ops tables	David Howells	3	-20/+4
	Merge the remaining autofs4 dentry ops tables. It doesn't matter if d_automount and d_manage are present on something that's not mountable or holdable as these ops are only used if the appropriate flags are set in dentry->d_flags. [AV] switch to ->s_d_op, since now _everything_ on autofs4 is using the same dentry_operations. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	Allow d_manage() to be used in RCU-walk mode	David Howells	1	-2/+6
	Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well as when it is in Ref-walk mode. This permits __follow_mount_rcu() to call d_manage() directly. d_manage() needs a parameter to indicate that it is in RCU-walk mode as it isn't allowed to sleep if in that mode (but should return -ECHILD instead). autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon accesses it and otherwise request dropping back to ref-walk mode. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	autofs4: Add v4 pseudo direct mount support	Ian Kent	1	-0/+58
	Version 4 of autofs provides a pseudo direct mount implementation that relies on directories at the leaves of a directory tree under an indirect mount to trigger mounts. This patch adds support for that functionality. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	autofs4: Fix wait validation	Ian Kent	1	-1/+16
	It is possible for the check in wait.c:validate_request() to return an incorrect result if the dentry that was mounted upon has changed during the callback. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	autofs4: Clean up autofs4_free_ino()	Ian Kent	2	-22/+0
	When this function is called the local reference count does't need to be updated since the dentry is going away and dput definitely must not be called here. Also the autofs info struct field inode isn't used so remove it. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	autofs4: Clean up dentry operations	Ian Kent	3	-29/+26
	There are now two distinct dentry operations uses. One for dentrys that trigger mounts and one for dentrys that do not. Rationalize the use of these dentry operations and rename them to reflect their function. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	autofs4: Clean up inode operations	Ian Kent	3	-21/+1
	Since the use of ->follow_link() has been eliminated there is no need to separate the indirect and direct inode operations. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	autofs4: Remove unused code	Ian Kent	2	-250/+0
	Remove code that is not used due to the use of ->d_automount() and ->d_manage(). Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	autofs4: Add d_manage() dentry operation	Ian Kent	4	-40/+159
	This patch required a previous patch to add the ->d_automount() dentry operation. Add a function to use the newly defined ->d_manage() dentry operation for blocking during mount and expire. Whether the VFS calls the dentry operations d_automount() and d_manage() is controled by the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags. autofs uses the d_automount() operation to callback to user space to request mount operations and the d_manage() operation to block walks into mounts that are under construction or destruction. In order to prevent these functions from being called unnecessarily the DMANAGED_* flags are cleared for cases which would cause this. In the common case the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags are both set for dentrys waiting to be mounted. The DMANAGED_TRANSIT flag is cleared upon successful mount request completion and set during expire runs, both during the dentry expire check, and if selected for expire, is left set until a subsequent successful mount request completes. The exception to this is the so-called rootless multi-mount which has no actual mount at its base. In this case the DMANAGED_AUTOMOUNT flag is cleared upon successful mount request completion as well and set again after a successful expire. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	autofs4: Add d_automount() dentry operation	Ian Kent	4	-112/+189
	Add a function to use the newly defined ->d_automount() dentry operation for triggering mounts instead of doing the user space callback in ->lookup() and ->d_revalidate(). Note, to be useful the subsequent patch to add the ->d_manage() dentry operation is also needed so the discussion of functionality is deferred to that patch. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16	Add a dentry op to allow processes to be held during pathwalk transit	David Howells	4	-21/+7
	Add a dentry op (d_manage) to permit a filesystem to hold a process and make it sleep when it tries to transit away from one of that filesystem's directories during a pathwalk. The operation is keyed off a new dentry flag (DCACHE_MANAGE_TRANSIT). The filesystem is allowed to be selective about which processes it holds and which it permits to continue on or prohibits from transiting from each flagged directory. This will allow autofs to hold up client processes whilst letting its userspace daemon through to maintain the directory or the stuff behind it or mounted upon it. The ->d_manage() dentry operation: int (d_manage)(struct path path, bool mounting_here); takes a pointer to the directory about to be transited away from and a flag indicating whether the transit is undertaken by do_add_mount() or do_move_mount() skipping through a pile of filesystems mounted on a mountpoint. It should return 0 if successful and to let the process continue on its way; -EISDIR to prohibit the caller from skipping to overmounted filesystems or automounting, and to use this directory; or some other error code to return to the user. ->d_manage() is called with namespace_sem writelocked if mounting_here is true and no other locks held, so it may sleep. However, if mounting_here is true, it may not initiate or wait for a mount or unmount upon the parameter directory, even if the act is actually performed by userspace. Within fs/namei.c, follow_managed() is extended to check with d_manage() first on each managed directory, before transiting away from it or attempting to automount upon it. follow_down() is renamed follow_down_one() and should only be used where the filesystem deliberately intends to avoid management steps (e.g. autofs). A new follow_down() is added that incorporates the loop done by all other callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS and CIFS do use it, their use is removed by converting them to use d_automount()). The new follow_down() calls d_manage() as appropriate. It also takes an extra parameter to indicate if it is being called from mount code (with namespace_sem writelocked) which it passes to d_manage(). follow_down() ignores automount points so that it can be used to mount on them. __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to sleep. It would be possible to enter d_manage() in rcu-walk mode too, and have that determine whether to abort or not itself. That would allow the autofs daemon to continue on in rcu-walk mode. Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't required as every tranist from that directory will cause d_manage() to be invoked. It can always be set again when necessary. ========================== WHAT THIS MEANS FOR AUTOFS ========================== Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to trigger the automounting of indirect mounts, and both of these can be called with i_mutex held. autofs knows that the i_mutex will be held by the caller in lookup(), and so can drop it before invoking the daemon - but this isn't so for d_revalidate(), since the lock is only held on _some_ of the code paths that call it. This means that autofs can't risk dropping i_mutex from its d_revalidate() function before it calls the daemon. The bug could manifest itself as, for example, a process that's trying to validate an automount dentry that gets made to wait because that dentry is expired and needs cleaning up: mkdir S ffffffff8014e05a 0 32580 24956 Call Trace: [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f [<ffffffff80057a2f>] lookup_create+0x46/0x80 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4 versus the automount daemon which wants to remove that dentry, but can't because the normal process is holding the i_mutex lock: automount D ffffffff8014e05a 0 32581 1 32561 Call Trace: [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14 [<ffffffff800e6d55>] do_rmdir+0x77/0xde [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 which means that the system is deadlocked. This patch allows autofs to hold up normal processes whilst the daemon goes ahead and does things to the dentry tree behind the automouter point without risking a deadlock as almost no locks are held in d_manage() and none in d_automount(). Signed-off-by: David Howells <dhowells@redhat.com> Was-Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-07	fs: rcu-walk aware d_revalidate method	Nick Piggin	1	-3/+10
	Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: dcache reduce branches in lookup path	Nick Piggin	2	-6/+6
	Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/\([^\t ]\)->d_op = \(.\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]\)\.d_op = \(.\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>