summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2012-01-26Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds1-1/+2
Quoth Ben Myers: "Please pull in the following bugfix for xfs. We forgot to drop a lock on error in xfs_readlink. It hasn't been through -next yet, but there is no -next tree tomorrow. The fix is clear so I'm sending this request today." * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()
2012-01-26eCryptfs: move misleading function commentsLi Wang1-4/+4
The data encryption was moved from ecryptfs_write_end into ecryptfs_writepage, this patch moves the corresponding function comments to be consistent with the modification. Signed-off-by: Li Wang <liwang@nudt.edu.cn> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-26Merge branch 'for-linus' of ↵Linus Torvalds6-194/+154
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Says Tyler: "Tim's logging message update will be really helpful to users when they're trying to locate a problematic file in the lower filesystem with filename encryption enabled. You'll recognize the fix from Li, as you commented on that. You should also be familiar with my setattr/truncate improvements, since you were the one that pointed them out to us (thanks again!). Andrew noted the /dev/ecryptfs write count sanitization needed to be improved, so I've got a fix in there for that along with some other less important cleanups of the /dev/ecryptfs read/write code." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: Fix oops when printing debug info in extent crypto functions eCryptfs: Remove unused ecryptfs_read() eCryptfs: Check inode changes in setattr eCryptfs: Make truncate path killable eCryptfs: Infinite loop due to overflow in ecryptfs_write() eCryptfs: Replace miscdev read/write magic numbers eCryptfs: Report errors in writes to /dev/ecryptfs eCryptfs: Sanitize write counts of /dev/ecryptfs ecryptfs: Remove unnecessary variable initialization ecryptfs: Improve metadata read failure logging MAINTAINERS: Update eCryptfs maintainer address
2012-01-26eCryptfs: Fix oops when printing debug info in extent crypto functionsTyler Hicks1-40/+0
If pages passed to the eCryptfs extent-based crypto functions are not mapped and the module parameter ecryptfs_verbosity=1 was specified at loading time, a NULL pointer dereference will occur. Note that this wouldn't happen on a production system, as you wouldn't pass ecryptfs_verbosity=1 on a production system. It leaks private information to the system logs and is for debugging only. The debugging info printed in these messages is no longer very useful and rather than doing a kmap() in these debugging paths, it will be better to simply remove the debugging paths completely. https://launchpad.net/bugs/913651 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Daniel DeFreez Cc: <stable@vger.kernel.org>
2012-01-26eCryptfs: Remove unused ecryptfs_read()Tyler Hicks1-73/+0
ecryptfs_read() has been ifdef'ed out for years now and it was apparently unused before then. It is time to get rid of it for good. Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-01-26eCryptfs: Check inode changes in setattrTyler Hicks1-12/+36
Most filesystems call inode_change_ok() very early in ->setattr(), but eCryptfs didn't call it at all. It allowed the lower filesystem to make the call in its ->setattr() function. Then, eCryptfs would copy the appropriate inode attributes from the lower inode to the eCryptfs inode. This patch changes that and actually calls inode_change_ok() on the eCryptfs inode, fairly early in ecryptfs_setattr(). Ideally, the call would happen earlier in ecryptfs_setattr(), but there are some possible inode initialization steps that must happen first. Since the call was already being made on the lower inode, the change in functionality should be minimal, except for the case of a file extending truncate call. In that case, inode_newsize_ok() was never being called on the eCryptfs inode. Rather than inode_newsize_ok() catching maximum file size errors early on, eCryptfs would encrypt zeroed pages and write them to the lower filesystem until the lower filesystem's write path caught the error in generic_write_checks(). This patch introduces a new function, called ecryptfs_inode_newsize_ok(), which checks if the new lower file size is within the appropriate limits when the truncate operation will be growing the lower file. In summary this change prevents eCryptfs truncate operations (and the resulting page encryptions), which would exceed the lower filesystem limits or FSIZE rlimits, from ever starting. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Li Wang <liwang@nudt.edu.cn> Cc: <stable@vger.kernel.org>
2012-01-26eCryptfs: Make truncate path killableTyler Hicks1-5/+14
ecryptfs_write() handles the truncation of eCryptfs inodes. It grabs a page, zeroes out the appropriate portions, and then encrypts the page before writing it to the lower filesystem. It was unkillable and due to the lack of sparse file support could result in tying up a large portion of system resources, while encrypting pages of zeros, with no way for the truncate operation to be stopped from userspace. This patch adds the ability for ecryptfs_write() to detect a pending fatal signal and return as gracefully as possible. The intent is to leave the lower file in a useable state, while still allowing a user to break out of the encryption loop. If a pending fatal signal is detected, the eCryptfs inode size is updated to reflect the modified inode size and then -EINTR is returned. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Cc: <stable@vger.kernel.org>
2012-01-26eCryptfs: Infinite loop due to overflow in ecryptfs_write()Li Wang1-2/+2
ecryptfs_write() can enter an infinite loop when truncating a file to a size larger than 4G. This only happens on architectures where size_t is represented by 32 bits. This was caused by a size_t overflow due to it incorrectly being used to store the result of a calculation which uses potentially large values of type loff_t. [tyhicks@canonical.com: rewrite subject and commit message] Signed-off-by: Li Wang <liwang@nudt.edu.cn> Signed-off-by: Yunchuan Wen <wenyunchuan@kylinos.com.cn> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-01-26eCryptfs: Replace miscdev read/write magic numbersTyler Hicks3-41/+55
ecryptfs_miscdev_read() and ecryptfs_miscdev_write() contained many magic numbers for specifying packet header field sizes and offsets. This patch defines those values and replaces the magic values. Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-01-26eCryptfs: Report errors in writes to /dev/ecryptfsTyler Hicks1-11/+13
Errors in writes to /dev/ecryptfs were being incorrectly reported by returning 0 or the value of the original write count. This patch clears up the return code assignment in error paths. Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-01-26eCryptfs: Sanitize write counts of /dev/ecryptfsTyler Hicks1-18/+38
A malicious count value specified when writing to /dev/ecryptfs may result in a a very large kernel memory allocation. This patch peeks at the specified packet payload size, adds that to the size of the packet headers and compares the result with the write count value. The resulting maximum memory allocation size is approximately 532 bytes. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Sasha Levin <levinsasha928@gmail.com> Cc: <stable@vger.kernel.org>
2012-01-26ecryptfs: Remove unnecessary variable initializationTim Gardner1-2/+3
Removes unneeded variable initialization in ecryptfs_read_metadata(). Also adds a small comment to help explain metadata reading logic. [tyhicks@canonical.com: Pulled out of for-stable patch and wrote commit msg] Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-01-26ecryptfs: Improve metadata read failure loggingTim Gardner1-3/+6
Print inode on metadata read failure. The only real way of dealing with metadata read failures is to delete the underlying file system file. Having the inode allows one to 'find . -inum INODE`. [tyhicks@canonical.com: Removed some minor not-for-stable parts] Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-01-25xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()Jan Kara1-1/+2
Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted symplink and bailed out. Fix it by jumping to 'out' instead of doing return. CC: stable@kernel.org CC: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Alex Elder <elder@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-25Merge branch 'for_linus' of ↵Linus Torvalds4-14/+47
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Pass information that quota is stored in system file to userspace ext2: protect inode changes in the SETVERSION and SETFLAGS ioctls jbd: Issue cache flush after checkpointing
2012-01-23Merge branch 'kernel-doc' from Randy DunlapLinus Torvalds1-1/+1
The usual kernel-doc fixups from Randy. Some of them David acked as merged in his tree, this is the random left-overs. * kernel-doc: docbook: fix sched source file names in device-drivers book docbook: change iomap source filename in deviceiobook docbook: don't use serial_core.h in device-drivers book kernel-doc: fix kernel-doc warnings in sched kernel-doc: fix new warnings in cfg80211.h kernel-doc: fix new warning in usb.h kernel-doc: fix new warnings in device.h kernel-doc: fix new warnings in debugfs kernel-doc: fix new warning in regulator core kernel-doc: fix new warnings in pci kernel-doc: fix new warnings in driver-core kernel-doc: fix new warnings in auditsc.c scripts/kernel-doc: fix fatal error caused by cfg80211.h
2012-01-23Merge branch 'akpm'Linus Torvalds1-0/+3
Quoth Andrew: "Random fixes. And a simple new LED driver which I'm trying to sneak in while you're not looking." Sneaking successful. * akpm: score: fix off-by-one index into syscall table mm: fix rss count leakage during migration SHM_UNLOCK: fix Unevictable pages stranded after swap SHM_UNLOCK: fix long unpreemptible section kdump: define KEXEC_NOTE_BYTES arch specific for s390x mm/hugetlb.c: undo change to page mapcount in fault handler mm: memcg: update the correct soft limit tree during migration proc: clear_refs: do not clear reserved pages drivers/video/backlight/l4f00242t03.c: return proper error in l4f00242t03_probe if regulator_get() fails drivers/video/backlight/adp88x0_bl.c: fix bit testing logic kprobes: initialize before using a hlist ipc/mqueue: simplify reading msgqueue limit leds: add led driver for Bachmann's ot200 mm: __count_immobile_pages(): make sure the node is online mm: fix NULL ptr dereference in __count_immobile_pages mm: fix warnings regarding enum migrate_mode
2012-01-23Merge git://git.samba.org/sfrench/cifs-2.6Linus Torvalds13-201/+421
* git://git.samba.org/sfrench/cifs-2.6: CIFS: Rename *UCS* functions to *UTF16* [CIFS] ACL and FSCACHE support no longer EXPERIMENTAL [CIFS] Fix build break with multiuser patch when LANMAN disabled cifs: warn about impending deprecation of legacy MultiuserMount code cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts cifs: sanitize username handling keys: add a "logon" key type cifs: lower default wsize when unix extensions are not used cifs: better instrumentation for coalesce_t2 cifs: integer overflow in parse_dacl() cifs: Fix sparse warning when calling cifs_strtoUCS CIFS: Add descriptions to the brlock cache functions
2012-01-23kernel-doc: fix new warnings in debugfsRandy Dunlap1-1/+1
Fix new kernel-doc warnings: Warning(fs/debugfs/file.c:556): No description found for parameter 'nregs' Warning(fs/debugfs/file.c:556): Excess function parameter 'mregs' description in 'debugfs_print_regs32' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-23proc: clear_refs: do not clear reserved pagesWill Deacon1-0/+3
/proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for pages and corresponding page table entries of the task with PID pid, which includes any special mappings inserted into the page tables in order to provide things like vDSOs and user helper functions. On ARM this causes a problem because the vectors page is mapped as a global mapping and since ec706dab ("ARM: add a vma entry for the user accessible vector page"), a VMA is also inserted into each task for this page to aid unwinding through signals and syscall restarts. Since the vectors page is required for handling faults, clearing the YOUNG bit (and subsequently writing a faulting pte) means that we lose the vectors page *globally* and cannot fault it back in. This results in a system deadlock on the next exception. To see this problem in action, just run: $ echo 1 > /proc/self/clear_refs on an ARM platform (as any user) and watch your system hang. I think this has been the case since 2.6.37 This patch avoids clearing the aforementioned bits for reserved pages, therefore leaving the vectors page intact on ARM. Since reserved pages are not candidates for swap, this change should not have any impact on the usefulness of clear_refs. Signed-off-by: Will Deacon <will.deacon@arm.com> Reported-by: Moussa Ba <moussaba@micron.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by: Nicolas Pitre <nico@linaro.org> Cc: Matt Mackall <mpm@selenic.com> Cc: <stable@vger.kernel.org> [2.6.37+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-20Merge branches 'sched-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds1-0/+2
'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/accounting, proc: Fix /proc/stat interrupts sum * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore x86/kprobes: Fix typo transferred from Intel manual * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits x86, tsc: Fix SMI induced variation in quick_pit_calibrate() x86, opcode: ANDN and Group 17 in x86-opcode-map.txt x86/kconfig: Move the ZONE_DMA entry under a menu x86/UV2: Add accounting for BAU strong nacks x86/UV2: Ack BAU interrupt earlier x86/UV2: Remove stale no-resources test for UV2 BAU x86/UV2: Work around BAU bug x86/UV2: Fix BAU destination timeout initialization x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode x86: Get rid of dubious one-bit signed bitfield
2012-01-20Merge branch 'for-linus' of ↵Linus Torvalds3-40/+27
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: qnx4: don't leak ->BitMap on late failure exits qnx4: reduce the insane nesting in qnx4_checkroot() qnx4: di_fname is an array, for crying out loud... vfs: remove printk from set_nlink() wake up s_wait_unfrozen when ->freeze_fs fails
2012-01-19qnx4: don't leak ->BitMap on late failure exitsAl Viro1-1/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-19qnx4: reduce the insane nesting in qnx4_checkroot()Al Viro1-34/+22
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-19qnx4: di_fname is an array, for crying out loud...Al Viro1-14/+12
(struct qnx4_inode_entry *)(bh->b_data + some_offset)->di_fname is not going to be NULL, TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-19CIFS: Rename *UCS* functions to *UTF16*Steve French8-138/+146
to reflect the unicode encoding used by CIFS protocol. Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Acked-by: Jeff Layton <jlayton@samba.org> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
2012-01-19[CIFS] ACL and FSCACHE support no longer EXPERIMENTALSteve French1-2/+1
CIFS ACL support and FSCACHE support have been in long enough to be no longer considered experimental. Remove obsolete Kconfig dependency. Signed-off-by: Steve French <sfrench@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com>
2012-01-19[CIFS] Fix build break with multiuser patch when LANMAN disabledSteve French1-0/+2
CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2012-01-18cifs: warn about impending deprecation of legacy MultiuserMount codeJeff Layton1-1/+10
We'll allow a grace period of 2 releases (3.3 and 3.4) and then remove the legacy code in 3.5. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-01-18cifs: fetch credentials out of keyring for non-krb5 auth multiuser mountsJeff Layton1-10/+165
Fix up multiuser mounts to set the secType and set the username and password from the key payload in the vol info for non-krb5 auth types. Look for a key of type "secret" with a description of "cifs:a:<server address>" or "cifs:d:<domainname>". If that's found, then scrape the username and password out of the key payload and use that to create a new user session. Finally, don't have the code enforce krb5 auth on multiuser mounts, but do require a kernel with keys support. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-01-18cifs: sanitize username handlingJeff Layton3-13/+27
Currently, it's not very clear whether you're allowed to have a NULL vol->username or ses->user_name. Some places check for it and some don't. Make it clear that a NULL pointer is OK in these fields, and ensure that all the callers check for that. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-01-18cifs: lower default wsize when unix extensions are not usedJeff Layton1-4/+19
We've had some reports of servers (namely, the Solaris in-kernel CIFS server) that don't deal properly with writes that are "too large" even though they set CAP_LARGE_WRITE_ANDX. Change the default to better mirror what windows clients do. Cc: stable@vger.kernel.org Cc: Pavel Shilovsky <piastry@etersoft.ru> Reported-by: Nick Davis <phireph0x@yahoo.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-01-18cifs: better instrumentation for coalesce_t2Jeff Layton1-34/+50
When coalesce_t2 returns an error, have it throw a cFYI message that explains the reason. Also rename some variables to clarify what they represent. Reported-and-Tested-by: Konstantinos Skarlatos <k.skarlatos@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2012-01-18Merge branch 'for-linus' of ↵Linus Torvalds2-19/+14
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits) audit: no leading space in audit_log_d_path prefix audit: treat s_id as an untrusted string audit: fix signedness bug in audit_log_execve_info() audit: comparison on interprocess fields audit: implement all object interfield comparisons audit: allow interfield comparison between gid and ogid audit: complex interfield comparison helper audit: allow interfield comparison in audit rules Kernel: Audit Support For The ARM Platform audit: do not call audit_getname on error audit: only allow tasks to set their loginuid if it is -1 audit: remove task argument to audit_set_loginuid audit: allow audit matching on inode gid audit: allow matching on obj_uid audit: remove audit_finish_fork as it can't be called audit: reject entry,always rules audit: inline audit_free to simplify the look of generic code audit: drop audit_set_macxattr as it doesn't do anything audit: inline checks for not needing to collect aux records audit: drop some potentially inadvisable likely notations ... Use evil merge to fix up grammar mistakes in Kconfig file. Bad speling and horrible grammar (and copious swearing) is to be expected, but let's keep it to commit messages and comments, rather than expose it to users in config help texts or printouts.
2012-01-18Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds18-542/+374
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: cleanup xfs_file_aio_write xfs: always return with the iolock held from xfs_file_aio_write_checks xfs: remove the i_new_size field in struct xfs_inode xfs: remove the i_size field in struct xfs_inode xfs: replace i_pin_wait with a bit waitqueue xfs: replace i_flock with a sleeping bitlock xfs: make i_flags an unsigned long xfs: remove the if_ext_max field in struct xfs_ifork xfs: remove the unused dm_attrs structure xfs: cleanup xfs_iomap_eof_align_last_fsb xfs: remove xfs_itruncate_data
2012-01-18Merge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds7-125/+110
* 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: take allocation of ->tree_root into open_ctree() btrfs: let ->s_fs_info point to fs_info, not root... btrfs: consolidate failure exits in btrfs_mount() a bit btrfs: make free_fs_info() call ->kill_sb() unconditional btrfs: merge free_fs_info() calls on fill_super failures btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super() btrfs: make open_ctree() return int btrfs: sanitizing ->fs_info, part 5 btrfs: sanitizing ->fs_info, part 4 btrfs: sanitizing ->fs_info, part 3 btrfs: sanitizing ->fs_info, part 2 btrfs: sanitizing ->fs_info, part 1 btrfs: fix a deadlock in btrfs_scan_one_device() btrfs: fix mount/umount race btrfs: get ->kill_sb() of its own btrfs: preparation to fixing mount/umount race
2012-01-18Merge branch 'for-linus' of ↵Linus Torvalds33-921/+6746
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits) Btrfs: use larger system chunks Btrfs: add a delalloc mutex to inodes for delalloc reservations Btrfs: space leak tracepoints Btrfs: protect orphan block rsv with spin_lock Btrfs: add allocator tracepoints Btrfs: don't call btrfs_throttle in file write Btrfs: release space on error in page_mkwrite Btrfs: fix btrfsck error 400 when truncating a compressed Btrfs: do not use btrfs_end_transaction_throttle everywhere Btrfs: add balance progress reporting Btrfs: allow for resuming restriper after it was paused Btrfs: allow for canceling restriper Btrfs: allow for pausing restriper Btrfs: add skip_balance mount option Btrfs: recover balance on mount Btrfs: save balance parameters to disk Btrfs: soft profile changing mode (aka soft convert) Btrfs: implement online profile changing Btrfs: do not reduce profile in do_chunk_alloc() Btrfs: virtual address space subset filter ... Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new mnt_drop_write_file() helper.
2012-01-18proc: clean up and fix /proc/<pid>/mem handlingLinus Torvalds1-106/+39
Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very robust, and it also doesn't match the permission checking of any of the other related files. This changes it to do the permission checks at open time, and instead of tracking the process, it tracks the VM at the time of the open. That simplifies the code a lot, but does mean that if you hold the file descriptor open over an execve(), you'll continue to read from the _old_ VM. That is different from our previous behavior, but much simpler. If somebody actually finds a load where this matters, we'll need to revert this commit. I suspect that nobody will ever notice - because the process mapping addresses will also have changed as part of the execve. So you cannot actually usefully access the fd across a VM change simply because all the offsets for IO would have changed too. Reported-by: Jüri Aedla <asd@ut.ee> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-18vfs: remove printk from set_nlink()Miklos Szeredi1-3/+0
Don't log a message for set_nlink(0). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-18wake up s_wait_unfrozen when ->freeze_fs failsKazuya Mio1-0/+2
dd slept infinitely when fsfeeze failed because of EIO. To fix this problem, if ->freeze_fs fails, freeze_super() wakes up the tasks waiting for the filesystem to become unfrozen. When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(), the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen. However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then freeze_super() returns an error number. In this case, FITHAW ioctl returns EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-18audit: do not call audit_getname on errorEric Paris1-15/+13
Just a code cleanup really. We don't need to make a function call just for it to return on error. This also makes the VFS function even easier to follow and removes a conditional on a hot path. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-18audit: only allow tasks to set their loginuid if it is -1Eric Paris1-3/+0
At the moment we allow tasks to set their loginuid if they have CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they log in and it be impossible to ever reset. We had to make it mutable even after it was once set (with the CAP) because on update and admin might have to restart sshd. Now sshd would get his loginuid and the next user which logged in using ssh would not be able to set his loginuid. Systemd has changed how userspace works and allowed us to make the kernel work the way it should. With systemd users (even admins) are not supposed to restart services directly. The system will restart the service for them. Thus since systemd is going to loginuid==-1, sshd would get -1, and sshd would be allowed to set a new loginuid without special permissions. If an admin in this system were to manually start an sshd he is inserting himself into the system chain of trust and thus, logically, it's his loginuid that should be used! Since we have old systems I make this a Kconfig option. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-18audit: remove task argument to audit_set_loginuidEric Paris1-1/+1
The function always deals with current. Don't expose an option pretending one can use it for something. You can't. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-18xfs: cleanup xfs_file_aio_writeChristoph Hellwig1-45/+37
With all the size field updates out of the way xfs_file_aio_write can be further simplified by pushing all iolock handling into xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using the generic generic_write_sync helper for synchronous writes. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-18xfs: always return with the iolock held from xfs_file_aio_write_checksChristoph Hellwig1-3/+4
While xfs_iunlock is fine with 0 lockflags the calling conventions are much cleaner if xfs_file_aio_write_checks never returns without the iolock held. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-18xfs: remove the i_new_size field in struct xfs_inodeChristoph Hellwig5-92/+30
Now that we use the VFS i_size field throughout XFS there is no need for the i_new_size field any more given that the VFS i_size field gets updated in ->write_end before unlocking the page, and thus is always uptodate when writeback could see a page. Removing i_new_size also has the advantage that we will never have to trim back di_size during a failed buffered write, given that it never gets updated past i_size. Note that currently the generic direct I/O code only updates i_size after calling our end_io handler, which requires a small workaround to make sure di_size actually makes it to disk. I hope to fix this properly in the generic code. A downside is that we lose the support for parallel non-overlapping O_DIRECT appending writes that recently was added. I don't think keeping the complex and fragile i_new_size infrastructure for this is a good tradeoff - if we really care about parallel appending writers we should investigate turning the iolock into a range lock, which would also allow for parallel non-overlapping buffered writers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-18xfs: remove the i_size field in struct xfs_inodeChristoph Hellwig12-82/+56
There is no fundamental need to keep an in-memory inode size copy in the XFS inode. We already have the on-disk value in the dinode, and the separate in-memory copy that we need for regular files only in the XFS inode. Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the VFS inode i_size field for regular files. Switch code that was directly accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where we are limited to regular files direct access of the VFS inode i_size field. This also allows dropping some fairly complicated code in the write path which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size that is getting updated inside ->write_end. Note that we do not bother resetting the VFS i_size when truncating a file that gets freed to zero as there is no point in doing so because the VFS inode is no longer in use at this point. Just relax the assert in xfs_ifree to only check the on-disk size instead. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-18xfs: replace i_pin_wait with a bit waitqueueChristoph Hellwig4-9/+24
Replace i_pin_wait, which is only used during synchronous inode flushing with a bit waitqueue. This trades off a much smaller inode against slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit) bytes in the XFS inode. Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-18xfs: replace i_flock with a sleeping bitlockChristoph Hellwig6-46/+76
We almost never block on i_flock, the exception is synchronous inode flushing. Instead of bloating the inode with a 16/24-byte completion that we abuse as a semaphore just implement it as a bitlock that uses a bit waitqueue for the rare sleeping path. This primarily is a tradeoff between a much smaller inode and a faster non-blocking path vs faster wakeups, and we are much better off with the former. A small downside is that we will lose lockdep checking for i_flock, but given that it's always taken inside the ilock that should be acceptable. Note that for example the inode writeback locking is implemented in a very similar way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-18xfs: make i_flags an unsigned longChristoph Hellwig1-1/+1
To be used for bit wakeup i_flags needs to be an unsigned long or we'll run into trouble on big endian systems. Because of the 1-byte i_update field right after it this actually causes a fairly large size increase on its own (4 or 8 bytes), but that increase will be more than offset by the next two patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>