summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2018-08-23nfsd: Remove callback_credChuck Lever3-30/+2
Clean up: The global callback_cred is no longer used, so it can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23nfsd: Use correct credential for NFSv4.0 callback with GSSChuck Lever1-1/+8
I've had trouble when operating a multi-homed Linux NFS server with Kerberos using NFSv4.0. Lately, I've seen my clients reporting this (and then hanging): May 9 11:43:26 manet kernel: NFS: NFSv4 callback contains invalid cred The client-side commit f11b2a1cfbf5 ("nfs4: copy acceptor name from context to nfs_client") appears to be related, but I suspect this problem has been going on for some time before that. RFC 7530 Section 3.3.3 says: > For Kerberos V5, nfs/hostname would be a server principal in the > Kerberos Key Distribution Center database. This is the same > principal the client acquired a GSS-API context for when it issued > the SETCLIENTID operation ... In other words, an NFSv4.0 client expects that the server will use the same GSS principal for callback that the client used to establish its lease. For example, if the client used the service principal "nfs@server.domain" to establish its lease, the server is required to use "nfs@server.domain" when performing NFSv4.0 callback operations. The Linux NFS server currently does not. It uses a common service principal for all callback connections. Sometimes this works as expected, and other times -- for example, when the server is accessible via multiple hostnames -- it won't work at all. This patch scrapes the target name from the client credential, and uses that for the NFSv4.0 callback credential. That should be correct much more often. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23sunrpc: Extract target name into svc_credChuck Lever1-2/+5
NFSv4.0 callback needs to know the GSS target name the client used when it established its lease. That information is available from the GSS context created by gssproxy. Make it available in each svc_cred. Note this will also give us access to the real target service principal name (which is typically "nfs", but spec does not require that). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09nfsd: use true and false for boolean valuesGustavo A. R. Silva1-3/+3
Return statements in functions returning bool should use true or false instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09nfsd: constify write_op[]Eric Biggers1-1/+1
write_op[] is never modified, so make it 'const'. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_idnixiaoming1-3/+1
READ_BUF(8); dummy = be32_to_cpup(p++); dummy = be32_to_cpup(p++); ... READ_BUF(4); dummy = be32_to_cpup(p++); Assigning value to "dummy" here, but that stored value is overwritten before it can be used. At the same time READ_BUF() will re-update the pointer p. delete invalid assignment statements Signed-off-by: nixiaoming <nixiaoming@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09NFSD: Handle full-length symlinksChuck Lever2-0/+4
I've given up on the idea of zero-copy handling of SYMLINK on the server side. This is because the Linux VFS symlink API requires the symlink pathname to be in a NUL-terminated kmalloc'd buffer. The NUL-termination is going to be problematic (watching out for landing on a page boundary and dealing with a 4096-byte pathname). I don't believe that SYMLINK creation is on a performance path or is requested frequently enough that it will cause noticeable CPU cache pollution due to data copies. There will be two places where a transport callout will be necessary to fill in the rqstp: one will be in the svc_fill_symlink_pathname() helper that is used by NFSv2 and NFSv3, and the other will be in nfsd4_decode_create(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09NFSD: Refactor the generic write vector fill helperChuck Lever3-21/+8
fill_in_write_vector() is nearly the same logic as svc_fill_write_vector(), but there are a few differences so that the former can handle multiple WRITE payloads in a single COMPOUND. svc_fill_write_vector() can be adjusted so that it can be used in the NFSv4 WRITE code path too. Instead of assuming the pages are coming from rq_args.pages, have the caller pass in the page list. The immediate benefit is a reduction of code duplication. It also prevents the NFSv4 WRITE decoder from passing an empty vector element when the transport has provided the payload in the xdr_buf's page array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09nfsd: Mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Warning level 2 was used: -Wimplicit-fallthrough=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09nfsd: fix leaked file lock with nfs exported overlayfsAmir Goldstein5-13/+13
nfsd and lockd call vfs_lock_file() to lock/unlock the inode returned by locks_inode(file). Many places in nfsd/lockd code use the inode returned by file_inode(file) for lock manipulation. With Overlayfs, file_inode() (the underlying inode) is not the same object as locks_inode() (the overlay inode). This can result in "Leaked POSIX lock" messages and eventually to a kernel crash as reported by Eddie Horng: https://marc.info/?l=linux-unionfs&m=153086643202072&w=2 Fix all the call sites in nfsd/lockd that should use locks_inode(). This is a correctness bug that manifested when overlayfs gained NFS export support in v4.16. Reported-by: Eddie Horng <eddiehorng.tw@gmail.com> Tested-by: Eddie Horng <eddiehorng.tw@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Fixes: 8383f1748829 ("ovl: wire up NFS export operations") Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-19nfsd: don't advertise a SCSI layout for an unsupported request_queueBenjamin Coddington1-9/+2
Commit 30181faae37f ("nfsd: Check queue type before submitting a SCSI request") did the work of ensuring that we don't send SCSI requests to a request queue that won't support them, but that check is in the GETDEVICEINFO path. Let's not set the SCSI layout in fs_layout_type in the first place, and then we'll have less clients sending GETDEVICEINFO for non-SCSI request queues and less unnecessary WARN_ONs. While we're in here, remove some outdated comments that refer to "overwriting" layout seletion because commit 8a4c3926889e ("nfsd: allow nfsd to advertise multiple layout types") changed things to no longer overwrite the layout type. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd: fix corrupted reply to badly ordered compoundJ. Bruce Fields1-0/+1
We're encoding a single op in the reply but leaving the number of ops zero, so the reply makes no sense. Somewhat academic as this isn't a case any real client will hit, though in theory perhaps that could change in a future protocol extension. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd: clarify check_op_orderingJ. Bruce Fields1-4/+9
Document a couple things that confused me on a recent reading. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd: update obselete comment referencing the BKLJ. Bruce Fields1-3/+3
It's inode->i_lock that's now taken in setlease and break_lease, instead of the big kernel lock. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd4: cleanup sessionid in nfsd4_destroy_sessionJ. Bruce Fields1-4/+4
The name of this variable doesn't fit the type. And we only ever use one field of it. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd4: less confusing nfsd4_compound_in_sessionJ. Bruce Fields1-4/+4
Make the function prototype match the name a little better. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd4: support change_attr_type attributeJ. Bruce Fields2-0/+11
The change attribute is what is used by clients to revalidate their caches. Our server may use i_version or ctime for that purpose. Those choices behave slightly differently, and it may be useful to the client to know which we're using. This attribute tells the client that. The Linux client doesn't yet use this attribute yet, though. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd: fix NFSv4 time_delta attributeJ. Bruce Fields1-3/+26
Currently we return the worst-case value of 1 second in the time delta attribute. That's not terribly useful. Instead, return a value calculated from the time granularity supported by the filesystem and the system clock. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd4: return default lease periodJ. Bruce Fields1-2/+2
I don't have a good rationale for the lease period, but 90 seconds seems long, and as long as we're allowing the server to extend the grace period up to double the lease period, let's half the default to 45. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-17nfsd4: extend reclaim period for reclaiming clientsJ. Bruce Fields4-1/+36
If the client is only renewing state a little sooner than once a lease period, then it might not discover the server has restarted till close to the end of the grace period, and might run out of time to do the actual reclaim. Extend the grace period by a second each time we notice there are clients still trying to reclaim, up to a limit of another whole lease period. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-06-16Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimentalLinus Torvalds4-4/+4
Pull documentation fixes from Mauro Carvalho Chehab: "This solves a series of broken links for files under Documentation, and improves a script meant to detect such broken links (see scripts/documentation-file-ref-check). The changes on this series are: - can.rst: fix a footnote reference; - crypto_engine.rst: Fix two parsing warnings; - Fix a lot of broken references to Documentation/*; - improve the scripts/documentation-file-ref-check script, in order to help detecting/fixing broken references, preventing false-positives. After this patch series, only 33 broken references to doc files are detected by scripts/documentation-file-ref-check" * tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits) fix a series of Documentation/ broken file name references Documentation: rstFlatTable.py: fix a broken reference ABI: sysfs-devices-system-cpu: remove a broken reference devicetree: fix a series of wrong file references devicetree: fix name of pinctrl-bindings.txt devicetree: fix some bindings file names MAINTAINERS: fix location of DT npcm files MAINTAINERS: fix location of some display DT bindings kernel-parameters.txt: fix pointers to sound parameters bindings: nvmem/zii: Fix location of nvmem.txt docs: Fix more broken references scripts/documentation-file-ref-check: check tools/*/Documentation scripts/documentation-file-ref-check: get rid of false-positives scripts/documentation-file-ref-check: hint: dash or underline scripts/documentation-file-ref-check: add a fix logic for DT scripts/documentation-file-ref-check: accept more wildcards at filenames scripts/documentation-file-ref-check: fix help message media: max2175: fix location of driver's companion documentation media: v4l: fix broken video4linux docs locations media: dvb: point to the location of the old README.dvb-usb file ...
2018-06-16Merge tag 'fsnotify_for_v4.18-rc1' of ↵Linus Torvalds10-124/+152
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fsnotify cleanups unifying handling of different watch types. This is the shortened fsnotify series from Amir with the last five patches pulled out. Amir has modified those patches to not change struct inode but obviously it's too late for those to go into this merge window" * tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: add fsnotify_add_inode_mark() wrappers fanotify: generalize fanotify_should_send_event() fsnotify: generalize send_to_group() fsnotify: generalize iteration of marks by object type fsnotify: introduce marks iteration helpers fsnotify: remove redundant arguments to handle_event() fsnotify: use type id to identify connector object type
2018-06-16Merge branch 'afs-proc' of ↵Linus Torvalds20-645/+879
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "Assorted AFS stuff - ended up in vfs.git since most of that consists of David's AFS-related followups to Christoph's procfs series" * 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Optimise callback breaking by not repeating volume lookup afs: Display manually added cells in dynamic root mount afs: Enable IPv6 DNS lookups afs: Show all of a server's addresses in /proc/fs/afs/servers afs: Handle CONFIG_PROC_FS=n proc: Make inline name size calculation automatic afs: Implement network namespacing afs: Mark afs_net::ws_cell as __rcu and set using rcu functions afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus() proc: Add a way to make network proc files writable afs: Rearrange fs/afs/proc.c to remove remaining predeclarations. afs: Rearrange fs/afs/proc.c to move the show routines up afs: Rearrange fs/afs/proc.c by moving fops and open functions down afs: Move /proc management functions to the end of the file
2018-06-16Merge branch 'work.compat' of ↵Linus Torvalds3-136/+112
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat updates from Al Viro: "Some biarch patches - getting rid of assorted (mis)uses of compat_alloc_user_space(). Not much in that area this cycle..." * 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: orangefs: simplify compat ioctl handling signalfd: lift sigmask copyin and size checks to callers of do_signalfd4() vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
2018-06-16Merge branch 'work.aio' of ↵Linus Torvalds3-9/+14
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull aio fixes from Al Viro: "Assorted AIO followups and fixes" * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: eventpoll: switch to ->poll_mask aio: only return events requested in poll_mask() for IOCB_CMD_POLL eventfd: only return events requested in poll_mask() aio: mark __aio_sigset::sigmask const
2018-06-16fix a series of Documentation/ broken file name referencesMauro Carvalho Chehab2-2/+2
As files move around, their previous links break. Fix the references for them. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
2018-06-16docs: Fix more broken referencesMauro Carvalho Chehab2-2/+2
As we move stuff around, some doc references are broken. Fix some of them via this script: ./scripts/documentation-file-ref-check --fix Manually checked that produced results are valid. Acked-by: Matthias Brugger <matthias.bgg@gmail.com> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
2018-06-15afs: Optimise callback breaking by not repeating volume lookupDavid Howells3-20/+107
At the moment, afs_break_callbacks calls afs_break_one_callback() for each separate FID it was given, and the latter looks up the volume individually for each one. However, this is inefficient if two or more FIDs have the same vid as we could reuse the volume. This is complicated by cell aliasing whereby we may have multiple cells sharing a volume and can therefore have multiple callback interests for any particular volume ID. At the moment afs_break_one_callback() scans the entire list of volumes we're getting from a server and breaks the appropriate callback in every matching volume, regardless of cell. This scan is done for every FID. Optimise callback breaking by the following means: (1) Sort the FID list by vid so that all FIDs belonging to the same volume are clumped together. This is done through the use of an indirection table as we cannot do an insertion sort on the afs_callback_break array as we decode FIDs into it as we subsequently also have to decode callback info into it that corresponds by array index only. We also don't really want to bubblesort afterwards if we can avoid it. (2) Sort the server->cb_interests array by vid so that all the matching volumes are grouped together. This permits the scan to stop after finding a record that has a higher vid. (3) When breaking FIDs, we try to keep server->cb_break_lock as long as possible, caching the start point in the array for that volume group as long as possible. It might make sense to add another layer in that list and have a refcounted volume ID anchor that has the matching interests attached to it rather than being in the list. This would allow the lock to be dropped without losing the cursor. Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15afs: Display manually added cells in dynamic root mountDavid Howells6-27/+199
Alter the dynroot mount so that cells created by manipulation of /proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root cell as a module parameter will cause directories for those cells to be created in the dynamic root superblock for the network namespace[*]. To this end: (1) Only one dynamic root superblock is now created per network namespace and this is shared between all attempts to mount it. This makes it easier to find the superblock to modify. (2) When a dynamic root superblock is created, the list of cells is walked and directories created for each cell already defined. (3) When a new cell is added, if a dynamic root superblock exists, a directory is created for it. (4) When a cell is destroyed, the directory is removed. (5) These directories are created by calling lookup_one_len() on the root dir which automatically creates them if they don't exist. [*] Inasmuch as network namespaces are currently supported here. Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15afs: Enable IPv6 DNS lookupsDavid Howells2-2/+2
Remove the restriction on DNS lookup upcalls that prevents ipv6 addresses from being looked up. Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15afs: Show all of a server's addresses in /proc/fs/afs/serversDavid Howells1-2/+8
Show all of a server's addresses in /proc/fs/afs/servers, placing the second plus addresses on padded lines of their own. The current address is marked with a star. Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15afs: Handle CONFIG_PROC_FS=nDavid Howells2-2/+10
The AFS filesystem depends at the moment on /proc for configuration and also presents information that way - however, this causes a compilation failure if procfs is disabled. Fix it so that the procfs bits aren't compiled in if procfs is disabled. This means that you can't configure the AFS filesystem directly, but it is still usable provided that an up-to-date keyutils is installed to look up cells by SRV or AFSDB DNS records. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15proc: Make inline name size calculation automaticDavid Howells4-12/+16
Make calculation of the size of the inline name in struct proc_dir_entry automatic, rather than having to manually encode the numbers and failing to allow for lockdep. Require a minimum inline name size of 33+1 to allow for names that look like two hex numbers with a dash between. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15orangefs: simplify compat ioctl handlingAl Viro1-42/+12
no need to mess with copy_in_user(), etc... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()Al Viro1-25/+25
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15eventpoll: switch to ->poll_maskBen Noordhuis1-5/+10
Signed-off-by: Ben Noordhuis <info@bnoordhuis.nl> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15aio: only return events requested in poll_mask() for IOCB_CMD_POLLChristoph Hellwig1-2/+2
The ->poll_mask() operation has a mask of events that the caller is interested in, but not all implementations might take it into account. Mask the return value to only the requested events, similar to what the poll and epoll code does. Reported-by: Avi Kivity <avi@scylladb.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15eventfd: only return events requested in poll_mask()Avi Kivity1-2/+2
The ->poll_mask() operation has a mask of events that the caller is interested in, but we're returning all events regardless. Change to return only the events the caller is interested in. This fixes aio IO_CMD_POLL returning immediately when called with POLLIN on an eventfd, since an eventfd is almost always ready for a write. Signed-off-by: Avi Kivity <avi@scylladb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds6-82/+134
Merge more updates from Andrew Morton: - MM remainders - various misc things - kcov updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits) lib/test_printf.c: call wait_for_random_bytes() before plain %p tests hexagon: drop the unused variable zero_page_mask hexagon: fix printk format warning in setup.c mm: fix oom_kill event handling treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX mm: use octal not symbolic permissions ipc: use new return type vm_fault_t sysvipc/sem: mitigate semnum index against spectre v1 fault-injection: reorder config entries arm: port KCOV to arm sched/core / kcov: avoid kcov_area during task switch kcov: prefault the kcov_area kcov: ensure irq code sees a valid area kernel/relay.c: change return type to vm_fault_t exofs: avoid VLA in structures coredump: fix spam with zero VMA process fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block() proc: skip branch in /proc/*/* lookup mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns mm/memblock: add missing include <linux/bootmem.h> ...
2018-06-15exofs: avoid VLA in structuresKees Cook3-67/+115
On the quest to remove all VLAs from the kernel[1] this adjusts several cases where allocation is made after an array of structures that points back into the allocation. The allocations are changed to perform explicit calculations instead of using a Variable Length Array in a structure. Additionally, this lets Clang compile this code now, since Clang does not support VLAIS[2]. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com [2] https://lkml.kernel.org/r/CA+55aFy6h1c3_rP_bXFedsTXzwW+9Q9MfJaW7GUmMBrAp-fJ9A@mail.gmail.com [keescook@chromium.org: v2] Link: http://lkml.kernel.org/r/20180418163546.GA45794@beast Link: http://lkml.kernel.org/r/20180327203904.GA1151@beast Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Boaz Harrosh <ooo@electrozaur.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15coredump: fix spam with zero VMA processAlexey Dobriyan1-8/+9
Nobody ever tried to self destruct by unmapping whole address space at once: munmap((void *)0, (1ULL << 47) - 4096); Doing this produces 2 warnings for zero-length vmalloc allocations: a.out[1353]: segfault at 7f80bcc4b757 ip 00007f80bcc4b757 sp 00007fff683939b8 error 14 a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null) ... a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null) ... Fix is to switch to kvmalloc(). Steps to reproduce: // vsyscall=none #include <sys/mman.h> #include <sys/resource.h> int main(void) { setrlimit(RLIMIT_CORE, &(struct rlimit){RLIM_INFINITY, RLIM_INFINITY}); munmap((void *)0, (1ULL << 47) - 4096); return 0; } Link: http://lkml.kernel.org/r/20180410180353.GA2515@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block()OGAWA Hirofumi1-1/+7
If file size and FAT cluster chain is not matched (corrupted image), we can hit BUG_ON(!phys) in __fat_get_block(). So, use fat_fs_error() instead. [hirofumi@mail.parknet.co.jp: fix printk warning] Link: http://lkml.kernel.org/r/87po12aq5p.fsf@mail.parknet.co.jp Link: http://lkml.kernel.org/r/874lilcu67.fsf@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Tested-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15proc: skip branch in /proc/*/* lookupAlexey Dobriyan1-6/+3
Code is structured like this: for ( ... p < last; p++) { if (memcmp == 0) break; } if (p >= last) ERROR OK gcc doesn't see that if if lookup succeeds than post loop branch will never be taken and skip it. [akpm@linux-foundation.org: proc_pident_instantiate() no longer takes an inode*] Link: http://lkml.kernel.org/r/20180423213954.GD9043@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15Merge tag 'vfs-timespec64' of ↵Linus Torvalds79-391/+494
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()
2018-06-15Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds7-124/+202
Pull ceph updates from Ilya Dryomov: "The main piece is a set of libceph changes that revamps how OSD requests are aborted, improving CephFS ENOSPC handling and making "umount -f" actually work (Zheng and myself). The rest is mostly mount option handling cleanups from Chengguang and assorted fixes from Zheng, Luis and Dongsheng. * tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client: (31 commits) rbd: flush rbd_dev->watch_dwork after watch is unregistered ceph: update description of some mount options ceph: show ino32 if the value is different with default ceph: strengthen rsize/wsize/readdir_max_bytes validation ceph: fix alignment of rasize ceph: fix use-after-free in ceph_statfs() ceph: prevent i_version from going back ceph: fix wrong check for the case of updating link count libceph: allocate the locator string with GFP_NOFAIL libceph: make abort_on_full a per-osdc setting libceph: don't abort reads in ceph_osdc_abort_on_full() libceph: avoid a use-after-free during map check libceph: don't warn if req->r_abort_on_full is set libceph: use for_each_request() in ceph_osdc_abort_on_full() libceph: defer __complete_request() to a workqueue libceph: move more code into __complete_request() libceph: no need to call flush_workqueue() before destruction ceph: flush pending works before shutdown super ceph: abort osd requests on force umount libceph: introduce ceph_osdc_abort_requests() ...
2018-06-15Merge tag 'for-4.18-part2-tag' of ↵Linus Torvalds5-19/+19
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - error handling fixup for one of the new ioctls from 1st pull - fix for device-replace that incorrectly uses inode pages and can mess up compressed extents in some cases - fiemap fix for reporting incorrect number of extents - vm_fault_t type conversion * tag 'for-4.18-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: Don't use inode pages for device replace btrfs: change return type of btrfs_page_mkwrite to vm_fault_t Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero btrfs: Check error of btrfs_iget in btrfs_search_path_in_tree_user
2018-06-14pstore: Remove bogus format string definitionArnd Bergmann1-11/+6
The pstore conversion to timespec64 introduces its own method of passing seconds into sscanf() and sprintf() type functions to work around the timespec64 definition on 64-bit systems that redefine it to 'timespec'. That hack is now finally getting removed, but that means we get a (harmless) warning once both patches are merged: fs/pstore/ram.c: In function 'ramoops_read_kmsg_hdr': fs/pstore/ram.c:39:29: error: format '%ld' expects argument of type 'long int *', but argument 3 has type 'time64_t *' {aka 'long long int *'} [-Werror=format=] #define RAMOOPS_KERNMSG_HDR "====" ^~~~~~ fs/pstore/ram.c:167:21: note: in expansion of macro 'RAMOOPS_KERNMSG_HDR' This removes the pstore specific workaround and uses the same method that we have in place for all other functions that print a timespec64. Related to this, I found that the kasprintf() output contains an incorrect nanosecond values for any number starting with zeroes, and I adapt the format string accordingly. Link: https://lkml.org/lkml/2018/5/19/115 Link: https://lkml.org/lkml/2018/5/16/1080 Fixes: 0f0d83b99ef7 ("pstore: Convert internal records to timespec64") Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-06-14Merge branch 'vfs_timespec64' of https://github.com/deepa-hub/vfs into ↵Arnd Bergmann79-390/+498
vfs-timespec64 Pull the timespec64 conversion from Deepa Dinamani: "The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The flag patch applies cleanly. I've not seen the timestamps update logic change often. The series applies cleanly on 4.17-rc6 and linux-next tip (top commit: next-20180517). I'm not sure how to merge this kind of a series with a flag patch. We are targeting 4.18 for this. Let me know if you have other suggestions. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. I've tried to keep the conversions with the script simple, to aid in the reviews. I've kept all the internal filesystem data structures and function signatures the same. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions." I've pulled it into a branch based on top of the NFS changes that are now in mainline, so I could resolve the non-obvious conflict between the two while merging. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-06-13Revert "debugfs: inode: debugfs_create_dir uses mode permission from parent"Linus Torvalds1-3/+1
This reverts commit 95cde3c59966f6371b6bcd9e4e2da2ba64ee9775. The commit had good intentions, but it breaks kvm-tool and qemu-kvm. With it in place, "lkvm run" just fails with Error: KVM_CREATE_VM ioctl Warning: Failed init: kvm__init which isn't a wonderful error message, but bisection pinpointed the problematic commit. The problem is almost certainly due to the special kvm debugfs entries created dynamically by kvm under /sys/kernel/debug/kvm/. See kvm_create_vm_debugfs() Bisected-and-reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-13Merge tag 'overflow-v4.18-rc1-part2' of ↵Linus Torvalds75-158/+225
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more overflow updates from Kees Cook: "The rest of the overflow changes for v4.18-rc1. This includes the explicit overflow fixes from Silvio, further struct_size() conversions from Matthew, and a bug fix from Dan. But the bulk of it is the treewide conversions to use either the 2-factor argument allocators (e.g. kmalloc(a * b, ...) into kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a * b) into vmalloc(array_size(a, b)). Coccinelle was fighting me on several fronts, so I've done a bunch of manual whitespace updates in the patches as well. Summary: - Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees)" * tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits) treewide: Use array_size in f2fs_kvzalloc() treewide: Use array_size() in f2fs_kzalloc() treewide: Use array_size() in f2fs_kmalloc() treewide: Use array_size() in sock_kmalloc() treewide: Use array_size() in kvzalloc_node() treewide: Use array_size() in vzalloc_node() treewide: Use array_size() in vzalloc() treewide: Use array_size() in vmalloc() treewide: devm_kzalloc() -> devm_kcalloc() treewide: devm_kmalloc() -> devm_kmalloc_array() treewide: kvzalloc() -> kvcalloc() treewide: kvmalloc() -> kvmalloc_array() treewide: kzalloc_node() -> kcalloc_node() treewide: kzalloc() -> kcalloc() treewide: kmalloc() -> kmalloc_array() mm: Introduce kvcalloc() video: uvesafb: Fix integer overflow in allocation UBIFS: Fix potential integer overflow in allocation leds: Use struct_size() in allocation Convert intel uncore to struct_size ...