summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-02-20gfs2: Allow some glocks to be used during withdrawBob Peterson4-10/+45
We need to allow some glocks to be enqueued, dequeued, promoted, and demoted when we're withdrawn. For example, to maintain metadata integrity, we should disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like iopen or the transaction glocks may be safely used because none of their metadata goes through the journal. So in general, we should disallow all glocks with an address space, and allow all the others. One exception is: we need to allow our active journal to be demoted so others may recover it. Allowing glocks after withdraw gives us the ability to take appropriate action (in a following patch) to have our journal properly replayed by another node rather than just abandoning the current transactions and pretending nothing bad happened, leaving the other nodes free to modify the blocks we had in our journal, which may result in file system corruption. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: move check_journal_clean to util.c for future useBob Peterson3-42/+46
Before this patch function check_journal_clean was in ops_fstype.c. This patch moves it to util.c so we can make use of it elsewhere in a future patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Ignore dlm recovery requests if gfs2 is withdrawnBob Peterson2-0/+23
When a node fails, user space informs dlm of the node failure, and dlm instructs gfs2 on the surviving nodes to perform journal recovery. It does this by calling various callback functions in lock_dlm.c. To mark its progress, it keeps generation numbers and recover bits in a dlm "control" lock lvb, which is seen by all nodes to determine which journals need to be replayed. The gfs2 on all nodes get the same recovery requests from dlm, so they all try to do the recovery, but only one will be granted the exclusive lock on the journal. The others fail with a "Busy" message on their "try lock." However, when a node is withdrawn, it cannot safely do any recovery or replay any journals. To make matters worse, gfs2 might withdraw as a result of attempting recovery. For example, this might happen if the device goes offline, or if an hba fails. But in today's gfs2 code, it doesn't check for being withdrawn at any step in the recovery process. What's worse is that these callbacks from dlm have no return code, so there is no way to indicate failure back to dlm. We can send a "Recovery failed" uevent eventually, but that tells user space what happened, not dlm's kernel code. Before this patch, lock_dlm would perform its recovery steps but ignore the result, and eventually it would still update its generation number in the lvb, despite the fact that it may have withdrawn or encountered an error. The other nodes would then see the newer generation number in the lvb and conclude that they don't need to do recovery because the generation number is newer than the last one they saw. They think a different node has already recovered the journal. This patch adds checks to several of the callbacks used by dlm in its recovery state machine so that the functions are ignored and skipped if an io error has occurred or if the file system is withdrawn. That prevents the lvb bits from being updated, and therefore dlm and user space still see the need for recovery to take place. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Only complain the first time an io error occurs in quota or logBob Peterson2-4/+5
Before this patch, all io errors received by the quota daemon or the logd daemon would cause a complaint message to be issued, such as: gfs2: fsid=dm-13.0: Error 10 writing to journal, jid=0 This patch changes it so that the error message is only issued the first time the error is encountered. Also, before this patch function gfs2_end_log_write did not set the sd_log_error value, so log errors would not cause the file system to be withdrawn. This patch sets the error code so the file system is properly withdrawn if an io error is encountered writing to the journal. WARNING: This change in function breaks check xfstests generic/441 and causes it to fail: io errors writing to the log should cause a file system to be withdrawn, and no further operations are tolerated. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: log error reformBob Peterson3-9/+16
Before this patch, gfs2 kept track of journal io errors in two places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags. This patch consolidates the two into sd_log_error so that it reflects the first error encountered writing to the journal. In future patches, we will take advantage of this by checking this value rather than having to check both when reacting to io errors. In addition, this fixes a tight loop in unmount: If buffers get on the ail1 list and an io error occurs elsewhere, the ail1 list would never be cleared because they were always busy. So unmount would hang, waiting for the ail1 list to empty. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Rework how rgrp buffer_heads are managedBob Peterson5-36/+6
Before this patch, the rgrp code had a serious problem related to how it managed buffer_heads for resource groups. The problem caused file system corruption, especially in cases of journal replay. When an rgrp glock was demoted to transfer ownership to a different cluster node, do_xmote() first calls rgrp_go_sync and then rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called gfs2_rgrp_brelse() that dropped the buffer_head reference count. In most cases, the reference count went to zero, which is right. However, there were other places where the buffers are handled differently. After rgrp_go_sync, do_xmote called rgrp_go_inval which called gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to truncate_inode_pages_range would get rid of the pages in memory, but only if the reference count drops to 0. Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL. So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer to the buffer_heads in cases where the reference count was still 1. Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time, it failed the check for "if (bi->bi_bh)" and thus failed to call brelse a second time. Because of that, the reference count on those buffers sometimes failed to drop from 1 to 0. And that caused function truncate_inode_pages_range to keep the pages in page cache rather than freeing them. The next time the rgrp glock was acquired, the metadata read of the rgrp buffers re-used the pages in memory, which were now wrong because they were likely modified by the other node who acquired the glock in EX (which is why we demoted the glock). This re-use of the page cache caused corruption because changes made by the other nodes were never seen, so the bitmaps were inaccurate. For some reason, the problem became most apparent when journal replay forced the replay of rgrps in memory, which caused newer rgrp data to be overwritten by the older in-core pages. A big part of the problem was that the rgrp buffer were released in multiple places: The go_unlock function would release them when the glock was released rather than when the glock is demoted, which is clearly wrong because our intent was to cache them until the glock is demoted from SH or EX. This patch attempts to clean up the mess and make one consistent and centralized mechanism for managing the rgrp buffer_heads by implementing several changes: 1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync. We don't want to release the buffers or zero the pointers when syncing for the reasons stated above. It only makes sense to release them when the glock is actually invalidated (go_inval). And when we do, then we set the bh pointers to NULL. 2. The go_unlock function (which was only used for rgrps) is eliminated, as we've talked about doing many times before. The go_unlock function was called too early in the glock dq process, and should not happen until the glock is invalidated. 3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd. That will now happen automatically when the rgrp glocks are demoted, and shouldn't happen any sooner or later than that. Instead, function gfs2_clear_rgrpd has been modified to demote the rgrp glocks, and therefore, free those pages, before the remaining glocks are culled by gfs2_gl_hash_clear. This prevents the gl_object from hanging around when the glocks are culled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: clear ail1 list when gfs2 withdrawsBob Peterson1-4/+13
This patch fixes a bug in which function gfs2_log_flush can get into an infinite loop when a gfs2 file system is withdrawn. The problem is the infinite loop "for (;;)" in gfs2_log_flush which would never finish because the io error and subsequent withdraw prevented the items from being taken off the ail list. This patch tries to clean up the mess by allowing withdraw situations to move not-in-flight buffer_heads to the ail2 list, where they will be dealt with later. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Introduce concept of a pending withdrawBob Peterson4-21/+27
File system withdraws can be delayed when inconsistencies are discovered when we cannot withdraw immediately, for example, when critical spin_locks are held. But delaying the withdraw can cause gfs2 to ignore the error and keep running for a short period of time. For example, an rgrp glock may be dequeued and demoted while there are still buffers that haven't been properly revoked, due to io errors writing to the journal. This patch introduces a new concept of a pending withdraw, which means an inconsistency has been discovered and we need to withdraw at the earliest possible opportunity. In these cases, we aren't quite withdrawn yet, but we still need to not dequeue glocks and other critical things. If we dequeue the glocks and the withdraw results in our journal being replayed, the replay could overwrite data that's been modified by a different node that acquired the glock in the meantime. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Return bool from gfs2_assert functionsAndreas Gruenbacher2-25/+24
The gfs2_assert functions only print messages when the filesystem hasn't been withdrawn yet, and they indicate whether or not they've printed something in their return value. However, none of the callers use that information, so simply return whether or not the assert has failed. (The gfs2_assert functions are still backwards; they return false when an assertion is true.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Turn gfs2_consist into void functionsAndreas Gruenbacher2-21/+15
Change the various gfs2_consist functions to return void. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Remove usused cluster_wide arguments of gfs2_consist functionsAndreas Gruenbacher2-9/+9
These arguments are always passed as 0, and they are never evaluated. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Report errors before withdrawAndreas Gruenbacher1-25/+23
In gfs2_rgrp_verify and compute_bitstructs, make sure to report errors before withdrawing the filesystem: otherwise, when we withdraw first and withdraw is configured to panic, we'll never get to the error reporting. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Split gfs2_lm_withdraw into two functionsAndreas Gruenbacher5-70/+82
Split gfs2_lm_withdraw into a function that prints an error message and a function that withdraws the filesystem. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-06gfs2: fix O_SYNC write handlingAndreas Gruenbacher1-30/+21
In gfs2_file_write_iter, for direct writes, the error checking in the buffered write fallback case is incomplete. This can cause inode write errors to go undetected. Fix and clean up gfs2_file_write_iter along the way. Based on a proposed fix by Christoph Hellwig <hch@lst.de>. Fixes: 967bcc91b044 ("gfs2: iomap direct I/O support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-06gfs2: move setting current->backing_dev_infoChristoph Hellwig1-11/+10
Set current->backing_dev_info just around the buffered write calls to prepare for the next fix. Fixes: 967bcc91b044 ("gfs2: iomap direct I/O support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-06gfs2: fix gfs2_find_jhead that returns uninitialized jhead with seq 0Abhi Das1-1/+1
When the first log header in a journal happens to have a sequence number of 0, a bug in gfs2_find_jhead() causes it to prematurely exit, and return an uninitialized jhead with seq 0. This can cause failures in the caller. For instance, a mount fails in one test case. The correct behavior is for it to continue searching through the journal to find the correct journal head with the highest sequence number. Fixes: f4686c26ecc3 ("gfs2: read journal in large chunks") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-01Merge tag 'gfs2-for-5.6' of ↵Linus Torvalds12-70/+73
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix some corner cases on filesystems with a block size < page size. - Fix a corner case that could expose incorrect access times over nfs. - Revert an otherwise sensible revoke accounting cleanup that causes assertion failures. The revoke accounting is whacky and needs to be fixed properly before we can add back this cleanup. - Various other minor cleanups. In addition, please expect to see another pull request from Bob Peterson about his gfs2 recovery patch queue shortly. * tag 'gfs2-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: eliminate tr_num_revoke_rm" gfs2: remove unused LBIT macros fs/gfs2: remove unused IS_DINODE and IS_LEAF macros gfs2: Remove GFS2_MIN_LVB_SIZE define gfs2: Fix incorrect variable name gfs2: Avoid access time thrashing in gfs2_inode_lookup gfs2: minor cleanup: remove unneeded variable ret in gfs2_jdata_writepage gfs2: eliminate ssize parameter from gfs2_struct2blk gfs2: Another gfs2_find_jhead fix
2020-01-31Merge tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-13/+5
Pull iomap fix from Darrick Wong: "A single patch fixing an off-by-one error when we're checking to see how far we're gotten into an EOF page" * tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: Fix page_mkwrite off-by-one errors
2020-01-31Merge branch 'akpm' (patches from Andrew)Linus Torvalds25-170/+236
Pull updates from Andrew Morton: "Most of -mm and quite a number of other subsystems: hotfixes, scripts, ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov. MM is fairly quiet this time. Holidays, I assume" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) kcov: ignore fault-inject and stacktrace include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc() execve: warn if process starts with executable stack reiserfs: prevent NULL pointer dereference in reiserfs_insert_item() init/main.c: fix misleading "This architecture does not have kernel memory protection" message init/main.c: fix quoted value handling in unknown_bootoption init/main.c: remove unnecessary repair_env_string in do_initcall_level init/main.c: log arguments and environment passed to init fs/binfmt_elf.c: coredump: allow process with empty address space to coredump fs/binfmt_elf.c: coredump: delete duplicated overflow check fs/binfmt_elf.c: coredump: allocate core ELF header on stack fs/binfmt_elf.c: make BAD_ADDR() unlikely fs/binfmt_elf.c: better codegen around current->mm fs/binfmt_elf.c: don't copy ELF header around fs/binfmt_elf.c: fix ->start_code calculation fs/binfmt_elf.c: smaller code generation around auxv vector fill lib/find_bit.c: uninline helper _find_next_bit() lib/find_bit.c: join _find_next_bit{_le} uapi: rename ext2_swab() to swab() and share globally in swab.h lib/scatterlist.c: adjust indentation in __sg_alloc_table ...
2020-01-31execve: warn if process starts with executable stackAlexey Dobriyan1-0/+5
There were few episodes of silent downgrade to an executable stack over years: 1) linking innocent looking assembly file will silently add executable stack if proper linker options is not given as well: $ cat f.S .intel_syntax noprefix .text .globl f f: ret $ cat main.c void f(void); int main(void) { f(); return 0; } $ gcc main.c f.S $ readelf -l ./a.out GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RWE 0x10 ^^^ 2) converting C99 nested function into a closure https://nullprogram.com/blog/2019/11/15/ void intsort2(int *base, size_t nmemb, _Bool invert) { int cmp(const void *a, const void *b) { int r = *(int *)a - *(int *)b; return invert ? -r : r; } qsort(base, nmemb, sizeof(*base), cmp); } will silently require stack trampolines while non-closure version will not. Without doubt this behaviour is documented somewhere, add a warning so that developers and users can at least notice. After so many years of x86_64 having proper executable stack support it should not cause too many problems. Link: http://lkml.kernel.org/r/20191208171918.GC19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Will Deacon <will@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()Yunfeng Ye1-1/+2
The variable inode may be NULL in reiserfs_insert_item(), but there is no check before accessing the member of inode. Fix this by adding NULL pointer check before calling reiserfs_debug(). Link: http://lkml.kernel.org/r/79c5135d-ff25-1cc9-4e99-9f572b88cc00@huawei.com Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Cc: zhengbin <zhengbin13@huawei.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Cc: Feilong Lin <linfeilong@huawei.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: coredump: allow process with empty address space to coredumpAlexey Dobriyan1-1/+9
Unmapping whole address space at once with munmap(0, (1ULL<<47) - 4096) or equivalent will create empty coredump. It is silly way to exit, however registers content may still be useful. The right to coredump is fundamental right of a process! Link: http://lkml.kernel.org/r/20191222150137.GA1277@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: coredump: delete duplicated overflow checkAlexey Dobriyan1-2/+0
array_size() macro will do overflow check anyway. Link: http://lkml.kernel.org/r/20191222144009.GB24341@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: coredump: allocate core ELF header on stackAlexey Dobriyan1-11/+5
Comment says ELF header is "too large to be on stack". 64 bytes on 64-bit is not large by any means. Link: http://lkml.kernel.org/r/20191222143850.GA24341@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: make BAD_ADDR() unlikelyAlexey Dobriyan1-1/+1
If some mapping goes past TASK_SIZE it will be rejected by kernel which means no such userspace binaries exist. Mark every such check as unlikely. Link: http://lkml.kernel.org/r/20191215124355.GA21124@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: better codegen around current->mmAlexey Dobriyan1-24/+28
"current->mm" pointer is stable in general except few cases one of which execve(2). Compiler can't treat is as stable but it _is_ stable most of the time. During ELF loading process ->mm becomes stable right after flush_old_exec(). Help compiler by caching current->mm, otherwise it continues to refetch it. add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-141 (-141) Function old new delta elf_core_dump 5062 5039 -23 load_elf_binary 5426 5308 -118 Note: other cases are left as is because it is either pessimisation or no change in binary size. Link: http://lkml.kernel.org/r/20191215124755.GB21124@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: don't copy ELF header aroundAlexey Dobriyan1-28/+27
ELF header is read into bprm->buf[] by generic execve code. Save a memcpy and allocate just one header for the interpreter instead of two headers (64 bytes instead of 128 on 64-bit). Link: http://lkml.kernel.org/r/20191208171242.GA19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: fix ->start_code calculationAlexey Dobriyan1-1/+1
Only executable segments should be accounted to ->start_code just like they do to ->end_code (correctly). Link: http://lkml.kernel.org/r/20191208171410.GB19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/binfmt_elf.c: smaller code generation around auxv vector fillAlexey Dobriyan1-7/+8
Filling auxv vector as array with index (auxv[i++] = ...) generates terrible code. "saved_auxv" should be reworked because it is the worst member of mm_struct by size/usefullness ratio but do it later. Meanwhile help gcc a little with *auxv++ idiom. Space savings on x86_64: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-127 (-127) Function old new delta load_elf_binary 5470 5343 -127 Link: http://lkml.kernel.org/r/20191208172301.GD19716@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31btrfs: use larger zlib buffer for s390 hardware compressionMikhail Zaslonko2-36/+101
In order to benefit from s390 zlib hardware compression support, increase the btrfs zlib workspace buffer size from 1 to 4 pages (if s390 zlib hardware support is enabled on the machine). This brings up to 60% better performance in hardware on s390 compared to the PAGE_SIZE buffer and much more compared to the software zlib processing in btrfs. In case of memory pressure, fall back to a single page buffer during workspace allocation. The data compressed with larger input buffers will still conform to zlib standard and thus can be decompressed also on a systems that uses only PAGE_SIZE buffer for btrfs zlib. Link: http://lkml.kernel.org/r/20200108105103.29028-1-zaslonko@linux.ibm.com Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: David Sterba <dsterba@suse.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Eduard Shishkin <edward6@linux.ibm.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31mm, tree-wide: rename put_user_page*() to unpin_user_page*()John Hubbard1-2/+2
In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs/io_uring: set FOLL_PIN via pin_user_pages()John Hubbard1-1/+1
Convert fs/io_uring to use the new pin_user_pages() call, which sets FOLL_PIN. Setting FOLL_PIN is now required for code that requires tracking of pinned pages, and therefore for any code that calls put_user_page(). In partial anticipation of this work, the io_uring code was already calling put_user_page() instead of put_page(). Therefore, in order to convert from the get_user_pages()/put_page() model, to the pin_user_pages()/put_user_page() model, the only change required here is to change get_user_pages() to pin_user_pages(). Link: http://lkml.kernel.org/r/20200107224558.2362728-17-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in ↵wangyan1-2/+1
handle->h_transaction For the uniform format, we use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction Link: http://lkml.kernel.org/r/6ff9a312-5f7d-0e27-fb51-bc4e062fcd97@huawei.com Signed-off-by: Yan Wang <wangyan122@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()wangyan1-3/+5
I found a NULL pointer dereference in ocfs2_update_inode_fsync_trans(), handle->h_transaction may be NULL in this situation: ocfs2_file_write_iter ->__generic_file_write_iter ->generic_perform_write ->ocfs2_write_begin ->ocfs2_write_begin_nolock ->ocfs2_write_cluster_by_desc ->ocfs2_write_cluster ->ocfs2_mark_extent_written ->ocfs2_change_extent_flag ->ocfs2_split_extent ->ocfs2_try_to_merge_extent ->ocfs2_extend_rotate_transaction ->ocfs2_extend_trans ->jbd2_journal_restart ->jbd2__journal_restart // handle->h_transaction is NULL here ->handle->h_transaction = NULL; ->start_this_handle /* journal aborted due to storage network disconnection, return error */ ->return -EROFS; /* line 3806 in ocfs2_try_to_merge_extent (), it will ignore ret error. */ ->ret = 0; ->... ->ocfs2_write_end ->ocfs2_write_end_nolock ->ocfs2_update_inode_fsync_trans // NULL pointer dereference ->oi->i_sync_tid = handle->h_transaction->t_tid; The information of NULL pointer dereference as follows: JBD2: Detected IO errors while flushing file data on dm-11-45 Aborting journal on device dm-11-45. JBD2: Error -5 detected when updating journal superblock for dm-11-45. (dd,22081,3):ocfs2_extend_trans:474 ERROR: status = -30 (dd,22081,3):ocfs2_try_to_merge_extent:3877 ERROR: status = -30 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e74e1338 [0000000000000008] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] SMP Process dd (pid: 22081, stack limit = 0x00000000584f35a9) CPU: 3 PID: 22081 Comm: dd Kdump: loaded Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019 pstate: 60400009 (nZCv daif +PAN -UAO) pc : ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2] lr : ocfs2_write_end_nolock+0x2a0/0x550 [ocfs2] sp : ffff0000459fba70 x29: ffff0000459fba70 x28: 0000000000000000 x27: ffff807ccf7f1000 x26: 0000000000000001 x25: ffff807bdff57970 x24: ffff807caf1d4000 x23: ffff807cc79e9000 x22: 0000000000001000 x21: 000000006c6cd000 x20: ffff0000091d9000 x19: ffff807ccb239db0 x18: ffffffffffffffff x17: 000000000000000e x16: 0000000000000007 x15: ffff807c5e15bd78 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000228 x8 : 000000000000000c x7 : 0000000000000fff x6 : ffff807a308ed6b0 x5 : ffff7e01f10967c0 x4 : 0000000000000018 x3 : d0bc661572445600 x2 : 0000000000000000 x1 : 000000001b2e0200 x0 : 0000000000000000 Call trace: ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2] ocfs2_write_end+0x4c/0x80 [ocfs2] generic_perform_write+0x108/0x1a8 __generic_file_write_iter+0x158/0x1c8 ocfs2_file_write_iter+0x668/0x950 [ocfs2] __vfs_write+0x11c/0x190 vfs_write+0xac/0x1c0 ksys_write+0x6c/0xd8 __arm64_sys_write+0x24/0x30 el0_svc_common+0x78/0x130 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc To prevent NULL pointer dereference in this situation, we use is_handle_aborted() before using handle->h_transaction->t_tid. Link: http://lkml.kernel.org/r/03e750ab-9ade-83aa-b000-b9e81e34e539@huawei.com Signed-off-by: Yan Wang <wangyan122@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider useAndy Shevchenko1-4/+0
There are users already and will be more of BITS_TO_BYTES() macro. Move it to bitops.h for wider use. In the case of ocfs2 the replacement is identical. As for bnx2x, there are two places where floor version is used. In the first case to calculate the amount of structures that can fit one memory page. In this case obviously the ceiling variant is correct and original code might have a potential bug, if amount of bits % 8 is not 0. In the second case the macro is used to calculate bytes transmitted in one microsecond. This will work for all speeds which is multiply of 1Gbps without any change, for the rest new code will give ceiling value, for instance 100Mbps will give 13 bytes, while old code gives 12 bytes and the arithmetically correct one is 12.5 bytes. Further the value is used to setup timer threshold which in any case has its own margins due to certain resolution. I don't see here an issue with slightly shifting thresholds for low speed connections, the card is supposed to utilize highest available rate, which is usually 10Gbps. Link: http://lkml.kernel.org/r/20200108121316.22411-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2/dlm: remove redundant assignment to retColin Ian King1-1/+1
The variable ret is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses Coverity ("Unused value") Link: http://lkml.kernel.org/r/20191202164833.62865-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: make local header paths relative to C filesMasahiro Yamada13-45/+41
Gang He reports the failure of building fs/ocfs2/ as an external module of the kernel installed on the system: $ cd fs/ocfs2 $ make -C /lib/modules/`uname -r`/build M=`pwd` modules If you want to make it work reliably, I'd recommend to remove ccflags-y from the Makefiles, and to make header paths relative to the C files. I think this is the correct usage of the #include "..." directive. Link: http://lkml.kernel.org/r/20191227022950.14804-1-ghe@suse.com Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Gang He <ghe@suse.com> Reported-by: Gang He <ghe@suse.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: remove unneeded semicolonszhengbin2-2/+2
Fixes coccicheck warnings: fs/ocfs2/cluster/quorum.c:76:2-3: Unneeded semicolon fs/ocfs2/dlmglue.c:573:2-3: Unneeded semicolon Link: http://lkml.kernel.org/r/6ee3aa16-9078-30b1-df3f-22064950bd98@linux.alibaba.com Signed-off-by: zhengbin <zhengbin13@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31fs: ocfs: remove unnecessary assertion in dlm_migrate_lockresAditya Pakki1-2/+0
In the only caller of dlm_migrate_lockres() - dlm_empty_lockres(), target is checked for O2NM_MAX_NODES. Thus, the assertion in dlm_migrate_lockres() is unnecessary and can be removed. The patch eliminates such a check. Link: http://lkml.kernel.org/r/20191218194111.26041-1-pakki001@umn.edu Signed-off-by: Aditya Pakki <pakki001@umn.edu> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31memcg: fix a crash in wb_workfn when a device disappearsTheodore Ts'o1-1/+1
Without memcg, there is a one-to-one mapping between the bdi and bdi_writeback structures. In this world, things are fairly straightforward; the first thing bdi_unregister() does is to shutdown the bdi_writeback structure (or wb), and part of that writeback ensures that no other work queued against the wb, and that the wb is fully drained. With memcg, however, there is a one-to-many relationship between the bdi and bdi_writeback structures; that is, there are multiple wb objects which can all point to a single bdi. There is a refcount which prevents the bdi object from being released (and hence, unregistered). So in theory, the bdi_unregister() *should* only get called once its refcount goes to zero (bdi_put will drop the refcount, and when it is zero, release_bdi gets called, which calls bdi_unregister). Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about the Brave New memcg World, and calls bdi_unregister directly. It does this without informing the file system, or the memcg code, or anything else. This causes the root wb associated with the bdi to be unregistered, but none of the memcg-specific wb's are shutdown. So when one of these wb's are woken up to do delayed work, they try to dereference their wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now NULL, thanks to the bdi_unregister() called by del_gendisk(). As a result, *boom*. Fortunately, it looks like the rest of the writeback path is perfectly happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to create a bdi_dev_name() function which can handle bdi->dev being NULL. This also allows us to bulletproof the writeback tracepoints to prevent them from dereferencing a NULL pointer and crashing the kernel if one is tracing with memcg's enabled, and an iSCSI device dies or a USB storage stick is pulled. The most common way of triggering this will be hotremoval of a device while writeback with memcg enabled is going on. It was triggering several times a day in a heavily loaded production environment. Google Bug Id: 145475544 Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31Merge tag 'mpx-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx Pull x86 MPX removal from Dave Hansen: "MPX requires recompiling applications, which requires compiler support. Unfortunately, GCC 9.1 is expected to be be released without support for MPX. This means that there was only a relatively small window where folks could have ever used MPX. It failed to gain wide adoption in the industry, and Linux was the only mainstream OS to ever support it widely. Support for the feature may also disappear on future processors. This set completes the process that we started during the 5.4 merge window when the MPX prctl()s were removed. XSAVE support is left in place, which allows MPX-using KVM guests to continue to function" * tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx: x86/mpx: remove MPX from arch/x86 mm: remove arch_bprm_mm_init() hook x86/mpx: remove bounds exception code x86/mpx: remove build infrastructure x86/alternatives: add missing insn.h include
2020-01-31Merge tag 'upstream-5.6-rc1' of ↵Linus Torvalds5-7/+19
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI/UBIFS updates from Miquel Raynal: "This pull request contains mostly fixes for UBI and UBIFS: UBI: - Fixes for memory leaks in error paths - Fix for an logic error in a fastmap selfcheck UBIFS: - Fix for FS_IOC_SETFLAGS related to fscrypt flag - Support for FS_ENCRYPT_FL - Fix for a dead lock in bulk-read mode" Sent on behalf of Richard Weinberger who is traveling. * tag 'upstream-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Fix an error pointer dereference in error handling code ubifs: Fix memory leak from c->sup_node ubifs: Fix ino_t format warnings in orphan_delete() ubifs: Fix deadlock in concurrent bulk-read and writepage ubifs: Fix wrong memory allocation ubi: Free the normal volumes in error paths of ubi_attach_mtd_dev() ubi: Check the presence of volume before call ubi_fastmap_destroy_checkmap() ubifs: Add support for FS_ENCRYPT_FL ubifs: Fix FS_IOC_SETFLAGS unexpectedly clearing encrypt flag ubi: wl: Remove set but not used variable 'prev_e' ubi: fastmap: Fix inverted logic in seen selfcheck
2020-01-31Merge tag 'f2fs-for-5.6' of ↵Linus Torvalds18-370/+3137
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this series, we've implemented transparent compression experimentally. It supports LZO and LZ4, but will add more later as we investigate in the field more. At this point, the feature doesn't expose compressed space to user directly in order to guarantee potential data updates later to the space. Instead, the main goal is to reduce data writes to flash disk as much as possible, resulting in extending disk life time as well as relaxing IO congestion. Alternatively, we're also considering to add ioctl() to reclaim compressed space and show it to user after putting the immutable bit. Enhancements: - add compression support - avoid unnecessary locks in quota ops - harden power-cut scenario for zoned block devices - use private bio_set to avoid IO congestion - replace GC mutex with rwsem to serialize callers Bug fixes: - fix dentry consistency and memory corruption in rename()'s error case - fix wrong swap extent reports - fix casefolding bugs - change lock coverage to avoid deadlock - avoid GFP_KERNEL under f2fs_lock_op And, we've cleaned up sysfs entries to prepare no debugfs" * tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (31 commits) f2fs: fix race conditions in ->d_compare() and ->d_hash() f2fs: fix dcache lookup of !casefolded directories f2fs: Add f2fs stats to sysfs f2fs: delete duplicate information on sysfs nodes f2fs: change to use rwsem for gc_mutex f2fs: update f2fs document regarding to fsync_mode f2fs: add a way to turn off ipu bio cache f2fs: code cleanup for f2fs_statfs_project() f2fs: fix miscounted block limit in f2fs_statfs_project() f2fs: show the CP_PAUSE reason in checkpoint traces f2fs: fix deadlock allocating bio_post_read_ctx from mempool f2fs: remove unneeded check for error allocating bio_post_read_ctx f2fs: convert inline_dir early before starting rename f2fs: fix memleak of kobject f2fs: fix to add swap extent correctly f2fs: run fsck when getting bad inode during GC f2fs: support data compression f2fs: free sysfs kobject f2fs: declare nested quota_sem and remove unnecessary sems f2fs: don't put new_page twice in f2fs_rename ...
2020-01-31Merge tag 'for_v5.6-rc1' of ↵Linus Torvalds12-85/+137
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF, quota, reiserfs, ext2 fixes and cleanups from Jan Kara: "A few assorted fixes and cleanups for udf, quota, reiserfs, and ext2" * tag 'for_v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs/reiserfs: remove unused macros fs/quota: remove unused macro udf: Clarify meaning of f_files in udf_statfs udf: Allow writing to 'Rewritable' partitions udf: Disallow R/W mode for disk with Metadata partition udf: Fix meaning of ENTITYID_FLAGS_* macros to be really bitwise-or flags udf: Fix free space reporting for metadata and virtual partitions udf: Update header files to UDF 2.60 udf: Move OSTA Identifier Suffix macros from ecma_167.h to osta_udf.h udf: Fix spelling in EXT_NEXT_EXTENT_ALLOCDESCS ext2: Adjust indentation in ext2_fill_super quota: avoid time_t in v1_disk_dqblk definition reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling reiserfs: Fix memory leak of journal device string ext2: set proper errno in error case of ext2_fill_super()
2020-01-31Merge tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds33-253/+300
Pull xfs updates from Darrick Wong: "In this release we clean out the last of the old 32-bit timestamp code, fix a number of bugs and memory corruptions on 32-bit platforms, and a refactoring of some of the extended attribute code. I think I'll be back next week with some refactoring of how the XFS buffer code returns error codes, however I prefer to hold onto that for another week to let it soak a while longer Summary: - Get rid of compat_time_t - Convert time_t to time64_t in quota code - Remove shadow variables - Prevent ATTR_ flag misuse in the attrmulti ioctls - Clean out strlen in the attr code - Remove some bogus asserts - Fix various file size limit calculation errors with 32-bit kernels - Pack xfs_dir2_sf_entry_t to fix build errors on arm oabi - Fix nowait inode locking calls for directio aio reads - Fix memory corruption bugs when invalidating remote xattr value buffers - Streamline remote attr value removal - Make the buffer log format size consistent across platforms - Strengthen buffer log format size checking - Fix messed up return types of xfs_inode_need_cow - Fix some unused variable warnings" * tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (24 commits) xfs: remove unused variable 'done' xfs: fix uninitialized variable in xfs_attr3_leaf_inactive xfs: change return value of xfs_inode_need_cow to int xfs: check log iovec size to make sure it's plausibly a buffer log format xfs: make struct xfs_buf_log_format have a consistent size xfs: complain if anyone tries to create a too-large buffer log item xfs: clean up xfs_buf_item_get_format return value xfs: streamline xfs_attr3_leaf_inactive xfs: fix memory corruption during remote attr value buffer invalidation xfs: refactor remote attr value buffer invalidation xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read xfs: Add __packed to xfs_dir2_sf_entry_t definition xfs: fix s_maxbytes computation on 32-bit kernels xfs: truncate should remove all blocks, not just to the end of the page cache xfs: introduce XFS_MAX_FILEOFF xfs: remove bogus assertion when online repair isn't enabled xfs: Remove all strlen in all xfs_attr_* functions for attr names. xfs: fix misuse of the XFS_ATTR_INCOMPLETE flag xfs: also remove cached ACLs when removing the underlying attr xfs: reject invalid flags combinations in XFS_IOC_ATTRMULTI_BY_HANDLE ...
2020-01-31Merge tag 'ext4_for_linus' of ↵Linus Torvalds28-423/+682
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge window, we've added some performance improvements in how we handle inode locking in the read/write paths, and improving the performance of Direct I/O overwrites. We also now record the error code which caused the first and most recent ext4_error() report in the superblock, to make it easier to root cause problems in production systems. There are also many of the usual cleanups and miscellaneous bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (49 commits) jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft() jbd2: make sure ESHUTDOWN to be recorded in the journal superblock ext4, jbd2: ensure panic when aborting with zero errno jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record jbd2_seq_info_next should increase position index jbd2: remove pointless assertion in __journal_remove_journal_head ext4,jbd2: fix comment and code style jbd2: delete the duplicated words in the comments ext4: fix extent_status trace points ext4: fix symbolic enum printing in trace output ext4: choose hardlimit when softlimit is larger than hardlimit in ext4_statfs_project() ext4: fix race conditions in ->d_compare() and ->d_hash() ext4: make dioread_nolock the default ext4: fix extent_status fragmentation for plain files jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal ext4: drop ext4_kvmalloc() ext4: Add EXT4_IOC_FSGETXATTR/EXT4_IOC_FSSETXATTR to compat_ioctl ext4: remove unused macro MPAGE_DA_EXTENT_TAIL ext4: add missing braces in ext4_ext_drop_refs() ext4: fix some nonstandard indentation in extents.c ...
2020-01-30Merge tag 'threads-v5.6' of ↵Linus Torvalds1-2/+20
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread management updates from Christian Brauner: "Sargun Dhillon over the last cycle has worked on the pidfd_getfd() syscall. This syscall allows for the retrieval of file descriptors of a process based on its pidfd. A task needs to have ptrace_may_access() permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and Andy) on the target. One of the main use-cases is in combination with seccomp's user notification feature. As a reminder, seccomp's user notification feature was made available in v5.0. It allows a task to retrieve a file descriptor for its seccomp filter. The file descriptor is usually handed of to a more privileged supervising process. The supervisor can then listen for syscall events caught by the seccomp filter of the supervisee and perform actions in lieu of the supervisee, usually emulating syscalls. pidfd_getfd() is needed to expand its uses. There are currently two major users that wait on pidfd_getfd() and one future user: - Netflix, Sargun said, is working on a service mesh where users should be able to connect to a dns-based VIP. When a user connects to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be redirected to an envoy process. This service mesh uses seccomp user notifications and pidfd to intercept all connect calls and instead of connecting them to 1.2.3.4:80 connects them to e.g. 127.0.0.1:8080. - LXD uses the seccomp notifier heavily to intercept and emulate mknod() and mount() syscalls for unprivileged containers/processes. With pidfd_getfd() more uses-cases e.g. bridging socket connections will be possible. - The patchset has also seen some interest from the browser corner. Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a broker process. In the future glibc will start blocking all signals during dlopen() rendering this type of sandbox impossible. Hence, in the future Firefox will switch to a seccomp-user-nofication based sandbox which also makes use of file descriptor retrieval. The thread for this can be found at https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html With pidfd_getfd() it is e.g. possible to bridge socket connections for the supervisee (binding to a privileged port) and taking actions on file descriptors on behalf of the supervisee in general. Sargun's first version was using an ioctl on pidfds but various people pushed for it to be a proper syscall which he duely implemented as well over various review cycles. Selftests are of course included. I've also added instructions how to deal with merge conflicts below. There's also a small fix coming from the kernel mentee project to correctly annotate struct sighand_struct with __rcu to fix various sparse warnings. We've received a few more such fixes and even though they are mostly trivial I've decided to postpone them until after -rc1 since they came in rather late and I don't want to risk introducing build warnings. Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is needed to avoid allocation recursions triggerable by storage drivers that have userspace parts that run in the IO path (e.g. dm-multipath, iscsi, etc). These allocation recursions deadlock the device. The new prctl() allows such privileged userspace components to avoid allocation recursions by setting the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags. The patch carries the necessary acks from the relevant maintainers and is routed here as part of prctl() thread-management." * tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim sched.h: Annotate sighand_struct with __rcu test: Add test for pidfd getfd arch: wire up pidfd_getfd syscall pid: Implement pidfd_getfd syscall vfs, fdtable: Add fget_task helper
2020-01-30Merge tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-blockLinus Torvalds8-459/+1995
Pull io_uring updates from Jens Axboe: - Support for various new opcodes (fallocate, openat, close, statx, fadvise, madvise, openat2, non-vectored read/write, send/recv, and epoll_ctl) - Faster ring quiesce for fileset updates - Optimizations for overflow condition checking - Support for max-sized clamping - Support for probing what opcodes are supported - Support for io-wq backend sharing between "sibling" rings - Support for registering personalities - Lots of little fixes and improvements * tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block: (64 commits) io_uring: add support for epoll_ctl(2) eventpoll: support non-blocking do_epoll_ctl() calls eventpoll: abstract out epoll_ctl() handler io_uring: fix linked command file table usage io_uring: support using a registered personality for commands io_uring: allow registering credentials io_uring: add io-wq workqueue sharing io-wq: allow grabbing existing io-wq io_uring/io-wq: don't use static creds/mm assignments io-wq: make the io_wq ref counted io_uring: fix refcounting with batched allocations at OOM io_uring: add comment for drain_next io_uring: don't attempt to copy iovec for READ/WRITE io_uring: honor IOSQE_ASYNC for linked reqs io_uring: prep req when do IOSQE_ASYNC io_uring: use labeled array init in io_op_defs io_uring: optimise sqe-to-req flags translation io_uring: remove REQ_F_IO_DRAINED io_uring: file switch work needs to get flushed on exit io_uring: hide uring_fd in ctx ...
2020-01-30Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds4-303/+97
Pull SCSI updates from James Bottomley: "This series is slightly unusual because it includes Arnd's compat ioctl tree here: 1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Excluding Arnd's changes, this is mostly an update of the usual drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There are a couple of core and base updates around error propagation and atomicity in the attribute container base we use for the SCSI transport classes. The rest is minor changes and updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits) scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity scsi: hisi_sas: Modify the file permissions of trigger_dump to write only scsi: hisi_sas: Replace magic number when handle channel interrupt scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock scsi: hisi_sas: use threaded irq to process CQ interrupts scsi: ufs: Use UFS device indicated maximum LU number scsi: ufs: Add max_lu_supported in struct ufs_dev_info scsi: ufs: Delete is_init_prefetch from struct ufs_hba scsi: ufs: Inline two functions into their callers scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init() scsi: ufs: Split ufshcd_probe_hba() based on its called flow scsi: ufs: Delete struct ufs_dev_desc scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails scsi: ufs-mediatek: enable low-power mode for hibern8 state scsi: ufs: export some functions for vendor usage scsi: ufs-mediatek: add dbg_register_dump implementation scsi: qla2xxx: Fix a NULL pointer dereference in an error path scsi: qla1280: Make checking for 64bit support consistent scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1 ...
2020-01-30Merge tag 'linux-kselftest-5.6-rc1-kunit' of ↵Linus Torvalds3-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest kunit updates from Shuah Khan: "This kunit update consists of: - Support for building kunit as a module from Alan Maguire - AppArmor KUnit tests for policy unpack from Mike Salvatore" * tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: building kunit as a module breaks allmodconfig kunit: update documentation to describe module-based build kunit: allow kunit to be loaded as a module kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds kunit: allow kunit tests to be loaded as a module kunit: hide unexported try-catch interface in try-catch-impl.h kunit: move string-stream.h to lib/kunit apparmor: add AppArmor KUnit tests for policy unpack