summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-11-01Merge tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds5-848/+1045
Pull io_uring updates from Jens Axboe: "Light on new features - basically just the hybrid mode support. Outside of that it's just fixes, cleanups, and performance improvements. In detail: - Add ring related information to the fdinfo output (Hao) - Hybrid async mode (Hao) - Support for batched issue on block (me) - sqe error trace improvement (me) - IOPOLL efficiency improvements (Pavel) - submit state cleanups and improvements (Pavel) - Completion side improvements (Pavel) - Drain improvements (Pavel) - Buffer selection cleanups (Pavel) - Fixed file node improvements (Pavel) - io-wq setup cancelation fix (Pavel) - Various other performance improvements and cleanups (Pavel) - Misc fixes (Arnd, Bixuan, Changcheng, Hao, me, Noah)" * tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-block: (97 commits) io-wq: remove worker to owner tw dependency io_uring: harder fdinfo sq/cq ring iterating io_uring: don't assign write hint in the read path io_uring: clusterise ki_flags access in rw_prep io_uring: kill unused param from io_file_supports_nowait io_uring: clean up timeout async_data allocation io_uring: don't try io-wq polling if not supported io_uring: check if opcode needs poll first on arming io_uring: clean iowq submit work cancellation io_uring: clean io_wq_submit_work()'s main loop io-wq: use helper for worker refcounting io_uring: implement async hybrid mode for pollable requests io_uring: Use ERR_CAST() instead of ERR_PTR(PTR_ERR()) io_uring: split logic of force_nonblock io_uring: warning about unused-but-set parameter io_uring: inform block layer of how many requests we are submitting io_uring: simplify io_file_supports_nowait() io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags io_uring: arm poll for non-nowait files fs/io_uring: Prioritise checking faster conditions first in io_write ...
2021-11-01Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds80-1524/+1521
Pull block driver updates from Jens Axboe: - paride driver cleanups (Christoph) - Remove cryptoloop support (Christoph) - null_blk poll support (me) - Now that add_disk() supports proper error handling, add it to various drivers (Luis) - Make ataflop actually work again (Michael) - s390 dasd fixes (Stefan, Heiko) - nbd fixes (Yu, Ye) - Remove redundant wq flush in mtip32xx (Christophe) - NVMe updates - fix a multipath partition scanning deadlock (Hannes Reinecke) - generate uevent once a multipath namespace is operational again (Hannes Reinecke) - support unique discovery controller NQNs (Hannes Reinecke) - fix use-after-free when a port is removed (Israel Rukshin) - clear shadow doorbell memory on resets (Keith Busch) - use struct_size (Len Baker) - add error handling support for add_disk (Luis Chamberlain) - limit the maximal queue size for RDMA controllers (Max Gurtovoy) - use a few more symbolic names (Max Gurtovoy) - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy) - add support for ->map_queues on FC (Saurav Kashyap) - support the current discovery subsystem entry (Hannes Reinecke) - use flex_array_size and struct_size (Len Baker) - bcache fixes (Christoph, Coly, Chao, Lin, Qing) - MD updates (Christoph, Guoqing, Xiao) - Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye) * tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits) null_blk: Fix handling of submit_queues and poll_queues attributes block: ataflop: Fix warning comparing pointer to 0 bcache: replace snprintf in show functions with sysfs_emit bcache: move uapi header bcache.h to bcache code directory nvmet: use flex_array_size and struct_size nvmet: register discovery subsystem as 'current' nvmet: switch check for subsystem type nvme: add new discovery log page entry definitions block: ataflop: more blk-mq refactoring fixes block: remove support for cryptoloop and the xor transfer mtd: add add_disk() error handling rnbd: add error handling support for add_disk() um/drivers/ubd_kern: add error handling support for add_disk() m68k/emu/nfblock: add error handling support for add_disk() xen-blkfront: add error handling support for add_disk() bcache: add error handling support for add_disk() dm: add add_disk() error handling block: aoe: fixup coccinelle warnings nvmet: use struct_size over open coded arithmetic nvme: drop scan_lock and always kick requeue list when removing namespaces ...
2021-11-01Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds185-4065/+4986
Pull block updates from Jens Axboe: - mq-deadline accounting improvements (Bart) - blk-wbt timer fix (Andrea) - Untangle the block layer includes (Christoph) - Rework the poll support to be bio based, which will enable adding support for polling for bio based drivers (Christoph) - Block layer core support for multi-actuator drives (Damien) - blk-crypto improvements (Eric) - Batched tag allocation support (me) - Request completion batching support (me) - Plugging improvements (me) - Shared tag set improvements (John) - Concurrent queue quiesce support (Ming) - Cache bdev in ->private_data for block devices (Pavel) - bdev dio improvements (Pavel) - Block device invalidation and block size improvements (Xie) - Various cleanups, fixes, and improvements (Christoph, Jackie, Masahira, Tejun, Yu, Pavel, Zheng, me) * tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits) blk-mq-debugfs: Show active requests per queue for shared tags block: improve readability of blk_mq_end_request_batch() virtio-blk: Use blk_validate_block_size() to validate block size loop: Use blk_validate_block_size() to validate block size nbd: Use blk_validate_block_size() to validate block size block: Add a helper to validate the block size block: re-flow blk_mq_rq_ctx_init() block: prefetch request to be initialized block: pass in blk_mq_tags to blk_mq_rq_ctx_init() block: add rq_flags to struct blk_mq_alloc_data block: add async version of bio_set_polled block: kill DIO_MULTI_BIO block: kill unused polling bits in __blkdev_direct_IO() block: avoid extra iter advance with async iocb block: Add independent access ranges support blk-mq: don't issue request directly in case that current is to be blocked sbitmap: silence data race warning blk-cgroup: synchronize blkg creation against policy deactivation block: refactor bio_iov_bvec_set() block: add single bio async direct IO helper ...
2021-11-01Merge tag 'locks-v5.16' of ↵Linus Torvalds9-169/+36
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "Most of this is just follow-on cleanup work of documentation and comments from the mandatory locking removal in v5.15. The only real functional change is that LOCK_MAND flock() support is also being removed, as it has basically been non-functional since the v2.5 days" * tag 'locks-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: remove leftover comments from mandatory locking removal locks: remove changelog comments docs: fs: locks.rst: update comment about mandatory file locking Documentation: remove reference to now removed mandatory-locking doc locks: remove LOCK_MAND flock lock support
2021-11-01Merge tag 'tpmdd-next-v5.16' of ↵Linus Torvalds6-9/+28
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "Only bug fixes" * tag 'tpmdd-next-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm_tis_spi: Add missing SPI ID tpm: fix Atmel TPM crash caused by too frequent queries tpm: Check for integer overflow in tpm2_map_response_body() tpm: tis: Kconfig: Add helper dependency on COMPILE_TEST
2021-11-01Merge tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds74-1703/+2914
Pull memory folios from Matthew Wilcox: "Add memory folios, a new type to represent either order-0 pages or the head page of a compound page. This should be enough infrastructure to support filesystems converting from pages to folios. The point of all this churn is to allow filesystems and the page cache to manage memory in larger chunks than PAGE_SIZE. The original plan was to use compound pages like THP does, but I ran into problems with some functions expecting only a head page while others expect the precise page containing a particular byte. The folio type allows a function to declare that it's expecting only a head page. Almost incidentally, this allows us to remove various calls to VM_BUG_ON(PageTail(page)) and compound_head(). This converts just parts of the core MM and the page cache. For 5.17, we intend to convert various filesystems (XFS and AFS are ready; other filesystems may make it) and also convert more of the MM and page cache to folios. For 5.18, multi-page folios should be ready. The multi-page folios offer some improvement to some workloads. The 80% win is real, but appears to be an artificial benchmark (postgres startup, which isn't a serious workload). Real workloads (eg building the kernel, running postgres in a steady state, etc) seem to benefit between 0-10%. I haven't heard of any performance losses as a result of this series. Nobody has done any serious performance tuning; I imagine that tweaking the readahead algorithm could provide some more interesting wins. There are also other places where we could choose to create large folios and currently do not, such as writes that are larger than PAGE_SIZE. I'd like to thank all my reviewers who've offered review/ack tags: Christoph Hellwig, David Howells, Jan Kara, Jeff Layton, Johannes Weiner, Kirill A. Shutemov, Michal Hocko, Mike Rapoport, Vlastimil Babka, William Kucharski, Yu Zhao and Zi Yan. I'd also like to thank those who gave feedback I incorporated but haven't offered up review tags for this part of the series: Nick Piggin, Mel Gorman, Ming Lei, Darrick Wong, Ted Ts'o, John Hubbard, Hugh Dickins, and probably a few others who I forget" * tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache: (90 commits) mm/writeback: Add folio_write_one mm/filemap: Add FGP_STABLE mm/filemap: Add filemap_get_folio mm/filemap: Convert mapping_get_entry to return a folio mm/filemap: Add filemap_add_folio() mm/filemap: Add filemap_alloc_folio mm/page_alloc: Add folio allocation functions mm/lru: Add folio_add_lru() mm/lru: Convert __pagevec_lru_add_fn to take a folio mm: Add folio_evictable() mm/workingset: Convert workingset_refault() to take a folio mm/filemap: Add readahead_folio() mm/filemap: Add folio_mkwrite_check_truncate() mm/filemap: Add i_blocks_per_folio() mm/writeback: Add folio_redirty_for_writepage() mm/writeback: Add folio_account_redirty() mm/writeback: Add folio_clear_dirty_for_io() mm/writeback: Add folio_cancel_dirty() mm/writeback: Add folio_account_cleaned() mm/writeback: Add filemap_dirty_folio() ...
2021-10-31Linux 5.15v5.15Linus Torvalds1-2/+2
2021-10-31Merge tag 'perf-tools-fixes-for-v5.15-2021-10-31' of ↵Linus Torvalds3-6/+12
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix compilation of callchain related code on powerpc with gcc11+ - Fix PERF_SAMPLE_WEIGHT_STRUCT support in 'perf script' - Check session->header.env.arch before using it, fixing a segmentation fault - Suppress 'rm dlfilter' build messages * tag 'perf-tools-fixes-for-v5.15-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf script: Fix PERF_SAMPLE_WEIGHT_STRUCT support perf callchain: Fix compilation on powerpc with gcc11+ perf script: Check session->header.env.arch before using it perf build: Suppress 'rm dlfilter' build message
2021-10-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-25/+61
Pull kvm fixes from Paolo Bonzini: - Fixes for s390 interrupt delivery - Fixes for Xen emulator bugs showing up as debug kernel WARNs - Fix another issue with SEV/ES string I/O VMGEXITs * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Take srcu lock in post_kvm_run_save() KVM: SEV-ES: fix another issue with string I/O VMGEXITs KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block() KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu KVM: s390: clear kicked_mask before sleeping again
2021-10-31perf script: Fix PERF_SAMPLE_WEIGHT_STRUCT supportKan Liang1-1/+1
-F weight in perf script is broken. # ./perf mem record # ./perf script -F weight Samples for 'dummy:HG' event do not have WEIGHT attribute set. Cannot print 'weight' field. The sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the PERF_SAMPLE_WEIGHT sample type. They share the same space, weight. The lower 32 bits are exactly the same for both sample type. The higher 32 bits may be different for different architecture. For a new kernel on x86, the PERF_SAMPLE_WEIGHT_STRUCT is used. For an old kernel or other ARCHs, the PERF_SAMPLE_WEIGHT is used. With -F weight, current perf script will only check the input string "weight" with the PERF_SAMPLE_WEIGHT sample type. Because the commit ea8d0ed6eae3 ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT") didn't update the PERF_SAMPLE_WEIGHT_STRUCT sample type for perf script. For a new kernel on x86, the check fails. Use PERF_SAMPLE_WEIGHT_TYPE, which supports both sample types, to replace PERF_SAMPLE_WEIGHT Fixes: ea8d0ed6eae37b01 ("perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT") Reported-by: Joe Mario <jmario@redhat.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Joe Mario <jmario@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/1632929894-102778-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31perf callchain: Fix compilation on powerpc with gcc11+Jiri Olsa1-1/+1
Got following build fail on powerpc: CC arch/powerpc/util/skip-callchain-idx.o In function ‘check_return_reg’, inlined from ‘check_return_addr’ at arch/powerpc/util/skip-callchain-idx.c:213:7, inlined from ‘arch_skip_callchain_idx’ at arch/powerpc/util/skip-callchain-idx.c:265:7: arch/powerpc/util/skip-callchain-idx.c:54:18: error: ‘dwarf_frame_register’ accessing 96 bytes \ in a region of size 64 [-Werror=stringop-overflow=] 54 | result = dwarf_frame_register(frame, ra_regno, ops_mem, &ops, &nops); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/util/skip-callchain-idx.c: In function ‘arch_skip_callchain_idx’: arch/powerpc/util/skip-callchain-idx.c:54:18: note: referencing argument 3 of type ‘Dwarf_Op *’ In file included from /usr/include/elfutils/libdwfl.h:32, from arch/powerpc/util/skip-callchain-idx.c:10: /usr/include/elfutils/libdw.h:1069:12: note: in a call to function ‘dwarf_frame_register’ 1069 | extern int dwarf_frame_register (Dwarf_Frame *frame, int regno, | ^~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors The dwarf_frame_register args changed with [1], Updating ops_mem accordingly. [1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=5621fe5443da23112170235dd5cac161e5c75e65 Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Mark Wieelard <mjw@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20210928195253.1267023-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31perf script: Check session->header.env.arch before using itSong Liu1-4/+8
When perf.data is not written cleanly, we would like to process existing data as much as possible (please see f_header.data.size == 0 condition in perf_session__read_header). However, perf.data with partial data may crash perf. Specifically, we see crash in 'perf script' for NULL session->header.env.arch. Fix this by checking session->header.env.arch before using it to determine native_arch. Also split the if condition so it is easier to read. Committer notes: If it is a pipe, we already assume is a native arch, so no need to check session->header.env.arch. Signed-off-by: Song Liu <songliubraving@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@fb.com Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20211004053238.514936-1-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31perf build: Suppress 'rm dlfilter' build messageAdrian Hunter1-0/+2
The following build message: rm dlfilters/dlfilter-test-api-v0.o is unwanted. The object file is being treated as an intermediate file and being automatically removed. Mark the object file as .SECONDARY to prevent removal and hence the message. Requested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20210930062849.110416-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-31Merge tag 'scsi-fixes' of ↵Linus Torvalds6-295/+15
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small fixes, all in drivers, and one sizeable update to the UFS driver to remove the HPB 2.0 feature that has been objected to by Jens and Christoph. Although the UFS patch is large and last minute, it's essentially the least intrusive way of resolving the objections in time for the 5.15 release" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: ufshpb: Remove HPB2.0 flows scsi: mpt3sas: Fix reference tag handling for WRITE_INSERT scsi: ufs: ufs-exynos: Correct timeout value setting registers scsi: ibmvfc: Fix up duplicate response detection
2021-10-30Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds1-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fix from Stephen Boyd: "One fix for the composite clk that broke when we changed this clk type to use the determine_rate instead of round_rate clk op by default. This caused lots of problems on Rockchip SoCs because they heavily use the composite clk code to model the clk tree" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: composite: Also consider .determine_rate for rate + mux composites
2021-10-30Merge tag 'riscv-for-linus-5.15-rc8' of ↵Linus Torvalds4-13/+11
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "These are pretty late, but they do fix concrete issues. - ensure the trap vector's address is aligned. - avoid re-populating the KASAN shadow memory. - allow kasan to build without warnings, which have recently become errors" * tag 'riscv-for-linus-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix asan-stack clang build riscv: Do not re-populate shadow memory with kasan_populate_early_shadow riscv: fix misalgned trap vector base address
2021-10-30scsi: ufs: ufshpb: Remove HPB2.0 flowsAvri Altman3-288/+4
The Host Performance Buffer feature allows UFS read commands to carry the physical media addresses along with the LBAs, thus allowing less internal L2P-table switches in the device. HPB1.0 allowed a single LBA, while HPB2.0 increases this capacity up to 255 blocks. Carrying more than a single record, the read operation is no longer purely of type "read" but a "hybrid" command: Writing the physical address to the device in one operation and reading back the required payload in another. The JEDEC HPB spec defines two commands for this operation: HPB-WRITE-BUFFER (0x2) to write the physical addresses to device, and HPB-READ to read the payload. With the current HPB design the UFS driver has no alternative but to divide the READ request into 2 separate commands: HPB-WRITE-BUFFER and HPB-READ. This causes a great deal of aggravation to the block layer guys who demanded that we completely revert the entire HPB driver regardless of the huge amount of corporate effort already invested in it. As a compromise, remove only the pieces that implement the 2.0 specification. This is done as a matter of urgency for the final 5.15 release. Link: https://lore.kernel.org/r/20211030062301.248-1-avri.altman@wdc.com Tested-by: Avri Altman <avri.altman@wdc.com> Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bean Huo <beanhuo@micron.com> Co-developed-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-30Merge tag 'powerpc-5.15-6' of ↵Linus Torvalds1-13/+14
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Three commits fixing some issues introduced with the recent IOMMU changes we merged. Thanks to Alexey Kardashevskiy" * tag 'powerpc-5.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries/iommu: Create huge DMA window if no MMIO32 is present powerpc/pseries/iommu: Check if the default window in use before removing it powerpc/pseries/iommu: Use correct vfree for it_map
2021-10-30Merge tag 'gpio-fixes-for-v5.15' of ↵Linus Torvalds2-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix the return value check when parsing the ngpios property in gpio-xgs-iproc - check the return value of bgpio_init() in gpio-mlxbf2 * tag 'gpio-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: mlxbf2.c: Add check for bgpio_init failure gpio: xgs-iproc: fix parsing of ngpios property
2021-10-29Merge tag 'block-5.15-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds5-20/+51
Pull block fixes from Jens Axboe: - NVMe pull request: - fix nvmet-tcp header digest verification (Amit Engel) - fix a memory leak in nvmet-tcp when releasing a queue (Maurizio Lombardi) - fix nvme-tcp H2CData PDU send accounting again (Sagi Grimberg) - fix digest pointer calculation in nvme-tcp and nvmet-tcp (Varun Prakash) - fix possible nvme-tcp req->offset corruption (Varun Prakash) - Queue drain ordering fix (Ming) - Partition check regression for zoned devices (Shin'ichiro) - Zone queue restart fix (Naohiro) * tag 'block-5.15-2021-10-29' of git://git.kernel.dk/linux-block: block: Fix partition check for host-aware zoned block devices nvmet-tcp: fix header digest verification nvmet-tcp: fix data digest pointer calculation nvme-tcp: fix data digest pointer calculation nvme-tcp: fix possible req->offset corruption block: schedule queue restart after BLK_STS_ZONE_RESOURCE block: drain queue after disk is removed from sysfs nvme-tcp: fix H2CData PDU send accounting (again) nvmet-tcp: fix a memory leak when releasing a queue
2021-10-29scsi: mpt3sas: Fix reference tag handling for WRITE_INSERTMartin K. Petersen1-3/+6
Testing revealed a problem with how the reference tag was handled for a WRITE_INSERT operation. The SCSI_PROT_REF_CHECK flag is not set when the controller is asked to generate the protection information (i.e. not DIX). And as a result the initial reference tag would not be set in the WRITE_INSERT case. Separate handling of the REF_CHECK and REF_INCREMENT flags to align with both the DIX spec and the MPI implementation. Link: https://lore.kernel.org/r/20211028034202.24225-1-martin.petersen@oracle.com Fixes: b3e2c72af1d5 ("scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI") Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-29Merge tag 'mmc-v5.15-rc5' of ↵Linus Torvalds9-39/+104
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - tmio: Re-enable card irqs after a reset - mtk-sd: Fixup probing of cqhci for crypto - cqhci: Fix support for suspend/resume - vub300: Fix control-message timeouts - dw_mmc-exynos: Fix support for tuning - winbond: Silences build errors on M68K - sdhci-esdhc-imx: Fix support for tuning - sdhci-pci: Read card detect from ACPI for Intel Merrifield - sdhci: Fix eMMC support for Thundercomm TurboX CM2290 * tag 'mmc-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: tmio: reenable card irqs after the reset callback mmc: mediatek: Move cqhci init behind ungate clock mmc: cqhci: clear HALT state after CQE enable mmc: vub300: fix control-message timeouts mmc: dw_mmc: exynos: fix the finding clock sample value mmc: winbond: don't build on M68K mmc: sdhci-esdhc-imx: clear the buffer_read_ready to reset standard tuning circuit mmc: sdhci-pci: Read card detect from ACPI for Intel Merrifield mmc: sdhci: Map more voltage level to SDHCI_POWER_330
2021-10-29Merge tag 'for-5.15-rc7-tag' of ↵Linus Torvalds5-33/+72
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Last minute fixes for crash on 32bit architectures when compression is in use. It's a regression introduced in 5.15-rc and I'd really like not let this into the final release, fixes via stable trees would add unnecessary delay. The problem is on 32bit architectures with highmem enabled, the pages for compression may need to be kmapped, while the patches removed that as we don't use GFP_HIGHMEM allocations anymore. The pages that don't come from local allocation still may be from highmem. Despite being on 32bit there's enough such ARM machines in use so it's not a marginal issue. I did full reverts of the patches one by one instead of a huge one. There's one exception for the "lzo" revert as there was an intermediate patch touching the same code to make it compatible with subpage. I can't revert that one too, so the revert in lzo.c is manual. Qu Wenruo has worked on that with me and verified the changes" * tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Revert "btrfs: compression: drop kmap/kunmap from lzo" Revert "btrfs: compression: drop kmap/kunmap from zlib" Revert "btrfs: compression: drop kmap/kunmap from zstd" Revert "btrfs: compression: drop kmap/kunmap from generic helpers"
2021-10-29Merge tag 'trace-v5.15-rc6-3' of ↵Linus Torvalds2-9/+11
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing comment fixes from Steven Rostedt: - Some bots have informed me that some of the ftrace functions kernel-doc has formatting issues. - Also, fix my snake instinct. * tag 'trace-v5.15-rc6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix misspelling of "missing" ftrace: Fix kernel-doc formatting issues
2021-10-29Merge branch 'linus' of ↵Linus Torvalds2-2/+10
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a build-time warning in x86/sm4" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/sm4 - Fix invalid section entry size
2021-10-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds12-54/+110
Merge misc fixes from Andrew Morton: "11 patches. Subsystems affected by this patch series: mm (memcg, memory-failure, oom-kill, secretmem, vmalloc, hugetlb, damon, and tools), and ocfs2" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof to pointer mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()' mm: khugepaged: skip huge page collapse for special files mm, thp: bail out early in collapse_file for writeback page mm/vmalloc: fix numa spreading for large hash tables mm/secretmem: avoid letting secretmem_users drop to zero ocfs2: fix race between searching chunks and release journal_head from buffer_head mm/oom_kill.c: prevent a race between process_mrelease and exit_mmap mm: filemap: check if THP has hwpoisoned subpage for PMD page fault mm: hwpoison: remove the unnecessary THP check memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT
2021-10-29riscv: Fix asan-stack clang buildAlexandre Ghiti3-2/+10
Nathan reported that because KASAN_SHADOW_OFFSET was not defined in Kconfig, it prevents asan-stack from getting disabled with clang even when CONFIG_KASAN_STACK is disabled: fix this by defining the corresponding config. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Fixes: 8ad8b72721d0 ("riscv: Add KASAN support") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-29riscv: Do not re-populate shadow memory with kasan_populate_early_shadowAlexandre Ghiti1-11/+0
When calling this function, all the shadow memory is already populated with kasan_early_shadow_pte which has PAGE_KERNEL protection. kasan_populate_early_shadow write-protects the mapping of the range of addresses passed in argument in zero_pte_populate, which actually write-protects all the shadow memory mapping since kasan_early_shadow_pte is used for all the shadow memory at this point. And then when using memblock API to populate the shadow memory, the first write access to the kernel stack triggers a trap. This becomes visible with the next commit that contains a fix for asan-stack. We already manually populate all the shadow memory in kasan_early_init and we write-protect kasan_early_shadow_pte at the end of kasan_init which makes the calls to kasan_populate_early_shadow superfluous so we can remove them. Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Fixes: e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support") Fixes: 8ad8b72721d0 ("riscv: Add KASAN support") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-10-29io-wq: remove worker to owner tw dependencyPavel Begunkov1-9/+37
INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds. Not tainted 5.15.0-rc5-syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:iou-wrk-6609 state:D stack:27944 pid: 6612 ppid: 6526 flags:0x00004006 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xb44/0x5960 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857 do_wait_for_common kernel/sched/completion.c:85 [inline] __wait_for_common kernel/sched/completion.c:106 [inline] wait_for_common kernel/sched/completion.c:117 [inline] wait_for_completion+0x176/0x280 kernel/sched/completion.c:138 io_worker_exit fs/io-wq.c:183 [inline] io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 io-wq worker may submit a task_work to the master task and upon io_worker_exit() wait for the tw to get executed. The problem appears when the master task is waiting in coredump.c: 468 freezer_do_not_count(); 469 wait_for_completion(&core_state->startup); 470 freezer_count(); Apparently having some dependency on children threads getting everything stuck. Workaround it by cancelling the taks_work callback that causes it before going into io_worker_exit() waiting. p.s. probably a better option is to not submit tw elevating the refcount in the first place, but let's leave this excercise for the future. Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+27d62ee6f256b186883e@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/142a716f4ed936feae868959059154362bfa8c19.1635509451.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29io_uring: harder fdinfo sq/cq ring iteratingJens Axboe1-22/+29
The ring iteration is racy, which isn't necessarily a problem except it can cause us to iterate the whole thing. That isn't desired or ideal, and it can lead to excessive runtimes of reading fdinfo. Cap the iteration at tail - head OR the ring size. While in there, clean up the ring masking and just dump the raw values along with the masks. That provides more useful debug info. Fixes: 83f84356bc8f ("io_uring: add more uring info to fdinfo for debug") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29tracing: Fix misspelling of "missing"Steven Rostedt (VMware)1-1/+1
My snake instinct was on and I wrote "misssing" instead of "missing". Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29ftrace: Fix kernel-doc formatting issuesSteven Rostedt (VMware)1-8/+10
Some functions had kernel-doc that used a comma instead of a hash to separate the function name from the one line description. Also, the "ftrace_is_dead()" had an incomplete description. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29null_blk: Fix handling of submit_queues and poll_queues attributesShin'ichiro Kawasaki2-17/+87
Commit 0a593fbbc245 ("null_blk: poll queue support") introduced the poll queue feature to null_blk. After this change, null_blk device has both submit queues and poll queues, and null_map_queues() callback maps the both queues for corresponding hardware contexts. The commit also added the device configuration attribute 'poll_queues' in same manner as the existing attribute 'submit_queues'. These attributes allow to modify the numbers of queues. However, when the new values are stored to these attributes, the values are just handled only for the corresponding queue. When number of submit_queue is updated, number of poll_queue is not counted, or vice versa. This caused inconsistent number of queues and queue mapping and resulted in null-ptr-dereference. This failure was observed in blktests block/029 and block/030. To avoid the inconsistency, fix the attribute updates to care both submit_queues and poll_queues. Introduce the helper function nullb_update_nr_hw_queues() to handle stores to the both two attributes. Add poll_queues field to the struct nullb_device to track the number in same manner as submit_queues. Add two more fields prev_submit_queues and prev_poll_queues to keep the previous values before change. In case the block layer failed to update the nr_hw_queues, refer the previous values in null_map_queues() to map queues in same manner as before change. Also add poll_queues value checks in nullb_update_nr_hw_queues() and null_validate_conf(). They ensure the poll_queues value of each device is within the range from 1 to module parameter value of poll_queues. Fixes: 0a593fbbc245 ("null_blk: poll queue support") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20211029103926.845635-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29block: ataflop: Fix warning comparing pointer to 0Jiapeng Chong1-2/+1
Fix the following coccicheck warning: ./drivers/block/ataflop.c:1464:20-21: WARNING comparing pointer to 0. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/1635501029-81391-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29blk-mq-debugfs: Show active requests per queue for shared tagsJohn Garry1-1/+1
Currently we show the hctx.active value for the per-hctx "active" file. However this is not maintained for shared tags, and we instead keep a record of the number active requests per request queue - see commit f1b49fdc1c64 ("blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap). Change for the case of shared tags to show the active requests per request queue by using __blk_mq_active_requests() helper. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1635496823-33515-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29bcache: replace snprintf in show functions with sysfs_emitQing Wang2-19/+16
coccicheck complains about the use of snprintf() in sysfs show functions. Fix the following coccicheck warning: drivers/md/bcache/sysfs.h:54:12-20: WARNING: use scnprintf or sprintf. Implement sysfs_print() by sysfs_emit() and remove snprint() since no one uses it any more. Suggested-by: Coly Li <colyli@suse.de> Signed-off-by: Qing Wang <wangqing@vivo.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20211029060930.119923-3-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29bcache: move uapi header bcache.h to bcache code directoryColy Li5-4/+5
The header file include/uapi/linux/bcache.h is not really a user space API heaer. This file defines the ondisk format of bcache internal meta data but no one includes it from user space, bcache-tools has its own copy of this header with minor modification. Therefore, this patch moves include/uapi/linux/bcache.h to bcache code directory as drivers/md/bcache/bcache_ondisk.h. Suggested-by: Arnd Bergmann <arnd@kernel.org> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20211029060930.119923-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29Revert "btrfs: compression: drop kmap/kunmap from lzo"David Sterba1-11/+25
This reverts commit 8c945d32e60427cbc0859cf7045bbe6196bb03d8. The kmaps in compression code are still needed and cause crashes on 32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004 with enabled LZO or ZSTD compression. The revert does not apply cleanly due to changes in a6e66e6f8c1b ("btrfs: rework lzo_decompress_bio() to make it subpage compatible") that reworked the page iteration so the revert is done to be equivalent to the original code. Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839 Tested-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29Revert "btrfs: compression: drop kmap/kunmap from zlib"David Sterba1-11/+25
This reverts commit 696ab562e6df9fbafd6052d8ce4aafcb2ed16069. The kmaps in compression code are still needed and cause crashes on 32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004 with enabled LZO or ZSTD compression. Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839 Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29Revert "btrfs: compression: drop kmap/kunmap from zstd"David Sterba1-9/+18
This reverts commit bbaf9715f3f5b5ff0de71da91fcc34ee9c198ed8. The kmaps in compression code are still needed and cause crashes on 32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004 with enabled LZO or ZSTD compression. Example stacktrace with ZSTD on a 32bit ARM machine: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c4159ed3 [00000000] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 210 Comm: kworker/u2:3 Not tainted 5.14.0-rc79+ #12 Hardware name: Allwinner sun4i/sun5i Families Workqueue: btrfs-delalloc btrfs_work_helper PC is at mmiocpy+0x48/0x330 LR is at ZSTD_compressStream_generic+0x15c/0x28c (mmiocpy) from [<c0629648>] (ZSTD_compressStream_generic+0x15c/0x28c) (ZSTD_compressStream_generic) from [<c06297dc>] (ZSTD_compressStream+0x64/0xa0) (ZSTD_compressStream) from [<c049444c>] (zstd_compress_pages+0x170/0x488) (zstd_compress_pages) from [<c0496798>] (btrfs_compress_pages+0x124/0x12c) (btrfs_compress_pages) from [<c043c068>] (compress_file_range+0x3c0/0x834) (compress_file_range) from [<c043c4ec>] (async_cow_start+0x10/0x28) (async_cow_start) from [<c0475c3c>] (btrfs_work_helper+0x100/0x230) (btrfs_work_helper) from [<c014ef68>] (process_one_work+0x1b4/0x418) (process_one_work) from [<c014f210>] (worker_thread+0x44/0x524) (worker_thread) from [<c0156aa4>] (kthread+0x180/0x1b0) (kthread) from [<c0100150>] Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839 Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29tools/testing/selftests/vm/split_huge_page_test.c: fix application of sizeof ↵David Yang1-1/+1
to pointer The coccinelle check report: ./tools/testing/selftests/vm/split_huge_page_test.c:344:36-42: ERROR: application of sizeof to pointer Use "strlen" to fix it. Link: https://lkml.kernel.org/r/20211012030116.184027-1-davidcomponentone@gmail.com Signed-off-by: David Yang <davidcomponentone@gmail.com> Reported-by: Zeal Robot <zealci@zte.com.cn> Cc: Zi Yan <ziy@nvidia.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-29mm/damon/core-test: fix wrong expectations for 'damon_split_regions_of()'SeongJae Park1-2/+2
Kunit test cases for 'damon_split_regions_of()' expects the number of regions after calling the function will be same to their request ('nr_sub'). However, the requested number is just an upper-limit, because the function randomly decides the size of each sub-region. This fixes the wrong expectation. Link: https://lkml.kernel.org/r/20211028090628.14948-1-sj@kernel.org Fixes: 17ccae8bb5c9 ("mm/damon: add kunit tests") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-29mm: khugepaged: skip huge page collapse for special filesYang Shi1-8/+11
The read-only THP for filesystems will collapse THP for files opened readonly and mapped with VM_EXEC. The intended usecase is to avoid TLB misses for large text segments. But it doesn't restrict the file types so a THP could be collapsed for a non-regular file, for example, block device, if it is opened readonly and mapped with EXEC permission. This may cause bugs, like [1] and [2]. This is definitely not the intended usecase, so just collapse THP for regular files in order to close the attack surface. [shy828301@gmail.com: fix vm_file check [3]] Link: https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx+f5vA@mail.gmail.com/ [1] Link: https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/ [2] Link: https://lkml.kernel.org/r/CAHbLzkqTW9U3VvTu1Ki5v_cLRC9gHW+znBukg_ycergE0JWj-A@mail.gmail.com [3] Link: https://lkml.kernel.org/r/20211027195221.3825-1-shy828301@gmail.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Reported-by: Hao Sun <sunhao.th@gmail.com> Reported-by: syzbot+aae069be1de40fb11825@syzkaller.appspotmail.com Cc: Matthew Wilcox <willy@infradead.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Song Liu <songliubraving@fb.com> Cc: Andrea Righi <andrea.righi@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-29mm, thp: bail out early in collapse_file for writeback pageRongwei Wang1-1/+6
Currently collapse_file does not explicitly check PG_writeback, instead, page_has_private and try_to_release_page are used to filter writeback pages. This does not work for xfs with blocksize equal to or larger than pagesize, because in such case xfs has no page->private. This makes collapse_file bail out early for writeback page. Otherwise, xfs end_page_writeback will panic as follows. page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:ffff0003f88c86a8 index:0x0 pfn:0x84ef32 aops:xfs_address_space_operations [xfs] ino:30000b7 dentry name:"libtest.so" flags: 0x57fffe0000008027(locked|referenced|uptodate|active|writeback) raw: 57fffe0000008027 ffff80001b48bc28 ffff80001b48bc28 ffff0003f88c86a8 raw: 0000000000000000 0000000000000000 00000000ffffffff ffff0000c3e9a000 page dumped because: VM_BUG_ON_PAGE(((unsigned int) page_ref_count(page) + 127u <= 127u)) page->mem_cgroup:ffff0000c3e9a000 ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1212! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: BUG: Bad page state in process khugepaged pfn:84ef32 xfs(E) page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:0 index:0x0 pfn:0x84ef32 libcrc32c(E) rfkill(E) aes_ce_blk(E) crypto_simd(E) ... CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Tainted: ... pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) Call trace: end_page_writeback+0x1c0/0x214 iomap_finish_page_writeback+0x13c/0x204 iomap_finish_ioend+0xe8/0x19c iomap_writepage_end_bio+0x38/0x50 bio_endio+0x168/0x1ec blk_update_request+0x278/0x3f0 blk_mq_end_request+0x34/0x15c virtblk_request_done+0x38/0x74 [virtio_blk] blk_done_softirq+0xc4/0x110 __do_softirq+0x128/0x38c __irq_exit_rcu+0x118/0x150 irq_exit+0x1c/0x30 __handle_domain_irq+0x8c/0xf0 gic_handle_irq+0x84/0x108 el1_irq+0xcc/0x180 arch_cpu_idle+0x18/0x40 default_idle_call+0x4c/0x1a0 cpuidle_idle_call+0x168/0x1e0 do_idle+0xb4/0x104 cpu_startup_entry+0x30/0x9c secondary_start_kernel+0x104/0x180 Code: d4210000 b0006161 910c8021 94013f4d (d4210000) ---[ end trace 4a88c6a074082f8c ]--- Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt Link: https://lkml.kernel.org/r/20211022023052.33114-1-rongwei.wang@linux.alibaba.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Suggested-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Song Liu <song@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-29mm/vmalloc: fix numa spreading for large hash tablesChen Wandun1-6/+9
Eric Dumazet reported a strange numa spreading info in [1], and found commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") introduced this issue [2]. Dig into the difference before and after this patch, page allocation has some difference: before: alloc_large_system_hash __vmalloc __vmalloc_node(..., NUMA_NO_NODE, ...) __vmalloc_node_range __vmalloc_area_node alloc_page /* because NUMA_NO_NODE, so choose alloc_page branch */ alloc_pages_current alloc_page_interleave /* can be proved by print policy mode */ after: alloc_large_system_hash __vmalloc __vmalloc_node(..., NUMA_NO_NODE, ...) __vmalloc_node_range __vmalloc_area_node alloc_pages_node /* choose nid by nuam_mem_id() */ __alloc_pages_node(nid, ....) So after commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings"), it will allocate memory in current node instead of interleaving allocate memory. Link: https://lore.kernel.org/linux-mm/CANn89iL6AAyWhfxdHO+jaT075iOa3XcYn9k6JJc7JR2XYn6k_Q@mail.gmail.com/ [1] Link: https://lore.kernel.org/linux-mm/CANn89iLofTR=AK-QOZY87RdUZENCZUT4O6a0hvhu3_EwRMerOg@mail.gmail.com/ [2] Link: https://lkml.kernel.org/r/20211021080744.874701-2-chenwandun@huawei.com Fixes: 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Reported-by: Eric Dumazet <edumazet@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-29mm/secretmem: avoid letting secretmem_users drop to zeroKees Cook1-1/+1
Quoting Dmitry: "refcount_inc() needs to be done before fd_install(). After fd_install() finishes, the fd can be used by userspace and we can have secret data in memory before the refcount_inc(). A straightforward misuse where a user will predict the returned fd in another thread before the syscall returns and will use it to store secret data is somewhat dubious because such a user just shoots themself in the foot. But a more interesting misuse would be to close the predicted fd and decrement the refcount before the corresponding refcount_inc, this way one can briefly drop the refcount to zero while there are other users of secretmem." Move fd_install() after refcount_inc(). Link: https://lkml.kernel.org/r/20211021154046.880251-1-keescook@chromium.org Link: https://lore.kernel.org/lkml/CACT4Y+b1sW6-Hkn8HQYw_SsT7X3tp-CJNh2ci0wG3ZnQz9jjig@mail.gmail.com Fixes: 9a436f8ff631 ("PM: hibernate: disable when there are active secretmem users") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jordy Zomer <jordy@pwning.systems> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-29ocfs2: fix race between searching chunks and release journal_head from ↵Gautham Ananthakrishna1-9/+13
buffer_head Encountered a race between ocfs2_test_bg_bit_allocatable() and jbd2_journal_put_journal_head() resulting in the below vmcore. PID: 106879 TASK: ffff880244ba9c00 CPU: 2 COMMAND: "loop3" Call trace: panic oops_end no_context __bad_area_nosemaphore bad_area_nosemaphore __do_page_fault do_page_fault page_fault [exception RIP: ocfs2_block_group_find_clear_bits+316] ocfs2_block_group_find_clear_bits [ocfs2] ocfs2_cluster_group_search [ocfs2] ocfs2_search_chain [ocfs2] ocfs2_claim_suballoc_bits [ocfs2] __ocfs2_claim_clusters [ocfs2] ocfs2_claim_clusters [ocfs2] ocfs2_local_alloc_slide_window [ocfs2] ocfs2_reserve_local_alloc_bits [ocfs2] ocfs2_reserve_clusters_with_limit [ocfs2] ocfs2_reserve_clusters [ocfs2] ocfs2_lock_refcount_allocators [ocfs2] ocfs2_make_clusters_writable [ocfs2] ocfs2_replace_cow [ocfs2] ocfs2_refcount_cow [ocfs2] ocfs2_file_write_iter [ocfs2] lo_rw_aio loop_queue_work kthread_worker_fn kthread ret_from_fork When ocfs2_test_bg_bit_allocatable() called bh2jh(bg_bh), the bg_bh->b_private NULL as jbd2_journal_put_journal_head() raced and released the jounal head from the buffer head. Needed to take bit lock for the bit 'BH_JournalHead' to fix this race. Link: https://lkml.kernel.org/r/1634820718-6043-1-git-send-email-gautham.ananthakrishna@oracle.com Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: <rajesh.sivaramasubramaniom@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-29mm/oom_kill.c: prevent a race between process_mrelease and exit_mmapSuren Baghdasaryan1-11/+12
Race between process_mrelease and exit_mmap, where free_pgtables is called while __oom_reap_task_mm is in progress, leads to kernel crash during pte_offset_map_lock call. oom-reaper avoids this race by setting MMF_OOM_VICTIM flag and causing exit_mmap to take and release mmap_write_lock, blocking it until oom-reaper releases mmap_read_lock. Reusing MMF_OOM_VICTIM for process_mrelease would be the simplest way to fix this race, however that would be considered a hack. Fix this race by elevating mm->mm_users and preventing exit_mmap from executing until process_mrelease is finished. Patch slightly refactors the code to adapt for a possible mmget_not_zero failure. This fix has considerable negative impact on process_mrelease performance and will likely need later optimization. Link: https://lkml.kernel.org/r/20211022014658.263508-1-surenb@google.com Fixes: 884a7e5964e0 ("mm: introduce process_mrelease system call") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Rik van Riel <riel@surriel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Christian Brauner <christian@brauner.io> Cc: Christoph Hellwig <hch@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-29mm: filemap: check if THP has hwpoisoned subpage for PMD page faultYang Shi5-1/+51
When handling shmem page fault the THP with corrupted subpage could be PMD mapped if certain conditions are satisfied. But kernel is supposed to send SIGBUS when trying to map hwpoisoned page. There are two paths which may do PMD map: fault around and regular fault. Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") the thing was even worse in fault around path. The THP could be PMD mapped as long as the VMA fits regardless what subpage is accessed and corrupted. After this commit as long as head page is not corrupted the THP could be PMD mapped. In the regular fault path the THP could be PMD mapped as long as the corrupted page is not accessed and the VMA fits. This loophole could be fixed by iterating every subpage to check if any of them is hwpoisoned or not, but it is somewhat costly in page fault path. So introduce a new page flag called HasHWPoisoned on the first tail page. It indicates the THP has hwpoisoned subpage(s). It is set if any subpage of THP is found hwpoisoned by memory failure and after the refcount is bumped successfully, then cleared when the THP is freed or split. The soft offline path doesn't need this since soft offline handler just marks a subpage hwpoisoned when the subpage is migrated successfully. But shmem THP didn't get split then migrated at all. Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-29mm: hwpoison: remove the unnecessary THP checkYang Shi1-14/+0
When handling THP hwpoison checked if the THP is in allocation or free stage since hwpoison may mistreat it as hugetlb page. After commit 415c64c1453a ("mm/memory-failure: split thp earlier in memory error handling") the problem has been fixed, so this check is no longer needed. Remove it. The side effect of the removal is hwpoison may report unsplit THP instead of unknown error for shmem THP. It seems not like a big deal. The following patch "mm: filemap: check if THP has hwpoisoned subpage for PMD page fault" depends on this, which fixes shmem THP with hwpoisoned subpage(s) are mapped PMD wrongly. So this patch needs to be backported to -stable as well. Link: https://lkml.kernel.org/r/20211020210755.23964-2-shy828301@gmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>