summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-05-07Input: ili210x - add missing negation for touch indication on ili210xHansem Ro1-1/+1
This adds the negation needed for proper finger detection on Ilitek ili2107/ili210x. This fixes polling issues (on Amazon Kindle Fire) caused by returning false for the cooresponding finger on the touchscreen. Signed-off-by: Hansem Ro <hansemro@outlook.com> Fixes: e3559442afd2a ("ili210x - rework the touchscreen sample processing") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2021-05-07Merge tag 's390-5.13-2' of ↵Linus Torvalds16-80/+103
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - add support for system call stack randomization - handle stale PCI deconfiguration events - couple of defconfig updates - some fixes and cleanups * tag 's390-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: fix detection of vector enhancements facility 1 vs. vector packed decimal facility s390/entry: add support for syscall stack randomization s390/configs: change CONFIG_VIRTIO_CONSOLE to "m" s390/cio: remove invalid condition on IO_SCH_UNREG s390/cpumf: remove call to perf_event_update_userpage s390/cpumf: move counter set size calculation to common place s390/cpumf: beautify if-then-else indentation s390/configs: enable CONFIG_PCI_IOV s390/pci: handle stale deconfiguration events s390/pci: rename zpci_configure_device()
2021-05-07Merge tag 'vfio-v5.13-rc1pt2' of git://github.com/awilliam/linux-vfioLinus Torvalds4-9/+44
Pull more VFIO updates from Alex Williamson: "A second small set of commits for this merge window, primarily to unbreak some deletions from our uAPI header. - Additional mdev sample driver cleanup (Dan Carpenter) - Doc fix (Alyssa Ross) - Unbreak uAPI from NVLink2 support removal (Alex Williamson)" * tag 'vfio-v5.13-rc1pt2' of git://github.com/awilliam/linux-vfio: docs: vfio: fix typo vfio/pci: Revert nvlink removal uAPI breakage vfio/mdev: remove unnecessary NULL check in mbochs_create()
2021-05-06futex: Make syscall entry points less convolutedThomas Gleixner1-26/+37
The futex and the compat syscall entry points do pretty much the same except for the timespec data types and the corresponding copy from user function. Split out the rest into inline functions and share the functionality. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210422194705.244476369@linutronix.de
2021-05-06futex: Get rid of the val2 conditional danceThomas Gleixner1-14/+2
There is no point in checking which FUTEX operand treats the utime pointer as 'val2' argument because that argument to do_futex() is only used by exactly these operands. So just handing it in unconditionally is not making any difference, but removes a lot of pointless gunk. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210422194705.125957049@linutronix.de
2021-05-06futex: Do not apply time namespace adjustment on FUTEX_LOCK_PIThomas Gleixner1-2/+2
FUTEX_LOCK_PI does not require to have the FUTEX_CLOCK_REALTIME bit set because it has been using CLOCK_REALTIME based absolute timeouts forever. Due to that, the time namespace adjustment which is applied when FUTEX_CLOCK_REALTIME is not set, will wrongly take place for FUTEX_LOCK_PI and wreckage the timeout. Exclude it from that procedure. Fixes: c2f7d08cccf4 ("futex: Adjust absolute futex timeouts with per time namespace offset") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210422194704.984540159@linutronix.de
2021-05-06Revert 337f13046ff0 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")Thomas Gleixner1-2/+1
The FUTEX_WAIT operand has historically a relative timeout which means that the clock id is irrelevant as relative timeouts on CLOCK_REALTIME are not subject to wall clock changes and therefore are mapped by the kernel to CLOCK_MONOTONIC for simplicity. If a caller would set FUTEX_CLOCK_REALTIME for FUTEX_WAIT the timeout is still treated relative vs. CLOCK_MONOTONIC and then the wait arms that timeout based on CLOCK_REALTIME which is broken and obviously has never been used or even tested. Reject any attempt to use FUTEX_CLOCK_REALTIME with FUTEX_WAIT again. The desired functionality can be achieved with FUTEX_WAIT_BITSET and a FUTEX_BITSET_MATCH_ANY argument. Fixes: 337f13046ff0 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210422194704.834797921@linutronix.de
2021-05-06Merge branch 'pcmcia-next' of ↵Linus Torvalds5-32/+30
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull pcmcia updates from Dominik Brodowski: "A number of patches fixing W=1 kernel build warnings, and one patch removing an always-false NULL check" * 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: pcmcia: rsrc_nonstatic: Fix call-back function as reference formatting pcmcia: pcmcia_resource: Fix some kernel-doc formatting/disparities and demote others pcmcia: ds: Fix function name disparity in header pcmcia: pcmcia_cis: Demote non-conforming kernel-doc headers to standard kernel-doc pcmcia: cistpl: Demote non-conformant kernel-doc headers to standard comments pcmcia: rsrc_nonstatic: Demote kernel-doc abuses pcmcia: ds: Remove if with always false condition
2021-05-06Merge tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds21-704/+562
Pull ceph updates from Ilya Dryomov: "Notable items here are - a series to take advantage of David Howells' netfs helper library from Jeff - three new filesystem client metrics from Xiubo - ceph.dir.rsnaps vxattr from Yanhu - two auth-related fixes from myself, marked for stable. Interspersed is a smattering of assorted fixes and cleanups across the filesystem" * tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client: (24 commits) libceph: allow addrvecs with a single NONE/blank address libceph: don't set global_id until we get an auth ticket libceph: bump CephXAuthenticate encoding version ceph: don't allow access to MDS-private inodes ceph: fix up some bare fetches of i_size ceph: convert some PAGE_SIZE invocations to thp_size() ceph: support getting ceph.dir.rsnaps vxattr ceph: drop pinned_page parameter from ceph_get_caps ceph: fix inode leak on getattr error in __fh_to_dentry ceph: only check pool permissions for regular files ceph: send opened files/pinned caps/opened inodes metrics to MDS daemon ceph: avoid counting the same request twice or more ceph: rename the metric helpers ceph: fix kerneldoc copypasta over ceph_start_io_direct ceph: use attach/detach_page_private for tracking snap context ceph: don't use d_add in ceph_handle_snapdir ceph: don't clobber i_snap_caps on non-I_NEW inode ceph: fix fall-through warnings for Clang ceph: convert ceph_readpages to ceph_readahead ceph: convert ceph_write_begin to netfs_write_begin ...
2021-05-06Merge tag 'ecryptfs-5.13-rc1-updates' of ↵Linus Torvalds14-63/+73
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs updates from Tyler Hicks: "Code cleanups and a bug fix - W=1 compiler warning cleanups - Mutex initialization simplification - Protect against NULL pointer exception during mount" * tag 'ecryptfs-5.13-rc1-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: fix kernel panic with null dev_name ecryptfs: remove unused helpers ecryptfs: Fix typo in message eCryptfs: Use DEFINE_MUTEX() for mutex lock ecryptfs: keystore: Fix some kernel-doc issues and demote non-conformant headers ecryptfs: inode: Help out nearly-there header and demote non-conformant ones ecryptfs: mmap: Help out one function header and demote other abuses ecryptfs: crypto: Supply some missing param descriptions and demote abuses ecryptfs: miscdev: File headers are not good kernel-doc candidates ecryptfs: main: Demote a bunch of non-conformant kernel-doc headers ecryptfs: messaging: Add missing param descriptions and demote abuses ecryptfs: super: Fix formatting, naming and kernel-doc abuses ecryptfs: file: Demote kernel-doc abuses ecryptfs: kthread: Demote file header and provide description for 'cred' ecryptfs: dentry: File headers are not good candidates for kernel-doc ecryptfs: debug: Demote a couple of kernel-doc abuses ecryptfs: read_write: File headers do not make good candidates for kernel-doc ecryptfs: use DEFINE_MUTEX() for mutex lock eCryptfs: add a semicolon
2021-05-06Merge tag 'trace-v5.13-2' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix probes written to the set_ftrace_filter file Now that there's a library that accesses the tracefs file system (libtracefs), the way the files are interacted with is slightly different than the command line. For instance, the write() system call is used directly instead of an echo. This exposes some old bugs. If a probe is written to "set_ftrace_filter" without any white space after it, it will be ignored. This is because the write expects that a string written to it that does not end with white spaces thinks there is more to come. But if the file is closed, the release function needs to finish it. The "set_ftrace_filter" release function handles the filter part of the "set_ftrace_filter" file, but did not handle the probe part" * tag 'trace-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Handle commands when closing set_ftrace_filter file
2021-05-06Merge tag 'acpi-5.13-rc1-2' of ↵Linus Torvalds6-5/+11
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These revert one recent commit that turned out to be problematic, address two issues in the ACPI "custom method" interface and update GPIO properties documentation. Specifics: - Revent recent commit related to the handling of ACPI power resources during initialization, because it turned out to cause problems to occur on some systems (Rafael Wysocki). - Fix potential use-after-free and potential memory leak in the ACPI "custom method" debugfs interface (Mark Langsdorf). - Update ACPI GPIO properties documentation to cover assumptions regarding GPIO polarity (Andy Shevchenko)" * tag 'acpi-5.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPI: scan: Turn off unused power resources during initialization" ACPI: custom_method: fix a possible memory leak ACPI: custom_method: fix potential use-after-free issue Documentation: firmware-guide: gpio-properties: Add note to SPI CS case
2021-05-06Merge tag 'devicetree-fixes-for-5.13-1' of ↵Linus Torvalds10-95/+49
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Several Renesas binding fixes to fix warnings - Remove duplicate compatibles in 8250 binding - Remove orphaned Sigma Designs Tango bindings - Fix bcm2711-hdmi binding to use 'additionalProperties' - Fix idt,32434-pic warning for missing 'interrupts' property - Fix 'stored but not read' warnings in DT overlay code * tag 'devicetree-fixes-for-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: net: renesas,etheravb: Fix optional second clock name dt-bindings: display: renesas,du: Add missing power-domains property dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1 dt-bindings: PCI: rcar-pci-host: Document missing R-Car H1 support of: overlay: Remove redundant assignment to ret dt-bindings: serial: 8250: Remove duplicated compatible strings dt-bindings: Remove unused Sigma Designs Tango bindings dt-bindings: bcm2711-hdmi: Fix broken schema dt-bindings: interrupt-controller: idt,32434-pic: Add missing interrupts property
2021-05-06riscv: remove unused handle_exception symbolRouven Czerwinski1-2/+0
Since commit 79b1feba5455 ("RISC-V: Setup exception vector early") exception vectors are setup early and the handle_exception symbol from the asm files is no longer referenced in traps.c. Remove it. Signed-off-by: Rouven Czerwinski <rouven@czerwinskis.de> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06riscv: Consistify protect_kernel_linear_mapping_text_rodata() useGeert Uytterhoeven3-4/+7
The various uses of protect_kernel_linear_mapping_text_rodata() are not consistent: - Its definition depends on "64BIT && !XIP_KERNEL", - Its forward declaration depends on MMU, - Its single caller depends on "STRICT_KERNEL_RWX && 64BIT && MMU && !XIP_KERNEL". Fix this by settling on the dependencies of the caller, which can be simplified as STRICT_KERNEL_RWX depends on "MMU && !XIP_KERNEL". Provide a dummy definition, as the caller is protected by "IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)" instead of "#ifdef CONFIG_STRICT_KERNEL_RWX". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06riscv: enable SiFive errata CIP-453 and CIP-1200 Kconfig only if CONFIG_64BIT=yVincent Chen1-2/+2
The corresponding hardware issues of CONFIG_ERRATA_SIFIVE_CIP_453 and CONFIG_ERRATA_SIFIVE_CIP_1200 only exist in the SiFive 64bit CPU cores. Therefore, these two errata are required only if CONFIG_64BIT=y Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Fixes: bff3ff525460 ("riscv: sifive: Apply errata "cip-1200" patch") Fixes: 800149a77c2c ("riscv: sifive: Apply errata "cip-453" patch") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06riscv: Only extend kernel reservation if mapped read-onlyGeert Uytterhoeven1-2/+7
When the kernel mapping was moved outside of the linear mapping, the kernel memory reservation was increased, to take into account mapping granularity. However, this is done unconditionally, regardless of whether the kernel memory is mapped read-only or not. If this extension is not needed, up to 2 MiB may be lost, which has a big impact on e.g. Canaan K210 (64-bit nommu) platforms with only 8 MiB of RAM. Reclaim the lost memory by only extending the reserved region when needed, i.e. depending on a simplified version of the conditional logic around the call to protect_kernel_linear_mapping_text_rodata(). Fixes: 2bfc6cd81bd17e43 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds27-454/+246
Pull ARM updates from Russell King: - Fix BSS size calculation for LLVM - Improve robustness of kernel entry around v7_invalidate_l1 - Fix and update kprobes assembly - Correct breakpoint overflow handler check - Pause function graph tracer when suspending a CPU - Switch to generic syscallhdr.sh and syscalltbl.sh - Remove now unused set_kernel_text_r[wo] functions - Updates for ptdump (__init marking and using DEFINE_SHOW_ATTRIBUTE) - Fix for interrupted SMC (secure) calls - Remove Compaq Personal Server platform * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: footbridge: remove personal server platform ARM: 9075/1: kernel: Fix interrupted SMC calls ARM: 9074/1: ptdump: convert to DEFINE_SHOW_ATTRIBUTE ARM: 9073/1: ptdump: add __init section marker to three functions ARM: 9072/1: mm: remove set_kernel_text_r[ow]() ARM: 9067/1: syscalls: switch to generic syscallhdr.sh ARM: 9068/1: syscalls: switch to generic syscalltbl.sh ARM: 9066/1: ftrace: pause/unpause function graph tracer in cpu_suspend() ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook ARM: 9062/1: kprobes: rewrite test-arm.c in UAL ARM: 9061/1: kprobes: fix UNPREDICTABLE warnings ARM: 9060/1: kexec: Remove unused kexec_reinit callback ARM: 9059/1: cache-v7: get rid of mini-stack ARM: 9058/1: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6 ARM: 9057/1: cache-v7: add missing ISB after cache level selection ARM: 9056/1: decompressor: fix BSS size calculation for LLVM ld.lld
2021-05-06Merge tag 'riscv-for-linus-5.13-mw0' of ↵Linus Torvalds71-216/+2635
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the memtest= kernel command-line argument. - Support for building the kernel with FORTIFY_SOURCE. - Support for generic clockevent broadcasts. - Support for the buildtar build target. - Some build system cleanups to pass more LLVM-friendly arguments. - Support for kprobes. - A rearranged kernel memory map, the first part of supporting sv48 systems. - Improvements to kexec, along with support for kdump and crash kernels. - An alternatives-based errata framework, along with support for handling a pair of errata that manifest on some SiFive designs (including the HiFive Unmatched). - Support for XIP. - A device tree for the Microchip PolarFire ICICLE SoC and associated dev board. ... along with a bunch of cleanups. There are already a handful of fixes on the list so there will likely be a part 2. * tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (45 commits) RISC-V: Always define XIP_FIXUP riscv: Remove 32b kernel mapping from page table dump riscv: Fix 32b kernel build with CONFIG_DEBUG_VIRTUAL=y RISC-V: Fix error code returned by riscv_hartid_to_cpuid() RISC-V: Enable Microchip PolarFire ICICLE SoC RISC-V: Initial DTS for Microchip ICICLE board dt-bindings: riscv: microchip: Add YAML documentation for the PolarFire SoC RISC-V: Add Microchip PolarFire SoC kconfig option RISC-V: enable XIP RISC-V: Add crash kernel support RISC-V: Add kdump support RISC-V: Improve init_resources() RISC-V: Add kexec support RISC-V: Add EM_RISCV to kexec UAPI header riscv: vdso: fix and clean-up Makefile riscv/mm: Use BUG_ON instead of if condition followed by BUG. riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobe riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU riscv: module: Create module allocations without exec permissions riscv: bpf: Avoid breaking W^X ...
2021-05-06Merge tag 'hexagon-5.13-0' of ↵Linus Torvalds12-14/+258
git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux Pull Hexagon updates from Brian Cain: "Hexagon architecture build fixes + builtins Small build fixes applied: - use -mlong-calls to build - extend jumps in futex_atomic_* - etc Also, for convenience and portability, the hexagon compiler builtin functions like memcpy etc have been added to the kernel -- following the idiom used by other architectures" * tag 'hexagon-5.13-0' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux: Hexagon: add target builtins to kernel Hexagon: remove DEBUG from comet config Hexagon: change jumps to must-extend in futex_atomic_* Hexagon: fix build errors
2021-05-06Merge tag 'docs-5.13-2' of git://git.lwn.net/linuxLinus Torvalds11-68/+225
Pull documentation fixes from Jonathan Corbet: "A few late-arriving documentation fixes, including some oprofile cleanup, a kernel-doc fix, some regression-reporting updates, and the usual minor fixes" * tag 'docs-5.13-2' of git://git.lwn.net/linux: Enlisted oprofile version line removed oprofiled version output line removed from the list Removed the oprofiled version option docs: reporting-issues.rst: CC subsystem and maintainers on regressions docs: correct URL to bios and kernel developer's guide docs/core-api: Consistent code style docs/zh_CN: Adjust order and content of zh_CN/index.rst Documentation: input: joydev file corrections docs: Fix typo in Documentation/x86/x86_64/5level-paging.rst kernel-doc: Add support for __deprecated
2021-05-06block: reexpand iov_iter after read/writeyangerkun1-3/+17
We get a bug: BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 Read of size 8 at addr ffff0000d3fb11f8 by task CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted 5.10.0-00843-g352c8610ccd2 #2 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x164 lib/dump_stack.c:118 print_address_description+0x78/0x5c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x148/0x1e4 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:183 [inline] __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 io_read fs/io_uring.c:3421 [inline] io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Allocated by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 __kmalloc+0x23c/0x334 mm/slub.c:3970 kmalloc include/linux/slab.h:557 [inline] __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 io_setup_async_rw fs/io_uring.c:3229 [inline] io_read fs/io_uring.c:3436 [inline] io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Freed by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x38/0x6c mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0x104/0x38c mm/slub.c:4124 io_dismantle_req fs/io_uring.c:1855 [inline] __io_free_req+0x70/0x254 fs/io_uring.c:1867 io_put_req_find_next fs/io_uring.c:2173 [inline] __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 task_work_run+0xdc/0x128 kernel/task_work.c:151 get_signal+0x6f8/0x980 kernel/signal.c:2562 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 work_pending+0xc/0x180 blkdev_read_iter can truncate iov_iter's count since the count + pos may exceed the size of the blkdev. This will confuse io_read that we have consume the iovec. And once we do the iov_iter_revert in io_read, we will trigger the slab-out-of-bounds. Fix it by reexpand the count with size has been truncated. blkdev_write_iter can trigger the problem too. Signed-off-by: yangerkun <yangerkun@huawei.com> Acked-by: Pavel Begunkov <asml.silencec@gmail.com> Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-06Merge branches 'acpi-pm' and 'acpi-docs'Rafael J. Wysocki5-4/+8
* acpi-pm: Revert "ACPI: scan: Turn off unused power resources during initialization" * acpi-docs: Documentation: firmware-guide: gpio-properties: Add note to SPI CS case
2021-05-06locking/qrwlock: Cleanup queued_write_lock_slowpath()Waiman Long1-3/+3
Make the code more readable by replacing the atomic_cmpxchg_acquire() by an equivalent atomic_try_cmpxchg_acquire() and change atomic_add() to atomic_or(). For architectures that use qrwlock, I do not find one that has an atomic_add() defined but not an atomic_or(). I guess it should be fine by changing atomic_add() to atomic_or(). Note that the previous use of atomic_add() isn't wrong as only one writer that is the wait_lock owner can set the waiting flag and the flag will be cleared later on when acquiring the write lock. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/20210426185017.19815-1-longman@redhat.com
2021-05-06smp: Fix smp_call_function_single_async prototypeArnd Bergmann3-15/+15
As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct call_single_data"), the smp code prefers 32-byte aligned call_single_data objects for performance reasons, but the block layer includes an instance of this structure in the main 'struct request' that is more senstive to size than to performance here, see 4ccafe032005 ("block: unalign call_single_data in struct request"). The result is a violation of the calling conventions that clang correctly points out: block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch] smp_call_function_single_async(cpu, &rq->csd); It does seem that the usage of the call_single_data without cache line alignment should still be allowed by the smp code, so just change the function prototype so it accepts both, but leave the default alignment unchanged for the other users. This seems better to me than adding a local hack to shut up an otherwise correct warning in the caller. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jens Axboe <axboe@kernel.dk> Link: https://lkml.kernel.org/r/20210505211300.3174456-1-arnd@kernel.org
2021-05-06x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gatingSuravee Suthikulpanit1-21/+26
On certain AMD platforms, when the IOMMU performance counter source (csource) field is zero, power-gating for the counter is enabled, which prevents write access and returns zero for read access. This can cause invalid perf result especially when event multiplexing is needed (i.e. more number of events than available counters) since the current logic keeps track of the previously read counter value, and subsequently re-program the counter to continue counting the event. With power-gating enabled, we cannot gurantee successful re-programming of the counter. Workaround this issue by : 1. Modifying the ordering of setting/reading counters and enabing/ disabling csources to only access the counter when the csource is set to non-zero. 2. Since AMD IOMMU PMU does not support interrupt mode, the logic can be simplified to always start counting with value zero, and accumulate the counter value when stopping without the need to keep track and reprogram the counter with the previously read counter value. This has been tested on systems with and without power-gating. Fixes: 994d6608efe4 ("iommu/amd: Remove performance counter pre-initialization test") Suggested-by: Alexander Monakov <amonakov@ispras.ru> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210504065236.4415-1-suravee.suthikulpanit@amd.com
2021-05-06sched/fair: Fix unfairness caused by missing load decayOdin Ugedal1-3/+9
This fixes an issue where old load on a cfs_rq is not properly decayed, resulting in strange behavior where fairness can decrease drastically. Real workloads with equally weighted control groups have ended up getting a respective 99% and 1%(!!) of cpu time. When an idle task is attached to a cfs_rq by attaching a pid to a cgroup, the old load of the task is attached to the new cfs_rq and sched_entity by attach_entity_cfs_rq. If the task is then moved to another cpu (and therefore cfs_rq) before being enqueued/woken up, the load will be moved to cfs_rq->removed from the sched_entity. Such a move will happen when enforcing a cpuset on the task (eg. via a cgroup) that force it to move. The load will however not be removed from the task_group itself, making it look like there is a constant load on that cfs_rq. This causes the vruntime of tasks on other sibling cfs_rq's to increase faster than they are supposed to; causing severe fairness issues. If no other task is started on the given cfs_rq, and due to the cpuset it would not happen, this load would never be properly unloaded. With this patch the load will be properly removed inside update_blocked_averages. This also applies to tasks moved to the fair scheduling class and moved to another cpu, and this path will also fix that. For fork, the entity is queued right away, so this problem does not affect that. This applies to cases where the new process is the first in the cfs_rq, issue introduced 3d30544f0212 ("sched/fair: Apply more PELT fixes"), and when there has previously been load on the cgroup but the cgroup was removed from the leaflist due to having null PELT load, indroduced in 039ae8bcf7a5 ("sched/fair: Fix O(nr_cgroups) in the load balancing path"). For a simple cgroup hierarchy (as seen below) with two equally weighted groups, that in theory should get 50/50 of cpu time each, it often leads to a load of 60/40 or 70/30. parent/ cg-1/ cpu.weight: 100 cpuset.cpus: 1 cg-2/ cpu.weight: 100 cpuset.cpus: 1 If the hierarchy is deeper (as seen below), while keeping cg-1 and cg-2 equally weighted, they should still get a 50/50 balance of cpu time. This however sometimes results in a balance of 10/90 or 1/99(!!) between the task groups. $ ps u -C stress USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 18568 1.1 0.0 3684 100 pts/12 R+ 13:36 0:00 stress --cpu 1 root 18580 99.3 0.0 3684 100 pts/12 R+ 13:36 0:09 stress --cpu 1 parent/ cg-1/ cpu.weight: 100 sub-group/ cpu.weight: 1 cpuset.cpus: 1 cg-2/ cpu.weight: 100 sub-group/ cpu.weight: 10000 cpuset.cpus: 1 This can be reproduced by attaching an idle process to a cgroup and moving it to a given cpuset before it wakes up. The issue is evident in many (if not most) container runtimes, and has been reproduced with both crun and runc (and therefore docker and all its "derivatives"), and with both cgroup v1 and v2. Fixes: 3d30544f0212 ("sched/fair: Apply more PELT fixes") Fixes: 039ae8bcf7a5 ("sched/fair: Fix O(nr_cgroups) in the load balancing path") Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210501141950.23622-2-odin@uged.al
2021-05-06sched: Fix out-of-bound access in uclampQuentin Perret1-1/+1
Util-clamp places tasks in different buckets based on their clamp values for performance reasons. However, the size of buckets is currently computed using a rounding division, which can lead to an off-by-one error in some configurations. For instance, with 20 buckets, the bucket size will be 1024/20=51. A task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly, correct indexes are in range [0,19], hence leading to an out of bound memory access. Clamp the bucket id to fix the issue. Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting") Suggested-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20210430151412.160913-1-qperret@google.com
2021-05-06psi: Fix psi state corruption when schedule() races with cgroup moveJohannes Weiner1-10/+26
4117cebf1a9f ("psi: Optimize task switch inside shared cgroups") introduced a race condition that corrupts internal psi state. This manifests as kernel warnings, sometimes followed by bogusly high IO pressure: psi: task underflow! cpu=1 t=2 tasks=[0 0 0 0] clear=c set=0 (schedule() decreasing RUNNING and ONCPU, both of which are 0) psi: incosistent task state! task=2412744:systemd cpu=17 psi_flags=e clear=3 set=0 (cgroup_move_task() clearing MEMSTALL and IOWAIT, but task is MEMSTALL | RUNNING | ONCPU) What the offending commit does is batch the two psi callbacks in schedule() to reduce the number of cgroup tree updates. When prev is deactivated and removed from the runqueue, nothing is done in psi at first; when the task switch completes, TSK_RUNNING and TSK_IOWAIT are updated along with TSK_ONCPU. However, the deactivation and the task switch inside schedule() aren't atomic: pick_next_task() may drop the rq lock for load balancing. When this happens, cgroup_move_task() can run after the task has been physically dequeued, but the psi updates are still pending. Since it looks at the task's scheduler state, it doesn't move everything to the new cgroup that the task switch that follows is about to clear from it. cgroup_move_task() will leak the TSK_RUNNING count in the old cgroup, and psi_sched_switch() will underflow it in the new cgroup. A similar thing can happen for iowait. TSK_IOWAIT is usually set when a p->in_iowait task is dequeued, but again this update is deferred to the switch. cgroup_move_task() can see an unqueued p->in_iowait task and move a non-existent TSK_IOWAIT. This results in the inconsistent task state warning, as well as a counter underflow that will result in permanent IO ghost pressure being reported. Fix this bug by making cgroup_move_task() use task->psi_flags instead of looking at the potentially mismatching scheduler state. [ We used the scheduler state historically in order to not rely on task->psi_flags for anything but debugging. But that ship has sailed anyway, and this is simpler and more robust. We previously already batched TSK_ONCPU clearing with the TSK_RUNNING update inside the deactivation call from schedule(). But that ordering was safe and didn't result in TSK_ONCPU corruption: unlike most places in the scheduler, cgroup_move_task() only checked task_current() and handled TSK_ONCPU if the task was still queued. ] Fixes: 4117cebf1a9f ("psi: Optimize task switch inside shared cgroups") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210503174917.38579-1-hannes@cmpxchg.org
2021-05-06sched,doc: sched_debug_verbose cmdline should be sched_verboseBarry Song1-1/+1
The cmdline should include sched_verbose but not sched_debug_verbose as sched_debug_verbose is only the variant name in code. Fixes: 9406415f46 ("sched/debug: Rename the sched_debug parameter to sched_verbose") Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210504105344.31344-1-song.bao.hua@hisilicon.com
2021-05-06arm64: kernel: Update the stale commentShaokun Zhang1-1/+1
Commit af391b15f7b5 ("arm64: kernel: rename __cpu_suspend to keep it aligned with arm") has used @index instead of @arg, but the comment is stale, update it. Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/1620280462-21937-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-05-06ALSA: hda: generic: change the DAC ctl name for LO+SPK or LO+HPHui Wang1-5/+11
Without this change, the DAC ctl's name could be changed only when the machine has both Speaker and Headphone, but we met some machines which only has Lineout and Headhpone, and the Lineout and Headphone share the Audio Mixer0 and DAC0, the ctl's name is set to "Front". On most of machines, the "Front" is used for Speaker only or Lineout only, but on this machine it is shared by Lineout and Headphone, This introduces an issue in the pipewire and pulseaudio, suppose users want the Headphone to be on and the Speaker/Lineout to be off, they could turn off the "Front", this works on most of the machines, but on this machine, the "Front" couldn't be turned off otherwise the headphone will be off too. Here we do some change to let the ctl's name change to "Headphone+LO" on this machine, and pipewire and pulseaudio already could handle "Headphone+LO" and "Speaker+LO". (https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/747) BugLink: http://bugs.launchpad.net/bugs/804178 Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://lore.kernel.org/r/20210504073917.22406-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-05-06can: m_can: m_can_tx_work_queue(): fix tx_skb race conditionMarc Kleine-Budde1-1/+2
The m_can_start_xmit() function checks if the cdev->tx_skb is NULL and returns with NETDEV_TX_BUSY in case tx_sbk is not NULL. There is a race condition in the m_can_tx_work_queue(), where first the skb is send to the driver and then the case tx_sbk is set to NULL. A TX complete IRQ might come in between and wake the queue, which results in tx_skb not being cleared yet. Fixes: f524f829b75a ("can: m_can: Create a m_can platform framework") Tested-by: Torin Cooper-Bennun <torin@maxiluxsystems.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-05-06can: mcp251x: fix resume from sleep before interface was brought upFrieder Schrempf1-17/+18
Since 8ce8c0abcba3 the driver queues work via priv->restart_work when resuming after suspend, even when the interface was not previously enabled. This causes a null dereference error as the workqueue is only allocated and initialized in mcp251x_open(). To fix this we move the workqueue init to mcp251x_can_probe() as there is no reason to do it later and repeat it whenever mcp251x_open() is called. Fixes: 8ce8c0abcba3 ("can: mcp251x: only reset hardware as required") Link: https://lore.kernel.org/r/17d5d714-b468-482f-f37a-482e3d6df84e@kontron.de Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [mkl: fix error handling in mcp251x_stop()] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-05-06can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error ↵Marc Kleine-Budde1-1/+3
path This patch adds the missing can_rx_offload_del(), that must be called if mcp251xfd_register() fails. Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Link: https://lore.kernel.org/r/20210504091838.1109047-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-05-06can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probeDan Carpenter1-2/+2
When we converted this code to use dev_err_probe() we accidentally removed a return. It means that if devm_clk_get() it will lead to an Oops when we call clk_get_rate() on the next line. Fixes: cf8ee6de2543 ("can: mcp251xfd: mcp251xfd_probe(): use dev_err_probe() to simplify error handling") Link: https://lore.kernel.org/r/YJANZf13Qxd5Mhr1@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-05-06drm/amdgpu: Use device specific BO size & stride check.Bas Nieuwenhuizen1-6/+175
The builtin size check isn't really the right thing for AMD modifiers due to a couple of reasons: 1) In the format structs we don't do set any of the tilesize / blocks etc. to avoid having format arrays per modifier/GPU 2) The pitch on the main plane is pixel_pitch * bytes_per_pixel even for tiled ... 3) The pitch for the DCC planes is really the pixel pitch of the main surface that would be covered by it ... Note that we only handle GFX9+ case but we do this after converting the implicit modifier to an explicit modifier, so on GFX9+ all framebuffers should be checked here. There is a TODO about DCC alignment, but it isn't worse than before and I'd need to dig a bunch into the specifics. Getting this out in a reasonable timeframe to make sure it gets the appropriate testing seemed more important. Finally as I've found that debugging addfb2 failures is a pita I was generous adding explicit error messages to every failure case. Fixes: f258907fdd83 ("drm/amdgpu: Verify bo size can fit framebuffer size on init.") Tested-by: Simon Ser <contact@emersion.fr> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-06drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.Bas Nieuwenhuizen1-0/+4
Otherwise tiling modes that require the values form this field (In particular _*_X) would be corrupted upon video decode. Copied from the VCN v2 code. Fixes: 99541f392b4d ("drm/amdgpu: add mc resume DPG mode for VCN3.0") Reviewed-and-Tested by: Leo Liu <leo.liu@amd.com> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-05-06x86/process: setup io_threads more like normal user space threadsStefan Metzmacher1-1/+18
As io_threads are fully set up USER threads it's clearer to separate the code path from the KTHREAD logic. The only remaining difference to user space threads is that io_threads never return to user space again. Instead they loop within the given worker function. The fact that they never return to user space means they don't have an user space thread stack. In order to indicate that to tools like gdb we reset the stack and instruction pointers to 0. This allows gdb attach to user space processes using io-uring, which like means that they have io_threads, without printing worrying message like this: warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386 warning: Architecture rejected target-supplied description The output will be something like this: (gdb) info threads Id Target Id Frame * 1 LWP 4863 "io_uring-cp-for" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 2 LWP 4864 "iou-mgr-4863" 0x0000000000000000 in ?? () 3 LWP 4865 "iou-wrk-4863" 0x0000000000000000 in ?? () (gdb) thread 3 [Switching to thread 3 (LWP 4865)] #0 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () Backtrace stopped: Cannot access memory at address 0x0 Fixes: 4727dc20e042 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD") Link: https://lore.kernel.org/io-uring/044d0bad-6888-a211-e1d3-159a4aeed52d@polymtl.ca/T/#m1bbf5727e3d4e839603f6ec7ed79c7eebfba6267 Signed-off-by: Stefan Metzmacher <metze@samba.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: Jens Axboe <axboe@kernel.dk> cc: Andy Lutomirski <luto@kernel.org> cc: linux-kernel@vger.kernel.org cc: io-uring@vger.kernel.org cc: x86@kernel.org Link: https://lore.kernel.org/r/20210505110310.237537-1-metze@samba.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-06netfilter: nftables: Fix a memleak from userdata error path in new objectsPablo Neira Ayuso1-2/+2
Release object name if userdata allocation fails. Fixes: b131c96496b3 ("netfilter: nf_tables: add userdata support for nft_object") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-06netfilter: remove BUG_ON() after skb_header_pointer()Pablo Neira Ayuso6-7/+21
Several conntrack helpers and the TCP tracker assume that skb_header_pointer() never fails based on upfront header validation. Even if this should not ever happen, BUG_ON() is a too drastic measure, remove them. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-06MAINTAINERS: add io_uring tool to IO_URINGLukas Bulwahn1-0/+1
The files in ./tools/io_uring/ are maintained by the IO_URING maintainers. Reflect that fact in MAINTAINERS. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20210505053728.3868-1-lukas.bulwahn@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-06io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffersThadeu Lima de Souza Cascardo1-2/+2
Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS. Truncate those lengths when doing io_add_buffers, so buffer addresses still use the uncapped length. Also, take the chance and change struct io_buffer len member to __u32, so it matches struct io_provide_buffer len member. This fixes CVE-2021-3491, also reported as ZDI-CAN-13546. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Reported-by: Billy Jheng Bing-Jhong (@st424204) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-05KVM: x86: Consolidate guest enter/exit logic to common helpersSean Christopherson3-74/+49
Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common helpers. Opportunistically update the somewhat stale comment about the updates needing to occur immediately after VM-Exit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210505002735.1684165-9-seanjc@google.com
2021-05-05context_tracking: KVM: Move guest enter/exit wrappers to KVM's domainSean Christopherson2-45/+45
Move the guest enter/exit wrappers to kvm_host.h so that KVM can manage its context tracking vs. vtime accounting without bleeding too many KVM details into the context tracking code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210505002735.1684165-8-seanjc@google.com
2021-05-05context_tracking: Consolidate guest enter/exit wrappersSean Christopherson1-41/+24
Consolidate the guest enter/exit wrappers, providing and tweaking stubs as needed. This will allow moving the wrappers under KVM without having to bleed #ifdefs into the soon-to-be KVM code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210505002735.1684165-7-seanjc@google.com
2021-05-05sched/vtime: Move guest enter/exit vtime accounting to vtime.hSean Christopherson2-22/+41
Provide separate helpers for guest enter vtime accounting (in addition to the existing guest exit helpers), and move all vtime accounting helpers to vtime.h where the existing #ifdef infrastructure can be leveraged to better delineate the different types of accounting. This will also allow future cleanups via deduplication of context tracking code. Opportunstically delete the vtime_account_kernel() stub now that all callers are wrapped with CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210505002735.1684165-6-seanjc@google.com
2021-05-05sched/vtime: Move vtime accounting external declarations above inlinesSean Christopherson1-37/+37
Move the blob of external declarations (and their stubs) above the set of inline definitions (and their stubs) for vtime accounting. This will allow a future patch to bring in more inline definitions without also having to shuffle large chunks of code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20210505002735.1684165-5-seanjc@google.com
2021-05-05KVM: x86: Defer vtime accounting 'til after IRQ handlingWanpeng Li3-6/+15
Defer the call to account guest time until after servicing any IRQ(s) that happened in the guest or immediately after VM-Exit. Tick-based accounting of vCPU time relies on PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked throughout the main sequence of vcpu_enter_guest(), including the call into vendor code to actually enter and exit the guest. This fixes a bug where reported guest time remains '0', even when running an infinite loop in the guest: https://bugzilla.kernel.org/show_bug.cgi?id=209831 Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com
2021-05-05context_tracking: Move guest exit vtime accounting to separate helpersWanpeng Li1-6/+16
Provide separate vtime accounting functions for guest exit instead of open coding the logic within the context tracking code. This will allow KVM x86 to handle vtime accounting slightly differently when using tick-based accounting. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/r/20210505002735.1684165-3-seanjc@google.com