summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-03-04sparc32: fix a user-triggerable oops in clear_user()Al Viro1-0/+1
commit 7780918b36489f0b2f9a3749d7be00c2ceaec513 upstream. Back in 2.1.29 the clear_user() guts (__bzero()) had been merged with memset(). Unfortunately, while all exception handlers had been copied, one of the exception table entries got lost. As the result, clear_user() starting at 128*n bytes before the end of page and spanning between 8 and 127 bytes into the next page would oops when the second page is unmapped. It's trivial to reproduce - all it takes is main() { int fd = open("/dev/zero", O_RDONLY); char *p = mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); munmap(p + 8192, 8192); read(fd, p + 8192 - 128, 192); } which had been oopsing since March 1997. Says something about the quality of test coverage... ;-/ And while today sparc32 port is nearly dead, back in '97 it had been very much alive; in fact, sparc64 had only been in mainline for 3 months by that point... Cc: stable@kernel.org Fixes: v2.1.29 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04f2fs: flush data when enabling checkpoint backJaegeuk Kim1-0/+3
commit b0ff4fe746fd028eef920ddc8c7b0361c1ede6ec upstream. During checkpoint=disable period, f2fs bypasses all the synchronous IOs such as sync and fsync. So, when enabling it back, we must flush all of them in order to keep the data persistent. Otherwise, suddern power-cut right after enabling checkpoint will cause data loss. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04f2fs: enforce the immutable flag on open filesChao Yu1-0/+17
commit e0fcd01510ad025c9bbce704c5c2579294056141 upstream. This patch ports commit 02b016ca7f99 ("ext4: enforce the immutable flag on open files") to f2fs. According to the chattr man page, "a file with the 'i' attribute cannot be modified..." Historically, this was only enforced when the file was opened, per the rest of the description, "... and the file can not be opened in write mode". There is general agreement that we should standardize all file systems to prevent modifications even for files that were opened at the time the immutable flag is set. Eventually, a change to enforce this at the VFS layer should be landing in mainline. Cc: stable@kernel.org Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04f2fs: fix out-of-repair __setattr_copy()Chao Yu1-1/+2
commit 2562515f0ad7342bde6456602c491b64c63fe950 upstream. __setattr_copy() was copied from setattr_copy() in fs/attr.c, there is two missing patches doesn't cover this inner function, fix it. Commit 7fa294c8991c ("userns: Allow chown and setgid preservation") Commit 23adbe12ef7d ("fs,userns: Change inode_capable to capable_wrt_inode_uidgid") Fixes: fbfa2cc58d53 ("f2fs: add file operations") Cc: stable@vger.kernel.org Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmapHuacai Chen1-1/+1
commit c1f664d2400e73d5ca0fcd067fa5847d2c789c11 upstream. Currently we use bitmap_alloc() to allocate msi bitmap which should be initialized with zero. This is obviously wrong but it works because msi can fallback to legacy interrupt mode. So use bitmap_zalloc() instead. Fixes: 632dcc2c75ef6de3272aa ("irqchip: Add Loongson PCH MSI controller") Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210209071051.2078435-1-chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04um: defer killing userspace on page table update failuresJohannes Berg3-4/+8
commit a7d48886cacf8b426e0079bca9639d2657cf2d38 upstream. In some cases we can get to fix_range_common() with mmap_sem held, and in others we get there without it being held. For example, we get there with it held from sys_mprotect(), and without it held from fork_handler(). Avoid any issues in this and simply defer killing the task until it runs the next time. Do it on the mm so that another task that shares the same mm can't continue running afterwards. Cc: stable@vger.kernel.org Fixes: 468f65976a8d ("um: Fix hung task in fix_range_common()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04um: mm: check more comprehensively for stub changesJohannes Berg1-1/+11
commit 47da29763ec9a153b9b685bff9db659e4e09e494 upstream. If userspace tries to change the stub, we need to kill it, because otherwise it can escape the virtual machine. In a few cases the stub checks weren't good, e.g. if userspace just tries to mmap(0x100000 - 0x1000, 0x3000, ...) it could succeed to get a new private/anonymous mapping replacing the stubs. Fix this by checking everywhere, and checking for _overlap_, not just direct changes. Cc: stable@vger.kernel.org Fixes: 3963333fe676 ("uml: cover stubs with a VMA") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04virtio/s390: implement virtio-ccw revision 2 correctlyCornelia Huck1-2/+2
commit 182f709c5cff683e6732d04c78e328de0532284f upstream. CCW_CMD_READ_STATUS was introduced with revision 2 of virtio-ccw, and drivers should only rely on it being implemented when they negotiated at least that revision with the device. However, virtio_ccw_get_status() issued READ_STATUS for any device operating at least at revision 1. If the device accepts READ_STATUS regardless of the negotiated revision (which some implementations like QEMU do, even though the spec currently does not allow it), everything works as intended. While a device rejecting the command should also be handled gracefully, we will not be able to see any changes the device makes to the status, such as setting NEEDS_RESET or setting the status to zero after a completed reset. We negotiated the revision to at most 1, as we never bumped the maximum revision; let's do that now and properly send READ_STATUS only if we are operating at least at revision 2. Cc: stable@vger.kernel.org Fixes: 7d3ce5ab9430 ("virtio/s390: support READ_STATUS command for virtio-ccw") Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20210216110645.1087321-1-cohuck@redhat.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04s390/vtime: fix inline assembly clobber listHeiko Carstens1-1/+2
commit b29c5093820d333eef22f58cd04ec0d089059c39 upstream. The stck/stckf instruction used within the inline assembly within do_account_vtime() changes the condition code. This is not reflected with the clobber list, and therefore might result in incorrect code generation. It seems unlikely that the compiler could generate incorrect code considering the surrounding C code, but it must still be fixed. Cc: <stable@vger.kernel.org> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04proc: don't allow async path resolution of /proc/thread-self componentsJens Axboe2-1/+8
commit 0d4370cfe36b7f1719123b621a4ec4d9c7a25f89 upstream. If this is attempted by an io-wq kthread, then return -EOPNOTSUPP as we don't currently support that. Once we can get task_pid_ptr() doing the right thing, then this can go away again. Use PF_IO_WORKER for this to speciically target the io_uring workers. Modify the /proc/self/ check to use PF_IO_WORKER as well. Cc: stable@vger.kernel.org Fixes: 8d4c3e76e3be ("proc: don't allow async path resolution of /proc/self components") Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if ↵Chen Yu1-2/+3
available commit 6f67e060083a84a4cc364eab6ae40c717165fb0c upstream. Currently, when turbo is disabled (either by BIOS or by the user), the intel_pstate driver reads the max non-turbo frequency from the package-wide MSR_PLATFORM_INFO(0xce) register. However, on asymmetric platforms it is possible in theory that small and big core with HWP enabled might have different max non-turbo CPU frequency, because MSR_HWP_CAPABILITIES is per-CPU scope according to Intel Software Developer Manual. The turbo max freq is already per-CPU in current code, so make similar change to the max non-turbo frequency as well. Reported-by: Wendy Wang <wendy.wang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Subject and changelog edits ] Cc: 4.18+ <stable@vger.kernel.org> # 4.18+: a45ee4d4e13b: cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argumentRafael J. Wysocki1-8/+8
commit a45ee4d4e13b0e35a8ec7ea0bf9267243d57b302 upstream. All of the callers of intel_pstate_get_hwp_max() access the struct cpudata object that corresponds to the given CPU already and the function itself needs to access that object (in order to update hwp_cap_cached), so modify the code to pass a struct cpudata pointer to it instead of the CPU number. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooksShawn Guo1-8/+32
commit 67fc209b527d023db4d087c68e44e9790aa089ef upstream. Commit f17b3e44320b ("cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code") introduces a regression on platforms using the driver, by failing to initialise a policy, when one is created post hotplug. When all the CPUs of a policy are hoptplugged out, the call to .exit() and later to devm_iounmap() does not release the memory region that was requested during devm_platform_ioremap_resource(). Therefore, a subsequent call to .init() will result in the following error, which will prevent a new policy to be initialised: [ 3395.915416] CPU4: shutdown [ 3395.938185] psci: CPU4 killed (polled 0 ms) [ 3399.071424] CPU5: shutdown [ 3399.094316] psci: CPU5 killed (polled 0 ms) [ 3402.139358] CPU6: shutdown [ 3402.161705] psci: CPU6 killed (polled 0 ms) [ 3404.742939] CPU7: shutdown [ 3404.765592] psci: CPU7 killed (polled 0 ms) [ 3411.492274] Detected VIPT I-cache on CPU4 [ 3411.492337] GICv3: CPU4: found redistributor 400 region 0:0x0000000017ae0000 [ 3411.492448] CPU4: Booted secondary processor 0x0000000400 [0x516f802d] [ 3411.503654] qcom-cpufreq-hw 17d43000.cpufreq: can't request region for resource [mem 0x17d45800-0x17d46bff] With that being said, the original code was tricky and skipping memory region request intentionally to hide this issue. The true cause is that those devm_xxx() device managed functions shouldn't be used for cpufreq init/exit hooks, because &pdev->dev is alive across the hooks and will not trigger auto resource free-up. Let's drop the use of device managed functions and manually allocate/free resources, so that the issue can be fixed properly. Cc: v5.10+ <stable@vger.kernel.org> # v5.10+ Fixes: f17b3e44320b ("cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code") Suggested-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on errorViresh Kumar1-1/+1
commit a51afb13311cd85b2f638c691b2734622277d8f5 upstream. freq_qos_update_request() returns 1 if the effective constraint value has changed, 0 if the effective constraint value has not changed, or a negative error code on failures. The frequency constraints for CPUs can be set by different parts of the kernel. If the maximum frequency constraint set by other parts of the kernel are set at a lower value than the one corresponding to cooling state 0, then we will never be able to cool down the system as freq_qos_update_request() will keep on returning 0 and we will skip updating cpufreq_state and thermal pressure. Fix that by doing the updates even in the case where freq_qos_update_request() returns 0, as we have effectively set the constraint to a new value even if the consolidated value of the actual constraint is unchanged because of external factors. Cc: v5.7+ <stable@vger.kernel.org> # v5.7+ Reported-by: Thara Gopinath <thara.gopinath@linaro.org> Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Thara Gopinath<thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/b2b7e84944937390256669df5a48ce5abba0c1ef.1613540713.git.viresh.kumar@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTOREChris Wilson6-5/+19
commit bfe3911a91047557eb0e620f95a370aee6a248c7 upstream. Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category. Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store. Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp. References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> # DRM depends on kcmp Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> # systemd uses kcmp Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210205220012.1983-1-chris@chris-wilson.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04zonefs: Fix file size of zones in full conditionShin'ichiro Kawasaki1-0/+3
commit 059c01039c0185dbee7ed080f1f2bd22cb1e4dab upstream. Per ZBC/ZAC/ZNS specifications, write pointers may not have valid values when zones are in full condition. However, when zonefs mounts a zoned block device, zonefs refers write pointers to set file size even when the zones are in full condition. This results in wrong file size. To fix this, refer maximum file size in place of write pointers for zones in full condition. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Cc: <stable@vger.kernel.org> # 5.6+ Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04exfat: fix shift-out-of-bounds in exfat_fill_super()Namjae Jeon2-5/+30
commit 78c276f5495aa53a8beebb627e5bf6a54f0af34f upstream. syzbot reported a warning which could cause shift-out-of-bounds issue. Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x183/0x22e lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:148 [inline] __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395 exfat_read_boot_sector fs/exfat/super.c:471 [inline] __exfat_fill_super fs/exfat/super.c:556 [inline] exfat_fill_super+0x2acb/0x2d00 fs/exfat/super.c:624 get_tree_bdev+0x406/0x630 fs/super.c:1291 vfs_get_tree+0x86/0x270 fs/super.c:1496 do_new_mount fs/namespace.c:2881 [inline] path_mount+0x1937/0x2c50 fs/namespace.c:3211 do_mount fs/namespace.c:3224 [inline] __do_sys_mount fs/namespace.c:3432 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3409 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 exfat specification describe sect_per_clus_bits field of boot sector could be at most 25 - sect_size_bits and at least 0. And sect_size_bits can also affect this calculation, It also needs validation. This patch add validation for sect_per_clus_bits and sect_size_bits field of boot sector. Fixes: 719c1e182916 ("exfat: add super block operations") Cc: stable@vger.kernel.org # v5.9+ Reported-by: syzbot+da4fe66aaadd3c2e2d1c@syzkaller.appspotmail.com Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04printk: fix deadlock when kernel panicMuchun Song1-4/+12
commit 8a8109f303e25a27f92c1d8edd67d7cbbc60a4eb upstream. printk_safe_flush_on_panic() caused the following deadlock on our server: CPU0: CPU1: panic rcu_dump_cpu_stacks kdump_nmi_shootdown_cpus nmi_trigger_cpumask_backtrace register_nmi_handler(crash_nmi_callback) printk_safe_flush __printk_safe_flush raw_spin_lock_irqsave(&read_lock) // send NMI to other processors apic_send_IPI_allbutself(NMI_VECTOR) // NMI interrupt, dead loop crash_nmi_callback printk_safe_flush_on_panic printk_safe_flush __printk_safe_flush // deadlock raw_spin_lock_irqsave(&read_lock) DEADLOCK: read_lock is taken on CPU1 and will never get released. It happens when panic() stops a CPU by NMI while it has been in the middle of printk_safe_flush(). Handle the lock the same way as logbuf_lock. The printk_safe buffers are flushed only when both locks can be safely taken. It can avoid the deadlock _in this particular case_ at expense of losing contents of printk_safe buffers. Note: It would actually be safe to re-init the locks when all CPUs were stopped by NMI. But it would require passing this information from arch-specific code. It is not worth the complexity. Especially because logbuf_lock and printk_safe buffers have been obsoleted by the lockless ring buffer. Fixes: cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: <stable@vger.kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210210034823.64867-1-songmuchun@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mfd: gateworks-gsc: Fix interrupt typeTim Harvey1-1/+1
commit 8d9bf3c3e1451fc8de7b590040a868ade26d6b22 upstream. The Gateworks System Controller has an active-low interrupt. Fix the interrupt request type. Cc: <stable@vger.kernel.org> Fixes: d85234994b2f ("mfd: Add Gateworks System Controller core driver") Signed-off-by: Tim Harvey <tharvey@gateworks.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04gpio: pcf857x: Fix missing first interruptMaxim Kiselev1-1/+1
commit a8002a35935aaefcd6a42ad3289f62bab947f2ca upstream. If no n_latch value will be provided at driver probe then all pins will be used as an input: gpio->out = ~n_latch; In that case initial state for all pins is "one": gpio->status = gpio->out; So if pcf857x IRQ happens with change pin value from "zero" to "one" then we miss it, because of "one" from IRQ and "one" from initial state leaves corresponding pin unchanged: change = (gpio->status ^ status) & gpio->irq_enabled; The right solution will be to read actual state at driver probe. Cc: stable@vger.kernel.org Fixes: 6e20a0a429bd ("gpio: pcf857x: enable gpio_to_irq() support") Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mei: me: add adler lake point LP DIDAlexander Usyskin2-0/+2
commit 930c922a987a02936000f15ea62988b7a39c27f5 upstream. Add Adler Lake LP device id. Cc: <stable@vger.kernel.org> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20210129120752.850325-7-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mei: me: add adler lake point S DIDAlexander Usyskin2-0/+4
commit f7545efaf7950b240de6b8a20b9c3ffd7278538e upstream. Add Adler Lake S device id. Cc: <stable@vger.kernel.org> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20210129120752.850325-6-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mei: me: emmitsburg workstation DIDTomas Winkler2-0/+4
commit 372726cb3957dbd69ded9a4e3419d5c6c3bc648e upstream. Add Emmitsburg workstation DID. Cc: <stable@vger.kernel.org> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20210129120752.850325-5-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mei: fix transfer over dma with extended headerAlexander Usyskin1-3/+30
commit 1309ecc90f16ee9cc3077761e7f4474369747e6e upstream. The size in header field for packet transferred over DMA includes size of the extended header. Include extended header in size check. Add size and sanity checks on extended header. Cc: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20210129120752.850325-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04spmi: spmi-pmic-arb: Fix hw_irq overflowSubbaraman Narayanamurthy1-3/+2
commit d19db80a366576d3ffadf2508ed876b4c1faf959 upstream. Currently, when handling the SPMI summary interrupt, the hw_irq number is calculated based on SID, Peripheral ID, IRQ index and APID. This is then passed to irq_find_mapping() to see if a mapping exists for this hw_irq and if available, invoke the interrupt handler. Since the IRQ index uses an "int" type, hw_irq which is of unsigned long data type can take a large value when SID has its MSB set to 1 and the type conversion happens. Because of this, irq_find_mapping() returns 0 as there is no mapping for this hw_irq. This ends up invoking cleanup_irq() as if the interrupt is spurious whereas it is actually a valid interrupt. Fix this by using the proper data type (u32) for id. Cc: stable@vger.kernel.org Signed-off-by: Subbaraman Narayanamurthy <subbaram@codeaurora.org> Link: https://lore.kernel.org/r/1612812784-26369-1-git-send-email-subbaram@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@kernel.org> Link: https://lore.kernel.org/r/20210212031417.3148936-1-sboyd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04powerpc/32s: Add missing call to kuep_lock on syscall entryChristophe Leroy1-0/+3
commit 57fdfbce89137ae85cd5cef48be168040a47dd13 upstream. Userspace Execution protection and fast syscall entry were implemented independently from each other and were both merged in kernel 5.2, leading to syscall entry missing userspace execution protection. On syscall entry, execution of user space memory must be locked in the same way as on exception entry. Fixes: b86fb88855ea ("powerpc/32: implement fast entry for syscalls on non BOOKE") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c65e105b63aaf74f91a14f845bc77192350b84a6.1612796617.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04powerpc/kexec_file: fix FDT size estimation for kdump kernelHari Bathini3-1/+37
commit 2377c92e37fe97bc5b365f55cf60f56dfc4849f5 upstream. On systems with large amount of memory, loading kdump kernel through kexec_file_load syscall may fail with the below error: "Failed to update fdt with linux,drconf-usable-memory property" This happens because the size estimation for kdump kernel's FDT does not account for the additional space needed to setup usable memory properties. Fix it by accounting for the space needed to include linux,usable-memory & linux,drconf-usable-memory properties while estimating kdump kernel's FDT size. Fixes: 6ecd0163d360 ("powerpc/kexec_file: Add appropriate regions for memory reserve map") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/161243826811.119001.14083048209224609814.stgit@hbathini Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04powerpc/32: Preserve cr1 in exception prolog stack check to fix build errorChristophe Leroy2-7/+1
commit 3642eb21256a317ac14e9ed560242c6d20cf06d9 upstream. THREAD_ALIGN_SHIFT = THREAD_SHIFT + 1 = PAGE_SHIFT + 1 Maximum PAGE_SHIFT is 18 for 256k pages so THREAD_ALIGN_SHIFT is 19 at the maximum. No need to clobber cr1, it can be preserved when moving r1 into CR when we check stack overflow. This reduces the number of instructions in Machine Check Exception prolog and fixes a build failure reported by the kernel test robot on v5.10 stable when building with RTAS + VMAP_STACK + KVM. That build failure is due to too many instructions in the prolog hence not fitting between 0x200 and 0x300. Allthough the problem doesn't show up in mainline, it is still worth the change. Fixes: 98bf2d3f4970 ("powerpc/32s: Fix RTAS machine check with VMAP stack") Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5ae4d545e3ac58e133d2599e0deb88843cb494fc.1612768623.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mmc: sdhci-pci-o2micro: Bug fix for SDR104 HW tuning failureShirley Her1-0/+20
commit 1ad9f88014ae1d5abccb6fe930bc4c5c311bdc05 upstream. Force chip enter L0 power state during SDR104 HW tuning to avoid tuning failure Signed-off-by: Shirley Her <shirley.her@bayhubtech.com> Link: https://lore.kernel.org/r/20210206014051.3418-1-shirley.her@bayhubtech.com Fixes: 7b7d897e8898 ("mmc: sdhci-pci-o2micro: Add HW tuning for SDR104 mode") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mmc: sdhci-esdhc-imx: fix kernel panic when remove moduleFrank Li1-1/+2
commit a56f44138a2c57047f1ea94ea121af31c595132b upstream. In sdhci_esdhc_imx_remove() the SDHCI_INT_STATUS in read. Under some circumstances, this may be done while the device is runtime suspended, triggering the below splat. Fix the problem by adding a pm_runtime_get_sync(), before reading the register, which will turn on clocks etc making the device accessible again. [ 1811.323148] mmc1: card aaaa removed [ 1811.347483] Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP [ 1811.354988] Modules linked in: sdhci_esdhc_imx(-) sdhci_pltfm sdhci cqhci mmc_block mmc_core [last unloaded: mmc_core] [ 1811.365726] CPU: 0 PID: 3464 Comm: rmmod Not tainted 5.10.1-sd-99871-g53835a2e8186 #5 [ 1811.373559] Hardware name: Freescale i.MX8DXL EVK (DT) [ 1811.378705] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 1811.384723] pc : sdhci_esdhc_imx_remove+0x28/0x15c [sdhci_esdhc_imx] [ 1811.391090] lr : platform_drv_remove+0x2c/0x50 [ 1811.395536] sp : ffff800012c7bcb0 [ 1811.398855] x29: ffff800012c7bcb0 x28: ffff00002c72b900 [ 1811.404181] x27: 0000000000000000 x26: 0000000000000000 [ 1811.409497] x25: 0000000000000000 x24: 0000000000000000 [ 1811.414814] x23: ffff0000042b3890 x22: ffff800009127120 [ 1811.420131] x21: ffff00002c4c9580 x20: ffff0000042d0810 [ 1811.425456] x19: ffff0000042d0800 x18: 0000000000000020 [ 1811.430773] x17: 0000000000000000 x16: 0000000000000000 [ 1811.436089] x15: 0000000000000004 x14: ffff000004019c10 [ 1811.441406] x13: 0000000000000000 x12: 0000000000000020 [ 1811.446723] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f [ 1811.452040] x9 : fefefeff6364626d x8 : 7f7f7f7f7f7f7f7f [ 1811.457356] x7 : 78725e6473607372 x6 : 0000000080808080 [ 1811.462673] x5 : 0000000000000000 x4 : 0000000000000000 [ 1811.467990] x3 : ffff800011ac1cb0 x2 : 0000000000000000 [ 1811.473307] x1 : ffff8000091214d4 x0 : ffff8000133a0030 [ 1811.478624] Call trace: [ 1811.481081] sdhci_esdhc_imx_remove+0x28/0x15c [sdhci_esdhc_imx] [ 1811.487098] platform_drv_remove+0x2c/0x50 [ 1811.491198] __device_release_driver+0x188/0x230 [ 1811.495818] driver_detach+0xc0/0x14c [ 1811.499487] bus_remove_driver+0x5c/0xb0 [ 1811.503413] driver_unregister+0x30/0x60 [ 1811.507341] platform_driver_unregister+0x14/0x20 [ 1811.512048] sdhci_esdhc_imx_driver_exit+0x1c/0x3a8 [sdhci_esdhc_imx] [ 1811.518495] __arm64_sys_delete_module+0x19c/0x230 [ 1811.523291] el0_svc_common.constprop.0+0x78/0x1a0 [ 1811.528086] do_el0_svc+0x24/0x90 [ 1811.531405] el0_svc+0x14/0x20 [ 1811.534461] el0_sync_handler+0x1a4/0x1b0 [ 1811.538474] el0_sync+0x174/0x180 [ 1811.541801] Code: a9025bf5 f9403e95 f9400ea0 9100c000 (b9400000) [ 1811.547902] ---[ end trace 3fb1a3bd48ff7be5 ]--- Signed-off-by: Frank Li <Frank.Li@nxp.com> Cc: stable@vger.kernel.org # v4.0+ Link: https://lore.kernel.org/r/20210210181933.29263-1-Frank.Li@nxp.com [Ulf: Clarified the commit message a bit] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbolsFangrui Song1-2/+19
commit ebfac7b778fac8b0e8e92ec91d0b055f046b4604 upstream. clang-12 -fno-pic (since https://github.com/llvm/llvm-project/commit/a084c0388e2a59b9556f2de0083333232da3f1d6) can emit `call __stack_chk_fail@PLT` instead of `call __stack_chk_fail` on x86. The two forms should have identical behaviors on x86-64 but the former causes GNU as<2.37 to produce an unreferenced undefined symbol _GLOBAL_OFFSET_TABLE_. (On x86-32, there is an R_386_PC32 vs R_386_PLT32 difference but the linker behavior is identical as far as Linux kernel is concerned.) Simply ignore _GLOBAL_OFFSET_TABLE_ for now, like what scripts/mod/modpost.c:ignore_undef_symbol does. This also fixes the problem for gcc/clang -fpie and -fpic, which may emit `call foo@PLT` for external function calls on x86. Note: ld -z defs and dynamic loaders do not error for unreferenced undefined symbols so the module loader is reading too much. If we ever need to ignore more symbols, the code should be refactored to ignore unreferenced symbols. Cc: <stable@vger.kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1250 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27178 Reported-by: Marco Elver <elver@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Marco Elver <elver@google.com> Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04nvmem: qcom-spmi-sdam: Fix uninitialized pdev pointerSubbaraman Narayanamurthy1-4/+3
commit e2057ee29973b9741d43d3f475a6b02fb46a0e61 upstream. "sdam->pdev" is uninitialized and it is used to print error logs. Fix it. Since device pointer can be used from sdam_config, use it directly thereby removing pdev pointer. Fixes: 40ce9798794f ("nvmem: add QTI SDAM driver") Cc: stable@vger.kernel.org Signed-off-by: Subbaraman Narayanamurthy <subbaram@codeaurora.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20210205100853.32372-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04KVM: nSVM: fix running nested guests when npt=0Paolo Bonzini1-0/+20
commit a04aead144fd938c2d9869eb187e5b9ea0009bae upstream. In case of npt=0 on host, nSVM needs the same .inject_page_fault tweak as VMX has, to make sure that shadow mmu faults are injected as vmexits. It is not clear why this is needed at all, but for now keep the same code as VMX and we'll fix it for both. Based on a patch by Maxim Levitsky <mlevitsk@redhat.com>. Fixes: 7c86663b68ba ("KVM: nSVM: inject exceptions via svm_check_nested_events") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mm, compaction: make fast_isolate_freepages() stay within zoneVlastimil Babka1-5/+11
commit 6e2b7044c199229a3d20cefbd3184968238c4184 upstream. Compaction always operates on pages from a single given zone when isolating both pages to migrate and freepages. Pageblock boundaries are intersected with zone boundaries to be safe in case zone starts or ends in the middle of pageblock. The use of pageblock_pfn_to_page() protects against non-contiguous pageblocks. The functions fast_isolate_freepages() and fast_isolate_around() don't currently protect the fast freepage isolation thoroughly enough against these corner cases, and can result in freepage isolation operate outside of zone boundaries: - in fast_isolate_freepages() if we get a pfn from the first pageblock of a zone that starts in the middle of that pageblock, 'highest' can be a pfn outside of the zone. If we fail to isolate anything in this function, we may then call fast_isolate_around() on a pfn outside of the zone and there effectively do a set_pageblock_skip(page_to_pfn(highest)) which may currently hit a VM_BUG_ON() in some configurations - fast_isolate_around() checks only the zone end boundary and not beginning, nor that the pageblock is contiguous (with pageblock_pfn_to_page()) so it's possible that we end up calling isolate_freepages_block() on a range of pfn's from two different zones and end up e.g. isolating freepages under the wrong zone's lock. This patch should fix the above issues. Link: https://lkml.kernel.org/r/20210217173300.6394-1-vbabka@suse.cz Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mm/vmscan: restore zone_reclaim_mode ABIDave Hansen2-7/+12
commit 519983645a9f2ec339cabfa0c6ef7b09be985dd0 upstream. I went to go add a new RECLAIM_* mode for the zone_reclaim_mode sysctl. Like a good kernel developer, I also went to go update the documentation. I noticed that the bits in the documentation didn't match the bits in the #defines. The VM never explicitly checks the RECLAIM_ZONE bit. The bit is, however implicitly checked when checking 'node_reclaim_mode==0'. The RECLAIM_ZONE #define was removed in a cleanup. That, by itself is fine. But, when the bit was removed (bit 0) the _other_ bit locations also got changed. That's not OK because the bit values are documented to mean one specific thing. Users surely do not expect the meaning to change from kernel to kernel. The end result is that if someone had a script that did: sysctl vm.zone_reclaim_mode=1 it would have gone from enabling node reclaim for clean unmapped pages to writing out pages during node reclaim after the commit in question. That's not great. Put the bits back the way they were and add a comment so something like this is a bit harder to do again. Update the documentation to make it clear that the first bit is ignored. Link: https://lkml.kernel.org/r/20210219172555.FF0CDF23@viggo.jf.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE") Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Daniel Wagner <dwagner@suse.de> Cc: "Tobin C. Harding" <tobin@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04hugetlb: fix copy_huge_page_from_user contig page struct assumptionMike Kravetz1-4/+6
commit 3272cfc2525b3a2810a59312d7a1e6f04a0ca3ef upstream. page structs are not guaranteed to be contiguous for gigantic pages. The routine copy_huge_page_from_user can encounter gigantic pages, yet it assumes page structs are contiguous when copying pages from user space. Since page structs for the target gigantic page are not contiguous, the data copied from user space could overwrite other pages not associated with the gigantic page and cause data corruption. Non-contiguous page structs are generally not an issue. However, they can exist with a specific kernel configuration and hotplug operations. For example: Configure the kernel with CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the gigantic page will be allocated. Link: https://lkml.kernel.org/r/20210217184926.33567-2-mike.kravetz@oracle.com Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04hugetlb: fix update_and_free_page contig page struct assumptionMike Kravetz1-2/+4
commit dbfee5aee7e54f83d96ceb8e3e80717fac62ad63 upstream. page structs are not guaranteed to be contiguous for gigantic pages. The routine update_and_free_page can encounter a gigantic page, yet it assumes page structs are contiguous when setting page flags in subpages. If update_and_free_page encounters non-contiguous page structs, we can see “BUG: Bad page state in process …” errors. Non-contiguous page structs are generally not an issue. However, they can exist with a specific kernel configuration and hotplug operations. For example: Configure the kernel with CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the gigantic page will be allocated. Zi Yan outlined steps to reproduce here [1]. [1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidia.com/ Link: https://lkml.kernel.org/r/20210217184926.33567-1-mike.kravetz@oracle.com Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime") Signed-off-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mm: memcontrol: fix get_active_memcg return valueMuchun Song1-7/+3
commit 1685bde6b9af55923180a76152036c7fb7176db0 upstream. We use a global percpu int_active_memcg variable to store the remote memcg when we are in the interrupt context. But get_active_memcg always return the current->active_memcg or root_mem_cgroup. The remote memcg (set in the interrupt context) is ignored. This is not what we want. So fix it. Link: https://lkml.kernel.org/r/20210223091101.42150-1-songmuchun@bytedance.com Fixes: 37d5985c003d ("mm: kmem: prepare remote memcg charging infra for interrupt contexts") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mm: memcontrol: fix swap undercounting in cgroup2Muchun Song1-1/+13
commit cae3af62b33aa931427a0f211e04347b22180b36 upstream. When pages are swapped in, the VM may retain the swap copy to avoid repeated writes in the future. It's also retained if shared pages are faulted back in some processes, but not in others. During that time we have an in-memory copy of the page, as well as an on-swap copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly differently due to the nature of how they account memory and swap: Cgroup1 has a unified memory+swap counter that tracks a data page regardless whether it's in-core or swapped out. On swapin, we transfer the charge from the swap entry to the newly allocated swapcache page, even though the swap entry might stick around for a while. That's why we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). Cgroup2 tracks memory and swap as separate, independent resources and thus has split memory and swap counters. On swapin, we charge the newly allocated swapcache page as memory, while the swap slot in turn must remain charged to the swap counter as long as its allocated too. The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control"), because it accidentally removed the do_memsw_account() check in the branch inside mem_cgroup_uncharge() that was supposed to tell the difference between the charge transfer in cgroup1 and the separate counters in cgroup2. As a result, cgroup2 currently undercounts retained swap to varying degrees: swap slots are cached up to 50% of the configured limit or total available swap space; partially faulted back shared pages are only limited by physical capacity. This in turn allows cgroups to significantly overconsume their alloted swap space. Add the do_memsw_account() check back to fix this problem. Link: https://lkml.kernel.org/r/20210217153237.92484-1-songmuchun@bytedance.com Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04x86: fix seq_file iteration for pat/memtype.cNeilBrown1-2/+2
commit 3d2fc4c082448e9c05792f9b2a11c1d5db408b85 upstream. The memtype seq_file iterator allocates a buffer in the ->start and ->next functions and frees it in the ->show function. The preferred handling for such resources is to free them in the subsequent ->next or ->stop function call. Since Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") there is no guarantee that ->show will be called after ->next, so this function can now leak memory. So move the freeing of the buffer to ->next and ->stop. Link: https://lkml.kernel.org/r/161248539022.21478.13874455485854739066.stgit@noble1 Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Signed-off-by: NeilBrown <neilb@suse.de> Cc: Xin Long <lucien.xin@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04seq_file: document how per-entry resources are managed.NeilBrown1-0/+6
commit b3656d8227f4c45812c6b40815d8f4e446ed372a upstream. Patch series "Fix some seq_file users that were recently broken". A recent change to seq_file broke some users which were using seq_file in a non-"standard" way ... though the "standard" isn't documented, so they can be excused. The result is a possible leak - of memory in one case, of references to a 'transport' in the other. These three patches: 1/ document and explain the problem 2/ fix the problem user in x86 3/ fix the problem user in net/sctp This patch (of 3): Users of seq_file will sometimes find it convenient to take a resource, such as a lock or memory allocation, in the ->start or ->next operations. These are per-entry resources, distinct from per-session resources which are taken in ->start and released in ->stop. The preferred management of these is release the resource on the subsequent call to ->next or ->stop. However prior to Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") it happened that ->show would always be called after ->start or ->next, and a few users chose to release the resource in ->show. This is no longer reliable. Since the mentioned commit, ->next will always come after a successful ->show (to ensure m->index is updated correctly), so the original ordering cannot be maintained. This patch updates the documentation to clearly state the required behaviour. Other patches will fix the few problematic users. [akpm@linux-foundation.org: fix typo, per Willy] Link: https://lkml.kernel.org/r/161248518659.21478.2484341937387294998.stgit@noble1 Link: https://lkml.kernel.org/r/161248539020.21478.3147971477400875336.stgit@noble1 Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Signed-off-by: NeilBrown <neilb@suse.de> Cc: Xin Long <lucien.xin@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04fs/affs: release old buffer head on error pathPan Bian1-1/+3
commit 70779b897395b330ba5a47bed84f94178da599f9 upstream. The reference count of the old buffer head should be decremented on path that fails to get the new buffer head. Fixes: 6b4657667ba0 ("fs/affs: add rename exchange") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mtd: spi-nor: hisi-sfc: Put child node np on error pathPan Bian1-1/+3
commit fe6653460ee7a7dbe0cd5fd322992af862ce5ab0 upstream. Put the child node np when it fails to get or register device. Fixes: e523f11141bd ("mtd: spi-nor: add hisilicon spi-nor flash controller driver") Cc: stable@vger.kernel.org Signed-off-by: Pan Bian <bianpan2016@163.com> [ta: Add Fixes tag and Cc stable] Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20210121091847.85362-1-bianpan2016@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mtd: spi-nor: core: Add erase size check for erase command initializationTakahiro Kuwano1-0/+1
commit 58fa22f68fcaff20ce4d08a6adffa64f65ccd37d upstream. Even if erase type is same as previous region, erase size can be different if the previous region is overlaid region. Since 'region->size' is assigned to 'cmd->size' for overlaid region, comparing 'erase->size' and 'cmd->size' can detect previous overlaid region. Fixes: 5390a8df769e ("mtd: spi-nor: add support to non-uniform SFDP SPI NOR flash memories") Cc: stable@vger.kernel.org Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano@infineon.com> [ta: Add Fixes tag and Cc to stable] Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/13d47e8d8991b8a7fd8cc7b9e2a5319c56df35cc.1601612872.git.Takahiro.Kuwano@infineon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mtd: spi-nor: core: Fix erase type discovery for overlaid regionTakahiro Kuwano1-4/+5
commit 969b276718de37dfe66fce3a5633f611e8cd58fd upstream. In case of overlaid regions in which their biggest erase size command overpasses in size the region's size, only the non-overlaid portion of the sector gets erased. For example, if a Sector Erase command is applied to a 256-kB range that is overlaid by 4-kB sectors, the overlaid 4-kB sectors are not affected by the erase. For overlaid regions, 'region->size' is assigned to 'cmd->size' later in spi_nor_init_erase_cmd(), so 'erase->size' can be greater than 'len'. Fixes: 5390a8df769e ("mtd: spi-nor: add support to non-uniform SFDP SPI NOR flash memories") Cc: stable@vger.kernel.org Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano@infineon.com> [ta: Update commit description, add Fixes tag and Cc to stable] Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/fa5d8b944a5cca488ac54ba37c95e775ac2deb34.1601612872.git.Takahiro.Kuwano@infineon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mtd: spi-nor: sfdp: Fix wrong erase type bitmask for overlaid regionTakahiro Kuwano1-1/+1
commit abdf5a5ef9652bad4d58058bc22ddf23543ba3e1 upstream. At the time spi_nor_region_check_overlay() is called, the erase types are sorted in ascending order of erase size. The 'erase_type' should be masked with 'BIT(erase[i].idx)' instead of 'BIT(i)'. Fixes: b038e8e3be72 ("mtd: spi-nor: parse SFDP Sector Map Parameter Table") Cc: stable@vger.kernel.org Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano@infineon.com> [ta: Add Fixes tag and Cc to stable] Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/fd90c40d5b626a1319a78fc2bcee79a8871d4d57.1601612872.git.Takahiro.Kuwano@infineon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mtd: spi-nor: sfdp: Fix last erase region markingTakahiro Kuwano1-2/+1
commit 9166f4af32db74e1544a2149aef231ff24515ea3 upstream. The place of spi_nor_region_mark_end() must be moved, because 'i' is re-used for the index of erase[]. Fixes: b038e8e3be72 ("mtd: spi-nor: parse SFDP Sector Map Parameter Table") Cc: stable@vger.kernel.org Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano@infineon.com> [ta: Add Fixes tag and Cc to stable] Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/02ce8d84b7989ebee33382f6494df53778dd508e.1601612872.git.Takahiro.Kuwano@infineon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04coresight: etm4x: Handle accesses to TRCSTALLCTLRSuzuki K Poulose2-4/+7
commit f72896063396b0cb205cbf0fd76ec6ab3ca11c8a upstream. TRCSTALLCTLR register is only implemented if TRCIDR3.STALLCTL == 0b1 Make sure the driver touches the register only it is implemented. Link: https://lore.kernel.org/r/20210127184617.3684379-1-suzuki.poulose@arm.com Cc: stable@vger.kernel.org Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20210201181351.1475223-32-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04watchdog: mei_wdt: request stop on unregisterAlexander Usyskin1-0/+1
commit 740c0a57b8f1e36301218bf549f3c9cc833a60be upstream. The MEI bus has a special behavior on suspend it destroys all the attached devices, this is due to the fact that also firmware context is not persistent across power flows. If watchdog on MEI bus is ticking before suspending the firmware times out and reports that the OS is missing watchdog tick. Send the stop command to the firmware on watchdog unregistered to eliminate the false event on suspend. This does not make the things worse from the user-space perspective as a user-space should re-open watchdog device after suspending before this patch. Cc: <stable@vger.kernel.org> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210124114938.373885-1-tomas.winkler@intel.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04watchdog: qcom: Remove incorrect usage of QCOM_WDT_ENABLE_IRQSai Prakash Ranjan1-12/+1
commit a4f3407c41605d14f09e490045d0609990cd5d94 upstream. As per register documentation, QCOM_WDT_ENABLE_IRQ which is BIT(1) of watchdog control register is wakeup interrupt enable bit and not related to bark interrupt at all, BIT(0) is used for that. So remove incorrect usage of this bit when supporting bark irq for pre-timeout notification. Currently with this bit set and bark interrupt specified, pre-timeout notification and/or watchdog reset/bite does not occur. Fixes: 36375491a439 ("watchdog: qcom: support pre-timeout when the bark irq is available") Cc: stable@vger.kernel.org Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20210126150241.10009-1-saiprakash.ranjan@codeaurora.org Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>