summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2021-03-11misc: eeprom_93xx46: Add quirk to support Microchip 93LC46B eepromAswath Govindraju1-0/+2
[ Upstream commit f6f1f8e6e3eea25f539105d48166e91f0ab46dd1 ] A dummy zero bit is sent preceding the data during a read transfer by the Microchip 93LC46B eeprom (section 2.7 of[1]). This results in right shift of data during a read. In order to ignore this bit a quirk can be added to send an extra zero bit after the read address. Add a quirk to ignore the zero bit sent before data by adding a zero bit after the read address. [1] - https://www.mouser.com/datasheet/2/268/20001749K-277859.pdf Signed-off-by: Aswath Govindraju <a-govindraju@ti.com> Link: https://lore.kernel.org/r/20210105105817.17644-3-a-govindraju@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-09crypto - shash: reduce minimum alignment of shash_desc structureArd Biesheuvel1-3/+6
commit 660d2062190db131d2feaf19914e90f868fe285c upstream. Unlike many other structure types defined in the crypto API, the 'shash_desc' structure is permitted to live on the stack, which implies its contents may not be accessed by DMA masters. (This is due to the fact that the stack may be located in the vmalloc area, which requires a different virtual-to-physical translation than the one implemented by the DMA subsystem) Our definition of CRYPTO_MINALIGN_ATTR is based on ARCH_KMALLOC_MINALIGN, which may take DMA constraints into account on architectures that support non-cache coherent DMA such as ARM and arm64. In this case, the value is chosen to reflect the largest cacheline size in the system, in order to ensure that explicit cache maintenance as required by non-coherent DMA masters does not affect adjacent, unrelated slab allocations. On arm64, this value is currently set at 128 bytes. This means that applying CRYPTO_MINALIGN_ATTR to struct shash_desc is both unnecessary (as it is never used for DMA), and undesirable, given that it wastes stack space (on arm64, performing the alignment costs 112 bytes in the worst case, and the hole between the 'tfm' and '__ctx' members takes up another 120 bytes, resulting in an increased stack footprint of up to 232 bytes.) So instead, let's switch to the minimum SLAB alignment, which does not take DMA constraints into account. Note that this is a no-op for x86. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07swap: fix swapfile read/write offsetJens Axboe1-0/+1
commit caf6912f3f4af7232340d500a4a2008f81b93f14 upstream. We're not factoring in the start of the file for where to write and read the swapfile, which leads to very unfortunate side effects of writing where we should not be... Fixes: dd6bd0d9c7db ("swap: use bdev_read_page() / bdev_write_page()") Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Anthony Iliopoulos <ailiop@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07net: fix dev_ifsioc_locked() race conditionCong Wang1-0/+3
commit 3b23a32a63219f51a5298bc55a65ecee866e79d0 upstream. dev_ifsioc_locked() is called with only RCU read lock, so when there is a parallel writer changing the mac address, it could get a partially updated mac address, as shown below: Thread 1 Thread 2 // eth_commit_mac_addr_change() memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN); // dev_ifsioc_locked() memcpy(ifr->ifr_hwaddr.sa_data, dev->dev_addr,...); Close this race condition by guarding them with a RW semaphore, like netdev_get_name(). We can not use seqlock here as it does not allow blocking. The writers already take RTNL anyway, so this does not affect the slow path. To avoid bothering existing dev_set_mac_address() callers in drivers, introduce a new wrapper just for user-facing callers on ioctl and rtnetlink paths. Note, bonding also changes slave mac addresses but that requires a separate patch due to the complexity of bonding code. Fixes: 3710becf8a58 ("net: RCU locking for simple ioctl()") Reported-by: "Gong, Sishuai" <sishuai@purdue.edu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sendingJason A. Donenfeld2-7/+20
commit ee576c47db60432c37e54b1e2b43a8ca6d3a8dca upstream. The icmp{,v6}_send functions make all sorts of use of skb->cb, casting it with IPCB or IP6CB, assuming the skb to have come directly from the inet layer. But when the packet comes from the ndo layer, especially when forwarded, there's no telling what might be in skb->cb at that point. As a result, the icmp sending code risks reading bogus memory contents, which can result in nasty stack overflows such as this one reported by a user: panic+0x108/0x2ea __stack_chk_fail+0x14/0x20 __icmp_send+0x5bd/0x5c0 icmp_ndo_send+0x148/0x160 In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read from it. The optlen parameter there is of particular note, as it can induce writes beyond bounds. There are quite a few ways that can happen in __ip_options_echo. For example: // sptr/skb are attacker-controlled skb bytes sptr = skb_network_header(skb); // dptr/dopt points to stack memory allocated by __icmp_send dptr = dopt->__data; // sopt is the corrupt skb->cb in question if (sopt->rr) { optlen = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data // this now writes potentially attacker-controlled data, over // flowing the stack: memcpy(dptr, sptr+sopt->rr, optlen); } In the icmpv6_send case, the story is similar, but not as dire, as only IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is worse than the iif case, but it is passed to ipv6_find_tlv, which does a bit of bounds checking on the value. This is easy to simulate by doing a `memset(skb->cb, 0x41, sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by good fortune and the rarity of icmp sending from that context that we've avoided reports like this until now. For example, in KASAN: BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0 Write of size 38 at addr ffff888006f1f80e by task ping/89 CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5 Call Trace: dump_stack+0x9a/0xcc print_address_description.constprop.0+0x1a/0x160 __kasan_report.cold+0x20/0x38 kasan_report+0x32/0x40 check_memory_region+0x145/0x1a0 memcpy+0x39/0x60 __ip_options_echo+0xa0e/0x12b0 __icmp_send+0x744/0x1700 Actually, out of the 4 drivers that do this, only gtp zeroed the cb for the v4 case, while the rest did not. So this commit actually removes the gtp-specific zeroing, while putting the code where it belongs in the shared infrastructure of icmp{,v6}_ndo_send. This commit fixes the issue by passing an empty IPCB or IP6CB along to the functions that actually do the work. For the icmp_send, this was already trivial, thanks to __icmp_send providing the plumbing function. For icmpv6_send, this required a tiny bit of refactoring to make it behave like the v4 case, after which it was straight forward. Fixes: a2b78e9b2cac ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs") Reported-by: SinYu <liuxyon@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04ipv6: silence compilation warning for non-IPV6 buildsLeon Romanovsky1-1/+1
commit 1faba27f11c8da244e793546a1b35a9b1da8208e upstream. The W=1 compilation of allmodconfig generates the following warning: net/ipv6/icmp.c:448:6: warning: no previous prototype for 'icmp6_send' [-Wmissing-prototypes] 448 | void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, | ^~~~~~~~~~ Fix it by providing function declaration for builds with ipv6 as a module. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04kgdb: fix to kill breakpoints on initmem after bootSumit Garg1-0/+2
commit d54ce6158e354f5358a547b96299ecd7f3725393 upstream. Currently breakpoints in kernel .init.text section are not handled correctly while allowing to remove them even after corresponding pages have been freed. Fix it via killing .init.text section breakpoints just prior to initmem pages being freed. Doug: "HW breakpoints aren't handled by this patch but it's probably not such a big deal". Link: https://lkml.kernel.org/r/20210224081652.587785-1-sumit.garg@linaro.org Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Suggested-by: Doug Anderson <dianders@chromium.org> Acked-by: Doug Anderson <dianders@chromium.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04dm: fix deadlock when swapping to encrypted deviceMikulas Patocka1-0/+5
commit a666e5c05e7c4aaabb2c5d58117b0946803d03d2 upstream. The system would deadlock when swapping to a dm-crypt device. The reason is that for each incoming write bio, dm-crypt allocates memory that holds encrypted data. These excessive allocations exhaust all the memory and the result is either deadlock or OOM trigger. This patch limits the number of in-flight swap bios, so that the memory consumed by dm-crypt is limited. The limit is enforced if the target set the "limit_swap_bios" variable and if the bio has REQ_SWAP set. Non-swap bios are not affected becuase taking the semaphore would cause performance degradation. This is similar to request-based drivers - they will also block when the number of requests is over the limit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTOREChris Wilson1-1/+1
commit bfe3911a91047557eb0e620f95a370aee6a248c7 upstream. Userspace has discovered the functionality offered by SYS_kcmp and has started to depend upon it. In particular, Mesa uses SYS_kcmp for os_same_file_description() in order to identify when two fd (e.g. device or dmabuf) point to the same struct file. Since they depend on it for core functionality, lift SYS_kcmp out of the non-default CONFIG_CHECKPOINT_RESTORE into the selectable syscall category. Rasmus Villemoes also pointed out that systemd uses SYS_kcmp to deduplicate the per-service file descriptor store. Note that some distributions such as Ubuntu are already enabling CHECKPOINT_RESTORE in their configs and so, by extension, SYS_kcmp. References: https://gitlab.freedesktop.org/drm/intel/-/issues/3046 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: stable@vger.kernel.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> # DRM depends on kcmp Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> # systemd uses kcmp Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210205220012.1983-1-chris@chris-wilson.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling pointFrederic Weisbecker1-0/+14
commit 4ae7dc97f726ea95c58ac58af71cc034ad22d7de upstream. Following the idle loop model, cleanly check for pending rcuog wakeup before the last rescheduling point upon resuming to guest mode. This way we can avoid to do it from rcu_user_enter() with the last resort self-IPI hack that enforces rescheduling. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04rcu/nocb: Perform deferred wake up before last idle's need_resched() checkFrederic Weisbecker1-0/+2
commit 43789ef3f7d61aa7bed0cb2764e588fc990c30ef upstream. Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP kthread (rcuog) to be serviced. Usually a local wake up happening while running the idle task is handled in one of the need_resched() checks carefully placed within the idle loop that can break to the scheduler. Unfortunately the call to rcu_idle_enter() is already beyond the last generic need_resched() check and we may halt the CPU with a resched request unhandled, leaving the task hanging. Fix this with splitting the rcuog wakeup handling from rcu_idle_enter() and place it before the last generic need_resched() check in the idle loop. It is then assumed that no call to call_rcu() will be performed after that in the idle loop until the CPU is put in low power mode. Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf) Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04KEYS: trusted: Reserve TPM for seal and unseal operationsJarkko Sakkinen1-1/+4
commit 8c657a0590de585b1115847c17b34a58025f2f4b upstream. When TPM 2.0 trusted keys code was moved to the trusted keys subsystem, the operations were unwrapped from tpm_try_get_ops() and tpm_put_ops(), which are used to take temporarily the ownership of the TPM chip. The ownership is only taken inside tpm_send(), but this is not sufficient, as in the key load TPM2_CC_LOAD, TPM2_CC_UNSEAL and TPM2_FLUSH_CONTEXT need to be done as a one single atom. Take the TPM chip ownership before sending anything with tpm_try_get_ops() and tpm_put_ops(), and use tpm_transmit_cmd() to send TPM commands instead of tpm_send(), reverting back to the old behaviour. Fixes: 2e19e10131a0 ("KEYS: trusted: Move TPM2 trusted keys code") Reported-by: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: stable@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Sumit Garg <sumit.garg@linaro.org> Acked-by Sumit Garg <sumit.garg@linaro.org> Tested-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04mm/rmap: fix potential pte_unmap on an not mapped pteMiaohe Lin1-1/+2
[ Upstream commit 5d5d19eda6b0ee790af89c45e3f678345be6f50f ] For PMD-mapped page (usually THP), pvmw->pte is NULL. For PTE-mapped THP, pvmw->pte is mapped. But for HugeTLB pages, pvmw->pte is not mapped and set to the relevant page table entry. So in page_vma_mapped_walk_done(), we may do pte_unmap() for HugeTLB pte which is not mapped. Fix this by checking pvmw->page against PageHuge before trying to do pte_unmap(). Link: https://lkml.kernel.org/r/20210127093349.39081-1-linmiaohe@huawei.com Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hongxiang Lou <louhongxiang@huawei.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michel Lespinasse <walken@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04mm: fix memory_failure() handling of dax-namespace metadataDan Williams1-0/+6
[ Upstream commit 34dc45be4563f344d59ba0428416d0d265aa4f4d ] Given 'struct dev_pagemap' spans both data pages and metadata pages be careful to consult the altmap if present to delineate metadata. In fact the pfn_first() helper already identifies the first valid data pfn, so export that helper for other code paths via pgmap_pfn_valid(). Other usage of get_dev_pagemap() are not a concern because those are operating on known data pfns having been looked up by get_user_pages(). I.e. metadata pfns are never user mapped. Link: https://lkml.kernel.org/r/161058501758.1840162.4239831989762604527.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 6100e34b2526 ("mm, memory_failure: Teach memory_failure() about dev_pagemap pages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04mm,thp,shmem: make khugepaged obey tmpfs mount flagsRik van Riel1-0/+2
[ Upstream commit cd89fb06509903f942a0ffe97ffa63034671ed0c ] Currently if thp enabled=[madvise], mounting a tmpfs filesystem with huge=always and mmapping files from that tmpfs does not result in khugepaged collapsing those mappings, despite the mount flag indicating that it should. Fix that by breaking up the blocks of tests in hugepage_vma_check a little bit, and testing things in the correct order. Link: https://lkml.kernel.org/r/20201124194925.623931-4-riel@surriel.com Fixes: c2231020ea7b ("mm: thp: register mm for khugepaged when merging vma for shmem") Signed-off-by: Rik van Riel <riel@surriel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xu Yu <xuyu@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04soundwire: export sdw_write/read_no_pm functionsBard Liao1-0/+2
[ Upstream commit 167790abb90fa073d8341ee0e408ccad3d2109cd ] sdw_write_no_pm and sdw_read_no_pm are useful when we want to do IO without touching PM. Fixes: 0231453bc08f ('soundwire: bus: add clock stop helpers') Fixes: 60ee9be25571 ('soundwire: bus: add PM/no-PM versions of read/write functions') Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20210122070634.12825-5-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04regulator: bd718x7, bd71828, Fix dvs voltage levelsMatti Vaittinen1-8/+6
[ Upstream commit c294554111a835598b557db789d9ad2379b512a2 ] The ROHM BD718x7 and BD71828 drivers support setting HW state specific voltages from device-tree. This is used also by various in-tree DTS files. These drivers do incorrectly try to compose bit-map using enum values. By a chance this works for first two valid levels having values 1 and 2 - but setting values for the rest of the levels do indicate capability of setting values for first levels as well. Luckily the regulators which support setting values for SUSPEND/LPSR do usually also support setting values for RUN and IDLE too - thus this has not been such a fatal issue. Fix this by defining the old enum values as bits and fixing the parsing code. This allows keeping existing IC specific drivers intact and only slightly changing the rohm-regulator.c Fixes: 21b72156ede8b ("regulator: bd718x7: Split driver to common and bd718x7 specific parts") Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Acked-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210212080023.GA880728@localhost.localdomain Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04iommu: Switch gather->end to the inclusive endYong Wu1-2/+2
[ Upstream commit 862c3715de8f3e5350489240c951d697f04bd8c9 ] Currently gather->end is "unsigned long" which may be overflow in arch32 in the corner case: 0xfff00000 + 0x100000(iova + size). Although it doesn't affect the size(end - start), it affects the checking "gather->end < end" This patch changes this "end" to the real end address (end = start + size - 1). Correspondingly, update the length to "end - start + 1". Fixes: a7d20dc19d9e ("iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes") Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210107122909.16317-5-yong.wu@mediatek.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04certs: Fix blacklist flag type confusionDavid Howells1-0/+1
[ Upstream commit 4993e1f9479a4161fd7d93e2b8b30b438f00cb0f ] KEY_FLAG_KEEP is not meant to be passed to keyring_alloc() or key_alloc(), as these only take KEY_ALLOC_* flags. KEY_FLAG_KEEP has the same value as KEY_ALLOC_BYPASS_RESTRICTION, but fortunately only key_create_or_update() uses it. LSMs using the key_alloc hook don't check that flag. KEY_FLAG_KEEP is then ignored but fortunately (again) the root user cannot write to the blacklist keyring, so it is not possible to remove a key/hash from it. Fix this by adding a KEY_ALLOC_SET_KEEP flag that tells key_alloc() to set KEY_FLAG_KEEP on the new key. blacklist_init() can then, correctly, pass this to keyring_alloc(). We can also use this in ima_mok_init() rather than setting the flag manually. Note that this doesn't fix an observable bug with the current implementation but it is required to allow addition of new hashes to the blacklist in the future without making it possible for them to be removed. Fixes: 734114f8782f ("KEYS: Add a system blacklist keyring") Reported-by: Mickaël Salaün <mic@linux.microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Mickaël Salaün <mic@linux.microsoft.com> cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04ima: Free IMA measurement buffer after kexec syscallLakshmi Ramasubramanian1-0/+5
[ Upstream commit f31e3386a4e92ba6eda7328cb508462956c94c64 ] IMA allocates kernel virtual memory to carry forward the measurement list, from the current kernel to the next kernel on kexec system call, in ima_add_kexec_buffer() function. This buffer is not freed before completing the kexec system call resulting in memory leak. Add ima_buffer field in "struct kimage" to store the virtual address of the buffer allocated for the IMA measurement list. Free the memory allocated for the IMA measurement list in kimage_file_post_load_cleanup() function. Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list") Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04tty: convert tty_ldisc_ops 'read()' function to take a kernel pointerLinus Torvalds1-1/+2
[ Upstream commit 3b830a9c34d5897be07176ce4e6f2d75e2c8cfd7 ] The tty line discipline .read() function was passed the final user pointer destination as an argument, which doesn't match the 'write()' function, and makes it very inconvenient to do a splice method for ttys. This is a conversion to use a kernel buffer instead. NOTE! It does this by passing the tty line discipline ->read() function an additional "cookie" to fill in, and an offset into the cookie data. The line discipline can fill in the cookie data with its own private information, and then the reader will repeat the read until either the cookie is cleared or it runs out of data. The only real user of this is N_HDLC, which can use this to handle big packets, even if the kernel buffer is smaller than the whole packet. Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04bpf: Declare __bpf_free_used_maps() unconditionallyAndrii Nakryiko1-2/+3
[ Upstream commit 936f8946bdb48239f4292812d4d2e26c6d328c95 ] __bpf_free_used_maps() is always defined in kernel/bpf/core.c, while include/linux/bpf.h is guarding it behind CONFIG_BPF_SYSCALL. Move it out of that guard region and fix compiler warning. Fixes: a2ea07465c8d ("bpf: Fix missing prog untrack in release_maps") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-4-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04bpf: Avoid warning when re-casting __bpf_call_base into __bpf_call_base_argsAndrii Nakryiko1-1/+1
[ Upstream commit 6943c2b05bf09fd5c5729f7d7d803bf3f126cb9a ] BPF interpreter uses extra input argument, so re-casts __bpf_call_base into __bpf_call_base_args. Avoid compiler warning about incompatible function prototypes by casting to void * first. Fixes: 1ea47e01ad6e ("bpf: add support for bpf_call to interpreter") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-3-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04bpf: Add bpf_patch_call_args prototype to include/linux/bpf.hAndrii Nakryiko1-0/+3
[ Upstream commit a643bff752dcf72a07e1b2ab2f8587e4f51118be ] Add bpf_patch_call_args() prototype. This function is called from BPF verifier and only if CONFIG_BPF_JIT_ALWAYS_ON is not defined. This fixes compiler warning about missing prototype in some kernel configurations. Fixes: 1ea47e01ad6e ("bpf: add support for bpf_call to interpreter") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-2-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04zsmalloc: account the number of compacted pages correctlyRokudo Yan1-1/+1
commit 2395928158059b8f9858365fce7713ce7fef62e4 upstream. There exists multiple path may do zram compaction concurrently. 1. auto-compaction triggered during memory reclaim 2. userspace utils write zram<id>/compaction node So, multiple threads may call zs_shrinker_scan/zs_compact concurrently. But pages_compacted is a per zsmalloc pool variable and modification of the variable is not serialized(through under class->lock). There are two issues here: 1. the pages_compacted may not equal to total number of pages freed(due to concurrently add). 2. zs_shrinker_scan may not return the correct number of pages freed(issued by current shrinker). The fix is simple: 1. account the number of pages freed in zs_compact locally. 2. use actomic variable pages_compacted to accumulate total number. Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages") Signed-off-by: Rokudo Yan <wu-yan@tcl.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-26mm: provide a saner PTE walking API for modulesPaolo Bonzini1-2/+4
commit 9fd6dad1261a541b3f5fa7dc5b152222306e6702 upstream. Currently, the follow_pfn function is exported for modules but follow_pte is not. However, follow_pfn is very easy to misuse, because it does not provide protections (so most of its callers assume the page is writable!) and because it returns after having already unlocked the page table lock. Provide instead a simplified version of follow_pte that does not have the pmdpp and range arguments. The older version survives as follow_invalidate_pte() for use by fs/dax.c. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2-1/+9
Pull networking fixes from David Miller: "Another pile of networing fixes: 1) ath9k build error fix from Arnd Bergmann 2) dma memory leak fix in mediatec driver from Lorenzo Bianconi. 3) bpf int3 kprobe fix from Alexei Starovoitov. 4) bpf stackmap integer overflow fix from Bui Quang Minh. 5) Add usb device ids for Cinterion MV31 to qmi_qwwan driver, from Christoph Schemmel. 6) Don't update deleted entry in xt_recent netfilter module, from Jazsef Kadlecsik. 7) Use after free in nftables, fix from Pablo Neira Ayuso. 8) Header checksum fix in flowtable from Sven Auhagen. 9) Validate user controlled length in qrtr code, from Sabyrzhan Tasbolatov. 10) Fix race in xen/netback, from Juergen Gross, 11) New device ID in cxgb4, from Raju Rangoju. 12) Fix ring locking in rxrpc release call, from David Howells. 13) Don't return LAPB error codes from x25_open(), from Xie He. 14) Missing error returns in gsi_channel_setup() from Alex Elder. 15) Get skb_copy_and_csum_datagram working properly with odd segment sizes, from Willem de Bruijn. 16) Missing RFS/RSS table init in enetc driver, from Vladimir Oltean. 17) Do teardown on probe failure in DSA, from Vladimir Oltean. 18) Fix compilation failures of txtimestamp selftest, from Vadim Fedorenko. 19) Limit rx per-napi gro queue size to fix latency regression, from Eric Dumazet. 20) dpaa_eth xdp fixes from Camelia Groza. 21) Missing txq mode update when switching CBS off, in stmmac driver, from Mohammad Athari Bin Ismail. 22) Failover pending logic fix in ibmvnic driver, from Sukadev Bhattiprolu. 23) Null deref fix in vmw_vsock, from Norbert Slusarek. 24) Missing verdict update in xdp paths of ena driver, from Shay Agroskin. 25) seq_file iteration fix in sctp from Neil Brown. 26) bpf 32-bit src register truncation fix on div/mod, from Daniel Borkmann. 27) Fix jmp32 pruning in bpf verifier, from Daniel Borkmann. 28) Fix locking in vsock_shutdown(), from Stefano Garzarella. 29) Various missing index bound checks in hns3 driver, from Yufeng Mo. 30) Flush ports on .phylink_mac_link_down() in dsa felix driver, from Vladimir Oltean. 31) Don't mix up stp and mrp port states in bridge layer, from Horatiu Vultur. 32) Fix locking during netif_tx_disable(), from Edwin Peer" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) bpf: Fix 32 bit src register truncation on div/mod bpf: Fix verifier jmp32 pruning decision logic bpf: Fix verifier jsgt branch analysis on max bound vsock: fix locking in vsock_shutdown() net: hns3: add a check for index in hclge_get_rss_key() net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx() net: hns3: add a check for queue_id in hclge_reset_vf_queue() net: dsa: felix: implement port flushing on .phylink_mac_link_down switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state net: watchdog: hold device global xmit lock during tx disable netfilter: nftables: relax check for stateful expressions in set definition netfilter: conntrack: skip identical origin tuple in same zone only vsock/virtio: update credit only if socket is not closed net: fix iteration for sctp transport seq_files net: ena: Update XDP verdict upon failure net/vmw_vsock: improve locking in vsock_connect_timeout() net/vmw_vsock: fix NULL pointer dereference ibmvnic: Clear failover_pending if unable to schedule net: stmmac: set TxQ mode back to DCB after disabling CBS ...
2021-02-09net: watchdog: hold device global xmit lock during tx disableEdwin Peer1-0/+2
Prevent netif_tx_disable() running concurrently with dev_watchdog() by taking the device global xmit lock. Otherwise, the recommended: netif_carrier_off(dev); netif_tx_disable(dev); driver shutdown sequence can happen after the watchdog has already checked carrier, resulting in possible false alarms. This is because netif_tx_lock() only sets the frozen bit without maintaining the locks on the individual queues. Fixes: c3f26a269c24 ("netdev: Fix lockdep warnings in multiqueue configurations.") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-07Merge tag 'irq_urgent_for_v5.11_rc7' of ↵Linus Torvalds2-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Prevent device managed IRQ allocation helpers from returning IRQ 0 - A fix for MSI activation of PCI endpoints with multiple MSIs * tag 'irq_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Prevent [devm_]irq_alloc_desc from returning irq 0 genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
2021-02-07Merge tag 'core_urgent_for_v5.11_rc7' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull syscall entry fixes from Borislav Petkov: - For syscall user dispatch, separate prctl operation from syscall redirection range specification before the API has been made official in 5.11. - Ensure tasks using the generic syscall code do trap after returning from a syscall when single-stepping is requested. * tag 'core_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Use different define for selector variable in SUD entry: Ensure trap after single-step on system call return
2021-02-06entry: Ensure trap after single-step on system call returnGabriel Krisman Bertazi2-0/+3
Commit 299155244770 ("entry: Drop usage of TIF flags in the generic syscall code") introduced a bug on architectures using the generic syscall entry code, in which processes stopped by PTRACE_SYSCALL do not trap on syscall return after receiving a TIF_SINGLESTEP. The reason is that the meaning of TIF_SINGLESTEP flag is overloaded to cause the trap after a system call is executed, but since the above commit, the syscall call handler only checks for the SYSCALL_WORK flags on the exit work. Split the meaning of TIF_SINGLESTEP such that it only means single-step mode, and create a new type of SYSCALL_WORK to request a trap immediately after a syscall in single-step mode. In the current implementation, the SYSCALL_WORK flag shadows the TIF_SINGLESTEP flag for simplicity. Update x86 to flip this bit when a tracer enables single stepping. Fixes: 299155244770 ("entry: Drop usage of TIF flags in the generic syscall code") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kyle Huey <me@kylehuey.com> Link: https://lore.kernel.org/r/87h7mtc9pr.fsf_-_@collabora.com
2021-02-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-7/+11
Merge misc fixes from Andrew Morton: "18 patches. Subsystems affected by this patch series: mm (hugetlb, compaction, vmalloc, shmem, memblock, pagecache, kasan, and hugetlb), mailmap, gcov, ubsan, and MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS/.mailmap: use my @kernel.org address mm: hugetlb: fix missing put_page in gather_surplus_pages() ubsan: implement __ubsan_handle_alignment_assumption kasan: make addr_has_metadata() return true for valid addresses kasan: add explicit preconditions to kasan_report() mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() mailmap: add entries for Manivannan Sadhasivam mailmap: fix name/email for Viresh Kumar memblock: do not start bottom-up allocations with kernel_end mm: thp: fix MADV_REMOVE deadlock on shmem THP init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov mm/vmalloc: separate put pages and flush VM flags mm, compaction: move high_pfn to the for loop scope mm: migrate: do not migrate HugeTLB page whose refcount is one mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active mm: hugetlb: fix a race between isolating and freeing page mm: hugetlb: fix a race between freeing and dissolving the page mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
2021-02-05genirq: Prevent [devm_]irq_alloc_desc from returning irq 0Hans de Goede1-2/+2
Since commit a85a6c86c25b ("driver core: platform: Clarify that IRQ 0 is invalid"), having a linux-irq with number 0 will trigger a WARN() when calling platform_get_irq*() to retrieve that linux-irq. Since [devm_]irq_alloc_desc allocs a single irq and since irq 0 is not used on some systems, it can return 0, triggering that WARN(). This happens e.g. on Intel Bay Trail and Cherry Trail devices using the LPE audio engine for HDMI audio: 0 is an invalid IRQ number WARNING: CPU: 3 PID: 472 at drivers/base/platform.c:238 platform_get_irq_optional+0x108/0x180 Modules linked in: snd_hdmi_lpe_audio(+) ... Call Trace: platform_get_irq+0x17/0x30 hdmi_lpe_audio_probe+0x4a/0x6c0 [snd_hdmi_lpe_audio] ---[ end trace ceece38854223a0b ]--- Change the 'from' parameter passed to __[devm_]irq_alloc_descs() by the [devm_]irq_alloc_desc macros from 0 to 1, so that these macros will no longer return 0. Fixes: a85a6c86c25b ("driver core: platform: Clarify that IRQ 0 is invalid") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201221185647.226146-1-hdegoede@redhat.com
2021-02-05kasan: add explicit preconditions to kasan_report()Vincenzo Frascino1-0/+7
Patch series "kasan: Fix metadata detection for KASAN_HW_TAGS", v5. With the introduction of KASAN_HW_TAGS, kasan_report() currently assumes that every location in memory has valid metadata associated. This is due to the fact that addr_has_metadata() returns always true. As a consequence of this, an invalid address (e.g. NULL pointer address) passed to kasan_report() when KASAN_HW_TAGS is enabled, leads to a kernel panic. Example below, based on arm64: BUG: KASAN: invalid-access in 0x0 Read at addr 0000000000000000 by task swapper/0/1 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 ... Call trace: mte_get_mem_tag+0x24/0x40 kasan_report+0x1a4/0x410 alsa_sound_last_init+0x8c/0xa4 do_one_initcall+0x50/0x1b0 kernel_init_freeable+0x1d4/0x23c kernel_init+0x14/0x118 ret_from_fork+0x10/0x34 Code: d65f03c0 9000f021 f9428021 b6cfff61 (d9600000) ---[ end trace 377c8bb45bdd3a1a ]--- hrtimer: interrupt took 48694256 ns note: swapper/0[1] exited with preempt_count 1 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b SMP: stopping secondary CPUs Kernel Offset: 0x35abaf140000 from 0xffff800010000000 PHYS_OFFSET: 0x40000000 CPU features: 0x0a7e0152,61c0a030 Memory Limit: none ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- This series fixes the behavior of addr_has_metadata() that now returns true only when the address is valid. This patch (of 2): With the introduction of KASAN_HW_TAGS, kasan_report() accesses the metadata only when addr_has_metadata() succeeds. Add a comment to make sure that the preconditions to the function are explicitly clarified. Link: https://lkml.kernel.org/r/20210126134409.47894-1-vincenzo.frascino@arm.com Link: https://lkml.kernel.org/r/20210126134409.47894-2-vincenzo.frascino@arm.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-05mm/vmalloc: separate put pages and flush VM flagsRick Edgecombe1-7/+2
When VM_MAP_PUT_PAGES was added, it was defined with the same value as VM_FLUSH_RESET_PERMS. This doesn't seem like it will cause any big functional problems other than some excess flushing for VM_MAP_PUT_PAGES allocations. Redefine VM_MAP_PUT_PAGES to have its own value. Also, rearrange things so flags are less likely to be missed in the future. Link: https://lkml.kernel.org/r/20210122233706.9304-1-rick.p.edgecombe@intel.com Fixes: b944afc9d64d ("mm: add a VM_MAP_PUT_PAGES flag for vmap") Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-05mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB pageMuchun Song1-0/+2
If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-05Merge tag 'iommu-fixes-v5.11-rc6' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fix from Joerg Roedel: "Fix a possible NULL-ptr dereference in dev_iommu_priv_get() which is too easy to accidentially trigger from IOMMU drivers. In the current case the AMD IOMMU driver triggered it on some machines in the IO-page-fault path, so fix it once and for all" * tag 'iommu-fixes-v5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu: Check dev->iommu in dev_iommu_priv_get() before dereferencing it
2021-02-05udp: fix skb_copy_and_csum_datagram with odd segment sizesWillem de Bruijn1-1/+7
When iteratively computing a checksum with csum_block_add, track the offset "pos" to correctly rotate in csum_block_add when offset is odd. The open coded implementation of skb_copy_and_csum_datagram did this. With the switch to __skb_datagram_iter calling csum_and_copy_to_iter, pos was reinitialized to 0 on each call. Bring back the pos by passing it along with the csum to the callback. Changes v1->v2 - pass csum value, instead of csump pointer (Alexander Duyck) Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/ Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Reported-by: Oliver Graute <oliver.graute@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-03Merge tag 'trace-v5.11-rc5' of ↵Linus Torvalds2-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Initialize tracing-graph-pause at task creation, not start of function tracing, to avoid corrupting the pause counter. - Set "pause-on-trace" for latency tracers as that option breaks their output (regression). - Fix the wrong error return for setting kretprobes on future modules (before they are loaded). - Fix re-registering the same kretprobe. - Add missing value check for added RCU variable reload. * tag 'trace-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracepoint: Fix race between tracing and removing tracepoint kretprobe: Avoid re-registration of the same kretprobe earlier tracing/kprobe: Fix to support kretprobe events on unloaded modules tracing: Use pause-on-trace with the latency tracers fgraph: Initialize tracing_graph_pause at task creation
2021-02-02iommu: Check dev->iommu in dev_iommu_priv_get() before dereferencing itJoerg Roedel1-1/+4
The dev_iommu_priv_get() needs a similar check to dev_iommu_fwspec_get() to make sure no NULL-ptr is dereferenced. Fixes: 05a0542b456e1 ("iommu/amd: Store dev_data as device iommu private data") Cc: stable@vger.kernel.org # v5.8+ Link: https://lore.kernel.org/r/20210202145419.29143-1-joro@8bytes.org Reference: https://bugzilla.kernel.org/show_bug.cgi?id=211241 Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-02-02tracepoint: Fix race between tracing and removing tracepointAlexey Kardashevskiy1-5/+7
When executing a tracepoint, the tracepoint's func is dereferenced twice - in __DO_TRACE() (where the returned pointer is checked) and later on in __traceiter_##_name where the returned pointer is dereferenced without checking which leads to races against tracepoint_removal_sync() and crashes. This adds a check before referencing the pointer in tracepoint_ptr_deref. Link: https://lkml.kernel.org/r/20210202072326.120557-1-aik@ozlabs.ru Cc: stable@vger.kernel.org Fixes: d25e37d89dd2f ("tracepoint: Optimize using static_call()") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-31Merge tag 'x86_entry_for_v5.11_rc6' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: "A single fix for objtool to generate proper unwind info for newer toolchains which do not generate section symbols anymore. And a cleanup ontop. This was originally going to go during the next merge window but people can already trigger a build error with binutils-2.36 which doesn't emit section symbols - something which objtool relies on - so let's expedite it" * tag 'x86_entry_for_v5.11_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Remove put_ret_addr_in_rdi THUNK macro argument x86/entry: Emit a symbol for register restoring thunk
2021-01-31Merge tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-2/+1
Pull NFS client fixes from Trond Myklebust: - SUNRPC: Handle 0 length opaque XDR object data properly - Fix a layout segment leak in pnfs_layout_process() - pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn - pNFS/NFSv4: Improve rejection of out-of-order layouts - pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() * tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Handle 0 length opaque XDR object data properly SUNRPC: Move simple_get_bytes and simple_get_netobj into private header pNFS/NFSv4: Improve rejection of out-of-order layouts pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
2021-01-30genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is setMarc Zyngier1-0/+6
When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI), __msi_domain_alloc_irqs() performs the activation of the interrupt (which in the case of PCI results in the endpoint being programmed) as soon as the interrupt is allocated. But it appears that this is only done for the first vector, introducing an inconsistent behaviour for PCI Multi-MSI. Fix it by iterating over the number of vectors allocated to each MSI descriptor. This is easily achieved by introducing a new "for_each_msi_vector" iterator, together with a tiny bit of refactoring. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Reported-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org
2021-01-29tracing/kprobe: Fix to support kretprobe events on unloaded modulesMasami Hiramatsu1-1/+1
Fix kprobe_on_func_entry() returns error code instead of false so that register_kretprobe() can return an appropriate error code. append_trace_kprobe() expects the kprobe registration returns -ENOENT when the target symbol is not found, and it checks whether the target module is unloaded or not. If the target module doesn't exist, it defers to probe the target symbol until the module is loaded. However, since register_kretprobe() returns -EINVAL instead of -ENOENT in that case, it always fail on putting the kretprobe event on unloaded modules. e.g. Kprobe event: /sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events [ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue. Kretprobe event: (p -> r) /sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events sh: write error: Invalid argument /sys/kernel/debug/tracing # cat error_log [ 41.122514] trace_kprobe: error: Failed to register probe event Command: r xfs:xfs_end_io ^ To fix this bug, change kprobe_on_func_entry() to detect symbol lookup failure and return -ENOENT in that case. Otherwise it returns -EINVAL or 0 (succeeded, given address is on the entry). Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 59158ec4aef7 ("tracing/kprobes: Check the probe on unloaded module correctly") Reported-by: Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-01-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-18/+0
Pull rdma fixes from Jason Gunthorpe: "Several recent regressions and some bug fixes: - Typo corrupting the max_recv_sge for cxgb4 - Regression from re-using kernel enums as a HW AbI in vmw_pvrdma - Sleeping inside a spinlock in hns - Revert the attempt to fix devlink deadlocks as the fix is more buggy - Typo in sysfs_emit_at conversions - Revert the removal of VLAN support in rxe" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: Revert "RDMA/rxe: Remove VLAN code leftovers from RXE" RDMA/usnic: Fix misuse of sysfs_emit_at Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion" RDMA/hns: Use mutex instead of spinlock for ida allocation RDMA/vmw_pvrdma: Fix network_hdr_type reported in WC RDMA/cxgb4: Fix the reported max_recv_sge value
2021-01-26Merge tag 'regulator-fix-v5.11-rc5' of ↵Linus Torvalds1-0/+30
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "The main thing here is a change to make sure that we don't try to double resolve the supply of a regulator if we have two probes going on simultaneously, plus an incremental fix on top of that to resolve a lockdep issue it introduced. There's also a patch from Dmitry Osipenko adding stubs for some functions to avoid build issues in consumers in some configurations" * tag 'regulator-fix-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: Fix lockdep warning resolving supplies regulator: consumer: Add missing stubs to regulator/consumer.h regulator: core: avoid regulator_resolve_supply() race condition
2021-01-25SUNRPC: Move simple_get_bytes and simple_get_netobj into private headerDave Wysochanski1-2/+1
Remove duplicated helper functions to parse opaque XDR objects and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h. In the new file carry the license and copyright from the source file net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside include/linux/sunrpc/xdr.h since lockd is not the only user of struct xdr_netobj. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-01-25Commit 9bb48c82aced ("tty: implement write_iter") converted the ttySami Tolvanen1-0/+1
layer to use write_iter. Fix the redirected_tty_write declaration also in n_tty and change the comparisons to use write_iter instead of write. [ Also moved the declaration of redirected_tty_write() to the proper location in a header file. The reason for the bug was the bogus extern declaration in n_tty.c silently not matching the changed definition in tty_io.c, and because it wasn't in a shared header file, there was no cross-checking of the declaration. Sami noticed because Clang's Control Flow Integrity checking ended up incidentally noticing the inconsistent declaration. - Linus ] Fixes: 9bb48c82aced ("tty: implement write_iter") Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24Merge tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+6
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph: - fix a status code in nvmet (Chaitanya Kulkarni) - avoid double completions in nvme-rdma/nvme-tcp (Chao Leng) - fix the CMB support to cope with NVMe 1.4 controllers (Klaus Jensen) - fix PRINFO handling in the passthrough ioctl (Revanth Rajashekar) - fix a double DMA unmap in nvme-pci - lightnvm error path leak fix (Pan) - MD pull request from Song: - Flush request fix (Xiao) * tag 'block-5.11-2021-01-24' of git://git.kernel.dk/linux-block: lightnvm: fix memory leak when submit fails nvme-pci: fix error unwind in nvme_map_data nvme-pci: refactor nvme_unmap_data md: Set prev_flush_start and flush_bio in an atomic way nvmet: set right status on error in id-ns handler nvme-pci: allow use of cmb on v1.4 controllers nvme-tcp: avoid request double completion for concurrent nvme_tcp_timeout nvme-rdma: avoid request double completion for concurrent nvme_rdma_timeout nvme: check the PRINFO bit before deciding the host buffer length