summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-11-18block: Hold invalidate_lock in BLKDISCARD ioctlShin'ichiro Kawasaki1-3/+9
commit 7607c44c157d343223510c8ffdf7206fdd2a6213 upstream. When BLKDISCARD ioctl and data read race, the data read leaves stale page cache. To avoid the stale page cache, hold invalidate_lock of the block device file mapping. The stale page cache is observed when blktests test case block/009 is repeated hundreds of times. This patch can be applied back to the stable kernel version v5.15.y with slight patch edit. Rework is required for older stable kernels. Fixes: 351499a172c0 ("block: Invalidate cache on discard v2") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211109104723.835533-2-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18drm/i915/guc: Fix blocked context accountingMatthew Brost1-1/+1
commit fc30a6764a54dea42291aeb7009bef7aa2fc1cd4 upstream. Prior to this patch the blocked context counter was cleared on init_sched_state (used during registering a context & resets) which is incorrect. This state needs to be persistent or the counter can read the incorrect value resulting in scheduling never getting enabled again. Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: <stable@vger.kernel.org> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-2-matthew.brost@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18erofs: fix unsafe pagevec reuse of hooked pclustersGao Xiang2-9/+17
commit 86432a6dca9bed79111990851df5756d3eb5f57c upstream. There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL before actual I/O submission. Thus, the decompression chain can be extended if the following pcluster chain hooks such tail pcluster. As the related comment mentioned, if some page is made of a hooked pcluster and another followed pcluster, it can be reused for in-place I/O (since I/O should be submitted anyway): _______________________________________________________________ | tail (partial) page | head (partial) page | |_____PRIMARY_HOOKED___|____________PRIMARY_FOLLOWED____________| However, it's by no means safe to reuse as pagevec since if such PRIMARY_HOOKED pclusters finally move into bypass chain without I/O submission. It's somewhat hard to reproduce with LZ4 and I just found it (general protection fault) by ro_fsstressing a LZMA image for long time. I'm going to actively clean up related code together with multi-page folio adaption in the next few months. Let's address it directly for easier backporting for now. Call trace for reference: z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs] z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs] z_erofs_runqueue+0x5f3/0x840 [erofs] z_erofs_readahead+0x1e8/0x320 [erofs] read_pages+0x91/0x270 page_cache_ra_unbounded+0x18b/0x240 filemap_get_pages+0x10a/0x5f0 filemap_read+0xa9/0x330 new_sync_read+0x11b/0x1a0 vfs_read+0xf1/0x190 Link: https://lore.kernel.org/r/20211103182006.4040-1-xiang@kernel.org Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18ceph: fix mdsmap decode when there are MDS's beyond max_mdsXiubo Li1-4/+0
commit 0e24421ac431e7af62d4acef6c638b85aae51728 upstream. If the max_mds is decreased in a cephfs cluster, there is a window of time before the MDSs are removed. If a map goes out during this period, the mdsmap may show the decreased max_mds but still shows those MDSes as in or in the export target list. Ensure that we don't fail the map decode in that case. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52436 Fixes: d517b3983dd3 ("ceph: reconnect to the export targets on new mdsmaps") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18f2fs: fix UAF in f2fs_available_free_memoryDongliang Mu1-0/+2
commit 5429c9dbc9025f9a166f64e22e3a69c94fd5b29b upstream. if2fs_fill_super -> f2fs_build_segment_manager -> create_discard_cmd_control -> f2fs_start_discard_thread It invokes kthread_run to create a thread and run issue_discard_thread. However, if f2fs_build_node_manager fails, the control flow goes to free_nm and calls f2fs_destroy_node_manager. This function will free sbi->nm_info. However, if issue_discard_thread accesses sbi->nm_info after the deallocation, but before the f2fs_stop_discard_thread, it will cause UAF(Use-after-free). -> f2fs_destroy_segment_manager -> destroy_discard_cmd_control -> f2fs_stop_discard_thread Fix this by stopping discard thread before f2fs_destroy_node_manager. Note that, the commit d6d2b491a82e1 introduces the call of f2fs_available_free_memory into issue_discard_thread. Cc: stable@vger.kernel.org Fixes: d6d2b491a82e ("f2fs: allow to change discard policy based on cached discard cmds") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18f2fs: include non-compressed blocks in compr_written_blockDaeho Jeong1-0/+1
commit 09631cf3234d32156e7cae32275f5a4144c683c5 upstream. Need to include non-compressed blocks in compr_written_block to estimate average compression ratio more accurately. Fixes: 5ac443e26a09 ("f2fs: add sysfs nodes to get runtime compression stat") Cc: stable@vger.kernel.org Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18f2fs: should use GFP_NOFS for directory inodesJaegeuk Kim2-2/+2
commit 92d602bc7177325e7453189a22e0c8764ed3453e upstream. We use inline_dentry which requires to allocate dentry page when adding a link. If we allow to reclaim memory from filesystem, we do down_read(&sbi->cp_rwsem) twice by f2fs_lock_op(). I think this should be okay, but how about stopping the lockdep complaint [1]? f2fs_create() - f2fs_lock_op() - f2fs_do_add_link() - __f2fs_find_entry - f2fs_get_read_data_page() -> kswapd - shrink_node - f2fs_evict_inode - f2fs_lock_op() [1] fs_reclaim ){+.+.}-{0:0} : kswapd0: lock_acquire+0x114/0x394 kswapd0: __fs_reclaim_acquire+0x40/0x50 kswapd0: prepare_alloc_pages+0x94/0x1ec kswapd0: __alloc_pages_nodemask+0x78/0x1b0 kswapd0: pagecache_get_page+0x2e0/0x57c kswapd0: f2fs_get_read_data_page+0xc0/0x394 kswapd0: f2fs_find_data_page+0xa4/0x23c kswapd0: find_in_level+0x1a8/0x36c kswapd0: __f2fs_find_entry+0x70/0x100 kswapd0: f2fs_do_add_link+0x84/0x1ec kswapd0: f2fs_mkdir+0xe4/0x1e4 kswapd0: vfs_mkdir+0x110/0x1c0 kswapd0: do_mkdirat+0xa4/0x160 kswapd0: __arm64_sys_mkdirat+0x24/0x34 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 kswapd0: -> #1 ( &sbi->cp_rwsem ){++++}-{3:3} : kswapd0: lock_acquire+0x114/0x394 kswapd0: down_read+0x7c/0x98 kswapd0: f2fs_do_truncate_blocks+0x78/0x3dc kswapd0: f2fs_truncate+0xc8/0x128 kswapd0: f2fs_evict_inode+0x2b8/0x8b8 kswapd0: evict+0xd4/0x2f8 kswapd0: iput+0x1c0/0x258 kswapd0: do_unlinkat+0x170/0x2a0 kswapd0: __arm64_sys_unlinkat+0x4c/0x68 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 Cc: stable@vger.kernel.org Fixes: bdbc90fa55af ("f2fs: don't put dentry page in pagecache into highmem") Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Light Hsieh <light.hsieh@mediatek.com> Tested-by: Light Hsieh <light.hsieh@mediatek.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18irqchip/sifive-plic: Fixup EOI failed when maskedGuo Ren1-1/+7
commit 69ea463021be0d159ab30f96195fb0dd18ee2272 upstream. When using "devm_request_threaded_irq(,,,,IRQF_ONESHOT,,)" in a driver, only the first interrupt is handled, and following interrupts are never delivered (initially reported in [1]). That's because the RISC-V PLIC cannot EOI masked interrupts, as explained in the description of Interrupt Completion in the PLIC spec [2]: <quote> The PLIC signals it has completed executing an interrupt handler by writing the interrupt ID it received from the claim to the claim/complete register. The PLIC does not check whether the completion ID is the same as the last claim ID for that target. If the completion ID does not match an interrupt source that *is currently enabled* for the target, the completion is silently ignored. </quote> Re-enable the interrupt before completion if it has been masked during the handling, and remask it afterwards. [1] http://lists.infradead.org/pipermail/linux-riscv/2021-July/007441.html [2] https://github.com/riscv/riscv-plic-spec/blob/8bc15a35d07c9edf7b5d23fec9728302595ffc4d/riscv-plic.adoc Fixes: bb0fed1c60cc ("irqchip/sifive-plic: Switch to fasteoi flow") Reported-by: Vincent Pelletier <plr.vincent@gmail.com> Tested-by: Nikita Shubin <nikita.shubin@maquefel.me> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> [maz: amended commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211105094748.3894453-1-guoren@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()Michael Pratt3-2/+20
commit ca7752caeaa70bd31d1714af566c9809688544af upstream. copy_process currently copies task_struct.posix_cputimers_work as-is. If a timer interrupt arrives while handling clone and before dup_task_struct completes then the child task will have: 1. posix_cputimers_work.scheduled = true 2. posix_cputimers_work.work queued. copy_process clears task_struct.task_works, so (2) will have no effect and posix_cpu_timers_work will never run (not to mention it doesn't make sense for two tasks to share a common linked list). Since posix_cpu_timers_work never runs, posix_cputimers_work.scheduled is never cleared. Since scheduled is set, future timer interrupts will skip scheduling work, with the ultimate result that the task will never receive timer expirations. Together, the complete flow is: 1. Task 1 calls clone(), enters kernel. 2. Timer interrupt fires, schedules task work on Task 1. 2a. task_struct.posix_cputimers_work.scheduled = true 2b. task_struct.posix_cputimers_work.work added to task_struct.task_works. 3. dup_task_struct() copies Task 1 to Task 2. 4. copy_process() clears task_struct.task_works for Task 2. 5. Future timer interrupts on Task 2 see task_struct.posix_cputimers_work.scheduled = true and skip scheduling work. Fix this by explicitly clearing contents of task_struct.posix_cputimers_work in copy_process(). This was never meant to be shared or inherited across tasks in the first place. Fixes: 1fb497dd0030 ("posix-cpu-timers: Provide mechanisms to defer timer handling to task_work") Reported-by: Rhys Hiltner <rhys@justin.tv> Signed-off-by: Michael Pratt <mpratt@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211101210615.716522-1-mpratt@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18KVM: x86: move guest_pv_has out of user_access sectionPaolo Bonzini1-3/+6
commit 3e067fd8503d6205aa0c1c8f48f6b209c592d19c upstream. When UBSAN is enabled, the code emitted for the call to guest_pv_has includes a call to __ubsan_handle_load_invalid_value. objtool complains that this call happens with UACCESS enabled; to avoid the warning, pull the calls to user_access_begin into both arms of the "if" statement, after the check for guest_pv_has. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18PCI/MSI: Destroy sysfs before freeing entriesThomas Gleixner1-5/+5
commit 3735459037114d31e5acd9894fad9aed104231a0 upstream. free_msi_irqs() frees the MSI entries before destroying the sysfs entries which are exposing them. Nothing prevents a concurrent free while a sysfs file is read and accesses the possibly freed entry. Move the sysfs release ahead of freeing the entries. Fixes: 1c51b50c2995 ("PCI/MSI: Export MSI mode using attributes, not kobjects") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18PCI/MSI: Move non-mask check back into low level accessorsThomas Gleixner3-15/+17
commit 9c8e9c9681a0f3f1ae90a90230d059c7a1dece5a upstream. The recent rework of PCI/MSI[X] masking moved the non-mask checks from the low level accessors into the higher level mask/unmask functions. This missed the fact that these accessors can be invoked from other places as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and also violates the virtual MSIX and the msi_attrib.maskbit protections. Instead of sprinkling checks all over the place, lift them back into the low level accessor functions. To avoid checking three different conditions combine them into one property of msi_desc::msi_attrib. [ josef: Fixed the missed conversion in the core code ] Fixes: fcacdfbef5a1 ("PCI/MSI: Provide a new set of mask and unmask functions") Reported-by: Josef Johansson <josef@oderland.se> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Josef Johansson <josef@oderland.se> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18x86/mce: Add errata workaround for Skylake SKX37Dave Jones1-2/+3
commit e629fc1407a63dbb748f828f9814463ffc2a0af0 upstream. Errata SKX37 is word-for-word identical to the other errata listed in this workaround. I happened to notice this after investigating a CMCI storm on a Skylake host. While I can't confirm this was the root cause, spurious corrected errors does sound like a likely suspect. Fixes: 2976908e4198 ("x86/mce: Do not log spurious corrected mce errors") Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20211029205759.GA7385@codemonkey.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVELMaciej W. Rozycki1-1/+4
commit a923a2676e60683aee46aa4b93c30aff240ac20d upstream. Fix assembly errors like: {standard input}: Assembler messages: {standard input}:287: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32' {standard input}:680: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32' {standard input}:1274: Error: opcode not supported on this processor: mips3 (mips3) `dins $12,$9,32,32' {standard input}:2175: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32' make[1]: *** [scripts/Makefile.build:277: mm/highmem.o] Error 1 with code produced from `__cmpxchg64' for MIPS64r2 CPU configurations using CONFIG_32BIT and CONFIG_PHYS_ADDR_T_64BIT. This is due to MIPS_ISA_ARCH_LEVEL downgrading the assembly architecture to `r4000' i.e. MIPS III for MIPS64r2 configurations, while there is a block of code containing a DINS MIPS64r2 instruction conditionalized on MIPS_ISA_REV >= 2 within the scope of the downgrade. The assembly architecture override code pattern has been put there for LL/SC instructions, so that code compiles for configurations that select a processor to build for that does not support these instructions while still providing run-time support for processors that do, dynamically switched by non-constant `cpu_has_llsc'. It went in with linux-mips.org commit aac8aa7717a2 ("Enable a suitable ISA for the assembler around ll/sc so that code builds even for processors that don't support the instructions. Plus minor formatting fixes.") back in 2005. Fix the problem by wrapping these instructions along with the adjacent SYNC instructions only, following the practice established with commit cfd54de3b0e4 ("MIPS: Avoid move psuedo-instruction whilst using MIPS_ISA_LEVEL") and commit 378ed6f0e3c5 ("MIPS: Avoid using .set mips0 to restore ISA"). Strictly speaking the SYNC instructions do not have to be wrapped as they are only used as a Loongson3 erratum workaround, so they will be enabled in the assembler by default, but do this so as to keep code consistent with other places. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: c7e2d71dda7a ("MIPS: Fix set_pte() for Netlogic XLR using cmpxchg64()") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18MIPS: fix *-pkg builds for loongson2ef platformMasahiro Yamada1-0/+2
commit 0706f74f719e6e72c3a862ab2990796578fa73cc upstream. Since commit 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed"), package builds for the loongson2f platform fail. $ make ARCH=mips CROSS_COMPILE=mips64-linux- lemote2f_defconfig bindeb-pkg [ snip ] sh ./scripts/package/builddeb arch/mips/loongson2ef//Platform:36: *** only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop. cp: cannot stat '': No such file or directory make[5]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1 make[4]: *** [Makefile:1558: intdeb-pkg] Error 2 make[3]: *** [debian/rules:13: binary-arch] Error 2 dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2 make[2]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2 make[1]: *** [Makefile:1558: bindeb-pkg] Error 2 make: *** [Makefile:350: __build_one_by_one] Error 2 The reason is because "make image_name" fails. $ make ARCH=mips CROSS_COMPILE=mips64-linux- image_name arch/mips/loongson2ef//Platform:36: *** only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop. In general, adding $(error ...) in the parse stage is troublesome, and it is pointless to check toolchains even if we are not building anything. Do not include Kbuild.platform in such cases. Fixes: 805b2e1d427a ("kbuild: include Makefile.compiler only when compiler is needed") Reported-by: Jason Self <jason@bluehome.net> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18MIPS: fix duplicated slashes for Platform file pathMasahiro Yamada1-1/+1
commit cca2aac8acf470b01066f559acd7146fc4c32ae8 upstream. platform-y accumulates platform names with a slash appended. The current $(patsubst ...) ends up with doubling slashes. GNU Make still include Platform files, but in case of an error, a clumsy file path is displayed: arch/mips/loongson2ef//Platform:36: *** only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Jason Self <jason@bluehome.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18parisc: Flush kernel data mapping in set_pte_at() when installing pte for ↵John David Anglin2-4/+10
user page commit 38860b2c8bb1b92f61396eb06a63adff916fc31d upstream. For years, there have been random segmentation faults in userspace on SMP PA-RISC machines. It occurred to me that this might be a problem in set_pte_at(). MIPS and some other architectures do cache flushes when installing PTEs with the present bit set. Here I have adapted the code in update_mmu_cache() to flush the kernel mapping when the kernel flush is deferred, or when the kernel mapping may alias with the user mapping. This simplifies calls to update_mmu_cache(). I also changed the barrier in set_pte() from a compiler barrier to a full memory barrier. I know this change is not sufficient to fix the problem. It might not be needed. I have had a few days of operation with 5.14.16 to 5.15.1 and haven't seen any random segmentation faults on rp3440 or c8000 so far. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@kernel.org # 5.12+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18parisc: Fix backtrace to always include init funtion namesHelge Deller1-1/+2
commit 279917e27edc293eb645a25428c6ab3f3bca3f86 upstream. I noticed that sometimes at kernel startup the backtraces did not included the function names of init functions. Their address were not resolved to function names and instead only the address was printed. Debugging shows that the culprit is is_ksym_addr() which is called by the backtrace functions to check if an address belongs to a function in the kernel. The problem occurs only for CONFIG_KALLSYMS_ALL=y. When looking at is_ksym_addr() one can see that for CONFIG_KALLSYMS_ALL=y the function only tries to resolve the address via is_kernel() function, which checks like this: if (addr >= _stext && addr <= _end) return 1; On parisc the init functions are located before _stext, so this check fails. Other platforms seem to have all functions (including init functions) behind _stext. The following patch moves the _stext symbol at the beginning of the kernel and thus includes the init section. This fixes the check and does not seem to have any negative side effects on where the kernel mapping happens in the map_pages() function in arch/parisc/mm/init.c. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@kernel.org # 5.4+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18ARM: 9156/1: drop cc-option fallbacks for architecture selectionArnd Bergmann1-11/+11
commit 418ace9992a7647c446ed3186df40cf165b67298 upstream. Naresh and Antonio ran into a build failure with latest Debian armhf compilers, with lots of output like tmp/ccY3nOAs.s:2215: Error: selected processor does not support `cpsid i' in ARM mode As it turns out, $(cc-option) fails early here when the FPU is not selected before CPU architecture is selected, as the compiler option check runs before enabling -msoft-float, which causes a problem when testing a target architecture level without an FPU: cc1: error: '-mfloat-abi=hard': selected architecture lacks an FPU Passing e.g. -march=armv6k+fp in place of -march=armv6k would avoid this issue, but the fallback logic is already broken because all supported compilers (gcc-5 and higher) are much more recent than these options, and building with -march=armv5t as a fallback no longer works. The best way forward that I see is to just remove all the checks, which also has the nice side-effect of slightly improving the startup time for 'make'. The -mtune=marvell-f option was apparently never supported by any mainline compiler, and the custom Codesourcery gcc build that did support is now too old to build kernels, so just use -mtune=xscale unconditionally for those. This should be safe to apply on all stable kernels, and will be required in order to keep building them with gcc-11 and higher. Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996419 Reported-by: Antonio Terceiro <antonio.terceiro@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com> Tested-by: Klaus Kudielka <klaus.kudielka@gmail.com> Cc: Matthias Klose <doko@debian.org> Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18ARM: 9155/1: fix early early_iounmap()Michał Mirosław1-2/+2
commit 0d08e7bf0d0d1a29aff7b16ef516f7415eb1aa05 upstream. Currently __set_fixmap() bails out with a warning when called in early boot from early_iounmap(). Fix it, and while at it, make the comment a bit easier to understand. Cc: <stable@vger.kernel.org> Fixes: b089c31c519c ("ARM: 8667/3: Fix memory attribute inconsistencies when using fixmap") Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18smb3: do not error on fsync when readonlySteve French1-6/+29
commit 71e6864eacbef0b2645ca043cdfbac272cb6cea3 upstream. Linux allows doing a flush/fsync on a file open for read-only, but the protocol does not allow that. If the file passed in on the flush is read-only try to find a writeable handle for the same inode, if that is not possible skip sending the fsync call to the server to avoid breaking the apps. Reported-by: Julian Sikorski <belegdol@gmail.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Suggested-by: Jeremy Allison <jra@samba.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18thermal: int340x: fix build on 32-bit targetsLinus Torvalds1-0/+1
[ Upstream commit d9c8e52ff9e84ff1a406330f9ea4de7c5eb40282 ] Commit aeb58c860dc5 ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses") started using 'readq()' to read 64-bit status responses from the int340x hardware. That's all fine and good, but on 32-bit targets a 64-bit 'readq()' is ambiguous, since it's no longer an atomic access. Some hardware might require 64-bit accesses, and other hardware might want low word first or high word first. It's quite likely that the driver isn't relevant in a 32-bit environment any more, and there's a patch floating around to just make it depend on X86_64, but let's make it buildable on x86-32 anyway. The driver previously just read the low 32 bits, so the hardware certainly is ok with 32-bit reads, and in a little-endian environment the low word first model is the natural one. So just add the include for the 'io-64-nonatomic-lo-hi.h' version. Fixes: aeb58c860dc5 ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses") Reported-by: Jakub Kicinski <kuba@kernel.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18selftests/net: udpgso_bench_rx: fix port argumentWillem de Bruijn1-4/+7
[ Upstream commit d336509cb9d03970911878bb77f0497f64fda061 ] The below commit added optional support for passing a bind address. It configures the sockaddr bind arguments before parsing options and reconfigures on options -b and -4. This broke support for passing port (-p) on its own. Configure sockaddr after parsing all arguments. Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18cxgb4: fix eeprom len when diagnostics not implementedRahul Lakkireddy2-2/+7
[ Upstream commit 4ca110bf8d9b31a60f8f8ff6706ea147d38ad97c ] Ensure diagnostics monitoring support is implemented for the SFF 8472 compliant port module and set the correct length for ethtool port module eeprom read. Fixes: f56ec6766dcf ("cxgb4: Add support for ethtool i2c dump") Signed-off-by: Manoj Malviya <manojmalviya@chelsio.com> Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net/smc: fix sk_refcnt underflow on linkdown and fallbackDust Li1-7/+11
[ Upstream commit e5d5aadcf3cd59949316df49c27cb21788d7efe4 ] We got the following WARNING when running ab/nginx test with RDMA link flapping (up-down-up). The reason is when smc_sock fallback and at linkdown happens simultaneously, we may got the following situation: __smc_lgr_terminate() --> smc_conn_kill() --> smc_close_active_abort() smc_sock->sk_state = SMC_CLOSED sock_put(smc_sock) smc_sock was set to SMC_CLOSED and sock_put() been called when terminate the link group. But later application call close() on the socket, then we got: __smc_release(): if (smc_sock->fallback) smc_sock->sk_state = SMC_CLOSED sock_put(smc_sock) Again we set the smc_sock to CLOSED through it's already in CLOSED state, and double put the refcnt, so the following warning happens: refcount_t: underflow; use-after-free. WARNING: CPU: 5 PID: 860 at lib/refcount.c:28 refcount_warn_saturate+0x8d/0xf0 Modules linked in: CPU: 5 PID: 860 Comm: nginx Not tainted 5.10.46+ #403 Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014 RIP: 0010:refcount_warn_saturate+0x8d/0xf0 Code: 05 5c 1e b5 01 01 e8 52 25 bc ff 0f 0b c3 80 3d 4f 1e b5 01 00 75 ad 48 RSP: 0018:ffffc90000527e50 EFLAGS: 00010286 RAX: 0000000000000026 RBX: ffff8881300df2c0 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffff88813bd58040 RDI: ffff88813bd58048 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000001 R10: ffff8881300df2c0 R11: ffffc90000527c78 R12: ffff8881300df340 R13: ffff8881300df930 R14: ffff88810b3dad80 R15: ffff8881300df4f8 FS: 00007f739de8fb80(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000a01b008 CR3: 0000000111b64003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: smc_release+0x353/0x3f0 __sock_release+0x3d/0xb0 sock_close+0x11/0x20 __fput+0x93/0x230 task_work_run+0x65/0xa0 exit_to_user_mode_prepare+0xf9/0x100 syscall_exit_to_user_mode+0x27/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This patch adds check in __smc_release() to make sure we won't do an extra sock_put() and set the socket to CLOSED when its already in CLOSED state. Fixes: 51f1de79ad8e (net/smc: replace sock_put worker by socket refcounting) Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18vsock: prevent unnecessary refcnt inc for nonblocking connectEiichi Tsukata1-0/+2
[ Upstream commit c7cd82b90599fa10915f41e3dd9098a77d0aa7b6 ] Currently vosck_connect() increments sock refcount for nonblocking socket each time it's called, which can lead to memory leak if it's called multiple times because connect timeout function decrements sock refcount only once. Fixes it by making vsock_connect() return -EALREADY immediately when sock state is already SS_CONNECTING. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: marvell: mvpp2: Fix wrong SerDes reconfiguration orderMarek Behún1-18/+20
[ Upstream commit bb7bbb6e36474933540c24ae1f1ad651b843981f ] Commit bfe301ebbc94 ("net: mvpp2: convert to use mac_prepare()/mac_finish()") introduced a bug wherein it leaves the MAC RESET register asserted after mac_finish(), due to wrong order of function calls. Before it was: .mac_config() mvpp22_mode_reconfigure() assert reset mvpp2_xlg_config() deassert reset Now it is: .mac_prepare() .mac_config() mvpp2_xlg_config() deassert reset .mac_finish() mvpp2_xlg_config() assert reset Obviously this is wrong. This bug is triggered when phylink tries to change the PHY interface mode from a GMAC mode (sgmii, 1000base-x, 2500base-x) to XLG mode (10gbase-r, xaui). The XLG mode does not work since reset is left asserted. Only after ifconfig down && ifconfig up is called will the XLG mode work. Move the call to mvpp22_mode_reconfigure() to .mac_prepare() implementation. Since some of the subsequent functions need to know whether the interface is being changed, we unfortunately also need to pass around the new interface mode before setting port->phy_interface. Fixes: bfe301ebbc94 ("net: mvpp2: convert to use mac_prepare()/mac_finish()") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: ethernet: ti: cpsw_ale: Fix access to un-initialized memoryChristophe JAILLET1-4/+2
[ Upstream commit 7a166854b4e24c57d56b3eba9fe1594985ee0a2c ] It is spurious to allocate a bitmap without initializing it. So, better safe than sorry, initialize it to 0 at least to have some known values. While at it, switch to the devm_bitmap_ API which is less verbose. Fixes: 4b41d3436796 ("net: ethernet: ti: cpsw: allow untagged traffic on host port") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: stmmac: allow a tc-taprio base-time of zeroVladimir Oltean1-2/+0
[ Upstream commit f64ab8e4f368f48afb08ae91928e103d17b235e9 ] Commit fe28c53ed71d ("net: stmmac: fix taprio configuration when base_time is in the past") allowed some base time values in the past, but apparently not all, the base-time value of 0 (Jan 1st 1970) is still explicitly denied by the driver. Remove the bogus check. Fixes: b60189e0392f ("net: stmmac: Integrate EST with TAPRIO scheduler API") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: hns3: allow configure ETS bandwidth of all TCsGuangbin Huang2-9/+2
[ Upstream commit 688db0c7a4a69ddc8b8143a1cac01eb20082a3aa ] Currently, driver only allow configuring ETS bandwidth of TCs according to the max TC number queried from firmware. However, the hardware actually supports 8 TCs and users may need to configure ETS bandwidth of all TCs, so remove the restriction. Fixes: 330baff5423b ("net: hns3: add ETS TC weight setting in SSU module") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: hns3: fix kernel crash when unload VF while it is being resetYufeng Mo2-0/+7
[ Upstream commit e140c7983e3054be0652bf914f4454f16c5520b0 ] When fully configure VLANs for a VF, then unload the VF while triggering a reset to PF, will cause a kernel crash because the irq is already uninit. [ 293.177579] ------------[ cut here ]------------ [ 293.183502] kernel BUG at drivers/pci/msi.c:352! [ 293.189547] Internal error: Oops - BUG: 0 [#1] SMP ...... [ 293.390124] Workqueue: hclgevf hclgevf_service_task [hclgevf] [ 293.402627] pstate: 80c00009 (Nzcv daif +PAN +UAO) [ 293.414324] pc : free_msi_irqs+0x19c/0x1b8 [ 293.425429] lr : free_msi_irqs+0x18c/0x1b8 [ 293.436545] sp : ffff00002716fbb0 [ 293.446950] x29: ffff00002716fbb0 x28: 0000000000000000 [ 293.459519] x27: 0000000000000000 x26: ffff45b91ea16b00 [ 293.472183] x25: 0000000000000000 x24: ffffa587b08f4700 [ 293.484717] x23: ffffc591ac30e000 x22: ffffa587b08f8428 [ 293.497190] x21: ffffc591ac30e300 x20: 0000000000000000 [ 293.509594] x19: ffffa58a062a8300 x18: 0000000000000000 [ 293.521949] x17: 0000000000000000 x16: ffff45b91dcc3f48 [ 293.534013] x15: 0000000000000000 x14: 0000000000000000 [ 293.545883] x13: 0000000000000040 x12: 0000000000000228 [ 293.557508] x11: 0000000000000020 x10: 0000000000000040 [ 293.568889] x9 : ffff45b91ea1e190 x8 : ffffc591802d0000 [ 293.580123] x7 : ffffc591802d0148 x6 : 0000000000000120 [ 293.591190] x5 : ffffc591802d0000 x4 : 0000000000000000 [ 293.602015] x3 : 0000000000000000 x2 : 0000000000000000 [ 293.612624] x1 : 00000000000004a4 x0 : ffffa58a1e0c6b80 [ 293.623028] Call trace: [ 293.630340] free_msi_irqs+0x19c/0x1b8 [ 293.638849] pci_disable_msix+0x118/0x140 [ 293.647452] pci_free_irq_vectors+0x20/0x38 [ 293.656081] hclgevf_uninit_msi+0x44/0x58 [hclgevf] [ 293.665309] hclgevf_reset_rebuild+0x1ac/0x2e0 [hclgevf] [ 293.674866] hclgevf_reset+0x358/0x400 [hclgevf] [ 293.683545] hclgevf_reset_service_task+0xd0/0x1b0 [hclgevf] [ 293.693325] hclgevf_service_task+0x4c/0x2e8 [hclgevf] [ 293.702307] process_one_work+0x1b0/0x448 [ 293.710034] worker_thread+0x54/0x468 [ 293.717331] kthread+0x134/0x138 [ 293.724114] ret_from_fork+0x10/0x18 [ 293.731324] Code: f940b000 b4ffff00 a903e7b8 f90017b6 (d4210000) This patch fixes the problem by waiting for the VF reset done while unloading the VF. Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: hns3: fix pfc packet number incorrect after querying pfc parametersJie Wang5-50/+48
[ Upstream commit 0b653a81a26d66ffe526a54c2177e24fb1400301 ] Currently, driver will send command to firmware to query pfc packet number when user uses dcb tool to get pfc parameters. However, the periodic service task will also periodically query and record MAC statistics, including pfc packet number. As the hardware registers of statistics is cleared after reading, it will cause pfc packet number of MAC statistics are not correct after using dcb tool to get pfc parameters. To fix this problem, when user uses dcb tool to get pfc parameters, driver updates MAC statistics firstly and then get pfc packet number from MAC statistics. Fixes: 64fd2300fcc1 ("net: hns3: add support for querying pfc puase packets statistic") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: hns3: fix ROCE base interrupt vector initialization bugJie Wang4-13/+2
[ Upstream commit beb27ca451a57a1c0e52b5268703f3c3173c1f8c ] Currently, NIC init ROCE interrupt vector with MSIX interrupt. But ROCE use pci_irq_vector() to get interrupt vector, which adds the relative interrupt vector again and gets wrong interrupt vector. So fixes it by assign relative interrupt vector to ROCE instead of MSIX interrupt vector and delete the unused struct member base_msi_vector declaration of hclgevf_dev. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_anyEric Dumazet1-10/+17
[ Upstream commit 6dc25401cba4d428328eade8ceae717633fdd702 ] 1) if q->tk_offset == TK_OFFS_MAX, then get_tcp_tstamp() calls ktime_mono_to_any() with out-of-bound value. 2) if q->tk_offset is changed in taprio_parse_clockid(), taprio_get_time() might also call ktime_mono_to_any() with out-of-bound value as sysbot found: UBSAN: array-index-out-of-bounds in kernel/time/timekeeping.c:908:27 index 3 is out of range for type 'ktime_t *[3]' CPU: 1 PID: 25668 Comm: kworker/u4:0 Not tainted 5.15.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 ubsan_epilogue+0xb/0x5a lib/ubsan.c:151 __ubsan_handle_out_of_bounds.cold+0x62/0x6c lib/ubsan.c:291 ktime_mono_to_any+0x1d4/0x1e0 kernel/time/timekeeping.c:908 get_tcp_tstamp net/sched/sch_taprio.c:322 [inline] get_packet_txtime net/sched/sch_taprio.c:353 [inline] taprio_enqueue_one+0x5b0/0x1460 net/sched/sch_taprio.c:420 taprio_enqueue+0x3b1/0x730 net/sched/sch_taprio.c:485 dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3785 __dev_xmit_skb net/core/dev.c:3869 [inline] __dev_queue_xmit+0x1f6e/0x3630 net/core/dev.c:4194 batadv_send_skb_packet+0x4a9/0x5f0 net/batman-adv/send.c:108 batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:393 [inline] batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:421 [inline] batadv_iv_send_outstanding_bat_ogm_packet+0x6d7/0x8e0 net/batman-adv/bat_iv_ogm.c:1701 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Fixes: 7ede7b03484b ("taprio: make clock reference conversions easier") Fixes: 54002066100b ("taprio: Adjust timestamps for TCP packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vedang Patel <vedang.patel@intel.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20211108180815.1822479-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18net: dsa: mv88e6xxx: Don't support >1G speeds on 6191X on ports other than 10Marek Behún1-1/+4
[ Upstream commit dc2fc9f03c5c410d8f01c2206b3d529f80b13733 ] Model 88E6191X only supports >1G speeds on port 10. Port 0 and 9 are only 1G. Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: Marek Behún <kabel@kernel.org> Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20211104171747.10509-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18drm/amdgpu: fix uvd crash on Polaris12 during driver unloadingEvan Quan1-11/+13
[ Upstream commit 4fc30ea780e0a5c1c019bc2e44f8523e1eed9051 ] There was a change(below) target for such issue: d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload") But the fix for VI ASICs was missing there. This is a supplement for that. Fixes: d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload") Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18seq_file: fix passing wrong private dataMuchun Song1-1/+1
[ Upstream commit 10a6de19cad6efb9b49883513afb810dc265fca2 ] DEFINE_PROC_SHOW_ATTRIBUTE() is supposed to be used to define a series of functions and variables to register proc file easily. And the users can use proc_create_data() to pass their own private data and get it via seq->private in the callback. Unfortunately, the proc file system use PDE_DATA() to get private data instead of inode->i_private. So fix it. Fortunately, there only one user of it which does not pass any private data, so this bug does not break any in-tree codes. Link: https://lkml.kernel.org/r/20211029032638.84884-1-songmuchun@bytedance.com Fixes: 97a32539b956 ("proc: convert everything to "struct proc_ops"") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Florent Revest <revest@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18init: make unknown command line param message clearerAndrew Halaney1-1/+3
[ Upstream commit 8bc2b3dca7292347d8e715fb723c587134abe013 ] The prior message is confusing users, which is the exact opposite of the goal. If the message is being seen, one of the following situations is happening: 1. the param is misspelled 2. the param is not valid due to the kernel configuration 3. the param is intended for init but isn't after the '--' delineator on the command line To make that more clear to the user, explicitly mention "kernel command line" and also note that the params are still passed to user space to avoid causing any alarm over params intended for init. Link: https://lkml.kernel.org/r/20211013223502.96756-1-ahalaney@redhat.com Fixes: 86d1919a4fb0 ("init: print out unknown kernel parameters") Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18drm/i915/fb: Fix rounding error in subsampled plane size calculationImre Deak1-2/+3
[ Upstream commit 90ab96f3872eae816f4e07deaa77322a91237960 ] For NV12 FBs with odd main surface tile-row height the CCS surface height was incorrectly calculated 1 less than the actual value. Fix this by rounding up the result of divison. For consistency do the same for the CCS surface width calculation. Fixes: b3e57bccd68a ("drm/i915/tgl: Gen-12 render decompression") Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211026225105.2783797-2-imre.deak@intel.com (cherry picked from commit 2ee5ef9c934ad26376c9282171e731e6c0339815) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18gve: Fix off by one in gve_tx_timeout()Dan Carpenter1-1/+1
[ Upstream commit 1c360cc1cc883fbdf0a258b4df376571fbeac5ee ] The priv->ntfy_blocks[] has "priv->num_ntfy_blks" elements so this > needs to be >= to prevent an off by one bug. The priv->ntfy_blocks[] array is allocated in gve_alloc_notify_blocks(). Fixes: 87a7f321bb6a ("gve: Recover from queue stall due to missed IRQ") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18dmaengine: stm32-dma: avoid 64-bit division in stm32_dma_get_max_widthArnd Bergmann1-3/+3
[ Upstream commit 2498363310e9b5e5de0e104709adc35c9f3ff7d9 ] Using the % operator on a 64-bit variable is expensive and can cause a link failure: arm-linux-gnueabi-ld: drivers/dma/stm32-dma.o: in function `stm32_dma_get_max_width': stm32-dma.c:(.text+0x170): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: drivers/dma/stm32-dma.o: in function `stm32_dma_set_xfer_param': stm32-dma.c:(.text+0x1cd4): undefined reference to `__aeabi_uldivmod' As we know that we just want to check the alignment in stm32_dma_get_max_width(), there is no need for a full division, and using a simple mask is a faster replacement. Same in stm32_dma_set_xfer_param(), change this to only allow burst transfers if the address is a multiple of the length. stm32_dma_get_best_burst just after will take buf_len into account to fix burst in case of misalignment. Fixes: b20fd5fa310c ("dmaengine: stm32-dma: fix stm32_dma_get_max_width") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Amelie Delaunay <amelie.delaunay@foss.st.com> Link: https://lore.kernel.org/r/20211103153312.41483-1-amelie.delaunay@foss.st.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18dmaengine: stm32-dma: fix burst in case of unaligned memory addressAmelie Delaunay1-4/+16
[ Upstream commit af229d2c2557b5cf2a3b1eb39847ec1de7446873 ] Theorically, address pointers used by STM32 DMA must be chosen so as to ensure that all transfers within a burst block are aligned on the address boundary equal to the size of the transfer. If this is always the case for peripheral addresses on STM32, it is not for memory addresses if the user doesn't respect this alignment constraint. To avoid a weird behavior of the DMA controller in this case (no error triggered but data are not transferred as expected), force no burst. Signed-off-by: Amelie Delaunay <amelie.delaunay@foss.st.com> Link: https://lore.kernel.org/r/20211011094259.315023-4-amelie.delaunay@foss.st.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_regJussi Maki2-6/+34
[ Upstream commit b2c4618162ec615a15883a804cce7e27afecfa58 ] The current conversion of skb->data_end reads like this: ; data_end = (void*)(long)skb->data_end; 559: (79) r1 = *(u64 *)(r2 +200) ; r1 = skb->data 560: (61) r11 = *(u32 *)(r2 +112) ; r11 = skb->len 561: (0f) r1 += r11 562: (61) r11 = *(u32 *)(r2 +116) 563: (1f) r1 -= r11 But similar to the case in 84f44df664e9 ("bpf: sock_ops sk access may stomp registers when dst_reg = src_reg"), the code will read an incorrect skb->len when src == dst. In this case we end up generating this xlated code: ; data_end = (void*)(long)skb->data_end; 559: (79) r1 = *(u64 *)(r1 +200) ; r1 = skb->data 560: (61) r11 = *(u32 *)(r1 +112) ; r11 = (skb->data)->len 561: (0f) r1 += r11 562: (61) r11 = *(u32 *)(r1 +116) 563: (1f) r1 -= r11 ... where line 560 is the reading 4B of (skb->data + 112) instead of the intended skb->len Here the skb pointer in r1 gets set to skb->data and the later deref for skb->len ends up following skb->data instead of skb. This fixes the issue similarly to the patch mentioned above by creating an additional temporary variable and using to store the register when dst_reg = src_reg. We name the variable bpf_temp_reg and place it in the cb context for sk_skb. Then we restore from the temp to ensure nothing is lost. Fixes: 16137b09a66f2 ("bpf: Compute data_end dynamically with JIT code") Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-6-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and collidingJohn Fastabend3-10/+38
[ Upstream commit e0dc3b93bd7bcff8c3813d1df43e0908499c7cf0 ] Strparser is reusing the qdisc_skb_cb struct to stash the skb message handling progress, e.g. offset and length of the skb. First this is poorly named and inherits a struct from qdisc that doesn't reflect the actual usage of cb[] at this layer. But, more importantly strparser is using the following to access its metadata. (struct _strp_msg *)((void *)skb->cb + offsetof(struct qdisc_skb_cb, data)) Where _strp_msg is defined as: struct _strp_msg { struct strp_msg strp; /* 0 8 */ int accum_len; /* 8 4 */ /* size: 12, cachelines: 1, members: 2 */ /* last cacheline: 12 bytes */ }; So we use 12 bytes of ->data[] in struct. However in BPF code running parser and verdict the user has read capabilities into the data[] array as well. Its not too problematic, but we should not be exposing internal state to BPF program. If its really needed then we can use the probe_read() APIs which allow reading kernel memory. And I don't believe cb[] layer poses any API breakage by moving this around because programs can't depend on cb[] across layers. In order to fix another issue with a ctx rewrite we need to stash a temp variable somewhere. To make this work cleanly this patch builds a cb struct for sk_skb types called sk_skb_cb struct. Then we can use this consistently in the strparser, sockmap space. Additionally we can start allowing ->cb[] write access after this. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-5-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18bpf, sockmap: Fix race in ingress receive verdict with redirect to selfJohn Fastabend1-0/+47
[ Upstream commit c5d2177a72a1659554922728fc407f59950aa929 ] A socket in a sockmap may have different combinations of programs attached depending on configuration. There can be no programs in which case the socket acts as a sink only. There can be a TX program in this case a BPF program is attached to sending side, but no RX program is attached. There can be an RX program only where sends have no BPF program attached, but receives are hooked with BPF. And finally, both TX and RX programs may be attached. Giving us the permutations: None, Tx, Rx, and TxRx To date most of our use cases have been TX case being used as a fast datapath to directly copy between local application and a userspace proxy. Or Rx cases and TxRX applications that are operating an in kernel based proxy. The traffic in the first case where we hook applications into a userspace application looks like this: AppA redirect AppB Tx <-----------> Rx | | + + TCP <--> lo <--> TCP In this case all traffic from AppA (after 3whs) is copied into the AppB ingress queue and no traffic is ever on the TCP recieive_queue. In the second case the application never receives, except in some rare error cases, traffic on the actual user space socket. Instead the send happens in the kernel. AppProxy socket pool sk0 ------------->{sk1,sk2, skn} ^ | | | | v ingress lb egress TCP TCP Here because traffic is never read off the socket with userspace recv() APIs there is only ever one reader on the sk receive_queue. Namely the BPF programs. However, we've started to introduce a third configuration where the BPF program on receive should process the data, but then the normal case is to push the data into the receive queue of AppB. AppB recv() (userspace) ----------------------- tcp_bpf_recvmsg() (kernel) | | | | | | ingress_msgQ | | | RX_BPF | | | v v sk->receive_queue This is different from the App{A,B} redirect because traffic is first received on the sk->receive_queue. Now for the issue. The tcp_bpf_recvmsg() handler first checks the ingress_msg queue for any data handled by the BPF rx program and returned with PASS code so that it was enqueued on the ingress msg queue. Then if no data exists on that queue it checks the socket receive queue. Unfortunately, this is the same receive_queue the BPF program is reading data off of. So we get a race. Its possible for the recvmsg() hook to pull data off the receive_queue before the BPF hook has a chance to read it. It typically happens when an application is banging on recv() and getting EAGAINs. Until they manage to race with the RX BPF program. To fix this we note that before this patch at attach time when the socket is loaded into the map we check if it needs a TX program or just the base set of proto bpf hooks. Then it uses the above general RX hook regardless of if we have a BPF program attached at rx or not. This patch now extends this check to handle all cases enumerated above, TX, RX, TXRX, and none. And to fix above race when an RX program is attached we use a new hook that is nearly identical to the old one except now we do not let the recv() call skip the RX BPF program. Now only the BPF program pulls data from sk->receive_queue and recv() only pulls data from the ingress msgQ post BPF program handling. With this resolved our AppB from above has been up and running for many hours without detecting any errors. We do this by correlating counters in RX BPF events and the AppB to ensure data is never skipping the BPF program. Selftests, was not able to detect this because we only run them for a short period of time on well ordered send/recvs so we don't get any of the noise we see in real application environments. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-4-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18bpf, sockmap: Remove unhash handler for BPF sockmap usageJohn Fastabend1-1/+0
[ Upstream commit b8b8315e39ffaca82e79d86dde26e9144addf66b ] We do not need to handle unhash from BPF side we can simply wait for the close to happen. The original concern was a socket could transition from ESTABLISHED state to a new state while the BPF hook was still attached. But, we convinced ourself this is no longer possible and we also improved BPF sockmap to handle listen sockets so this is no longer a problem. More importantly though there are cases where unhash is called when data is in the receive queue. The BPF unhash logic will flush this data which is wrong. To be correct it should keep the data in the receive queue and allow a receiving application to continue reading the data. This may happen when tcp_abort() is received for example. Instead of complicating the logic in unhash simply moving all this to tcp_close() hook solves this. Fixes: 51199405f9672 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-3-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functionsArnd Bergmann1-3/+9
[ Upstream commit c7c386fbc20262c1d911c615c65db6a58667d92c ] gcc warns about undefined behavior the vmalloc code when building with CONFIG_ARM64_PA_BITS_52, when the 'idx++' in the argument to __phys_to_pte_val() is evaluated twice: mm/vmalloc.c: In function 'vmap_pfn_apply': mm/vmalloc.c:2800:58: error: operation on 'data->idx' may be undefined [-Werror=sequence-point] 2800 | *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot)); | ~~~~~~~~~^~ arch/arm64/include/asm/pgtable-types.h:25:37: note: in definition of macro '__pte' 25 | #define __pte(x) ((pte_t) { (x) } ) | ^ arch/arm64/include/asm/pgtable.h:80:15: note: in expansion of macro '__phys_to_pte_val' 80 | __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) | ^~~~~~~~~~~~~~~~~ mm/vmalloc.c:2800:30: note: in expansion of macro 'pfn_pte' 2800 | *pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot)); | ^~~~~~~ I have no idea why this never showed up earlier, but the safest workaround appears to be changing those macros into inline functions so the arguments get evaluated only once. Cc: Matthew Wilcox <willy@infradead.org> Fixes: 75387b92635e ("arm64: handle 52-bit physical addresses in page table entries") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20211105075414.2553155-1-arnd@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18arm64: arm64_ftr_reg->name may not be a human-readable stringReiji Watanabe1-3/+7
[ Upstream commit 9dc232a8ab18bb20f1dcb03c8e049e3607f3ed15 ] The id argument of ARM64_FTR_REG_OVERRIDE() is used for two purposes: one as the system register encoding (used for the sys_id field of __ftr_reg_entry), and the other as the register name (stringified and used for the name field of arm64_ftr_reg), which is debug information. The id argument is supposed to be a macro that indicates an encoding of the register (eg. SYS_ID_AA64PFR0_EL1, etc). ARM64_FTR_REG(), which also has the same id argument, uses ARM64_FTR_REG_OVERRIDE() and passes the id to the macro. Since the id argument is completely macro-expanded before it is substituted into a macro body of ARM64_FTR_REG_OVERRIDE(), the stringified id in the body of ARM64_FTR_REG_OVERRIDE is not a human-readable register name, but a string of numeric bitwise operations. Fix this so that human-readable register names are available as debug information. Fixes: 8f266a5d878a ("arm64: cpufeature: Add global feature override facility") Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211101045421.2215822-1-reijiw@google.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18litex_liteeth: Fix a double free in the remove functionChristophe JAILLET1-1/+0
[ Upstream commit c45231a7668d6b632534f692b10592ea375b55b0 ] 'netdev' is a managed resource allocated in the probe using 'devm_alloc_etherdev()'. It must not be freed explicitly in the remove function. Fixes: ee7da21ac4c3 ("net: Add driver for LiteX's LiteETH network interface") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18nfc: pn533: Fix double free when pn533_fill_fragment_skbs() failsChengfeng Ye1-3/+3
[ Upstream commit 9fec40f850658e00a14a7dd9e06f7fbc7e59cc4a ] skb is already freed by dev_kfree_skb in pn533_fill_fragment_skbs, but follow error handler branch when pn533_fill_fragment_skbs() fails, skb is freed again, results in double free issue. Fix this by not free skb in error path of pn533_fill_fragment_skbs. Fixes: 963a82e07d4e ("NFC: pn533: Split large Tx frames in chunks") Fixes: 93ad42020c2d ("NFC: pn533: Target mode Tx fragmentation support") Signed-off-by: Chengfeng Ye <cyeaa@connect.ust.hk> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>