summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-12-14aio: fix use-after-free due to missing POLLFREE handlingEric Biggers2-32/+107
commit 50252e4b5e989ce64555c7aef7516bdefc2fea72 upstream. signalfd_poll() and binder_poll() are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, by sending a POLLFREE notification to all waiters. Unfortunately, only eventpoll handles POLLFREE. A second type of non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't handle POLLFREE. This allows a use-after-free to occur if a signalfd or binder fd is polled with aio poll, and the waitqueue gets freed. Fix this by making aio poll handle POLLFREE. A patch by Ramji Jiyani <ramjiyani@google.com> (https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com) tried to do this by making aio_poll_wake() always complete the request inline if POLLFREE is seen. However, that solution had two bugs. First, it introduced a deadlock, as it unconditionally locked the aio context while holding the waitqueue lock, which inverts the normal locking order. Second, it didn't consider that POLLFREE notifications are missed while the request has been temporarily de-queued. The second problem was solved by my previous patch. This patch then properly fixes the use-after-free by handling POLLFREE in a deadlock-free way. It does this by taking advantage of the fact that freeing of the waitqueue is RCU-delayed, similar to what eventpoll does. Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.18+ Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14aio: keep poll requests on waitqueue until completedEric Biggers1-20/+63
commit 363bee27e25804d8981dd1c025b4ad49dc39c530 upstream. Currently, aio_poll_wake() will always remove the poll request from the waitqueue. Then, if aio_poll_complete_work() sees that none of the polled events are ready and the request isn't cancelled, it re-adds the request to the waitqueue. (This can easily happen when polling a file that doesn't pass an event mask when waking up its waitqueue.) This is fundamentally broken for two reasons: 1. If a wakeup occurs between vfs_poll() and the request being re-added to the waitqueue, it will be missed because the request wasn't on the waitqueue at the time. Therefore, IOCB_CMD_POLL might never complete even if the polled file is ready. 2. When the request isn't on the waitqueue, there is no way to be notified that the waitqueue is being freed (which happens when its lifetime is shorter than the struct file's). This is supposed to happen via the waitqueue entries being woken up with POLLFREE. Therefore, leave the requests on the waitqueue until they are actually completed (or cancelled). To keep track of when aio_poll_complete_work needs to be scheduled, use new fields in struct poll_iocb. Remove the 'done' field which is now redundant. Note that this is consistent with how sys_poll() and eventpoll work; their wakeup functions do *not* remove the waitqueue entries. Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.18+ Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14signalfd: use wake_up_pollfree()Eric Biggers1-11/+1
commit 9537bae0da1f8d1e2361ab6d0479e8af7824e160 upstream. wake_up_poll() uses nr_exclusive=1, so it's not guaranteed to wake up all exclusive waiters. Yet, POLLFREE *must* wake up all waiters. epoll and aio poll are fortunately not affected by this, but it's very fragile. Thus, the new function wake_up_pollfree() has been introduced. Convert signalfd to use wake_up_pollfree(). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: d80e731ecab4 ("epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211209010455.42744-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14binder: use wake_up_pollfree()Eric Biggers1-12/+9
commit a880b28a71e39013e357fd3adccd1d8a31bc69a8 upstream. wake_up_poll() uses nr_exclusive=1, so it's not guaranteed to wake up all exclusive waiters. Yet, POLLFREE *must* wake up all waiters. epoll and aio poll are fortunately not affected by this, but it's very fragile. Thus, the new function wake_up_pollfree() has been introduced. Convert binder to use wake_up_pollfree(). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: f5cb779ba163 ("ANDROID: binder: remove waitqueue when thread exits.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211209010455.42744-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14wait: add wake_up_pollfree()Eric Biggers2-0/+33
commit 42288cb44c4b5fff7653bc392b583a2b8bd6a8c0 upstream. Several ->poll() implementations are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'. However, that has a bug: wake_up_poll() calls __wake_up() with nr_exclusive=1. Therefore, if there are multiple "exclusive" waiters, and the wakeup function for the first one returns a positive value, only that one will be called. That's *not* what's needed for POLLFREE; POLLFREE is special in that it really needs to wake up everyone. Considering the three non-blocking poll systems: - io_uring poll doesn't handle POLLFREE at all, so it is broken anyway. - aio poll is unaffected, since it doesn't support exclusive waits. However, that's fragile, as someone could add this feature later. - epoll doesn't appear to be broken by this, since its wakeup function returns 0 when it sees POLLFREE. But this is fragile. Although there is a workaround (see epoll), it's better to define a function which always sends POLLFREE to all waiters. Add such a function. Also make it verify that the queue really becomes empty after all waiters have been woken up. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211209010455.42744-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14io_uring: ensure task_work gets run as part of cancelationsJens Axboe1-2/+4
commit 78a780602075d8b00c98070fa26e389b3b3efa72 upstream. If we successfully cancel a work item but that work item needs to be processed through task_work, then we can be sleeping uninterruptibly in io_uring_cancel_generic() and never process it. Hence we don't make forward progress and we end up with an uninterruptible sleep warning. While in there, correct a comment that should be IFF, not IIF. Reported-and-tested-by: syzbot+21e6887c0be14181206d@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14libata: add horkage for ASMedia 1092Hannes Reinecke1-0/+2
commit a66307d473077b7aeba74e9b09c841ab3d399c2d upstream. The ASMedia 1092 has a configuration mode which will present a dummy device; sadly the implementation falsely claims to provide a device with 100M which doesn't actually exist. So disable this device to avoid errors during boot. Cc: stable@vger.kernel.org Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence.Bas Nieuwenhuizen1-1/+10
commit b19926d4f3a660a8b76e5d989ffd1168e619a5c4 upstream. dma_fence_chain_find_seqno only ever returns the top fence in the chain or an unsignalled fence. Hence if we request a seqno that is already signalled it returns a NULL fence. Some callers are not prepared to handle this, like the syncobj transfer functions for example. This behavior is "new" with timeline syncobj and it looks like not all callers were updated. To fix this behavior make sure that a successful drm_sync_find_fence always returns a non-NULL fence. v2: Move the fix to drm_syncobj_find_fence from the transfer functions. Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: stable@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211208023935.17018-1-bas@basnieuwenhuizen.nl Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14thermal: int340x: Fix VCoRefLow MMIO bit offset for TGLSumeet Pawnikar1-1/+1
commit f872f73601b92c86f3da8bdf3e19abd0f1780eb9 upstream. The VCoRefLow CPU FIVR register definition for Tiger Lake is incorrect. Current implementation reads it from MMIO offset 0x5A18 and bit offset [12:14], but the actual correct register definition is from bit offset [11:13]. Update to fix the bit offset. Fixes: 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver") Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Cc: 5.14+ <stable@vger.kernel.org> # 5.14+ [ rjw: New subject, changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14clk: qcom: regmap-mux: fix parent clock lookupDmitry Baryshkov3-1/+15
commit 9a61f813fcc8d56d85fcf9ca6119cf2b5ac91dd5 upstream. The function mux_get_parent() uses qcom_find_src_index() to find the parent clock index, which is incorrect: qcom_find_src_index() uses src enum for the lookup, while mux_get_parent() should use cfg field (which corresponds to the register value). Add qcom_find_cfg_index() function doing this kind of lookup and use it for mux parent lookup. Fixes: df964016490b ("clk: qcom: add parent map for regmap mux") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20211115233407.1046179-1-dmitry.baryshkov@linaro.org Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14mmc: renesas_sdhi: initialize variable properly when tuningWolfram Sang1-1/+1
commit 7dba402807a85fa3723f4a27504813caf81cc9d7 upstream. 'cmd_error' is not necessarily initialized on some error paths in mmc_send_tuning(). Initialize it. Fixes: 2c9017d0b5d3 ("mmc: renesas_sdhi: abort tuning when timeout detected") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211130132309.18246-1-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14hwmon: (pwm-fan) Ensure the fan going on in .probe()Billy Tsai1-2/+0
commit a2ca752055edd39be38b887e264d3de7ca2bc1bb upstream. Before commit 86585c61972f ("hwmon: (pwm-fan) stop using legacy PWM functions and some cleanups") pwm_apply_state() was called unconditionally in pwm_fan_probe(). In this commit this direct call was replaced by a call to __set_pwm(ct, MAX_PWM) which however is a noop if ctx->pwm_value already matches the value to set. After probe the fan is supposed to run at full speed, and the internal driver state suggests it does, but this isn't asserted and depending on bootloader and pwm low-level driver, the fan might just be off. So drop setting pwm_value to MAX_PWM to ensure the check in __set_pwm doesn't make it exit early and the fan goes on as intended. Cc: stable@vger.kernel.org Fixes: 86585c61972f ("hwmon: (pwm-fan) stop using legacy PWM functions and some cleanups") Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20211130092212.17783-1-billy_tsai@aspeedtech.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14selftests: KVM: avoid failures due to reserved HyperTransport regionPaolo Bonzini3-1/+78
commit c8cc43c1eae2910ac96daa4216e0fb3391ad0504 upstream. AMD proceessors define an address range that is reserved by HyperTransport and causes a failure if used for guest physical addresses. Avoid selftests failures by reserving those guest physical addresses; the rules are: - On parts with <40 bits, its fully hidden from software. - Before Fam17h, it was always 12G just below 1T, even if there was more RAM above this location. In this case we just not use any RAM above 1T. - On Fam17h and later, it is variable based on SME, and is either just below 2^48 (no encryption) or 2^43 (encryption). Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()") Cc: stable@vger.kernel.org Cc: David Matlack <dmatlack@google.com> Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210805105423.412878-1-pbonzini@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14tracefs: Have new files inherit the ownership of their parentSteven Rostedt (VMware)1-0/+4
commit ee7f3666995d8537dec17b1d35425f28877671a9 upstream. If directories in tracefs have their ownership changed, then any new files and directories that are created under those directories should inherit the ownership of the director they are created in. Link: https://lkml.kernel.org/r/20211208075720.4855d180@gandalf.local.home Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Yabin Cui <yabinc@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: stable@vger.kernel.org Fixes: 4282d60689d4f ("tracefs: Add new tracefs file system") Reported-by: Kalesh Singh <kaleshsingh@google.com> Reported: https://lore.kernel.org/all/CAC_TJve8MMAv+H_NdLSJXZUSoxOEq2zB_pVaJ9p=7H6Bu3X76g@mail.gmail.com/ Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14nfsd: Fix nsfd startup race (again)Alexander Sverdlin2-7/+8
commit b10252c7ae9c9d7c90552f88b544a44ee773af64 upstream. Commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") has re-opened rpc_pipefs_event() race against nfsd_net_id registration (register_pernet_subsys()) which has been fixed by commit bb7ffbf29e76 ("nfsd: fix nsfd startup race triggering BUG_ON"). Restore the order of register_pernet_subsys() vs register_cld_notifier(). Add WARN_ON() to prevent a future regression. Crash info: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012 CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1 pc : rpc_pipefs_event+0x54/0x120 [nfsd] lr : rpc_pipefs_event+0x48/0x120 [nfsd] Call trace: rpc_pipefs_event+0x54/0x120 [nfsd] blocking_notifier_call_chain rpc_fill_super get_tree_keyed rpc_fs_get_tree vfs_get_tree do_mount ksys_mount __arm64_sys_mount el0_svc_handler el0_svc Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14nfsd: fix use-after-free due to delegation raceJ. Bruce Fields1-2/+7
commit 548ec0805c399c65ed66c6641be467f717833ab5 upstream. A delegation break could arrive as soon as we've called vfs_setlease. A delegation break runs a callback which immediately (in nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. If we then exit nfs4_set_delegation without hashing the delegation, it will be freed as soon as the callback is done with it, without ever being removed from del_recall_lru. Symptoms show up later as use-after-free or list corruption warnings, usually in the laundromat thread. I suspect aba2072f4523 "nfsd: grant read delegations to clients holding writes" made this bug easier to hit, but I looked as far back as v3.0 and it looks to me it already had the same problem. So I'm not sure where the bug was introduced; it may have been there from the beginning. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14md: fix update super 1.0 on rdev size changeMarkus Hochholdinger1-0/+1
commit 55df1ce0d4e086e05a8ab20619c73c729350f965 upstream. The superblock of version 1.0 doesn't get moved to the new position on a device size change. This leads to a rdev without a superblock on a known position, the raid can't be re-assembled. The line was removed by mistake and is re-added by this patch. Fixes: d9c0fa509eaf ("md: fix max sectors calculation for super 1.0") Cc: stable@vger.kernel.org Signed-off-by: Markus Hochholdinger <markus@hochholdinger.net> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14perf intel-pt: Fix error timestamp setting on the decoder error pathAdrian Hunter1-0/+1
commit 6665b8e4836caa8023cbc7e53733acd234969c8c upstream. An error timestamp shows the last known timestamp for the queue, but this is not updated on the error path. Fix by setting it. Fixes: f4aa081949e7b6 ("perf tools: Add Intel PT decoder") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14perf intel-pt: Fix missing 'instruction' events with 'q' optionAdrian Hunter1-3/+8
commit a882cc94971093e146ffa1163b140ad956236754 upstream. FUP packets contain IP information, which makes them also an 'instruction' event in 'hop' mode i.e. the itrace 'q' option. That wasn't happening, so restructure the logic so that FUP events are added along with appropriate 'instruction' and 'branch' events. Fixes: 7c1b16ba0e26e6 ("perf intel-pt: Add support for decoding FUP/TIP only") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14perf intel-pt: Fix next 'err' value, walking traceAdrian Hunter1-0/+1
commit a32e6c5da599dbf49e60622a4dfb5b9b40ece029 upstream. Code after label 'next:' in intel_pt_walk_trace() assumes 'err' is zero, but it may not be, if arrived at via a 'goto'. Ensure it is zero. Fixes: 7c1b16ba0e26e6 ("perf intel-pt: Add support for decoding FUP/TIP only") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14perf intel-pt: Fix state setting when receiving overflow (OVF) packetAdrian Hunter1-4/+28
commit c79ee2b2160909889df67c8801352d3e69d43a1a upstream. An overflow (OVF packet) is treated as an error because it represents a loss of trace data, but there is no loss of synchronization, so the packet state should be INTEL_PT_STATE_IN_SYNC not INTEL_PT_STATE_ERR_RESYNC. To support that, some additional variables must be reset, and the FUP packet that may follow OVF is treated as an FUP event. Fixes: f4aa081949e7b6 ("perf tools: Add Intel PT decoder") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14perf intel-pt: Fix intel_pt_fup_event() assumptions about setting state typeAdrian Hunter1-19/+13
commit 4c761d805bb2d2ead1b9baaba75496152b394c80 upstream. intel_pt_fup_event() assumes it can overwrite the state type if there has been an FUP event, but this is an unnecessary and unexpected constraint on callers. Fix by touching only the state type flags that are affected by an FUP event. Fixes: a472e65fc490a ("perf intel-pt: Add decoder support for ptwrite and power event packets") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14perf intel-pt: Fix sync state when a PSB (synchronization) packet is foundAdrian Hunter1-1/+1
commit ad106a26aef3a95ac7ca88d033b431661ba346ce upstream. When syncing, it may be that branch packet generation is not enabled at that point, in which case there will not immediately be a control-flow packet, so some packets before a control flow packet turns up, get ignored. However, the decoder is in sync as soon as a PSB is found, so the state should be set accordingly. Fixes: f4aa081949e7b6 ("perf tools: Add Intel PT decoder") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14perf intel-pt: Fix some PGE (packet generation enable/control flow packets) ↵Adrian Hunter1-3/+4
usage commit 057ae59f5a1d924511beb1b09f395bdb316cfd03 upstream. Packet generation enable (PGE) refers to whether control flow (COFI) packets are being produced. PGE may be false even when branch-tracing is enabled, due to being out-of-context, or outside a filter address range. Fix some missing PGE usage. Fixes: 7c1b16ba0e26e6 ("perf intel-pt: Add support for decoding FUP/TIP only") Fixes: 839598176b0554 ("perf intel-pt: Allow decoding with branch tracing disabled") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14btrfs: free exchange changeset on failuresJohannes Thumshirn1-3/+9
commit da5e817d9d75422eaaa05490d0b9a5e328fc1a51 upstream. Fstests runs on my VMs have show several kmemleak reports like the following. unreferenced object 0xffff88811ae59080 (size 64): comm "xfs_io", pid 12124, jiffies 4294987392 (age 6.368s) hex dump (first 32 bytes): 00 c0 1c 00 00 00 00 00 ff cf 1c 00 00 00 00 00 ................ 90 97 e5 1a 81 88 ff ff 90 97 e5 1a 81 88 ff ff ................ backtrace: [<00000000ac0176d2>] ulist_add_merge+0x60/0x150 [btrfs] [<0000000076e9f312>] set_state_bits+0x86/0xc0 [btrfs] [<0000000014fe73d6>] set_extent_bit+0x270/0x690 [btrfs] [<000000004f675208>] set_record_extent_bits+0x19/0x20 [btrfs] [<00000000b96137b1>] qgroup_reserve_data+0x274/0x310 [btrfs] [<0000000057e9dcbb>] btrfs_check_data_free_space+0x5c/0xa0 [btrfs] [<0000000019c4511d>] btrfs_delalloc_reserve_space+0x1b/0xa0 [btrfs] [<000000006d37e007>] btrfs_dio_iomap_begin+0x415/0x970 [btrfs] [<00000000fb8a74b8>] iomap_iter+0x161/0x1e0 [<0000000071dff6ff>] __iomap_dio_rw+0x1df/0x700 [<000000002567ba53>] iomap_dio_rw+0x5/0x20 [<0000000072e555f8>] btrfs_file_write_iter+0x290/0x530 [btrfs] [<000000005eb3d845>] new_sync_write+0x106/0x180 [<000000003fb505bf>] vfs_write+0x24d/0x2f0 [<000000009bb57d37>] __x64_sys_pwrite64+0x69/0xa0 [<000000003eba3fdf>] do_syscall_64+0x43/0x90 In case brtfs_qgroup_reserve_data() or btrfs_delalloc_reserve_metadata() fail the allocated extent_changeset will not be freed. So in btrfs_check_data_free_space() and btrfs_delalloc_reserve_space() free the allocated extent_changeset to get rid of the allocated memory. The issue currently only happens in the direct IO write path, but only after 65b3c08606e5 ("btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range"), and also at defrag_one_locked_target(). Every other place is always calling extent_changeset_free() even if its call to btrfs_delalloc_reserve_space() or btrfs_check_data_free_space() has failed. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handlingQu Wenruo1-1/+2
commit 8289ed9f93bef2762f9184e136d994734b16d997 upstream. I hit the BUG_ON() with generic/475 test case, and to my surprise, all callers of btrfs_del_root_ref() are already aborting transaction, thus there is not need for such BUG_ON(), just go to @out label and caller will properly handle the error. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14btrfs: fix re-dirty process of tree-log nodesNaohiro Aota1-2/+3
commit 84c25448929942edacba905cecc0474e91114e7a upstream. There is a report of a transaction abort of -EAGAIN with the following script. #!/bin/sh for d in sda sdb; do mkfs.btrfs -d single -m single -f /dev/\${d} done mount /dev/sda /mnt/test mount /dev/sdb /mnt/scratch for dir in test scratch; do echo 3 >/proc/sys/vm/drop_caches fio --directory=/mnt/\${dir} --name=fio.\${dir} --rw=read --size=50G --bs=64m \ --numjobs=$(nproc) --time_based --ramp_time=5 --runtime=480 \ --group_reporting |& tee /dev/shm/fio.\${dir} echo 3 >/proc/sys/vm/drop_caches done for d in sda sdb; do umount /dev/\${d} done The stack trace is shown in below. [3310.967991] BTRFS: error (device sda) in btrfs_commit_transaction:2341: errno=-11 unknown (Error while writing out transaction) [3310.968060] BTRFS info (device sda): forced readonly [3310.968064] BTRFS warning (device sda): Skipping commit of aborted transaction. [3310.968065] ------------[ cut here ]------------ [3310.968066] BTRFS: Transaction aborted (error -11) [3310.968074] WARNING: CPU: 14 PID: 1684 at fs/btrfs/transaction.c:1946 btrfs_commit_transaction.cold+0x209/0x2c8 [3310.968131] CPU: 14 PID: 1684 Comm: fio Not tainted 5.14.10-300.fc35.x86_64 #1 [3310.968135] Hardware name: DIAWAY Tartu/Tartu, BIOS V2.01.B10 04/08/2021 [3310.968137] RIP: 0010:btrfs_commit_transaction.cold+0x209/0x2c8 [3310.968144] RSP: 0018:ffffb284ce393e10 EFLAGS: 00010282 [3310.968147] RAX: 0000000000000026 RBX: ffff973f147b0f60 RCX: 0000000000000027 [3310.968149] RDX: ffff974ecf098a08 RSI: 0000000000000001 RDI: ffff974ecf098a00 [3310.968150] RBP: ffff973f147b0f08 R08: 0000000000000000 R09: ffffb284ce393c48 [3310.968151] R10: ffffb284ce393c40 R11: ffffffff84f47468 R12: ffff973f101bfc00 [3310.968153] R13: ffff971f20cf2000 R14: 00000000fffffff5 R15: ffff973f147b0e58 [3310.968154] FS: 00007efe65468740(0000) GS:ffff974ecf080000(0000) knlGS:0000000000000000 [3310.968157] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [3310.968158] CR2: 000055691bcbe260 CR3: 000000105cfa4001 CR4: 0000000000770ee0 [3310.968160] PKRU: 55555554 [3310.968161] Call Trace: [3310.968167] ? dput+0xd4/0x300 [3310.968174] btrfs_sync_file+0x3f1/0x490 [3310.968180] __x64_sys_fsync+0x33/0x60 [3310.968185] do_syscall_64+0x3b/0x90 [3310.968190] entry_SYSCALL_64_after_hwframe+0x44/0xae [3310.968194] RIP: 0033:0x7efe6557329b [3310.968200] RSP: 002b:00007ffe0236ebc0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [3310.968203] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007efe6557329b [3310.968204] RDX: 0000000000000000 RSI: 00007efe58d77010 RDI: 0000000000000006 [3310.968205] RBP: 0000000004000000 R08: 0000000000000000 R09: 00007efe58d77010 [3310.968207] R10: 0000000016cacc0c R11: 0000000000000293 R12: 00007efe5ce95980 [3310.968208] R13: 0000000000000000 R14: 00007efe6447c790 R15: 0000000c80000000 [3310.968212] ---[ end trace 1a346f4d3c0d96ba ]--- [3310.968214] BTRFS: error (device sda) in cleanup_transaction:1946: errno=-11 unknown The abort occurs because of a write hole while writing out freeing tree nodes of a tree-log tree. For zoned btrfs, we re-dirty a freed tree node to ensure btrfs can write the region and does not leave a hole on write on a zoned device. The current code fails to re-dirty a node when the tree-log tree's depth is greater or equal to 2. That leads to a transaction abort with -EAGAIN. Fix the issue by properly re-dirtying a node on walking up the tree. Fixes: d3575156f662 ("btrfs: zoned: redirty released extent buffers") CC: stable@vger.kernel.org # 5.12+ Link: https://github.com/kdave/btrfs-progs/issues/415 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14btrfs: clear extent buffer uptodate when we fail to write itJosef Bacik1-0/+6
commit c2e39305299f0118298c2201f6d6cc7d3485f29e upstream. I got dmesg errors on generic/281 on our overnight fstests. Looking at the history this happens occasionally, with errors like this WARNING: CPU: 0 PID: 673217 at fs/btrfs/extent_io.c:6848 assert_eb_page_uptodate+0x3f/0x50 CPU: 0 PID: 673217 Comm: kworker/u4:13 Tainted: G W 5.16.0-rc2+ #469 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Workqueue: btrfs-cache btrfs_work_helper RIP: 0010:assert_eb_page_uptodate+0x3f/0x50 RSP: 0018:ffffae598230bc60 EFLAGS: 00010246 RAX: 0017ffffc0002112 RBX: ffffebaec4100900 RCX: 0000000000001000 RDX: ffffebaec45733c7 RSI: ffffebaec4100900 RDI: ffff9fd98919f340 RBP: 0000000000000d56 R08: ffff9fd98e300000 R09: 0000000000000000 R10: 0001207370a91c50 R11: 0000000000000000 R12: 00000000000007b0 R13: ffff9fd98919f340 R14: 0000000001500000 R15: 0000000001cb0000 FS: 0000000000000000(0000) GS:ffff9fd9fbc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f549fcf8940 CR3: 0000000114908004 CR4: 0000000000370ef0 Call Trace: extent_buffer_test_bit+0x3f/0x70 free_space_test_bit+0xa6/0xc0 load_free_space_tree+0x1d6/0x430 caching_thread+0x454/0x630 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? lock_release+0x1f0/0x2d0 btrfs_work_helper+0xf2/0x3e0 ? lock_release+0x1f0/0x2d0 ? finish_task_switch.isra.0+0xf9/0x3a0 process_one_work+0x270/0x5a0 worker_thread+0x55/0x3c0 ? process_one_work+0x5a0/0x5a0 kthread+0x174/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 This happens because we're trying to read from a extent buffer page that is !PageUptodate. This happens because we will clear the page uptodate when we have an IO error, but we don't clear the extent buffer uptodate. If we do a read later and find this extent buffer we'll think its valid and not return an error, and then trip over this warning. Fix this by also clearing uptodate on the extent buffer when this happens, so that we get an error when we do a btrfs_search_slot() and find this block later. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14scsi: qla2xxx: Format log strings only if neededRoman Bolshakov1-0/+3
commit 69002c8ce914ef0ae22a6ea14b43bb30b9a9a6a8 upstream. Commit 598a90f2002c ("scsi: qla2xxx: add ring buffer for tracing debug logs") introduced unconditional log string formatting to ql_dbg() even if ql_dbg_log event is disabled. It harms performance because some strings are formatted in fastpath and/or interrupt context. Link: https://lore.kernel.org/r/20211112145446.51210-1-r.bolshakov@yadro.com Fixes: 598a90f2002c ("scsi: qla2xxx: add ring buffer for tracing debug logs") Cc: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14cifs: Fix crash on unload of cifs_arc4.koVincent Whitchurch1-13/+0
commit 51a08bdeca27988a17c87b87d8e64ffecbd2a172 upstream. The exit function is wrongly placed in the __init section and this leads to a crash when the module is unloaded. Just remove both the init and exit functions since this module does not need them. Fixes: 71c02863246167b3d ("cifs: fork arc4 and create a separate module...") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org # 5.15 Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14ALSA: pcm: oss: Handle missing errors in snd_pcm_oss_change_params*()Takashi Iwai1-2/+9
commit 6665bb30a6b1a4a853d52557c05482ee50e71391 upstream. A couple of calls in snd_pcm_oss_change_params_locked() ignore the possible errors. Catch those errors and abort the operation for avoiding further problems. Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211201073606.11660-4-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14ALSA: pcm: oss: Limit the period size to 16MBTakashi Iwai1-1/+1
commit 8839c8c0f77ab8fc0463f4ab8b37fca3f70677c2 upstream. Set the practical limit to the period size (the fragment shift in OSS) instead of a full 31bit; a too large value could lead to the exhaust of memory as we allocate temporary buffers of the period size, too. As of this patch, we set to 16MB limit, which should cover all use cases. Reported-by: syzbot+bb348e9f9a954d42746f@syzkaller.appspotmail.com Reported-by: Bixuan Cui <cuibixuan@linux.alibaba.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1638270978-42412-1-git-send-email-cuibixuan@linux.alibaba.com Link: https://lore.kernel.org/r/20211201073606.11660-3-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14ALSA: pcm: oss: Fix negative period/buffer sizesTakashi Iwai1-9/+15
commit 9d2479c960875ca1239bcb899f386970c13d9cfe upstream. The period size calculation in OSS layer may receive a negative value as an error, but the code there assumes only the positive values and handle them with size_t. Due to that, a too big value may be passed to the lower layers. This patch changes the code to handle with ssize_t and adds the proper error checks appropriately. Reported-by: syzbot+bb348e9f9a954d42746f@syzkaller.appspotmail.com Reported-by: Bixuan Cui <cuibixuan@linux.alibaba.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/1638270978-42412-1-git-send-email-cuibixuan@linux.alibaba.com Link: https://lore.kernel.org/r/20211201073606.11660-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14ALSA: hda/realtek: Fix quirk for TongFang PHxTxX1Werner Sembach1-18/+22
commit 619764cc2ec9ce1283a8bbcd89a1376a7c68293b upstream. This fixes the SND_PCI_QUIRK(...) of the TongFang PHxTxX1 barebone. This fixes the issue of sound not working after s3 suspend. When waking up from s3 suspend the Coef 0x10 is set to 0x0220 instead of 0x0020. Setting the value manually makes the sound work again. This patch does this automatically. While being on it, I also fixed the comment formatting of the quirk and shortened variable and function names. Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Fixes: dd6dd6e3c791 ("ALSA: hda/realtek: Add quirk for TongFang PHxTxX1") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211202165010.876431-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14ALSA: hda/realtek - Add headset Mic support for Lenovo ALC897 platformKailang Yang1-0/+40
commit d7f32791a9fcf0dae8b073cdea9b79e29098c5f4 upstream. Lenovo ALC897 platform had headset Mic. This patch enable supported headset Mic. Signed-off-by: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/baab2c2536cb4cc18677a862c6f6d840@realtek.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14ALSA: ctl: Fix copy of updated id with element read/writeAlan Young1-0/+3
commit b6409dd6bdc03aa178bbff0d80db2a30d29b63ac upstream. When control_compat.c:copy_ctl_value_to_user() is used, by ctl_elem_read_user() & ctl_elem_write_user(), it must also copy back the snd_ctl_elem_id value that may have been updated (filled in) by the call to snd_ctl_elem_read/snd_ctl_elem_write(). This matches the functionality provided by snd_ctl_elem_read_user() and snd_ctl_elem_write_user(), via snd_ctl_build_ioff(). Without this, and without making additional calls to snd_ctl_info() which are unnecessary when using the non-compat calls, a userspace application will not know the numid value for the element and consequently will not be able to use the poll/read interface on the control file to determine which elements have updates. Signed-off-by: Alan Young <consult.awy@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211202150607.543389-1-consult.awy@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14mm: bdi: initialize bdi_min_ratio when bdi is unregisteredManjong Lee1-0/+7
commit 3c376dfafbf7a8ea0dea212d095ddd83e93280bb upstream. Initialize min_ratio if it is set during bdi unregistration. This can prevent problems that may occur a when bdi is removed without resetting min_ratio. For example. 1) insert external sdcard 2) set external sdcard's min_ratio 70 3) remove external sdcard without setting min_ratio 0 4) insert external sdcard 5) set external sdcard's min_ratio 70 << error occur(can't set) Because when an sdcard is removed, the present bdi_min_ratio value will remain. Currently, the only way to reset bdi_min_ratio is to reboot. [akpm@linux-foundation.org: tweak comment and coding style] Link: https://lkml.kernel.org/r/20211021161942.5983-1-mj0123.lee@samsung.com Signed-off-by: Manjong Lee <mj0123.lee@samsung.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Changheun Lee <nanich.lee@samsung.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: <seunghwan.hyun@samsung.com> Cc: <sookwan7.kim@samsung.com> Cc: <yt0928.kim@samsung.com> Cc: <junho89.kim@samsung.com> Cc: <jisoo2146.oh@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14mm/slub: fix endianness bug for alloc/free_traces attributesGerald Schaefer1-6/+9
commit 005a79e5c254c3f60ec269a459cc41b55028c798 upstream. On big-endian s390, the alloc/free_traces attributes produce endless output, because of always 0 idx in slab_debugfs_show(). idx is de-referenced from *v, which points to a loff_t value, with unsigned int idx = *(unsigned int *)v; This will only give the upper 32 bits on big-endian, which remain 0. Instead of only fixing this de-reference, during discussion it seemed more appropriate to change the seq_ops so that they use an explicit iterator in private loc_track struct. This patch adds idx to loc_track, which will also fix the endianness bug. Link: https://lore.kernel.org/r/20211117193932.4049412-1-gerald.schaefer@linux.ibm.com Link: https://lkml.kernel.org/r/20211126171848.17534-1-gerald.schaefer@linux.ibm.com Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs") Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reported-by: Steffen Maier <maier@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Faiyaz Mohammed <faiyazm@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14mm/damon/core: fix fake load reports due to uninterruptible sleepsSeongJae Park1-3/+11
commit 70e9274805fccfd175d0431a947bfd11ee7df40e upstream. Because DAMON sleeps in uninterruptible mode, /proc/loadavg reports fake load while DAMON is turned on, though it is doing nothing. This can confuse users[1]. To avoid the case, this commit makes DAMON sleeps in idle mode. [1] https://lore.kernel.org/all/11868371.O9o76ZdvQC@natalenko.name/ Link: https://lkml.kernel.org/r/20211126145015.15862-3-sj@kernel.org Fixes: 2224d8485492 ("mm: introduce Data Access MONitor (DAMON)") Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: SeongJae Park <sj@kernel.org> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14timers: implement usleep_idle_range()SeongJae Park2-8/+22
commit e4779015fd5d2fb8390c258268addff24d6077c7 upstream. Patch series "mm/damon: Fix fake /proc/loadavg reports", v3. This patchset fixes DAMON's fake load report issue. The first patch makes yet another variant of usleep_range() for this fix, and the second patch fixes the issue of DAMON by making it using the newly introduced function. This patch (of 2): Some kernel threads such as DAMON could need to repeatedly sleep in micro seconds level. Because usleep_range() sleeps in uninterruptible state, however, such threads would make /proc/loadavg reports fake load. To help such cases, this commit implements a variant of usleep_range() called usleep_idle_range(). It is same to usleep_range() but sets the state of the current task as TASK_IDLE while sleeping. Link: https://lkml.kernel.org/r/20211126145015.15862-1-sj@kernel.org Link: https://lkml.kernel.org/r/20211126145015.15862-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: John Stultz <john.stultz@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush ↵Vitaly Kuznetsov1-1/+1
hypercall commit 1ebfaa11ebb5b603a3c3f54b2e84fcf1030f5a14 upstream. Prior to commit 0baedd792713 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH | KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs and KVM_REQ_TLB_FLUSH is/was defined as: (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that "This call guarantees that by the time control returns back to the caller, the observable effects of all flushes on the specified virtual processors have occurred." and without KVM_REQUEST_WAIT there's a small chance that the vCPU making the TLB flush will resume running before all IPIs get delivered to other vCPUs and a stale mapping can get read there. Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST: kvm_hv_flush_tlb() is the sole caller which uses it for kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where KVM_REQUEST_WAIT makes a difference. Cc: stable@kernel.org Fixes: 0baedd792713 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211209102937.584397-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI reqSean Christopherson1-2/+5
commit 3244867af8c065e51969f1bffe732d3ebfd9a7d2 upstream. Do not bail early if there are no bits set in the sparse banks for a non-sparse, a.k.a. "all CPUs", IPI request. Per the Hyper-V spec, it is legal to have a variable length of '0', e.g. VP_SET's BankContents in this case, if the request can be serviced without the extra info. It is possible that for a given invocation of a hypercall that does accept variable sized input headers that all the header input fits entirely within the fixed size header. In such cases the variable sized input header is zero-sized and the corresponding bits in the hypercall input should be set to zero. Bailing early results in KVM failing to send IPIs to all CPUs as expected by the guest. Fixes: 214ff83d4473 ("KVM: x86: hyperv: implement PV IPI send hypercalls") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211207220926.718794-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14KVM: x86: Don't WARN if userspace mucks with RCX during string I/O exitSean Christopherson1-2/+7
commit d07898eaf39909806128caccb6ebd922ee3edd69 upstream. Replace a WARN with a comment to call out that userspace can modify RCX during an exit to userspace to handle string I/O. KVM doesn't actually support changing the rep count during an exit, i.e. the scenario can be ignored, but the WARN needs to go as it's trivial to trigger from userspace. Cc: stable@vger.kernel.org Fixes: 3b27de271839 ("KVM: x86: split the two parts of emulator_pio_in") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211025201311.1881846-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14net: mvpp2: fix XDP rx queues registeringLouis Amas1-2/+2
commit a50e659b2a1be14784e80f8492aab177e67c53a2 upstream. The registration of XDP queue information is incorrect because the RX queue id we use is invalid. When port->id == 0 it appears to works as expected yet it's no longer the case when port->id != 0. The problem arised while using a recent kernel version on the MACCHIATOBin. This board has several ports: * eth0 and eth1 are 10Gbps interfaces ; both ports has port->id == 0; * eth2 is a 1Gbps interface with port->id != 0. Code from xdp-tutorial (more specifically advanced03-AF_XDP) was used to test packet capture and injection on all these interfaces. The XDP kernel was simplified to: SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx) { int index = ctx->rx_queue_index; /* A set entry here means that the correspnding queue_id * has an active AF_XDP socket bound to it. */ if (bpf_map_lookup_elem(&xsks_map, &index)) return bpf_redirect_map(&xsks_map, index, 0); return XDP_PASS; } Starting the program using: ./af_xdp_user -d DEV Gives the following result: * eth0 : ok * eth1 : ok * eth2 : no capture, no injection Investigating the issue shows that XDP rx queues for eth2 are wrong: XDP expects their id to be in the range [0..3] but we found them to be in the range [32..35]. Trying to force rx queue ids using: ./af_xdp_user -d eth2 -Q 32 fails as expected (we shall not have more than 4 queues). When we register the XDP rx queue information (using xdp_rxq_info_reg() in function mvpp2_rxq_init()) we tell it to use rxq->id as the queue id. This value is computed as: rxq->id = port->id * max_rxq_count + queue_id where max_rxq_count depends on the device version. In the MACCHIATOBin case, this value is 32, meaning that rx queues on eth2 are numbered from 32 to 35 - there are four of them. Clearly, this is not the per-port queue id that XDP is expecting: it wants a value in the range [0..3]. It shall directly use queue_id which is stored in rxq->logic_rxq -- so let's use that value instead. rxq->id is left untouched ; its value is indeed valid but it should not be used in this context. This is consistent with the remaining part of the code in mvpp2_rxq_init(). With this change, packet capture is working as expected on all the MACCHIATOBin ports. Fixes: b27db2274ba8 ("mvpp2: use page_pool allocator") Signed-off-by: Louis Amas <louis.amas@eho.link> Signed-off-by: Emmanuel Deloget <emmanuel.deloget@eho.link> Reviewed-by: Marcin Wojtas <mw@semihalf.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20211207143423.916334-1-louis.amas@eho.link Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14net/sched: fq_pie: prevent dismantle issueEric Dumazet1-0/+1
commit 61c2402665f1e10c5742033fce18392e369931d7 upstream. For some reason, fq_pie_destroy() did not copy working code from pie_destroy() and other qdiscs, thus causing elusive bug. Before calling del_timer_sync(&q->adapt_timer), we need to ensure timer will not rearm itself. rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (4416 ticks this GP) idle=60d/1/0x4000000000000000 softirq=10433/10434 fqs=2579 (t=10501 jiffies g=13085 q=3989) NMI backtrace for cpu 0 CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline] check_cpu_stall kernel/rcu/tree_stall.h:711 [inline] rcu_pending kernel/rcu/tree.c:3878 [inline] rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597 update_process_times+0x16d/0x200 kernel/time/timer.c:1785 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:write_comp_data kernel/kcov.c:221 [inline] RIP: 0010:__sanitizer_cov_trace_const_cmp1+0x1d/0x80 kernel/kcov.c:273 Code: 54 c8 20 48 89 10 c3 66 0f 1f 44 00 00 53 41 89 fb 41 89 f1 bf 03 00 00 00 65 48 8b 0c 25 40 70 02 00 48 89 ce 4c 8b 54 24 08 <e8> 4e f7 ff ff 84 c0 74 51 48 8b 81 88 15 00 00 44 8b 81 84 15 00 RSP: 0018:ffffc90000d27b28 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff888064bf1bf0 RCX: ffff888011928000 RDX: ffff888011928000 RSI: ffff888011928000 RDI: 0000000000000003 RBP: ffff888064bf1c28 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff875d8295 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8880783dd300 R14: 0000000000000000 R15: 0000000000000000 pie_calculate_probability+0x405/0x7c0 net/sched/sch_pie.c:418 fq_pie_timer+0x170/0x2a0 net/sched/sch_fq_pie.c:383 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421 expire_timers kernel/time/timer.c:1466 [inline] __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734 __run_timers kernel/time/timer.c:1715 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 run_ksoftirqd kernel/softirq.c:921 [inline] run_ksoftirqd+0x2d/0x60 kernel/softirq.c:913 smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Cc: Sachin D. Patil <sdp.sachin@gmail.com> Cc: V. Saicharan <vsaicharan1998@gmail.com> Cc: Mohit Bhasi <mohitbhasi1998@gmail.com> Cc: Leslie Monis <lesliemonis@gmail.com> Cc: Gautam Ramakrishnan <gautamramk@gmail.com> Link: https://lore.kernel.org/r/20211209084937.3500020-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14net: dsa: felix: Fix memory leak in felix_setup_mmio_filteringJosé Expósito1-1/+4
commit e8b1d7698038e76363859fb47ae0a262080646f5 upstream. Avoid a memory leak if there is not a CPU port defined. Fixes: 8d5f7954b7c8 ("net: dsa: felix: break at first CPU port during init and teardown") Addresses-Coverity-ID: 1492897 ("Resource leak") Addresses-Coverity-ID: 1492899 ("Resource leak") Signed-off-by: José Expósito <jose.exposito89@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211209110538.11585-1-jose.exposito89@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14net: dsa: mv88e6xxx: error handling for serdes_power functionsAmeer Hamza1-1/+7
commit 0416e7af2369b0d12a28dea8d30b104df9a6953d upstream. Added default case to handle undefined cmode scenario in mv88e6393x_serdes_power() and mv88e6393x_serdes_power() methods. Addresses-Coverity: 1494644 ("Uninitialized scalar variable") Fixes: 21635d9203e1c (net: dsa: mv88e6xxx: Fix application of erratum 4.8 for 88E6393X) Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: Ameer Hamza <amhamza.mgc@gmail.com> Link: https://lore.kernel.org/r/20211209041552.9810-1-amhamza.mgc@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14net: bcm4908: Handle dma_set_coherent_mask error codesJiasheng Jiang1-1/+3
commit 128f6ec95a282b2d8bc1041e59bf65810703fa44 upstream. The return value of dma_set_coherent_mask() is not always 0. To catch the exception in case that dma is not support the mask. Fixes: 9d61d138ab30 ("net: broadcom: rename BCM4908 driver & update DT binding") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14devlink: fix netns refcount leak in devlink_nl_cmd_reload()Eric Dumazet1-8/+8
commit 4dbb0dad8e63fcd0b5a117c2861d2abe7ff5f186 upstream. While preparing my patch series adding netns refcount tracking, I spotted bugs in devlink_nl_cmd_reload() Some error paths forgot to release a refcount on a netns. To fix this, we can reduce the scope of get_net()/put_net() section around the call to devlink_reload(). Fixes: ccdf07219da6 ("devlink: Add reload action option to devlink reload command") Fixes: dc64cc7c6310 ("devlink: Add devlink reload limit option") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Moshe Shemesh <moshe@mellanox.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20211205192822.1741045-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-14IB/hfi1: Correct guard on eager buffer deallocationMike Marciniszyn1-1/+1
commit 9292f8f9a2ac42eb320bced7153aa2e63d8cc13a upstream. The code tests the dma address which legitimately can be 0. The code should test the kernel logical address to avoid leaking eager buffer allocations that happen to map to a dma address of 0. Fixes: 60368186fd85 ("IB/hfi1: Fix user-space buffers mapping with IOMMU enabled") Link: https://lore.kernel.org/r/20211129191952.101968.17137.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>