summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-12-08siphash: use _unaligned version by defaultArnd Bergmann2-16/+10
commit f7e5b9bfa6c8820407b64eabc1f29c9a87e8993d upstream. On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because the ordinary load/store instructions (ldr, ldrh, ldrb) can tolerate any misalignment of the memory address. However, load/store double and load/store multiple instructions (ldrd, ldm) may still only be used on memory addresses that are 32-bit aligned, and so we have to use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we may end up with a severe performance hit due to alignment traps that require fixups by the kernel. Testing shows that this currently happens with clang-13 but not gcc-11. In theory, any compiler version can produce this bug or other problems, as we are dealing with undefined behavior in C99 even on architectures that support this in hardware, see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363. Fortunately, the get_unaligned() accessors do the right thing: when building for ARMv6 or later, the compiler will emit unaligned accesses using the ordinary load/store instructions (but avoid the ones that require 32-bit alignment). When building for older ARM, those accessors will emit the appropriate sequence of ldrb/mov/orr instructions. And on architectures that can truly tolerate any kind of misalignment, the get_unaligned() accessors resolve to the leXX_to_cpup accessors that operate on aligned addresses. Since the compiler will in fact emit ldrd or ldm instructions when building this code for ARM v6 or later, the solution is to use the unaligned accessors unconditionally on architectures where this is known to be fast. The _aligned version of the hash function is however still needed to get the best performance on architectures that cannot do any unaligned access in hardware. This new version avoids the undefined behavior and should produce the fastest hash on all architectures we support. Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuvel@linaro.org/ Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/ Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Fixes: 2c956a60778c ("siphash: add cryptographically secure PRF") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08net: mpls: Fix notifications when deleting a deviceBenjamin Poirier1-16/+52
commit 7d4741eacdefa5f0475431645b56baf00784df1f upstream. There are various problems related to netlink notifications for mpls route changes in response to interfaces being deleted: * delete interface of only nexthop DELROUTE notification is missing RTA_OIF attribute * delete interface of non-last nexthop NEWROUTE notification is missing entirely * delete interface of last nexthop DELROUTE notification is missing nexthop All of these problems stem from the fact that existing routes are modified in-place before sending a notification. Restructure mpls_ifdown() to avoid changing the route in the DELROUTE cases and to create a copy in the NEWROUTE case. Fixes: f8efb73c97e2 ("mpls: multipath route support") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()Zhou Qingyang1-2/+8
commit e2dabc4f7e7b60299c20a36d6a7b24ed9bf8e572 upstream. In qlcnic_83xx_add_rings(), the indirect function of ahw->hw_ops->alloc_mbx_args will be called to allocate memory for cmd.req.arg, and there is a dereference of it in qlcnic_83xx_add_rings(), which could lead to a NULL pointer dereference on failure of the indirect function like qlcnic_83xx_alloc_mbx_args(). Fix this bug by adding a check of alloc_mbx_args(), this patch imitates the logic of mbx_cmd()'s failure handling. This bug was found by a static analyzer. The analysis employs differential checking to identify inconsistent security operations (e.g., checks or kfrees) between two code paths and confirms that the inconsistent operations are not recovered in the current function or the callers, so they constitute bugs. Note that, as a bug found by static analysis, it can be a false positive or hard to trigger. Multiple researchers have cross-reviewed the bug. Builds with CONFIG_QLCNIC=m show no new warnings, and our static analyzer no longer warns about this code. Fixes: 7f9664525f9c ("qlcnic: 83xx memory map and HW access routine") Signed-off-by: Zhou Qingyang <zhou1615@umn.edu> Link: https://lore.kernel.org/r/20211130110848.109026-1-zhou1615@umn.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08tcp: fix page frag corruption on page faultPaolo Abeni1-5/+8
commit dacb5d8875cc6cd3a553363b4d6f06760fcbe70c upstream. Steffen reported a TCP stream corruption for HTTP requests served by the apache web-server using a cifs mount-point and memory mapping the relevant file. The root cause is quite similar to the one addressed by commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from memory reclaim"). Here the nested access to the task page frag is caused by a page fault on the (mmapped) user-space memory buffer coming from the cifs file. The page fault handler performs an smb transaction on a different socket, inside the same process context. Since sk->sk_allaction for such socket does not prevent the usage for the task_frag, the nested allocation modify "under the hood" the page frag in use by the outer sendmsg call, corrupting the stream. The overall relevant stack trace looks like the following: httpd 78268 [001] 3461630.850950: probe:tcp_sendmsg_locked: ffffffff91461d91 tcp_sendmsg_locked+0x1 ffffffff91462b57 tcp_sendmsg+0x27 ffffffff9139814e sock_sendmsg+0x3e ffffffffc06dfe1d smb_send_kvec+0x28 [...] ffffffffc06cfaf8 cifs_readpages+0x213 ffffffff90e83c4b read_pages+0x6b ffffffff90e83f31 __do_page_cache_readahead+0x1c1 ffffffff90e79e98 filemap_fault+0x788 ffffffff90eb0458 __do_fault+0x38 ffffffff90eb5280 do_fault+0x1a0 ffffffff90eb7c84 __handle_mm_fault+0x4d4 ffffffff90eb8093 handle_mm_fault+0xc3 ffffffff90c74f6d __do_page_fault+0x1ed ffffffff90c75277 do_page_fault+0x37 ffffffff9160111e page_fault+0x1e ffffffff9109e7b5 copyin+0x25 ffffffff9109eb40 _copy_from_iter_full+0xe0 ffffffff91462370 tcp_sendmsg_locked+0x5e0 ffffffff91462370 tcp_sendmsg_locked+0x5e0 ffffffff91462b57 tcp_sendmsg+0x27 ffffffff9139815c sock_sendmsg+0x4c ffffffff913981f7 sock_write_iter+0x97 ffffffff90f2cc56 do_iter_readv_writev+0x156 ffffffff90f2dff0 do_iter_write+0x80 ffffffff90f2e1c3 vfs_writev+0xa3 ffffffff90f2e27c do_writev+0x5c ffffffff90c042bb do_syscall_64+0x5b ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65 The cifs filesystem rightfully sets sk_allocations to GFP_NOFS, we can avoid the nesting using the sk page frag for allocation lacking the __GFP_FS flag. Do not define an additional mm-helper for that, as this is strictly tied to the sk page frag usage. v1 -> v2: - use a stricted sk_page_frag() check instead of reordering the code (Eric) Reported-by: Steffen Froemer <sfroemer@redhat.com> Fixes: 5640f7685831 ("net: use a per task frag allocator") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08natsemi: xtensa: fix section mismatch warningsRandy Dunlap1-1/+1
commit b0f38e15979fa8851e88e8aa371367f264e7b6e9 upstream. Fix section mismatch warnings in xtsonic. The first one appears to be bogus and after fixing the second one, the first one is gone. WARNING: modpost: vmlinux.o(.text+0x529adc): Section mismatch in reference from the function sonic_get_stats() to the function .init.text:set_reset_devices() The function sonic_get_stats() references the function __init set_reset_devices(). This is often because sonic_get_stats lacks a __init annotation or the annotation of set_reset_devices is wrong. WARNING: modpost: vmlinux.o(.text+0x529b3b): Section mismatch in reference from the function xtsonic_probe() to the function .init.text:sonic_probe1() The function xtsonic_probe() references the function __init sonic_probe1(). This is often because xtsonic_probe lacks a __init annotation or the annotation of sonic_probe1 is wrong. Fixes: 74f2a5f0ef64 ("xtensa: Add support for the Sonic Ethernet device for the XT2000 board.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Chris Zankel <chris@zankel.net> Cc: linux-xtensa@linux-xtensa.org Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Link: https://lore.kernel.org/r/20211130063947.7529-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08i2c: cbus-gpio: set atomic transfer callbackAaro Koskinen1-2/+3
commit b12764695c3fcade145890b67f82f8b139174cc7 upstream. CBUS transfers have always been atomic, but after commit 63b96983a5dd ("i2c: core: introduce callbacks for atomic transfers") we started to see warnings during e.g. poweroff as the atomic callback is not explicitly set. Fix that. Fixes the following WARNING seen during Nokia N810 power down: [ 786.570617] reboot: Power down [ 786.573913] ------------[ cut here ]------------ [ 786.578826] WARNING: CPU: 0 PID: 672 at drivers/i2c/i2c-core.h:40 i2c_smbus_xfer+0x100/0x110 [ 786.587799] No atomic I2C transfer handler for 'i2c-2' Fixes: 63b96983a5dd ("i2c: core: introduce callbacks for atomic transfers") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08i2c: stm32f7: stop dma transfer in case of NACKAlain Volmat1-2/+7
commit 31b90a95ccbbb4b628578ac17e3b3cc8eeacfe31 upstream. In case of receiving a NACK, the dma transfer should be stopped to avoid feeding data into the FIFO. Also ensure to properly return the proper error code and avoid waiting for the end of the dma completion in case of error happening during the transmission. Fixes: 7ecc8cfde553 ("i2c: i2c-stm32f7: Add DMA support") Signed-off-by: Alain Volmat <alain.volmat@foss.st.com> Reviewed-by: Pierre-Yves MORDRET <pierre-yves.mordret@foss.st.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08i2c: stm32f7: recover the bus on access timeoutAlain Volmat1-0/+2
commit b933d1faf8fa30d16171bcff404e39c41b2a7c84 upstream. When getting an access timeout, ensure that the bus is in a proper state prior to returning the error. Fixes: aeb068c57214 ("i2c: i2c-stm32f7: add driver") Signed-off-by: Alain Volmat <alain.volmat@foss.st.com> Reviewed-by: Pierre-Yves MORDRET <pierre-yves.mordret@foss.st.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08i2c: stm32f7: flush TX FIFO upon transfer errorsAlain Volmat1-1/+19
commit 0c21d02ca469574d2082379db52d1a27b99eed0c upstream. While handling an error during transfer (ex: NACK), it could happen that the driver has already written data into TXDR before the transfer get stopped. This commit add TXDR Flush after end of transfer in case of error to avoid sending a wrong data on any other slave upon next transfer. Fixes: aeb068c57214 ("i2c: i2c-stm32f7: add driver") Signed-off-by: Alain Volmat <alain.volmat@foss.st.com> Reviewed-by: Pierre-Yves MORDRET <pierre-yves.mordret@foss.st.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08wireguard: ratelimiter: use kvcalloc() instead of kvzalloc()Gustavo A. R. Silva1-2/+2
commit 4e3fd721710553832460c179c2ee5ce67ef7f1e0 upstream. Use 2-factor argument form kvcalloc() instead of kvzalloc(). Link: https://github.com/KSPP/linux/issues/162 Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> [Jason: Gustavo's link above is for KSPP, but this isn't actually a security fix, as table_size is bounded to 8192 anyway, and gcc realizes this, so the codegen comes out to be about the same.] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08wireguard: receive: drop handshakes if queue lock is contendedJason A. Donenfeld1-3/+13
commit fb32f4f606c17b869805d7cede8b03d78339b50a upstream. If we're being delivered packets from multiple CPUs so quickly that the ring lock is contended for CPU tries, then it's safe to assume that the queue is near capacity anyway, so just drop the packet rather than spinning. This helps deal with multicore DoS that can interfere with data path performance. It _still_ does not completely fix the issue, but it again chips away at it. Reported-by: Streun Fabio <fstreun@student.ethz.ch> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08wireguard: receive: use ring buffer for incoming handshakesJason A. Donenfeld5-43/+37
commit 886fcee939adb5e2af92741b90643a59f2b54f97 upstream. Apparently the spinlock on incoming_handshake's skb_queue is highly contended, and a torrent of handshake or cookie packets can bring the data plane to its knees, simply by virtue of enqueueing the handshake packets to be processed asynchronously. So, we try switching this to a ring buffer to hopefully have less lock contention. This alleviates the problem somewhat, though it still isn't perfect, so future patches will have to improve this further. However, it at least doesn't completely diminish the data plane. Reported-by: Streun Fabio <fstreun@student.ethz.ch> Reported-by: Joel Wanner <joel.wanner@inf.ethz.ch> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08wireguard: device: reset peer src endpoint when netns exitsJason A. Donenfeld5-2/+57
commit 20ae1d6aa159eb91a9bf09ff92ccaa94dbea92c2 upstream. Each peer's endpoint contains a dst_cache entry that takes a reference to another netdev. When the containing namespace exits, we take down the socket and prevent future sockets from being created (by setting creating_net to NULL), which removes that potential reference on the netns. However, it doesn't release references to the netns that a netdev cached in dst_cache might be taking, so the netns still might fail to exit. Since the socket is gimped anyway, we can simply clear all the dst_caches (by way of clearing the endpoint src), which will release all references. However, the current dst_cache_reset function only releases those references lazily. But it turns out that all of our usages of wg_socket_clear_peer_endpoint_src are called from contexts that are not exactly high-speed or bottle-necked. For example, when there's connection difficulty, or when userspace is reconfiguring the interface. And in particular for this patch, when the netns is exiting. So for those cases, it makes more sense to call dst_release immediately. For that, we add a small helper function to dst_cache. This patch also adds a test to netns.sh from Hangbin Liu to ensure this doesn't regress. Tested-by: Hangbin Liu <liuhangbin@gmail.com> Reported-by: Xiumei Mu <xmu@redhat.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Paolo Abeni <pabeni@redhat.com> Fixes: 900575aa33a3 ("wireguard: device: avoid circular netns references") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08wireguard: selftests: rename DEBUG_PI_LIST to DEBUG_PLISTLi Zhijian1-1/+1
commit 7e938beb8321d34f040557b8915b228af125f73c upstream. DEBUG_PI_LIST was renamed to DEBUG_PLIST since 8e18faeac3 ("lib/plist: rename DEBUG_PI_LIST to DEBUG_PLIST"). Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Fixes: 8e18faeac3e4 ("lib/plist: rename DEBUG_PI_LIST to DEBUG_PLIST") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08wireguard: selftests: actually test for routing loopsJason A. Donenfeld1-1/+5
commit 782c72af567fc2ef09bd7615d0307f24de72c7e0 upstream. We previously removed the restriction on looping to self, and then added a test to make sure the kernel didn't blow up during a routing loop. The kernel didn't blow up, thankfully, but on certain architectures where skb fragmentation is easier, such as ppc64, the skbs weren't actually being discarded after a few rounds through. But the test wasn't catching this. So actually test explicitly for massive increases in tx to see if we have a routing loop. Note that the actual loop problem will need to be addressed in a different commit. Fixes: b673e24aad36 ("wireguard: socket: remove errant restriction on looping to self") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08wireguard: allowedips: add missing __rcu annotation to satisfy sparseJason A. Donenfeld1-1/+1
commit ae9287811ba75571cd69505d50ab0e612ace8572 upstream. A __rcu annotation got lost during refactoring, which caused sparse to become enraged. Fixes: bf7b042dc62a ("wireguard: allowedips: free empty intermediate nodes when removing single node") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08wireguard: selftests: increase default dmesg log sizeJason A. Donenfeld1-0/+1
commit 03ff1b1def73f817e196bf96ab36ac259490bd7c upstream. The selftests currently parse the kernel log at the end to track potential memory leaks. With these tests now reading off the end of the buffer, due to recent optimizations, some creation messages were lost, making the tests think that there was a free without an alloc. Fix this by increasing the kernel log size. Fixes: 24b70eeeb4f4 ("wireguard: use synchronize_net rather than synchronize_rcu") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassedMarek Behún1-6/+42
commit ede359d8843a2779d232ed30bc36089d4b5962e4 upstream. Function mv88e6xxx_serdes_pcs_get_state() currently does not report link up if AN is enabled, Link bit is set, but Speed and Duplex Resolved bit is not set, which testing shows is the case for when auto-negotiation was bypassed (we have AN enabled but link partner does not). An example of such link partner is Marvell 88X3310 PHY, when put into the mode where host interface changes between 10gbase-r, 5gbase-r, 2500base-x and sgmii according to copper speed. The 88X3310 does not enable AN in 2500base-x, and so SerDes on mv88e6xxx currently does not link with it. Fix this. Fixes: a5a6858b793f ("net: dsa: mv88e6xxx: extend phylink to Serdes PHYs") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08net: dsa: mv88e6xxx: Fix inband AN for 2500base-x on 88E6393X familyMarek Behún2-1/+61
commit 163000dbc772c1eae9bdfe7c8fe30155db1efd74 upstream. Inband AN is broken on Amethyst in 2500base-x mode when set by standard mechanism (via cmode). (There probably is some weird setting done by default in the switch for this mode that make it cycle in some state or something, because when the peer is the mvneta controller, it receives link change interrupts every ~0.3ms, but the link is always down.) Get around this by configuring the PCS mode to 1000base-x (where inband AN works), and then changing the SerDes frequency while SerDes transmitter and receiver are disabled, before enabling SerDes PHY. After disabling SerDes PHY, change the PCS mode back to 2500base-x, to avoid confusing the device (if we leave it at 1000base-x PCS mode but with different frequency, and then change cmode to sgmii, the device won't change the frequency because it thinks it already has the correct one). The register which changes the frequency is undocumented. I discovered it by going through all registers in the ranges 4.f000-4.f100 and 1e.8000-1e.8200 for all SerDes cmodes (sgmii, 1000base-x, 2500base-x, 5gbase-r, 10gbase-r, usxgmii) and filtering out registers that didn't make sense (the value was the same for modes which have different frequency). The result of this was: reg sgmii 1000base-x 2500base-x 5gbase-r 10gbase-r usxgmii 04.f002 005b 0058 0059 005c 005d 005f 04.f076 3000 0000 1000 4000 5000 7000 04.f07c 0950 0950 1850 0550 0150 0150 1e.8000 0059 0059 0058 0055 0051 0051 1e.8140 0e20 0e20 0e28 0e21 0e42 0e42 Register 04.f002 is the documented Port Operational Confiuration register, it's last 3 bits select PCS type, so changing this register also changes the frequency to the appropriate value. Registers 04.f076 and 04.f07c are not writable. Undocumented register 1e.8000 was the one: changing bits 3:0 from 9 to 8 changed SerDes frequency to 3.125 GHz, while leaving the value of PCS mode in register 04.f002.2:0 at 1000base-x. Inband autonegotiation started working correctly. (I didn't try anything with register 1e.8140 since 1e.8000 solved the problem.) Since I don't have documentation for this register 1e.8000.3:0, I am using the constants without names, but my hypothesis is that this register selects PHY frequency. If in the future I have access to an oscilloscope able to handle these frequencies, I will try to test this hypothesis. Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08net: dsa: mv88e6xxx: Add fix for erratum 5.2 of 88E6393X familyMarek Behún1-0/+48
commit 93fd8207bed80ce19aaf59932cbe1c03d418a37d upstream. Add fix for erratum 5.2 of the 88E6393X (Amethyst) family: for 10gbase-r mode, some undocumented registers need to be written some special values. Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08net: dsa: mv88e6xxx: Save power by disabling SerDes trasmitter and receiverMarek Behún2-4/+45
commit 7527d66260ac0c603c6baca5146748061fcddbd6 upstream. Save power on 88E6393X by disabling SerDes receiver and transmitter after SerDes is SerDes is disabled. Signed-off-by: Marek Behún <kabel@kernel.org> Cc: stable@vger.kernel.org # de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08net: dsa: mv88e6xxx: Drop unnecessary check in mv88e6393x_serdes_erratum_4_6()Marek Behún1-17/+11
commit 8c3318b4874e2dee867f5ae8f6d38f78e044bf71 upstream. The check for lane is unnecessary, since the function is called only with allowed lane argument. Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08net: dsa: mv88e6xxx: Fix application of erratum 4.8 for 88E6393XMarek Behún1-20/+33
commit 21635d9203e1cf2b73b67e9a86059a62f62a3563 upstream. According to SERDES scripts for 88E6393X, erratum 4.8 has to be applied every time before SerDes is powered on. Split the code for erratum 4.8 into separate function and call it in mv88e6393x_serdes_power(). Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08tracing/histograms: String compares should not care about signed valuesSteven Rostedt (VMware)1-1/+1
commit 450fec13d9170127678f991698ac1a5b05c02e2f upstream. When comparing two strings for the "onmatch" histogram trigger, fields that are strings use string comparisons, which do not care about being signed or not. Do not fail to match two string fields if one is unsigned char array and the other is a signed char array. Link: https://lore.kernel.org/all/20211129123043.5cfd687a@gandalf.local.home/ Cc: stable@vgerk.kernel.org Cc: Tom Zanussi <zanussi@kernel.org> Cc: Yafang Shao <laoar.shao@gmail.com> Fixes: b05e89ae7cf3b ("tracing: Accept different type for synthetic event fields") Reviewed-by: Masami Hiramatsu <mhiramatsu@kernel.org> Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: x86: check PIR even for vCPUs with disabled APICvPaolo Bonzini3-11/+10
commit 37c4dbf337c5c2cdb24365ffae6ed70ac1e74d7a upstream. The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even if APICv is disabled on the vCPU that receives it. In that case, the interrupt will just cause a vmexit and leave the ON bit set together with the PIR bit corresponding to the interrupt. Right now, the interrupt would not be delivered until APICv is re-enabled. However, fixing this is just a matter of always doing the PIR->IRR synchronization, even if the vCPU has temporarily disabled APICv. This is not a problem for performance, or if anything it is an improvement. First, in the common case where vcpu->arch.apicv_active is true, one fewer check has to be performed. Second, static_call_cond will elide the function call if APICv is not present or disabled. Finally, in the case for AMD hardware we can remove the sync_pir_to_irr callback: it is only needed for apic_has_interrupt_for_ppr, and that function already has a fallback for !APICv. Cc: stable@vger.kernel.org Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <20211123004311.2954158-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()Lai Jiangshan1-1/+1
commit 05b29633c7a956d5675f5fbba70db0d26aa5e73e upstream. INVLPG operates on guest virtual address, which are represented by vcpu->arch.walk_mmu. In nested virtualization scenarios, kvm_mmu_invlpg() was using the wrong MMU structure; if L2's invlpg were emulated by L0 (in practice, it hardly happen) when nested two-dimensional paging is enabled, the call to ->tlb_flush_gva() would be skipped and the hardware TLB entry would not be invalidated. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-5-jiangshanlai@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: arm64: Avoid setting the upper 32 bits of TCR_EL2 and CPTR_EL2 to 1Catalin Marinas1-2/+2
commit 1f80d15020d7f130194821feb1432b67648c632d upstream. Having a signed (1 << 31) constant for TCR_EL2_RES1 and CPTR_EL2_TCPAC causes the upper 32-bit to be set to 1 when assigning them to a 64-bit variable. Bit 32 in TCR_EL2 is no longer RES0 in ARMv8.7: with FEAT_LPA2 it changes the meaning of bits 49:48 and 9:8 in the stage 1 EL2 page table entries. As a result of the sign-extension, a non-VHE kernel can no longer boot on a model with ARMv8.7 enabled. CPTR_EL2 still has the top 32 bits RES0 but we should preempt any future problems Make these top bit constants unsigned as per commit df655b75c43f ("arm64: KVM: Avoid setting the upper 32 bits of VTCR_EL2 to 1"). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Chris January <Chris.January@arm.com> Cc: <stable@vger.kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211125152014.2806582-1-catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: MMU: shadow nested paging does not have PKUPaolo Bonzini1-2/+2
commit 28f091bc2f8c23b7eac2402956b692621be7f9f4 upstream. Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding incorrect interpretations of the nested hypervisor's page tables. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: x86: Use a stable condition around all VT-d PI pathsPaolo Bonzini1-9/+11
commit 53b7ca1a359389276c76fbc9e1009d8626a17e40 upstream. Currently, checks for whether VT-d PI can be used refer to the current status of the feature in the current vCPU; or they more or less pick vCPU 0 in case a specific vCPU is not available. However, these checks do not attempt to synchronize with changes to the IRTE. In particular, there is no path that updates the IRTE when APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU that has APICv disabled, if the wakeup occurs because of an IRTE that points to a posted interrupt. To fix this, always go through the VT-d PI path as long as there are assigned devices and APICv is available on both the host and the VM side. Since the relevant condition was copied over three times, take the hint and factor it into a separate function. Suggested-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <20211123004311.2954158-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: VMX: prepare sync_pir_to_irr for running with APICv disabledPaolo Bonzini1-14/+25
commit 7e1901f6c86c896acff6609e0176f93f756d8b2a upstream. If APICv is disabled for this vCPU, assigned devices may still attempt to post interrupts. In that case, we need to cancel the vmentry and deliver the interrupt with KVM_REQ_EVENT. Extend the existing code that handles injection of L1 interrupts into L2 to cover this case as well. vmx_hwapic_irr_update is only called when APICv is active so it would be confusing to add a check for vcpu->arch.apicv_active in there. Instead, just use vmx_set_rvi directly in vmx_sync_pir_to_irr. Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211123004311.2954158-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: nVMX: Abide to KVM_REQ_TLB_FLUSH_GUEST request on nested vmentry/vmexitSean Christopherson3-15/+28
commit 40e5f9080472b614eeedcc5ba678289cd98d70df upstream. Like KVM_REQ_TLB_FLUSH_CURRENT, the GUEST variant needs to be serviced at nested transitions, as KVM doesn't track requests for L1 vs L2. E.g. if there's a pending flush when a nested VM-Exit occurs, then the flush was requested in the context of L2 and needs to be handled before switching to L1, otherwise the flush for L2 would effectiely be lost. Opportunistically add a helper to handle CURRENT and GUEST as a pair, the logic for when they need to be serviced is identical as both requests are tied to L1 vs. L2, the only difference is the scope of the flush. Reported-by: Lai Jiangshan <jiangshanlai+lkml@gmail.com> Fixes: 07ffaf343e34 ("KVM: nVMX: Sync all PGDs on nested transition with shadow paging") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211125014944.536398-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUESTSean Christopherson1-9/+14
commit 2b4a5a5d56881ece3c66b9a9a8943a6f41bd7349 upstream. Flush the current VPID when handling KVM_REQ_TLB_FLUSH_GUEST instead of always flushing vpid01. Any TLB flush that is triggered when L2 is active is scoped to L2's VPID (if it has one), e.g. if L2 toggles CR4.PGE and L1 doesn't intercept PGE writes, then KVM's emulation of the TLB flush needs to be applied to L2's VPID. Reported-by: Lai Jiangshan <jiangshanlai+lkml@gmail.com> Fixes: 07ffaf343e34 ("KVM: nVMX: Sync all PGDs on nested transition with shadow paging") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211125014944.536398-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: nVMX: Emulate guest TLB flush on nested VM-Enter with new vpid12Sean Christopherson1-20/+17
commit 712494de96f35f3e146b36b752c2afe0fdc0f0cc upstream. Fully emulate a guest TLB flush on nested VM-Enter which changes vpid12, i.e. L2's VPID, instead of simply doing INVVPID to flush real hardware's TLB entries for vpid02. From L1's perspective, changing L2's VPID is effectively a TLB flush unless "hardware" has previously cached entries for the new vpid12. Because KVM tracks only a single vpid12, KVM doesn't know if the new vpid12 has been used in the past and so must treat it as a brand new, never been used VPID, i.e. must assume that the new vpid12 represents a TLB flush from L1's perspective. For example, if L1 and L2 share a CR3, the first VM-Enter to L2 (with a VPID) is effectively a TLB flush as hardware/KVM has never seen vpid12 and thus can't have cached entries in the TLB for vpid12. Reported-by: Lai Jiangshan <jiangshanlai+lkml@gmail.com> Fixes: 5c614b3583e7 ("KVM: nVMX: nested VPID emulation") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211125014944.536398-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: x86: ignore APICv if LAPIC is not enabledPaolo Bonzini1-1/+1
commit 78311a514099932cd8434d5d2194aa94e56ab67c upstream. Synchronize the two calls to kvm_x86_sync_pir_to_irr. The one in the reenter-guest fast path invoked the callback unconditionally even if LAPIC is present but disabled. In this case, there are no interrupts to deliver, and therefore posted interrupts can be ignored. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: Ensure local memslot copies operate on up-to-date arch-specific dataSean Christopherson1-16/+31
commit bda44d844758c70c8dc1478e6fc9c25efa90c5a7 upstream. When modifying memslots, snapshot the "old" memslot and copy it to the "new" memslot's arch data after (re)acquiring slots_arch_lock. x86 can change a memslot's arch data while memslot updates are in-progress so long as it holds slots_arch_lock, thus snapshotting a memslot without holding the lock can result in the consumption of stale data. Fixes: b10a038e84d1 ("KVM: mmu: Add slots_arch_lock for memslot arch fields") Cc: stable@vger.kernel.org Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211104002531.1176691-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: x86/mmu: Fix TLB flush range when handling disconnected ptBen Gardon1-6/+4
commit 574c3c55e969096cea770eda3375ff35ccf91702 upstream. When recursively clearing out disconnected pts, the range based TLB flush in handle_removed_tdp_mmu_page uses the wrong starting GFN, resulting in the flush mostly missing the affected range. Fix this by using base_gfn for the flush. In response to feedback from David Matlack on the RFC version of this patch, also move a few definitions into the for loop in the function to prevent unintended references to them in the future. Fixes: a066e61f13cf ("KVM: x86/mmu: Factor out handling of removed page tables") CC: stable@vger.kernel.org Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20211115211704.2621644-1-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: Disallow user memslot with size that exceeds "unsigned long"Sean Christopherson1-1/+2
commit 6b285a5587506bae084cf9a3ed5aa491d623b91b upstream. Reject userspace memslots whose size exceeds the storage capacity of an "unsigned long". KVM's uAPI takes the size as u64 to support large slots on 64-bit hosts, but does not account for the size being truncated on 32-bit hosts in various flows. The access_ok() check on the userspace virtual address in particular casts the size to "unsigned long" and will check the wrong number of bytes. KVM doesn't actually support slots whose size doesn't fit in an "unsigned long", e.g. KVM's internal kvm_memory_slot.npages is an "unsigned long", not a "u64", and misc arch specific code follows that behavior. Fixes: fa3d315a4ce2 ("KVM: Validate userspace_addr of memslot when registered") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <20211104002531.1176691-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08KVM: fix avic_set_running for preemptable kernelsPaolo Bonzini1-7/+9
commit 7cfc5c653b07782e7059527df8dc1e3143a7591e upstream. avic_set_running() passes the current CPU to avic_vcpu_load(), albeit via vcpu->cpu rather than smp_processor_id(). If the thread is migrated while avic_set_running runs, the call to avic_vcpu_load() can use a stale value for the processor id. Avoid this by blocking preemption over the entire execution of avic_set_running(). Reported-by: Sean Christopherson <seanjc@google.com> Fixes: 8221c1370056 ("svm: Manage vcpu load/unload when enable AVIC") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08drm/i915/dp: Perform 30ms delay after source OUI writeLyude Paul4-0/+21
commit a44f42ba7f1ad7d3c17bc7d91013fe814a53c5dc upstream. While working on supporting the Intel HDR backlight interface, I noticed that there's a couple of laptops that will very rarely manage to boot up without detecting Intel HDR backlight support - even though it's supported on the system. One example of such a laptop is the Lenovo P17 1st generation. Following some investigation Ville Syrjälä did through the docs they have available to them, they discovered that there's actually supposed to be a 30ms wait after writing the source OUI before we begin setting up the rest of the backlight interface. This seems to be correct, as adding this 30ms delay seems to have completely fixed the probing issues I was previously seeing. So - let's start performing a 30ms wait after writing the OUI, which we do in a manner similar to how we keep track of PPS delays (e.g. record the timestamp of the OUI write, and then wait for however many ms are left since that timestamp right before we interact with the backlight) in order to avoid waiting any longer then we need to. As well, this also avoids us performing this delay on systems where we don't end up using the HDR backlight interface. V3: * Move last_oui_write into intel_dp V2: * Move panel delays into intel_pps Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Fixes: 4a8d79901d5b ("drm/i915/dp: Enable Intel's HDR backlight interface (only SDR for now)") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.12+ Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211130212912.212044-1-lyude@redhat.com (cherry picked from commit c7c90b0b8418a97d3aa8b39aae1992908948efad) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08drm/amd/display: Allow DSC on supported MST branch devicesNicholas Kazlauskas1-4/+16
commit 94ebc035456a4ccacfbbef60c444079a256623ad upstream. [Why] When trying to lightup two 4k60 non-DSC displays behind a branch device that supports DSC we can't lightup both at once due to bandwidth limitations - each requires 48 VCPI slots but we only have 63. [How] The workaround already exists in the code but is guarded by a CONFIG that cannot be set by the user and shouldn't need to be. Check for specific branch device IDs to device whether to enable the workaround for multiple display scenarios. Reviewed-by: Hersen Wu <hersenxs.wu@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08ipv6: fix memory leak in fib6_rule_suppressmsizanoen14-4/+7
commit cdef485217d30382f3bf6448c54b4401648fe3f1 upstream. The kernel leaks memory when a `fib` rule is present in IPv6 nftables firewall rules and a suppress_prefix rule is present in the IPv6 routing rules (used by certain tools such as wg-quick). In such scenarios, every incoming packet will leak an allocation in `ip6_dst_cache` slab cache. After some hours of `bpftrace`-ing and source code reading, I tracked down the issue to ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule"). The problem with that change is that the generic `args->flags` always have `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not decreasing the refcount when needed. How to reproduce: - Add the following nftables rule to a prerouting chain: meta nfproto ipv6 fib saddr . mark . iif oif missing drop This can be done with: sudo nft create table inet test sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }' sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop - Run: sudo ip -6 rule add table main suppress_prefixlength 0 - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase with every incoming ipv6 packet. This patch exposes the protocol-specific flags to the protocol specific `suppress` function, and check the protocol-specific `flags` argument for RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when decreasing the refcount, like this. [1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71 [2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99 Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105 Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08scsi: ufs: ufs-pci: Add support for Intel ADLAdrian Hunter1-0/+18
commit 7dc9fb47bc9a95f1cc6c5655341860c5e50f91d4 upstream. Add PCI ID and callbacks to support Intel Alder Lake. Link: https://lore.kernel.org/r/20211124204218.1784559-1-adrian.hunter@intel.com Cc: stable@vger.kernel.org # v5.15+ Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08scsi: lpfc: Fix non-recovery of remote ports following an unsolicited LOGOJames Smart1-7/+2
commit 0956ba63bd94355bf38cd40f7eb9104577739ab8 upstream. A commit introduced formal regstration of all Fabric nodes to the SCSI transport as well as REG/UNREG RPI mailbox requests. The commit introduced the NLP_RELEASE_RPI flag for rports set in the lpfc_cmpl_els_logo_acc() routine to help clean up the RPIs. This new code caused the driver to release the RPI value used for the remote port and marked the RPI invalid. When the driver later attempted to re-login, it would use the invalid RPI and the adapter rejected the PLOGI request. As no login occurred, the devloss timer on the rport expired and connectivity was lost. This patch corrects the code by removing the snippet that requests the rpi to be unregistered. This change only occurs on a node that is already marked to be rediscovered. This puts the code back to its original behavior, preserving the already-assigned rpi value (registered or not) which can be used on the re-login attempts. Link: https://lore.kernel.org/r/20211123165646.62740-1-jsmart2021@gmail.com Fixes: fe83e3b9b422 ("scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller") Cc: <stable@vger.kernel.org> # v5.14+ Co-developed-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08sata_fsl: fix warning in remove_proc_entry when rmmod sata_fslBaokun Li1-5/+3
commit 6f48394cf1f3e8486591ad98c11cdadb8f1ef2ad upstream. Trying to remove the fsl-sata module in the PPC64 GNU/Linux leads to the following warning: ------------[ cut here ]------------ remove_proc_entry: removing non-empty directory 'irq/69', leaking at least 'fsl-sata[ff0221000.sata]' WARNING: CPU: 3 PID: 1048 at fs/proc/generic.c:722 .remove_proc_entry+0x20c/0x220 IRQMASK: 0 NIP [c00000000033826c] .remove_proc_entry+0x20c/0x220 LR [c000000000338268] .remove_proc_entry+0x208/0x220 Call Trace: .remove_proc_entry+0x208/0x220 (unreliable) .unregister_irq_proc+0x104/0x140 .free_desc+0x44/0xb0 .irq_free_descs+0x9c/0xf0 .irq_dispose_mapping+0x64/0xa0 .sata_fsl_remove+0x58/0xa0 [sata_fsl] .platform_drv_remove+0x40/0x90 .device_release_driver_internal+0x160/0x2c0 .driver_detach+0x64/0xd0 .bus_remove_driver+0x70/0xf0 .driver_unregister+0x38/0x80 .platform_driver_unregister+0x14/0x30 .fsl_sata_driver_exit+0x18/0xa20 [sata_fsl] ---[ end trace 0ea876d4076908f5 ]--- The driver creates the mapping by calling irq_of_parse_and_map(), so it also has to dispose the mapping. But the easy way out is to simply use platform_get_irq() instead of irq_of_parse_map(). Also we should adapt return value checking and propagate error values. In this case the mapping is not managed by the device but by the of core, so the device has not to dispose the mapping. Fixes: faf0b2e5afe7 ("drivers/ata: add support to Freescale 3.0Gbps SATA Controller") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08sata_fsl: fix UAF in sata_fsl_port_stop when rmmod sata_fslBaokun Li1-2/+10
commit 6c8ad7e8cf29eb55836e7a0215f967746ab2b504 upstream. When the `rmmod sata_fsl.ko` command is executed in the PPC64 GNU/Linux, a bug is reported: ================================================================== BUG: Unable to handle kernel data access on read at 0x80000800805b502c Oops: Kernel access of bad area, sig: 11 [#1] NIP [c0000000000388a4] .ioread32+0x4/0x20 LR [80000000000c6034] .sata_fsl_port_stop+0x44/0xe0 [sata_fsl] Call Trace: .free_irq+0x1c/0x4e0 (unreliable) .ata_host_stop+0x74/0xd0 [libata] .release_nodes+0x330/0x3f0 .device_release_driver_internal+0x178/0x2c0 .driver_detach+0x64/0xd0 .bus_remove_driver+0x70/0xf0 .driver_unregister+0x38/0x80 .platform_driver_unregister+0x14/0x30 .fsl_sata_driver_exit+0x18/0xa20 [sata_fsl] .__se_sys_delete_module+0x1ec/0x2d0 .system_call_exception+0xfc/0x1f0 system_call_common+0xf8/0x200 ================================================================== The triggering of the BUG is shown in the following stack: driver_detach device_release_driver_internal __device_release_driver drv->remove(dev) --> platform_drv_remove/platform_remove drv->remove(dev) --> sata_fsl_remove iounmap(host_priv->hcr_base); <---- unmap kfree(host_priv); <---- free devres_release_all release_nodes dr->node.release(dev, dr->data) --> ata_host_stop ap->ops->port_stop(ap) --> sata_fsl_port_stop ioread32(hcr_base + HCONTROL) <---- UAF host->ops->host_stop(host) The iounmap(host_priv->hcr_base) and kfree(host_priv) functions should not be executed in drv->remove. These functions should be executed in host_stop after port_stop. Therefore, we move these functions to the new function sata_fsl_host_stop and bind the new function to host_stop. Fixes: faf0b2e5afe7 ("drivers/ata: add support to Freescale 3.0Gbps SATA Controller") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08fget: check that the fd still exists after getting a ref to itLinus Torvalds1-0/+4
commit 054aa8d439b9185d4f5eb9a90282d1ce74772969 upstream. Jann Horn points out that there is another possible race wrt Unix domain socket garbage collection, somewhat reminiscent of the one fixed in commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK"). See the extended comment about the garbage collection requirements added to unix_peek_fds() by that commit for details. The race comes from how we can locklessly look up a file descriptor just as it is in the process of being closed, and with the right artificial timing (Jann added a few strategic 'mdelay(500)' calls to do that), the Unix domain socket garbage collector could see the reference count decrement of the close() happen before fget() took its reference to the file and the file was attached onto a new file descriptor. This is all (intentionally) correct on the 'struct file *' side, with RCU lookups and lockless reference counting very much part of the design. Getting that reference count out of order isn't a problem per se. But the garbage collector can get confused by seeing this situation of having seen a file not having any remaining external references and then seeing it being attached to an fd. In commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") the fix was to serialize the file descriptor install with the garbage collector by taking and releasing the unix_gc_lock. That's not really an option here, but since this all happens when we are in the process of looking up a file descriptor, we can instead simply just re-check that the file hasn't been closed in the meantime, and just re-do the lookup if we raced with a concurrent close() of the same file descriptor. Reported-and-tested-by: Jann Horn <jannh@google.com> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08s390/pci: move pseudo-MMIO to prevent MIO overlapNiklas Schnelle1-3/+4
commit 52d04d408185b7aa47628d2339c28ec70074e0ae upstream. When running without MIO support, with pci=nomio or for devices which are not MIO-capable the zPCI subsystem generates pseudo-MMIO addresses to allow access to PCI BARs via MMIO based Linux APIs even though the platform uses function handles and BAR numbers. This is done by stashing an index into our global IOMAP array which contains the function handle in the 16 most significant bits of the addresses returned by ioremap() always setting the most significant bit. On the other hand the MIO addresses assigned by the platform for use, while requiring special instructions, allow PCI access with virtually mapped physical addresses. Now the problem is that these MIO addresses and our own pseudo-MMIO addresses may overlap, while functionally this would not be a problem by itself this overlap is detected by common code as both address types are added as resources in the iomem_resource tree. This leads to the overlapping resource claim of either the MIO capable or non-MIO capable devices with being rejected. Since PCI is tightly coupled to the use of the iomem_resource tree, see for example the code for request_mem_region(), we can't reasonably get rid of the overlap being detected by keeping our pseudo-MMIO addresses out of the iomem_resource tree. Instead let's move the range used by our own pseudo-MMIO addresses by starting at (1UL << 62) and only using addresses below (1UL << 63) thus avoiding the range currently used for MIO addresses. Fixes: c7ff0e918a7c ("s390/pci: deal with devices that have no support for MIO instructions") Cc: stable@vger.kernel.org # 5.3+ Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08dma-buf: system_heap: Use 'for_each_sgtable_sg' in pages free flowGuangming1-1/+1
commit 679d94cd7d900871e5bc9cf780bd5b73af35ab42 upstream. For previous version, it uses 'sg_table.nent's to traverse sg_table in pages free flow. However, 'sg_table.nents' is reassigned in 'dma_map_sg', it means the number of created entries in the DMA adderess space. So, use 'sg_table.nents' in pages free flow will case some pages can't be freed. Here we should use sg_table.orig_nents to free pages memory, but use the sgtable helper 'for each_sgtable_sg'(, instead of the previous rather common helper 'for_each_sg' which maybe cause memory leak) is much better. Fixes: d963ab0f15fb0 ("dma-buf: system_heap: Allocate higher order pages if available") Signed-off-by: Guangming <Guangming.Cao@mediatek.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: <stable@vger.kernel.org> # 5.11.* Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211126074904.88388-1-guangming.cao@mediatek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08iwlwifi: mvm: retry init flow if failedMordechay Goodstein5-8/+47
commit 5283dd677e52af9db6fe6ad11b2f12220d519d0c upstream. In some very rare cases the init flow may fail. In many cases, this is recoverable, so we can retry. Implement a loop to retry two more times after the first attempt failed. This can happen in two different situations, namely during probe and during mac80211 start. For the first case, a simple loop is enough. For the second case, we need to add a flag to prevent mac80211 from trying to restart it as well, leaving full control with the driver. Cc: <stable@vger.kernel.org> Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/iwlwifi.20211110150132.57514296ecab.I52a0411774b700bdc7dedb124d8b59bf99456eb2@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08cpufreq: Fix get_cpu_device() failure in add_cpu_dev_symlink()Xiongfeng Wang1-5/+4
commit 2c1b5a84669d2477d8fffe9136e86a2cff591729 upstream. When I hot added a CPU, I found 'cpufreq' directory was not created below /sys/devices/system/cpu/cpuX/. It is because get_cpu_device() failed in add_cpu_dev_symlink(). cpufreq_add_dev() is the .add_dev callback of a CPU subsys interface. It will be called when the CPU device registered into the system. The call chain is as follows: register_cpu() ->device_register() ->device_add() ->bus_probe_device() ->cpufreq_add_dev() But only after the CPU device has been registered, we can get the CPU device by get_cpu_device(), otherwise it will return NULL. Since we already have the CPU device in cpufreq_add_dev(), pass it to add_cpu_dev_symlink(). I noticed that the 'kobj' of the CPU device has been added into the system before cpufreq_add_dev(). Fixes: 2f0ba790df51 ("cpufreq: Fix creation of symbolic links to policy directories") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>