kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2024-10-10	net: liquidio: Remove unused cn23xx_dump_pf_initialized_regs	Dr. David Alan Gilbert	2	-171/+0
	cn23xx_dump_pf_initialized_regs() was added in 2016's commit 72c0091293c0 ("liquidio: CN23XX device init and sriov config") but hasn't been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241009003841.254853-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	Merge branch 'qca_spi-improvements-to-qca7000-sync'	Jakub Kicinski	3	-15/+21
	Stefan Wahren says: ==================== qca_spi: Improvements to QCA7000 sync This series contains patches which improve the QCA7000 sync behavior. ==================== Link: https://patch.msgid.link/20241007113312.38728-1-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	qca_spi: Improve reset mechanism	Stefan Wahren	3	-15/+20
	The commit 92717c2356cb ("net: qca_spi: Avoid high load if QCA7000 is not available") fixed the high load in case the QCA7000 is not available but introduced sync delays for some corner cases like buffer errors. So add the reset requests to the atomics flags, which are polled by the SPI thread. As a result reset requests and sync state are now separated. This has the nice benefit to make the code easier to understand. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://patch.msgid.link/20241007113312.38728-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	qca_spi: Count unexpected WRBUF_SPC_AVA after reset	Stefan Wahren	1	-0/+1
	After a reset of the QCA7000, the amount of available write buffer space should match QCASPI_HW_BUF_LEN. If this is not the case this error should be counted as such. Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://patch.msgid.link/20241007113312.38728-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	tcp: remove unnecessary update for tp->write_seq in tcp_connect()	xin.guo	1	-1/+4
	Commit 783237e8daf13 ("net-tcp: Fast Open client - sending SYN-data") introduces tcp_connect_queue_skb() and it would overwrite tcp->write_seq, so it is no need to update tp->write_seq before invoking tcp_connect_queue_skb(). Signed-off-by: xin.guo <guoxin0309@gmail.com> Link: https://patch.msgid.link/1728289544-4611-1-git-send-email-guoxin0309@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	net: ftgmac100: fixed not check status from fixed phy	Jacky Chou	1	-1/+6
	Add error handling from calling fixed_phy_register. It may return some error, therefore, need to check the status. And fixed_phy_register needs to bind a device node for mdio. Add the mac device node for fixed_phy_register function. This is a reference to this function, of_phy_register_fixed_link(). Fixes: e24a6c874601 ("net: ftgmac100: Get link speed and duplex for NC-SI") Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20241007032435.787892-1-jacky_chou@aspeedtech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	doc: net: Fix .rst rendering of net_cachelines pages	Donald Hunter	6	-654/+666
	The doc pages under /networking/net_cachelines are unreadable because they lack .rst formatting for the tabular text. Add simple table markup and tidy up the table contents: - remove dashes that represent empty cells because they render as bullets and are not needed - replace 'struct_' with 'struct ' in the first column so that sphinx can render links for any structs that appear in the docs Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241008165329.45647-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	Merge branch 'ipv4-convert-__fib_validate_source-and-its-callers-to-dscp_t'	Jakub Kicinski	6	-41/+38
	Guillaume Nault says: ==================== ipv4: Convert __fib_validate_source() and its callers to dscp_t. This patch series continues to prepare users of ->flowi4_tos to a future conversion of this field (__u8 to dscp_t). This time, we convert __fib_validate_source() and its call chain. The objective is to eventually make all users of ->flowi4_tos use a dscp_t value. Making ->flowi4_tos a dscp_t field will help avoiding regressions where ECN bits are erroneously interpreted as DSCP bits. ==================== Link: https://patch.msgid.link/cover.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	ipv4: Convert __fib_validate_source() to dscp_t.	Guillaume Nault	1	-4/+4
	Pass a dscp_t variable to __fib_validate_source(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos. Only fib_validate_source() actually calls __fib_validate_source(). Since it already has a dscp_t variable to pass as parameter, we only need to remove the inet_dscp_to_dsfield() conversion. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/8206b0a64a21a208ed94774e261a251c8d7bc251.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	ipv4: Convert fib_validate_source() to dscp_t.	Guillaume Nault	3	-15/+14
	Pass a dscp_t variable to fib_validate_source(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos. All callers of fib_validate_source() already have a dscp_t variable to pass as parameter. We just need to remove the inet_dscp_to_dsfield() conversions. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/08612a4519bc5a3578bb493fbaad82437ebb73dc.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	ipv4: Convert ip_mc_validate_source() to dscp_t.	Guillaume Nault	3	-7/+8
	Pass a dscp_t variable to ip_mc_validate_source(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos. Callers of ip_mc_validate_source() to consider are: * ip_route_input_mc() which already has a dscp_t variable to pass as parameter. We just need to remove the inet_dscp_to_dsfield() conversion. * udp_v4_early_demux() which gets the DSCP directly from the IPv4 header and can simply use the ip4h_dscp() helper. Also, stop including net/inet_dscp.h in udp.c as we don't use any of its declarations anymore. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/c91b2cca04718b7ee6cf5b9c1d5b40507d65a8d4.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	ipv4: Convert ip_route_input_mc() to dscp_t.	Guillaume Nault	1	-5/+6
	Pass a dscp_t variable to ip_route_input_mc(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos. Only ip_route_input_rcu() actually calls ip_route_input_mc(). Since it already has a dscp_t variable to pass as parameter, we only need to remove the inet_dscp_to_dsfield() conversion. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/0cc653ef59bbc0a28881f706d34896c61eba9e01.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	ipv4: Convert __mkroute_input() to dscp_t.	Guillaume Nault	1	-8/+6
	Pass a dscp_t variable to __mkroute_input(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos. Only ip_mkroute_input() actually calls __mkroute_input(). Since it already has a dscp_t variable to pass as parameter, we only need to remove the inet_dscp_to_dsfield() conversion. While there, reorganise the function parameters to fill up horizontal space. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/40853c720aee4d608e6b1b204982164c3b76697d.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	ipv4: Convert ip_mkroute_input() to dscp_t.	Guillaume Nault	1	-8/+6
	Pass a dscp_t variable to ip_mkroute_input(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos. Only ip_route_input_slow() actually calls ip_mkroute_input(). Since it already has a dscp_t variable to pass as parameter, we only need to remove the inet_dscp_to_dsfield() conversion. While there, reorganise the function parameters to fill up horizontal space. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/6aa71e28f9ff681cbd70847080e1ab6b526f94f1.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	ipv4: Convert ip_route_use_hint() to dscp_t.	Guillaume Nault	3	-7/+7
	Pass a dscp_t variable to ip_route_use_hint(), instead of a plain u8, to prevent accidental setting of ECN bits in ->flowi4_tos. Only ip_rcv_finish_core() actually calls ip_route_use_hint(). Use the ip4h_dscp() helper to get the DSCP from the IPv4 header. While there, modify the declaration of ip_route_use_hint() in include/net/route.h so that it matches the prototype of its implementation in net/ipv4/route.c. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/c40994fdf804db7a363d04fdee01bf48dddda676.1728302212.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10	Merge tag 'mm-hotfixes-stable-2024-10-09-15-46' of ↵	Linus Torvalds	18	-73/+117
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "12 hotfixes, 5 of which are c:stable. All singletons, about half of which are MM" * tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: zswap: delete comments for "value" member of 'struct zswap_entry'. CREDITS: sort alphabetically by name secretmem: disable memfd_secret() if arch cannot set direct map .mailmap: update Fangrui's email mm/huge_memory: check pmd_special() only after pmd_present() resource, kunit: fix user-after-free in resource_test_region_intersects() fs/proc/kcore.c: allow translation of physical memory addresses selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test device-dax: correct pgoff align in dax_set_mapping() kthread: unpark only parked kthread Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN" bcachefs: do not use PF_MEMALLOC_NORECLAIM
2024-10-10	selftests: netfilter: conntrack_vrf.sh: add fib test case	Florian Westphal	1	-0/+33
	meta iifname veth0 ip daddr ... fib daddr oif ... is expected to return "dummy0" interface which is part of same vrf as veth0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-10	netfilter: fib: check correct rtable in vrf setups	Florian Westphal	2	-5/+4
	We need to init l3mdev unconditionally, else main routing table is searched and incorrect result is returned unless strict (iif keyword) matching is requested. Next patch adds a selftest for this. Fixes: 2a8a7c0eaa87 ("netfilter: nft_fib: Fix for rpath check with VRF devices") Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1761 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-10	netfilter: xtables: avoid NFPROTO_UNSPEC where needed	Florian Westphal	16	-165/+422
	syzbot managed to call xt_cluster match via ebtables: WARNING: CPU: 0 PID: 11 at net/netfilter/xt_cluster.c:72 xt_cluster_mt+0x196/0x780 [..] ebt_do_table+0x174b/0x2a40 Module registers to NFPROTO_UNSPEC, but it assumes ipv4/ipv6 packet processing. As this is only useful to restrict locally terminating TCP/UDP traffic, register this for ipv4 and ipv6 family only. Pablo points out that this is a general issue, direct users of the set/getsockopt interface can call into targets/matches that were only intended for use with ip(6)tables. Check all UNSPEC matches and targets for similar issues: - matches and targets are fine except if they assume skb_network_header() is valid -- this is only true when called from inet layer: ip(6) stack pulls the ip/ipv6 header into linear data area. - targets that return XT_CONTINUE or other xtables verdicts must be restricted too, they are incompatbile with the ebtables traverser, e.g. EBT_CONTINUE is a completely different value than XT_CONTINUE. Most matches/targets are changed to register for NFPROTO_IPV4/IPV6, as they are provided for use by ip(6)tables. The MARK target is also used by arptables, so register for NFPROTO_ARP too. While at it, bail out if connbytes fails to enable the corresponding conntrack family. This change passes the selftests in iptables.git. Reported-by: syzbot+256c348558aa5cf611a9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netfilter-devel/66fec2e2.050a0220.9ec68.0047.GAE@google.com/ Fixes: 0269ea493734 ("netfilter: xtables: add cluster match") Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-10-09	mm: zswap: delete comments for "value" member of 'struct zswap_entry'.	Kanchana P Sridhar	1	-1/+0
	Made a minor edit in the comments for 'struct zswap_entry' to delete the description of the 'value' member that was deleted in commit 20a5532ffa53 ("mm: remove code to handle same filled pages"). Link: https://lkml.kernel.org/r/20241002194213.30041-1-kanchana.p.sridhar@intel.com Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Fixes: 20a5532ffa53 ("mm: remove code to handle same filled pages") Reviewed-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Usama Arif <usamaarif642@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Huang Ying <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Wajdi Feghali <wajdi.k.feghali@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	CREDITS: sort alphabetically by name	Krzysztof Kozlowski	1	-27/+27
	Re-sort few misplaced entries in the CREDITS file. Link: https://lkml.kernel.org/r/20241002111932.46012-1-krzysztof.kozlowski@linaro.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	secretmem: disable memfd_secret() if arch cannot set direct map	Patrick Roy	1	-2/+2
	Return -ENOSYS from memfd_secret() syscall if !can_set_direct_map(). This is the case for example on some arm64 configurations, where marking 4k PTEs in the direct map not present can only be done if the direct map is set up at 4k granularity in the first place (as ARM's break-before-make semantics do not easily allow breaking apart large/gigantic pages). More precisely, on arm64 systems with !can_set_direct_map(), set_direct_map_invalid_noflush() is a no-op, however it returns success (0) instead of an error. This means that memfd_secret will seemingly "work" (e.g. syscall succeeds, you can mmap the fd and fault in pages), but it does not actually achieve its goal of removing its memory from the direct map. Note that with this patch, memfd_secret() will start erroring on systems where can_set_direct_map() returns false (arm64 with CONFIG_RODATA_FULL_DEFAULT_ENABLED=n, CONFIG_DEBUG_PAGEALLOC=n and CONFIG_KFENCE=n), but that still seems better than the current silent failure. Since CONFIG_RODATA_FULL_DEFAULT_ENABLED defaults to 'y', most arm64 systems actually have a working memfd_secret() and aren't be affected. From going through the iterations of the original memfd_secret patch series, it seems that disabling the syscall in these scenarios was the intended behavior [1] (preferred over having set_direct_map_invalid_noflush return an error as that would result in SIGBUSes at page-fault time), however the check for it got dropped between v16 [2] and v17 [3], when secretmem moved away from CMA allocations. [1]: https://lore.kernel.org/lkml/20201124164930.GK8537@kernel.org/ [2]: https://lore.kernel.org/lkml/20210121122723.3446-11-rppt@kernel.org/#t [3]: https://lore.kernel.org/lkml/20201125092208.12544-10-rppt@kernel.org/ Link: https://lkml.kernel.org/r/20241001080056.784735-1-roypat@amazon.co.uk Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: Patrick Roy <roypat@amazon.co.uk> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@redhat.com> Cc: James Gowans <jgowans@amazon.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	.mailmap: update Fangrui's email	Fangrui Song	1	-0/+1
	I'm leaving Google. Link: https://lkml.kernel.org/r/20240927192912.31532-1-i@maskray.me Signed-off-by: Fangrui Song <i@maskray.me> Acked-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	mm/huge_memory: check pmd_special() only after pmd_present()	David Hildenbrand	1	-1/+1
	We should only check for pmd_special() after we made sure that we have a present PMD. For example, if we have a migration PMD, pmd_special() might indicate that we have a special PMD although we really don't. This fixes confusing migration entries as PFN mappings, and not doing what we are supposed to do in the "is_swap_pmd()" case further down in the function -- including messing up COW, page table handling and accounting. Link: https://lkml.kernel.org/r/20240926154234.2247217-1-david@redhat.com Fixes: bc02afbd4d73 ("mm/fork: accept huge pfnmap entries") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+bf2c35fa302ebe3c7471@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/66f15c8d.050a0220.c23dd.000f.GAE@google.com/ Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	resource, kunit: fix user-after-free in resource_test_region_intersects()	Huang Ying	1	-4/+14
	In resource_test_insert_resource(), the pointer is used in error message after kfree(). This is user-after-free. To fix this, we need to call kunit_add_action_or_reset() to schedule memory freeing after usage. But kunit_add_action_or_reset() itself may fail and free the memory. So, its return value should be checked and abort the test for failure. Then, we found that other usage of kunit_add_action_or_reset() in resource_test_region_intersects() needs to be fixed too. We fix all these user-after-free bugs in this patch. Link: https://lkml.kernel.org/r/20240930070611.353338-1-ying.huang@intel.com Fixes: 99185c10d5d9 ("resource, kunit: add test case for region_intersects()") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reported-by: Kees Bakker <kees@ijzerbout.nl> Closes: https://lore.kernel.org/lkml/87ldzaotcg.fsf@yhuang6-desk2.ccr.corp.intel.com/ Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	fs/proc/kcore.c: allow translation of physical memory addresses	Alexander Gordeev	2	-2/+36
	When /proc/kcore is read an attempt to read the first two pages results in HW-specific page swap on s390 and another (so called prefix) pages are accessed instead. That leads to a wrong read. Allow architecture-specific translation of memory addresses using kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks. That way an architecture can deal with specific physical memory ranges. Re-use the existing /dev/mem callback implementation on s390, which handles the described prefix pages swapping correctly. For other architectures the default callback is basically NOP. It is expected the condition (vaddr == __va(__pa(vaddr))) always holds true for KCORE_RAM memory type. Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Suggested-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test	Donet Tom	1	-1/+1
	The hmm2 double_map test was failing due to an incorrect buffer->mirror size. The buffer->mirror size was 6, while buffer->ptr size was 6 * PAGE_SIZE. The test failed because the kernel's copy_to_user function was attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror. Since the size of buffer->mirror was incorrect, copy_to_user failed. This patch corrects the buffer->mirror size to 6 * PAGE_SIZE. Test Result without this patch ============================== # RUN hmm2.hmm2_device_private.double_map ... # hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0) # double_map: Test terminated by assertion # FAIL hmm2.hmm2_device_private.double_map not ok 53 hmm2.hmm2_device_private.double_map Test Result with this patch =========================== # RUN hmm2.hmm2_device_private.double_map ... # OK hmm2.hmm2_device_private.double_map ok 53 hmm2.hmm2_device_private.double_map Link: https://lkml.kernel.org/r/20240927050752.51066-1-donettom@linux.ibm.com Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Brown <broonie@kernel.org> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	device-dax: correct pgoff align in dax_set_mapping()	Kun(llfl)	1	-1/+1
	pgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise, vmf->address not aligned to fault_size will be aligned to the next alignment, that can result in memory failure getting the wrong address. It's a subtle situation that only can be observed in page_mapped_in_vma() after the page is page fault handled by dev_dax_huge_fault. Generally, there is little chance to perform page_mapped_in_vma in dev-dax's page unless in specific error injection to the dax device to trigger an MCE - memory-failure. In that case, page_mapped_in_vma() will be triggered to determine which task is accessing the failure address and kill that task in the end. We used self-developed dax device (which is 2M aligned mapping) , to perform error injection to random address. It turned out that error injected to non-2M-aligned address was causing endless MCE until panic. Because page_mapped_in_vma() kept resulting wrong address and the task accessing the failure address was never killed properly: [ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3784.049006] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3784.448042] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3784.792026] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3785.162502] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3785.461116] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3785.764730] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3786.042128] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3786.464293] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3786.818090] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page: Recovered [ 3787.085297] mce: Uncorrected hardware memory error in user-access at 200c9742380 [ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page: Recovered It took us several weeks to pinpoint this problem, but we eventually used bpftrace to trace the page fault and mce address and successfully identified the issue. Joao added: ; Likely we never reproduce in production because we always pin : device-dax regions in the region align they provide (Qemu does : similarly with prealloc in hugetlb/file backed memory). I think this : bug requires that we touch unpinned device-dax regions unaligned to : the device-dax selected alignment (page size i.e. 4K/2M/1G) Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.1727421694.git.llfl@linux.alibaba.com Fixes: b9b5777f09be ("device-dax: use ALIGN() for determining pgoff") Signed-off-by: Kun(llfl) <llfl@linux.alibaba.com> Tested-by: JianXiong Zhao <zhaojianxiong.zjx@alibaba-inc.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	kthread: unpark only parked kthread	Frederic Weisbecker	1	-0/+2
	Calling into kthread unparking unconditionally is mostly harmless when the kthread is already unparked. The wake up is then simply ignored because the target is not in TASK_PARKED state. However if the kthread is per CPU, the wake up is preceded by a call to kthread_bind() which expects the task to be inactive and in TASK_PARKED state, which obviously isn't the case if it is unparked. As a result, calling kthread_stop() on an unparked per-cpu kthread triggers such a warning: WARNING: CPU: 0 PID: 11 at kernel/kthread.c:525 __kthread_bind_mask kernel/kthread.c:525 <TASK> kthread_stop+0x17a/0x630 kernel/kthread.c:707 destroy_workqueue+0x136/0xc40 kernel/workqueue.c:5810 wg_destruct+0x1e2/0x2e0 drivers/net/wireguard/device.c:257 netdev_run_todo+0xe1a/0x1000 net/core/dev.c:10693 default_device_exit_batch+0xa14/0xa90 net/core/dev.c:11769 ops_exit_list net/core/net_namespace.c:178 [inline] cleanup_net+0x89d/0xcc0 net/core/net_namespace.c:640 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312 worker_thread+0x86d/0xd70 kernel/workqueue.c:3393 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Fix this with skipping unecessary unparking while stopping a kthread. Link: https://lkml.kernel.org/r/20240913214634.12557-1-frederic@kernel.org Fixes: 5c25b5ff89f0 ("workqueue: Tag bound workers with KTHREAD_IS_PER_CPU") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reported-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com Tested-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com Suggested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hillf Danton <hdanton@sina.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"	Michal Hocko	2	-15/+6
	This reverts commit eab0af905bfc3e9c05da2ca163d76a1513159aa4. There is no existing user of those flags. PF_MEMALLOC_NOWARN is dangerous because a nested allocation context can use GFP_NOFAIL which could cause unexpected failure. Such a code would be hard to maintain because it could be deeper in the call chain. PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1] that such a allocation contex is inherently unsafe if the context doesn't fully control all allocations called from this context. While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM is it doesn't have any user and as Matthew has pointed out we are running out of those flags so better reclaim it without any real users. [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/ Link: https://lkml.kernel.org/r/20240926172940.167084-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: James Morris <jmorris@namei.org> Cc: Jan Kara <jack@suse.cz> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Paul Moore <paul@paul-moore.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	bcachefs: do not use PF_MEMALLOC_NORECLAIM	Michal Hocko	5	-19/+26
	Patch series "remove PF_MEMALLOC_NORECLAIM" v3. This patch (of 2): bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new inode to achieve GFP_NOWAIT semantic while holding locks. If this allocation fails it will drop locks and use GFP_NOFS allocation context. We would like to drop PF_MEMALLOC_NORECLAIM because it is really dangerous to use if the caller doesn't control the full call chain with this flag set. E.g. if any of the function down the chain needed GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and cause unexpected failure. While this is not the case in this particular case using the scoped gfp semantic is not really needed bacause we can easily pus the allocation context down the chain without too much clutter. [akpm@linux-foundation.org: fix kerneldoc warnings] Link: https://lkml.kernel.org/r/20240926172940.167084-1-mhocko@kernel.org Link: https://lkml.kernel.org/r/20240926172940.167084-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> # For vfs changes Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: James Morris <jmorris@namei.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Paul Moore <paul@paul-moore.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09	misc: sgi-gru: Don't disable preemption in GRU driver	Dimitri Sivanich	3	-8/+0
	Disabling preemption in the GRU driver is unnecessary, and clashes with sleeping locks in several code paths. Remove preempt_disable and preempt_enable from the GRU driver. Signed-off-by: Dimitri Sivanich <sivanich@hpe.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-09	Merge tag 'unicode-fixes-6.12-rc3' of ↵	Linus Torvalds	2	-3427/+3346
	git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode Pull unicode fix from Gabriel Krisman Bertazi: - Handle code-points with the Ignorable property as regular character instead of treating them as an empty string (me) * tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: unicode: Don't special case ignorable code points
2024-10-09	unicode: Don't special case ignorable code points	Gabriel Krisman Bertazi	2	-3427/+3346
	We don't need to handle them separately. Instead, just let them decompose/casefold to themselves. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-10-09	ring-buffer: Do not have boot mapped buffers hook to CPU hotplug	Steven Rostedt	1	-3/+6
	The boot mapped ring buffer has its buffer mapped at a fixed location found at boot up. It is not dynamic. It cannot grow or be expanded when new CPUs come online. Do not hook fixed memory mapped ring buffers to the CPU hotplug callback, otherwise it can cause a crash when it tries to add the buffer to the memory that is already fully occupied. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241008143242.25e20801@gandalf.local.home Fixes: be68d63a139bd ("ring-buffer: Add ring_buffer_alloc_range()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-09	net: mana: Enable debugfs files for MANA device	Shradha Gupta	4	-3/+159
	Implement debugfs in MANA driver to be able to view RX,TX,EQ queue specific attributes and dump their gdma queues. These dumps can be used by other userspace utilities to improve debuggability and troubleshooting Following files are added in debugfs: /sys/kernel/debug/mana/ \|-------------- 1 \|--------------- EQs \| \|------- eq0 \| \| \|---head \| \| \|---tail \| \| \|---eq_dump \| \|------- eq1 \| . \| . \| \|--------------- adapter-MTU \|--------------- vport0 \|------- RX-0 \| \|---cq_budget \| \|---cq_dump \| \|---cq_head \| \|---cq_tail \| \|---rq_head \| \|---rq_nbuf \| \|---rq_tail \| \|---rxq_dump \|------- RX-1 . . \|------- TX-0 \| \|---cq_budget \| \|---cq_dump \| \|---cq_head \| \|---cq_tail \| \|---sq_head \| \|---sq_pend_skb_qlen \| \|---sq_tail \| \|---txq_dump \|------- TX-1 . . Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	net: hns3/hns: Update the maintainer for the HNS3/HNS ethernet driver	Jijie Shao	1	-2/+2
	Yisen Zhuang has left the company in September. Jian Shen will be responsible for maintaining the hns3/hns driver's code in the future, so add Jian Shen to the hns3/hns driver's matainer list. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	r8169: add support for the temperature sensor being available from RTL8125B	Heiner Kallweit	1	-0/+44
	This adds support for the temperature sensor being available from RTL8125B. Register information was taken from r8125 vendor driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	sctp: ensure sk_state is set to CLOSED if hashing fails in sctp_listen_start	Xin Long	1	-5/+13
	If hashing fails in sctp_listen_start(), the socket remains in the LISTENING state, even though it was not added to the hash table. This can lead to a scenario where a socket appears to be listening without actually being accessible. This patch ensures that if the hashing operation fails, the sk_state is set back to CLOSED before returning an error. Note that there is no need to undo the autobind operation if hashing fails, as the bind port can still be used for next listen() call on the same socket. Fixes: 76c6d988aeb3 ("sctp: add sock_reuseport for the sock in __sctp_hash_endpoint") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	Merge branch 'net-improve-multicast-group-join-performance'	David S. Miller	7	-54/+39
	Jonas Rebmann says: ==================== improve multicast join group performance This series seeks to improve performance on updating igmp group memberships such as with IP_ADD_MEMBERSHIP or MCAST_JOIN_SOURCE_GROUP. Our use case was to add 2000 multicast memberships on a TQMLS1046A which took about 3.6 seconds for the membership additions alone. Our userspace reproducer tool was instrumented to log runtimes of the individual setsockopt invocations which clearly indicated quadratic complexity of setting up the membership with regard to the total number of multicast groups to be joined. We used perf to locate the hotspots and subsequently optimized the most costly sections of code. This series includes a patch to Linux igmp handling as well as a patch to the DPAA/Freescale driver. With both patches applied, our memberships can be set up in only about 87 miliseconds, which corresponds to a speedup of around 40. While we have acheived practically linear run-time complexity on the kernel side, a small quadratic factor remains in parts of the freescale driver code which we haven't yet optimized. We have by now payed little attention to the optimization potential in dropping group memberships, yet the dpaa patch applies to joining and leaving groups alike. Overall, this patch series brings great improvements in use cases involving large numbers of multicast groups, particularly when using the fsl_dpa driver, without noteworthy drawbacks in other scenarios. ==================== Signed-off-by: Jonas Rebmann <jre@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	net: dpaa: use __dev_mc_sync in dpaa_set_rx_mode()	Jonas Rebmann	6	-49/+18
	The original driver first unregisters then re-registers all multicast addresses in the struct net_device_ops::ndo_set_rx_mode() callback. As the networking stack calls ndo_set_rx_mode() if a single multicast address change occurs, a significant amount of time may be used to first unregister and then re-register unchanged multicast addresses. This leads to performance issues when tracking large numbers of multicast addresses. Replace the unregister and register loop and the hand crafted mc_addr_list list handling with __dev_mc_sync(), to only update entries which have changed. On profiling with an fsl_dpa NIC, this patch presented a speedup of around 40 when successively setting up 2000 multicast groups using setsockopt(), without drawbacks on smaller numbers of multicast groups. Signed-off-by: Jonas Rebmann <jre@pengutronix.de> Reviewed-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	net: ipv4: igmp: optimize ____ip_mc_inc_group() using mc_hash	Jonas Rebmann	1	-5/+21
	The runtime cost of joining a single multicast group in the current implementation of ____ip_mc_inc_group grows linearly with the number of existing memberships. This is caused by the linear search for an existing group record in the multicast address list. This linear complexity results in quadratic complexity when successively adding memberships, which becomes a performance bottleneck when setting up large numbers of multicast memberships. If available, use the existing multicast hash map mc_hash to quickly search for an existing group membership record. This leads to near-constant complexity on the addition of a new multicast record, significantly improving performance for workloads involving many multicast memberships. On profiling with a loopback device, this patch presented a speedup of around 6 when successively setting up 2000 multicast groups using setsockopt without measurable drawbacks on smaller numbers of multicast groups. Signed-off-by: Jonas Rebmann <jre@pengutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	net: amd: mvme147: Fix probe banner message	Daniel Palmer	1	-4/+3
	Currently this driver prints this line with what looks like a rogue format specifier when the device is probed: [ 2.840000] eth%d: MVME147 at 0xfffe1800, irq 12, Hardware Address xx:xx:xx:xx:xx:xx Change the printk() for netdev_info() and move it after the registration has completed so it prints out the name of the interface properly. Signed-off-by: Daniel Palmer <daniel@0x0f.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	net: phy: realtek: Fix MMD access on RTL8126A-integrated PHY	Heiner Kallweit	1	-1/+23
	All MMD reads return 0 for the RTL8126A-integrated PHY. Therefore phylib assumes it doesn't support EEE, what results in higher power consumption, and a significantly higher chip temperature in my case. To fix this split out the PHY driver for the RTL8126A-integrated PHY and set the read_mmd/write_mmd callbacks to read from vendor-specific registers. Fixes: 5befa3728b85 ("net: phy: realtek: add support for RTL8126A-integrated 5Gbps PHY") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	btrfs: fix clear_dirty and writeback ordering in submit_one_sector()	Naohiro Aota	1	-7/+7
	This commit is a replay of commit 6252690f7e1b ("btrfs: fix invalid mapping of extent xarray state"). We need to call btrfs_folio_clear_dirty() before btrfs_set_range_writeback(), so that xarray DIRTY tag is cleared. With a refactoring commit 8189197425e7 ("btrfs: refactor __extent_writepage_io() to do sector-by-sector submission"), it screwed up and the order is reversed and causing the same hang. Fix the ordering now in submit_one_sector(). Fixes: 8189197425e7 ("btrfs: refactor __extent_writepage_io() to do sector-by-sector submission") Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-09	btrfs: zoned: fix missing RCU locking in error message when loading zone info	Filipe Manana	1	-1/+1
	At btrfs_load_zone_info() we have an error path that is dereferencing the name of a device which is a RCU string but we are not holding a RCU read lock, which is incorrect. Fix this by using btrfs_err_in_rcu() instead of btrfs_err(). The problem is there since commit 08e11a3db098 ("btrfs: zoned: load zone's allocation offset"), back then at btrfs_load_block_group_zone_info() but then later on that code was factored out into the helper btrfs_load_zone_info() by commit 09a46725cc84 ("btrfs: zoned: factor out per-zone logic from btrfs_load_block_group_zone_info"). Fixes: 08e11a3db098 ("btrfs: zoned: load zone's allocation offset") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-09	net: ti: icssg-prueth: Fix race condition for VLAN table access	MD Danish Anwar	3	-0/+5
	The VLAN table is a shared memory between the two ports/slices in a ICSSG cluster and this may lead to race condition when the common code paths for both ports are executed in different CPUs. Fix the race condition access by locking the shared memory access Fixes: 487f7323f39a ("net: ti: icssg-prueth: Add helper functions to configure FDB") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	ptp: Add support for the AMZNC10C 'vmclock' device	David Woodhouse	5	-0/+818
	The vmclock device addresses the problem of live migration with precision clocks. The tolerances of a hardware counter (e.g. TSC) are typically around ±50PPM. A guest will use NTP/PTP/PPS to discipline that counter against an external source of 'real' time, and track the precise frequency of the counter as it changes with environmental conditions. When a guest is live migrated, anything it knows about the frequency of the underlying counter becomes invalid. It may move from a host where the counter running at -50PPM of its nominal frequency, to a host where it runs at +50PPM. There will also be a step change in the value of the counter, as the correctness of its absolute value at migration is limited by the accuracy of the source and destination host's time synchronization. In its simplest form, the device merely advertises a 'disruption_marker' which indicates that the guest should throw away any NTP synchronization it thinks it has, and start again. Because the shared memory region can be exposed all the way to userspace through the /dev/vmclock0 node, applications can still use time from a fast vDSO 'system call', and check the disruption marker to be sure that their timestamp is indeed truthful. The structure also allows for the precise time, as known by the host, to be exposed directly to guests so that they don't have to wait for NTP to resync from scratch. The PTP driver consumes this information if present. Like the KVM PTP clock, this PTP driver can convert TSC-based cross timestamps into KVM clock values. Unlike the KVM PTP clock, it does so only when such is actually helpful. The values and fields are based on the nascent virtio-rtc specification, and the intent is that a version (hopefully precisely this version) of this structure will be included as an optional part of that spec. In the meantime, this driver supports the simple ACPI form of the device which is being shipped in certain commercial hypervisors (and submitted for inclusion in QEMU). Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	Merge branch 'pcs-xpcs-cleanups-batch-2'	David S. Miller	6	-335/+237
	Russell King says: ==================== net: pcs: xpcs: cleanups batch 2 This is the second cleanup series for XPCS. Patch 1 removes the enum indexing the dw_xpcs_compat array. The index is never used except to place entries in the array and to size the array. Patch 2 removes the interface arrays - each of which only contain one interface. Patch 3 makes xpcs_find_compat() take the xpcs structure rather than the ID - the previous series removed the reason for xpcs_find_compat needing to take the ID. Patch 4 provides a helper to convert xpcs structure to a regular phylink_pcs structure, which leads to patch 5. Patch 5 moves the definition of struct dw_xpcs to the private xpcs header - with patch 4 in place, nothing outside of the xpcs driver accesses the contents of the dw_xpcs structure. Patch 6 renames xpcs_get_id() to xpcs_read_id() since it's reading the ID, rather than doing anything further with it. (Prior versions of this series renamed it to xpcs_read_phys_id() since that more accurately described that it was reading the physical ID registers.) Patch 7 moves the searching of the ID list out of line as this is a separate functional block. Patch 8 converts xpcs to use the bitmap macros, which eliminates the need for _SHIFT definitions. Patch 9 adds and uses _modify() accessors as there are a large amount of read-modify-write operations in this driver. This conversion found a bug in xpcs-wx code that has been reported and already fixed. Patch 10 converts xpcs to use read_poll_timeout() rather than open coding that. Patch 11 converts all printed messages to use the dev_*() functions so the driver and devie name are always printed. Patch 12 moves DW_VR_MII_DIG_CTRL1_2G5_EN to the correct place in the header file, rather than amongst another register's definitions. Patch 13 moves the Wangxun workaround to a common location rather than duplicating it in two places. We also reformat this to fit within 80 columns. ==================== Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09	net: pcs: xpcs: move Wangxun VR_XS_PCS_DIG_CTRL1 configuration	Russell King (Oracle)	1	-6/+8
	According to commits 2a22b7ae2fa3 ("net: pcs: xpcs: adapt Wangxun NICs for SGMII mode") and 2deea43f386d ("net: pcs: xpcs: add 1000BASE-X AN interrupt support"), Wangxun devices need special VR_XS_PCS_DIG_CTRL1 settings for SGMII and 1000BASE-X. Both SGMII and 1000BASE-X use the same settings. Rather than placing these in the individual xpcs_config_*() functions, move it to where we already test for the Wangxun devices in xpcs_do_config(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>