summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-04-27platform/x86: intel-uncore-freq: Prevent driver loading in guestsSrinivas Pandruvada1-0/+3
Loading this driver in guests results in unchecked MSR access error for MSR 0x620. There is no use of reading and modifying package/die scope uncore MSRs in guests. So check for CPU feature X86_FEATURE_HYPERVISOR to prevent loading of this driver in guests. Fixes: dbce412a7733 ("platform/x86/intel-uncore-freq: Split common and enumeration part") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215870 Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20220427100304.2562990-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-04-27platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboardDarryn Anton Jordan1-0/+1
This works on my system. Signed-off-by: Darryn Anton Jordan <darrynjordan@icloud.com> Acked-by: Thomas Weißschuh <thomas@weissschuh.net> Link: https://lore.kernel.org/r/Ylguq87YG+9L3foV@hark Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-04-27platform/x86: dell-laptop: Add quirk entry for Latitude 7520Gabriele Mazzotta1-0/+13
The Latitude 7520 supports AC timeouts, but it has no KBD_LED_AC_TOKEN and so changes to stop_timeout appear to have no effect if the laptop is plugged in. Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Acked-by: Pali Rohár <pali@kernel.org> Link: https://lore.kernel.org/r/20220426120827.12363-1-gabriele.mzt@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-04-27platform/x86: asus-wmi: Fix driver not binding when fan curve control probe ↵Hans de Goede1-3/+4
fails Before this commit fan_curve_check_present() was trying to not cause the probe to fail on devices without fan curve control by testing for known error codes returned by asus_wmi_evaluate_method_buf(). Checking for ENODATA or ENODEV, with the latter being returned by this function when an ACPI integer with a value of ASUS_WMI_UNSUPPORTED_METHOD is returned. But for other ACPI integer returns this function just returns them as is, including the ASUS_WMI_DSTS_UNKNOWN_BIT value of 2. On the Asus U36SD ASUS_WMI_DSTS_UNKNOWN_BIT gets returned, leading to: asus-nb-wmi: probe of asus-nb-wmi failed with error 2 Instead of playing whack a mole with error codes here, simply treat all errors as there not being any fan curves, fixing the driver no longer loading on the Asus U36SD laptop. Fixes: e3d13da7f77d ("platform/x86: asus-wmi: Fix regression when probing for fan curve control") BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2079125 Cc: Luke D. Jones <luke@ljones.dev> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220427114956.332919-1-hdegoede@redhat.com
2022-04-27platform/x86: asus-wmi: Potential buffer overflow in ↵Dan Carpenter1-2/+6
asus_wmi_evaluate_method_buf() This code tests for if the obj->buffer.length is larger than the buffer but then it just does the memcpy() anyway. Fixes: 0f0ac158d28f ("platform/x86: asus-wmi: Add support for custom fan curves") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20220413073744.GB8812@kili Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-04-27netfilter: conntrack: fix udp offload timeout sysctlVolodymyr Mytnyk1-1/+1
`nf_flowtable_udp_timeout` sysctl option is available only if CONFIG_NFT_FLOW_OFFLOAD enabled. But infra for this flow offload UDP timeout was added under CONFIG_NF_FLOW_TABLE config option. So, if you have CONFIG_NFT_FLOW_OFFLOAD disabled and CONFIG_NF_FLOW_TABLE enabled, the `nf_flowtable_udp_timeout` is not present in sysfs. Please note, that TCP flow offload timeout sysctl option is present even CONFIG_NFT_FLOW_OFFLOAD is disabled. I suppose it was a typo in commit that adds UDP flow offload timeout and CONFIG_NF_FLOW_TABLE should be used instead. Fixes: 975c57504da1 ("netfilter: conntrack: Introduce udp offload timeout configuration") Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-27netfilter: nf_conntrack_tcp: re-init for syn packets onlyFlorian Westphal1-15/+6
Jaco Kroon reported tcp problems that Eric Dumazet and Neal Cardwell pinpointed to nf_conntrack tcp_in_window() bug. tcp trace shows following sequence: I > R Flags [S], seq 3451342529, win 62580, options [.. tfo [|tcp]> R > I Flags [S.], seq 2699962254, ack 3451342530, win 65535, options [..] R > I Flags [P.], seq 1:89, ack 1, [..] Note 3rd ACK is from responder to initiator so following branch is taken: } else if (((state->state == TCP_CONNTRACK_SYN_SENT && dir == IP_CT_DIR_ORIGINAL) || (state->state == TCP_CONNTRACK_SYN_RECV && dir == IP_CT_DIR_REPLY)) && after(end, sender->td_end)) { ... because state == TCP_CONNTRACK_SYN_RECV and dir is REPLY. This causes the scaling factor to be reset to 0: window scale option is only present in syn(ack) packets. This in turn makes nf_conntrack mark valid packets as out-of-window. This was always broken, it exists even in original commit where window tracking was added to ip_conntrack (nf_conntrack predecessor) in 2.6.9-rc1 kernel. Restrict to 'tcph->syn', just like the 3rd condtional added in commit 82b72cb94666 ("netfilter: conntrack: re-init state for retransmitted syn-ack"). Upon closer look, those conditionals/branches can be merged: Because earlier checks prevent syn-ack from showing up in original direction, the 'dir' checks in the conditional quoted above are redundant, remove them. Return early for pure syn retransmitted in reply direction (simultaneous open). Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Reported-by: Jaco Kroon <jaco@uls.co.za> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-27Merge branch 'remove-virt_to_bus-drivers'David S. Miller30-15143/+1
Jakub Kicinski says: ==================== net: remove non-Ethernet drivers using virt_to_bus() Networking is currently the main offender in using virt_to_bus(). Frankly all the drivers which use it are super old and unlikely to be used today. They are just an ongoing maintenance burden. In other words this series is using virt_to_bus() as an excuse to shed some old stuff. Having done the tree-wide dev_addr_set() conversion recently I have limited sympathy for carrying dead code. Obviously please scream if any of these drivers _is_ in fact still being used. Otherwise let's take the chance, we can always apologize and revert if users show up later. Also I should say thanks to everyone who contributed to this code! The work continues to be appreciated although realistically in more of a "history book" fashion... ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: hamradio: remove support for DMA SCC devicesJakub Kicinski4-1486/+0
Another incarnation of Z8530, looks like? Again, no real changes in the git history, and it needs VIRT_TO_BUS. Unlikely to have users, let's spend less time refactoring dead code... Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: wan: remove support for Z85230-based devicesJakub Kicinski9-3035/+0
Looks like all the changes to this driver had been automated churn since git era begun. The driver is using virt_to_bus(), it's just a maintenance burden unlikely to have any users. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: wan: remove support for COSA and SRP synchronous serial boardsJakub Kicinski6-2186/+1
Looks like all the changes to this driver had been automated churn since git era begun. The driver is using virt_to_bus() so it should be updated to a proper DMA API or removed. Given the latest "news" entry on the website is from 1999 I'm opting for the latter. I'm marking the allocated char device major number as [REMOVED], I reckon we can't reuse it in case some SW out there assumes its COSA? Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: atm: remove support for ZeitNet ZN122x ATM devicesJakub Kicinski10-2492/+0
This driver received nothing but automated fixes in the last 15 years. Since it's using virt_to_bus it's unlikely to be used on any modern platform. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: atm: remove support for Madge Horizon ATM devicesJakub Kicinski6-3372/+0
This driver received nothing but automated fixes since git era begun. Since it's using virt_to_bus it's unlikely to be used on any modern platform. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: atm: remove support for Fujitsu FireStream ATM devicesJakub Kicinski6-2572/+0
This driver received nothing but automated fixes (mostly spelling and compiler warnings) since git era begun. Since it's using virt_to_bus it's unlikely to be used on any modern platform. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27Merge branch 'lan966x-ptp-programmable-pins'David S. Miller5-1/+338
Horatiu Vultur says: ==================== net: lan966x: Add support for PTP programmable pins Lan966x has 8 PTP programmable pins. The last pin is hardcoded to be used by PHC0 and all the rest are shareable between the PHCs. The PTP pins can implement both extts and perout functions. v1->v2: - use ptp_find_pin_unlocked instead of ptp_find_pin inside the irq handler. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: lan966x: Add support for PTP_PF_EXTTSHoratiu Vultur3-1/+127
Extend the PTP programmable pins to implement also PTP_PF_EXTTS function. The PTP pin can be configured to capture only on the rising edge of the PPS signal. And once an event is seen then an interrupt is generated and the local time counter is saved. The interrupt is shared between all the pins. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: lan966x: Add support for PTP_PF_PEROUTHoratiu Vultur2-0/+169
Lan966x has 8 PTP programmable pins, where the last pins is hardcoded to be used by PHC0, which does the frame timestamping. All the rest of the PTP pins can be shared between the PHCs and can have different functions like perout or extts. For now add support for PTP_FS_PEROUT. The HW is not able to support absolute start time but can use the nsec for phase adjustment when generating PPS. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: lan966x: Add registers used to configure the PTP pinHoratiu Vultur1-0/+40
Add registers that are used to configure the PTP pins. These registers are used to enable the interrupts per PTP pin and to set the waveform generated by the pin. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: lan966x: Change the PTP pin used to read/write the PHC.Horatiu Vultur1-1/+1
To read/write a value to a PHC, it is required to use a PTP pin. Currently it is used pin 5, but change to pin 7 as is the last pin. All the other pins will have different functions. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27dt-bindings: net: lan966x: Extend with the ptp external interrupt.Horatiu Vultur1-0/+2
Extend dt-bindings for lan966x with ptp external interrupt. This is generated when an external 1pps signal is received on the ptp pin. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/netDavid S. Miller3-19/+13
-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-04-26 This series contains updates to ice driver only. Ivan Vecera removes races related to VF message processing by changing mutex_trylock() call to mutex_lock() and moving additional operations to occur under mutex. Petr Oros increases wait time after firmware flash as current time is not sufficient. Jake resolves a use-after-free issue for mailbox snapshot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27Merge branch 'mptcp-MP_FAIL-timeout'David S. Miller6-8/+216
Mat Martineau says: ==================== mptcp: Timeout for MP_FAIL response When one peer sends an infinite mapping to coordinate fallback from MPTCP to regular TCP, the other peer is expected to send a packet with the MPTCP MP_FAIL option to acknowledge the infinite mapping. Rather than leave the connection in some half-fallback state, this series adds a timeout after which the infinite mapping sender will reset the connection. Patch 1 adds a fallback self test. Patches 2-5 make use of the MPTCP socket's retransmit timer to reset the MPTCP connection if no MP_FAIL was received. Patches 6 and 7 extends the self test to check MP_FAIL-related MIBs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27selftests: mptcp: print extra msg in chk_csum_nrGeliang Tang1-1/+10
When the multiple checksum errors occur in chk_csum_nr(), print the numbers of the errors as an extra message. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27selftests: mptcp: check MP_FAIL response mibsGeliang Tang1-3/+35
This patch extends chk_fail_nr to check the MP_FAIL response mibs. Add a new argument invert for chk_fail_nr to allow it can check the MP_FAIL TX and RX mibs from the opposite direction. When the infinite map is received before the MP_FAIL response, the response will be lost. A '-' can be added into fail_tx or fail_rx to represent that MP_FAIL response TX or RX can be lost when doing the checks. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27mptcp: reset subflow when MP_FAIL doesn't respondGeliang Tang4-0/+68
This patch adds a new msk->flags bit MPTCP_FAIL_NO_RESPONSE, then reuses sk_timer to trigger a check if we have not received a response from the peer after sending MP_FAIL. If the peer doesn't respond properly, reset the subflow. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27mptcp: add MP_FAIL response supportGeliang Tang3-1/+12
This patch adds a new struct member mp_fail_response_expect in struct mptcp_subflow_context to support MP_FAIL response. In the single subflow with checksum error and contiguous data special case, a MP_FAIL is sent in response to another MP_FAIL. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27mptcp: add data lock for sk timersGeliang Tang1-0/+12
mptcp_data_lock() needs to be held when manipulating the msk retransmit_timer or the sk sk_timer. This patch adds the data lock for the both timers. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27mptcp: use mptcp_stop_timerGeliang Tang1-2/+2
Use the helper mptcp_stop_timer() instead of using sk_stop_timer() to stop icsk_retransmit_timer directly. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27selftests: mptcp: add infinite map testcaseGeliang Tang2-1/+77
Add the single subflow test case for MP_FAIL, to test the infinite mapping case. Use the test_linkfail value to make 128KB test files. Add a new function reset_with_fail(), in it use 'iptables' and 'tc action pedit' rules to produce the bit flips to trigger the checksum failures. Set validate_checksum to enable checksums for the MP_FAIL tests without passing the '-C' argument. Set check_invert flag to enable the invert bytes check for the output data in check_transfer(). Instead of the file mismatch error, this test prints out the inverted bytes. Add a new function pedit_action_pkts() to get the numbers of the packets edited by the tc pedit actions. Print this numbers to the output. Also add the needed kernel configures in the selftests config file. Suggested-by: Davide Caratti <dcaratti@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-27net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLKMartin Blumenstingl1-3/+0
Commit 4b5923249b8fa4 ("net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits") added all known bits in the GSWIP_MII_CFGp register. It helped bring this register into a well-defined state so the driver has to rely less on the bootloader to do things right. Unfortunately it also sets the GSWIP_MII_CFG_RMII_CLK bit without any possibility to configure it. Upon further testing it turns out that all boards which are supported by the GSWIP driver in OpenWrt which use an RMII PHY have a dedicated oscillator on the board which provides the 50MHz RMII reference clock. Don't set the GSWIP_MII_CFG_RMII_CLK bit (but keep the code which always clears it) to fix support for the Fritz!Box 7362 SL in OpenWrt. This is a board with two Atheros AR8030 RMII PHYs. With the "RMII clock" bit set the MAC also generates the RMII reference clock whose signal then conflicts with the signal from the oscillator on the board. This results in a constant cycle of the PHY detecting link up/down (and as a result of that: the two ports using the AR8030 PHYs are not working). At the time of writing this patch there's no known board where the MAC (GSWIP) has to generate the RMII reference clock. If needed this can be implemented in future by providing a device-tree flag so the GSWIP_MII_CFG_RMII_CLK bit can be toggled per port. Fixes: 4b5923249b8fa4 ("net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits") Tested-by: Jan Hoffmann <jan@3e8.eu> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Link: https://lore.kernel.org/r/20220425152027.2220750-1-martin.blumenstingl@googlemail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27net: Use this_cpu_inc() to increment net->core_statsSebastian Andrzej Siewior2-21/+14
The macro dev_core_stats_##FIELD##_inc() disables preemption and invokes netdev_core_stats_alloc() to return a per-CPU pointer. netdev_core_stats_alloc() will allocate memory on its first invocation which breaks on PREEMPT_RT because it requires non-atomic context for memory allocation. This can be avoided by enabling preemption in netdev_core_stats_alloc() assuming the caller always disables preemption. It might be better to replace local_inc() with this_cpu_inc() now that dev_core_stats_##FIELD##_inc() gained a preempt-disable section and does not rely on already disabled preemption. This results in less instructions on x86-64: local_inc: | incl %gs:__preempt_count(%rip) # __preempt_count | movq 488(%rdi), %rax # _1->core_stats, _22 | testq %rax, %rax # _22 | je .L585 #, | add %gs:this_cpu_off(%rip), %rax # this_cpu_off, tcp_ptr__ | .L586: | testq %rax, %rax # _27 | je .L587 #, | incq (%rax) # _6->a.counter | .L587: | decl %gs:__preempt_count(%rip) # __preempt_count this_cpu_inc(), this patch: | movq 488(%rdi), %rax # _1->core_stats, _5 | testq %rax, %rax # _5 | je .L591 #, | .L585: | incq %gs:(%rax) # _18->rx_dropped Use unsigned long as type for the counter. Use this_cpu_inc() to increment the counter. Use a plain read of the counter. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/YmbO0pxgtKpCw4SY@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27net: stmmac: dwmac-imx: comment spelling fixMarcel Ziswiler1-2/+2
Fix spelling in comment. Fixes: 94abdad6974a ("net: ethernet: dwmac: add ethernet glue logic for NXP imx8 chip") Signed-off-by: Marcel Ziswiler <marcel.ziswiler@toradex.com> Link: https://lore.kernel.org/r/20220425154856.169499-1-marcel@ziswiler.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27net: remove comments that mention obsolete __SLOW_DOWN_IOBjorn Helgaas3-9/+0
The only remaining definitions of __SLOW_DOWN_IO (for alpha and ia64) do nothing, and the only mentions in networking are in comments. Remove these mentions. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27net: wan: atp: remove unused eeprom_delay()Bjorn Helgaas1-4/+0
atp.h is included only by atp.c, which does not use eeprom_delay(). Remove the unused definition. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27net: tls: fix async vs NIC crypto offloadJakub Kicinski1-0/+2
When NIC takes care of crypto (or the record has already been decrypted) we forget to update darg->async. ->async is supposed to mean whether record is async capable on input and whether record has been queued for async crypto on output. Reported-by: Gal Pressman <gal@nvidia.com> Fixes: 3547a1f9d988 ("tls: rx: use async as an in-out argument") Tested-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20220425233309.344858-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27net: dsa: mt753x: fix pcs conversion regressionRussell King (Oracle)1-9/+9
Daniel Golle reports that the conversion of mt753x to phylink PCS caused an oops as below. The problem is with the placement of the PCS initialisation, which occurs after mt7531_setup() has been called. However, burited in this function is a call to setup the CPU port, which requires the PCS structure to be already setup. Fix this by changing the initialisation order. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 Mem abort info: ESR = 0x96000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000046057000 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] SMP Modules linked in: CPU: 0 PID: 32 Comm: kworker/u4:1 Tainted: G S 5.18.0-rc3-next-20220422+ #0 Hardware name: Bananapi BPI-R64 (DT) Workqueue: events_unbound deferred_probe_work_func pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mt7531_cpu_port_config+0xcc/0x1b0 lr : mt7531_cpu_port_config+0xc0/0x1b0 sp : ffffffc008d5b980 x29: ffffffc008d5b990 x28: ffffff80060562c8 x27: 00000000f805633b x26: ffffff80001a8880 x25: 00000000000009c4 x24: 0000000000000016 x23: ffffff8005eb6470 x22: 0000000000003600 x21: ffffff8006948080 x20: 0000000000000000 x19: 0000000000000006 x18: 0000000000000000 x17: 0000000000000001 x16: 0000000000000001 x15: 02963607fcee069e x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101 x11: ffffffc037302000 x10: 0000000000000870 x9 : ffffffc008d5b800 x8 : ffffff800028f950 x7 : 0000000000000001 x6 : 00000000662b3000 x5 : 00000000000002f0 x4 : 0000000000000000 x3 : ffffff800028f080 x2 : 0000000000000000 x1 : ffffff800028f080 x0 : 0000000000000000 Call trace: mt7531_cpu_port_config+0xcc/0x1b0 mt753x_cpu_port_enable+0x24/0x1f0 mt7531_setup+0x49c/0x5c0 mt753x_setup+0x20/0x31c dsa_register_switch+0x8bc/0x1020 mt7530_probe+0x118/0x200 mdio_probe+0x30/0x64 really_probe.part.0+0x98/0x280 __driver_probe_device+0x94/0x140 driver_probe_device+0x40/0x114 __device_attach_driver+0xb0/0x10c bus_for_each_drv+0x64/0xa0 __device_attach+0xa8/0x16c device_initial_probe+0x10/0x20 bus_probe_device+0x94/0x9c deferred_probe_work_func+0x80/0xb4 process_one_work+0x200/0x3a0 worker_thread+0x260/0x4c0 kthread+0xd4/0xe0 ret_from_fork+0x10/0x20 Code: 9409e911 937b7e60 8b0002a0 f9405800 (f9401005) ---[ end trace 0000000000000000 ]--- Reported-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Fixes: cbd1f243bc41 ("net: dsa: mt7530: partially convert to phylink_pcs") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/E1nj6FW-007WZB-5Y@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27net: generalize skb freeing deferral to per-cpu listsEric Dumazet11-46/+90
Logic added in commit f35f821935d8 ("tcp: defer skb freeing after socket lock is released") helped bulk TCP flows to move the cost of skbs frees outside of critical section where socket lock was held. But for RPC traffic, or hosts with RFS enabled, the solution is far from being ideal. For RPC traffic, recvmsg() has to return to user space right after skb payload has been consumed, meaning that BH handler has no chance to pick the skb before recvmsg() thread. This issue is more visible with BIG TCP, as more RPC fit one skb. For RFS, even if BH handler picks the skbs, they are still picked from the cpu on which user thread is running. Ideally, it is better to free the skbs (and associated page frags) on the cpu that originally allocated them. This patch removes the per socket anchor (sk->defer_list) and instead uses a per-cpu list, which will hold more skbs per round. This new per-cpu list is drained at the end of net_action_rx(), after incoming packets have been processed, to lower latencies. In normal conditions, skbs are added to the per-cpu list with no further action. In the (unlikely) cases where the cpu does not run net_action_rx() handler fast enough, we use an IPI to raise NET_RX_SOFTIRQ on the remote cpu. Also, we do not bother draining the per-cpu list from dev_cpu_dead() This is because skbs in this list have no requirement on how fast they should be freed. Note that we can add in the future a small per-cpu cache if we see any contention on sd->defer_lock. Tested on a pair of hosts with 100Gbit NIC, RFS enabled, and /proc/sys/net/ipv4/tcp_rmem[2] tuned to 16MB to work around page recycling strategy used by NIC driver (its page pool capacity being too small compared to number of skbs/pages held in sockets receive queues) Note that this tuning was only done to demonstrate worse conditions for skb freeing for this particular test. These conditions can happen in more general production workload. 10 runs of one TCP_STREAM flow Before: Average throughput: 49685 Mbit. Kernel profiles on cpu running user thread recvmsg() show high cost for skb freeing related functions (*) 57.81% [kernel] [k] copy_user_enhanced_fast_string (*) 12.87% [kernel] [k] skb_release_data (*) 4.25% [kernel] [k] __free_one_page (*) 3.57% [kernel] [k] __list_del_entry_valid 1.85% [kernel] [k] __netif_receive_skb_core 1.60% [kernel] [k] __skb_datagram_iter (*) 1.59% [kernel] [k] free_unref_page_commit (*) 1.16% [kernel] [k] __slab_free 1.16% [kernel] [k] _copy_to_iter (*) 1.01% [kernel] [k] kfree (*) 0.88% [kernel] [k] free_unref_page 0.57% [kernel] [k] ip6_rcv_core 0.55% [kernel] [k] ip6t_do_table 0.54% [kernel] [k] flush_smp_call_function_queue (*) 0.54% [kernel] [k] free_pcppages_bulk 0.51% [kernel] [k] llist_reverse_order 0.38% [kernel] [k] process_backlog (*) 0.38% [kernel] [k] free_pcp_prepare 0.37% [kernel] [k] tcp_recvmsg_locked (*) 0.37% [kernel] [k] __list_add_valid 0.34% [kernel] [k] sock_rfree 0.34% [kernel] [k] _raw_spin_lock_irq (*) 0.33% [kernel] [k] __page_cache_release 0.33% [kernel] [k] tcp_v6_rcv (*) 0.33% [kernel] [k] __put_page (*) 0.29% [kernel] [k] __mod_zone_page_state 0.27% [kernel] [k] _raw_spin_lock After patch: Average throughput: 73076 Mbit. Kernel profiles on cpu running user thread recvmsg() looks better: 81.35% [kernel] [k] copy_user_enhanced_fast_string 1.95% [kernel] [k] _copy_to_iter 1.95% [kernel] [k] __skb_datagram_iter 1.27% [kernel] [k] __netif_receive_skb_core 1.03% [kernel] [k] ip6t_do_table 0.60% [kernel] [k] sock_rfree 0.50% [kernel] [k] tcp_v6_rcv 0.47% [kernel] [k] ip6_rcv_core 0.45% [kernel] [k] read_tsc 0.44% [kernel] [k] _raw_spin_lock_irqsave 0.37% [kernel] [k] _raw_spin_lock 0.37% [kernel] [k] native_irq_return_iret 0.33% [kernel] [k] __inet6_lookup_established 0.31% [kernel] [k] ip6_protocol_deliver_rcu 0.29% [kernel] [k] tcp_rcv_established 0.29% [kernel] [k] llist_reverse_order v2: kdoc issue (kernel bots) do not defer if (alloc_cpu == smp_processor_id()) (Paolo) replace the sk_buff_head with a single-linked list (Jakub) add a READ_ONCE()/WRITE_ONCE() for the lockless read of sd->defer_list Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220422201237.416238-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-27Merge tag 'pinctrl-v5.18-2' of ↵Linus Torvalds10-69/+128
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Fix some register offsets on Intel Alderlake - Fix the order the UFS and SDC pins on Qualcomm SM6350 - Fix a build error in Mediatek Moore. - Fix a pin function table in the Sunplus SP7021. - Fix some Kconfig and static keywords on the Samsung Tesla FSD SoC. - Fix up the EOI function for edge triggered IRQs and keep the block clock enabled for level IRQs in the STM32 driver. - Fix some bits and order in the Rockchip RK3308 driver. - Handle the errorpath in the Pistachio driver probe() properly. * tag 'pinctrl-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: pistachio: fix use of irq_of_parse_and_map() pinctrl: stm32: Keep pinctrl block clock enabled when LEVEL IRQ requested pinctrl: rockchip: sort the rk3308_mux_recalced_data entries pinctrl: rockchip: fix RK3308 pinmux bits pinctrl: stm32: Do not call stm32_gpio_get() for edge triggered IRQs in EOI pinctrl: Fix an error in pin-function table of SP7021 pinctrl: samsung: fix missing GPIOLIB on ARM64 Exynos config pinctrl: mediatek: moore: Fix build error pinctrl: qcom: sm6350: fix order of UFS & SDC pins pinctrl: alderlake: Fix register offsets for ADL-N variant pinctrl: samsung: staticize fsd_pin_ctrl
2022-04-27Merge branch 'Teach libbpf to "fix up" BPF verifier log'Alexei Starovoitov11-97/+464
Andrii Nakryiko says: ==================== This patch set teaches libbpf to enhance BPF verifier log with human-readable and relevant information about failed CO-RE relocation. Patch #9 is the main one with the new logic. See relevant commit messages for some more details. All the other patches are either fixing various bugs detected while working on this feature, most prominently a bug with libbpf not handling CO-RE relocations for SEC("?...") programs, or are refactoring libbpf internals to allow for easier reuse of CO-RE relo lookup and formatting logic. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-04-27selftests/bpf: Add libbpf's log fixup logic selftestsAndrii Nakryiko3-0/+163
Add tests validating that libbpf is indeed patching up BPF verifier log with CO-RE relocation details. Also test partial and full truncation scenarios. This test might be a bit fragile due to changing BPF verifier log format. If that proves to be frequently breaking, we can simplify tests or remove the truncation subtests. But for now it seems useful to test it in those conditions that are otherwise rarely occuring in practice. Also test CO-RE relo failure in a subprog as that excercises subprogram CO-RE relocation mapping logic which doesn't work out of the box without extra relo storage previously done only for gen_loader case. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-11-andrii@kernel.org
2022-04-27libbpf: Fix up verifier log for unguarded failed CO-RE relosAndrii Nakryiko3-4/+154
Teach libbpf to post-process BPF verifier log on BPF program load failure and detect known error patterns to provide user with more context. Currently there is one such common situation: an "unguarded" failed BPF CO-RE relocation. While failing CO-RE relocation is expected, it is expected to be property guarded in BPF code such that BPF verifier always eliminates BPF instructions corresponding to such failed CO-RE relos as dead code. In cases when user failed to take such precautions, BPF verifier provides the best log it can: 123: (85) call unknown#195896080 invalid func unknown#195896080 Such incomprehensible log error is due to libbpf "poisoning" BPF instruction that corresponds to failed CO-RE relocation by replacing it with invalid `call 0xbad2310` instruction (195896080 == 0xbad2310 reads "bad relo" if you squint hard enough). Luckily, libbpf has all the necessary information to look up CO-RE relocation that failed and provide more human-readable description of what's going on: 5: <invalid CO-RE relocation> failed to resolve CO-RE relocation <byte_off> [6] struct task_struct___bad.fake_field_subprog (0:2 @ offset 8) This hopefully makes it much easier to understand what's wrong with user's BPF program without googling magic constants. This BPF verifier log fixup is setup to be extensible and is going to be used for at least one other upcoming feature of libbpf in follow up patches. Libbpf is parsing lines of BPF verifier log starting from the very end. Currently it processes up to 10 lines of code looking for familiar patterns. This avoids wasting lots of CPU processing huge verifier logs (especially for log_level=2 verbosity level). Actual verification error should normally be found in last few lines, so this should work reliably. If libbpf needs to expand log beyond available log_buf_size, it truncates the end of the verifier log. Given verifier log normally ends with something like: processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 ... truncating this on program load error isn't too bad (end user can always increase log size, if it needs to get complete log). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-10-andrii@kernel.org
2022-04-27libbpf: Simplify bpf_core_parse_spec() signatureAndrii Nakryiko1-19/+15
Simplify bpf_core_parse_spec() signature to take struct bpf_core_relo as an input instead of requiring callers to decompose them into type_id, relo, spec_str, etc. This makes using and reusing this helper easier. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-9-andrii@kernel.org
2022-04-27libbpf: Refactor CO-RE relo human description formatting routineAndrii Nakryiko1-26/+38
Refactor how CO-RE relocation is formatted. Now it dumps human-readable representation, currently used by libbpf in either debug or error message output during CO-RE relocation resolution process, into provided buffer. This approach allows for better reuse of this functionality outside of CO-RE relocation resolution, which we'll use in next patch for providing better error message for BPF verifier rejecting BPF program due to unguarded failed CO-RE relocation. It also gets rid of annoying "stitching" of libbpf_print() calls, which was the only place where we did this. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-8-andrii@kernel.org
2022-04-27libbpf: Record subprog-resolved CO-RE relocations unconditionallyAndrii Nakryiko1-15/+12
Previously, libbpf recorded CO-RE relocations with insns_idx resolved according to finalized subprog locations (which are appended at the end of entry BPF program) to simplify the job of light skeleton generator. This is necessary because once subprogs' instructions are appended to main entry BPF program all the subprog instruction indices are shifted and that shift is different for each entry (main) BPF program, so it's generally impossible to map final absolute insn_idx of the finalized BPF program to their original locations inside subprograms. This information is now going to be used not only during light skeleton generation, but also to map absolute instruction index to subprog's instruction and its corresponding CO-RE relocation. So start recording these relocations always, not just when obj->gen_loader is set. This information is going to be freed at the end of bpf_object__load() step, as before (but this can change in the future if there will be a need for this information post load step). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-7-andrii@kernel.org
2022-04-27selftests/bpf: Add CO-RE relos and SEC("?...") to linked_funcs selftestsAndrii Nakryiko3-2/+18
Enhance linked_funcs selftest with two tricky features that might not obviously work correctly together. We add CO-RE relocations to entry BPF programs and mark those programs as non-autoloadable with SEC("?...") annotation. This makes sure that libbpf itself handles .BTF.ext CO-RE relocation data matching correctly for SEC("?...") programs, as well as ensures that BPF static linker handles this correctly (this was the case before, no changes are necessary, but it wasn't explicitly tested). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-6-andrii@kernel.org
2022-04-27libbpf: Avoid joining .BTF.ext data with BPF programs by section nameAndrii Nakryiko3-29/+65
Instead of using ELF section names as a joining key between .BTF.ext and corresponding BPF programs, pre-build .BTF.ext section number to ELF section index mapping during bpf_object__open() and use it later for matching .BTF.ext information (func/line info or CO-RE relocations) to their respective BPF programs and subprograms. This simplifies corresponding joining logic and let's libbpf do manipulations with BPF program's ELF sections like dropping leading '?' character for non-autoloaded programs. Original joining logic in bpf_object__relocate_core() (see relevant comment that's now removed) was never elegant, so it's a good improvement regardless. But it also avoids unnecessary internal assumptions about preserving original ELF section name as BPF program's section name (which was broken when SEC("?abc") support was added). Fixes: a3820c481112 ("libbpf: Support opting out from autoloading BPF programs declaratively") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-5-andrii@kernel.org
2022-04-27libbpf: Fix logic for finding matching program for CO-RE relocationAndrii Nakryiko1-2/+3
Fix the bug in bpf_object__relocate_core() which can lead to finding invalid matching BPF program when processing CO-RE relocation. IF matching program is not found, last encountered program will be assumed to be correct program and thus error detection won't detect the problem. Fixes: 9c82a63cf370 ("libbpf: Fix CO-RE relocs against .text section") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-4-andrii@kernel.org
2022-04-27libbpf: Drop unhelpful "program too large" guessAndrii Nakryiko1-4/+0
libbpf pretends it knows actual limit of BPF program instructions based on UAPI headers it compiled with. There is neither any guarantee that UAPI headers match host kernel, nor BPF verifier actually uses BPF_MAXINSNS constant anymore. Just drop unhelpful "guess", BPF verifier will emit actual reason for failure in its logs anyways. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-3-andrii@kernel.org
2022-04-27libbpf: Fix anonymous type check in CO-RE logicAndrii Nakryiko1-1/+1
Use type name for checking whether CO-RE relocation is referring to anonymous type. Using spec string makes no sense. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220426004511.2691730-2-andrii@kernel.org
2022-04-26bpf: Compute map_btf_id during build timeMenglong Dong19-129/+62
For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map types are computed during vmlinux-btf init: btf_parse_vmlinux() -> btf_vmlinux_map_ids_init() It will lookup the btf_type according to the 'map_btf_name' field in 'struct bpf_map_ops'. This process can be done during build time, thanks to Jiri's resolve_btfids. selftest of map_ptr has passed: $96 map_ptr:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>