Age | Commit message (Collapse) | Author | Files | Lines |
|
There is no need in one-liners wrappers to call internal functions.
Let's remove them.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The XFRM code performs fallback to software IPsec if .xdo_dev_state_add()
returns -EOPNOTSUPP. This is what mlx5 did very deep in its stack trace,
despite have all the knowledge that IPsec is not going to work in very
early stage.
This is achieved by making sure that priv->ipsec pointer is valid for
fully working and supported hardware crypto IPsec engine.
In case, the hardware IPsec is not supported, the XFRM code will set NULL
to xso->dev and it will prevent from calls to various .xdo_dev_state_*()
callbacks.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Ensure that flow steering is usable as early as possible, to understand
if crypto IPsec is supported or not.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Remove multiple function wrappers to make sure that IPsec FS initialization
and cleanup functions present in one place to help with code readability.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Michael Chan says:
====================
bnxt_en: Bug fixes
This patch series includes 3 fixes:
- Fix an occasional VF open failure.
- Fix a PTP spinlock usage before initialization
- Fix unnecesary RX packet drops under high TX traffic load.
====================
Link: https://lore.kernel.org/r/1651540392-2260-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In bnxt_poll_p5(), we first check cpr->has_more_work. If it is true,
we are in NAPI polling mode and we will call __bnxt_poll_cqs() to
continue polling. It is possible to exhanust the budget again when
__bnxt_poll_cqs() returns.
We then enter the main while loop to check for new entries in the NQ.
If we had previously exhausted the NAPI budget, we may call
__bnxt_poll_work() to process an RX entry with zero budget. This will
cause packets to be dropped unnecessarily, thinking that we are in the
netpoll path. Fix it by breaking out of the while loop if we need
to process an RX NQ entry with no budget left. We will then exit
NAPI and stay in polling mode.
Fixes: 389a877a3b20 ("bnxt_en: Process the NQ under NAPI continuous polling.")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bnxt_ptp_init() calls bnxt_ptp_init_rtc() which will acquire the ptp_lock
spinlock. The spinlock is not initialized until later. Move the
bnxt_ptp_init_rtc() call after the spinlock is initialized.
Fixes: 24ac1ecd5240 ("bnxt_en: Add driver support to use Real Time Counter for PTP")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bnxt_open() can fail in this code path, especially on a VF when
it fails to reserve default rings:
bnxt_open()
__bnxt_open_nic()
bnxt_clear_int_mode()
bnxt_init_dflt_ring_mode()
RX rings would be set to 0 when we hit this error path.
It is possible for a subsequent bnxt_open() call to potentially succeed
with a code path like this:
bnxt_open()
bnxt_hwrm_if_change()
bnxt_fw_init_one()
bnxt_fw_init_one_p3()
bnxt_set_dflt_rfs()
bnxt_rfs_capable()
bnxt_hwrm_reserve_rings()
On older chips, RFS is capable if we can reserve the number of vnics that
is equal to RX rings + 1. But since RX rings is still set to 0 in this
code path, we may mistakenly think that RFS is supported for 0 RX rings.
Later, when the default RX rings are reserved and we try to enable
RFS, it would fail and cause bnxt_open() to fail unnecessarily.
We fix this in 2 places. bnxt_rfs_capable() will always return false if
RX rings is not yet set. bnxt_init_dflt_ring_mode() will call
bnxt_set_dflt_rfs() which will always clear the RFS flags if RFS is not
supported.
Fixes: 20d7d1c5c9b1 ("bnxt_en: reliably allocate IRQ table on reset to avoid crash")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v5.19
First set of patches for v5.19 and this is a big one. We have two new
drivers, a change in mac80211 STA API affecting most drivers and
ath11k getting support for WCN6750. And as usual lots of fixes and
cleanups all over.
Major changes:
new drivers
- wfx: silicon labs devices
- plfxlc: pureLiFi X, XL, XC devices
mac80211
- host based BSS color collision detection
- prepare sta handling for IEEE 802.11be Multi-Link Operation (MLO) support
rtw88
- support TP-Link T2E devices
rtw89
- support firmware crash simulation
- preparation for 8852ce hardware support
ath11k
- Wake-on-WLAN support for QCA6390 and WCN6855
- device recovery (firmware restart) support for QCA6390 and WCN6855
- support setting Specific Absorption Rate (SAR) for WCN6855
- read country code from SMBIOS for WCN6855/QCA6390
- support for WCN6750
wcn36xx
- support for transmit rate reporting to user space
* tag 'wireless-next-2022-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (228 commits)
rtw89: 8852c: rfk: add DPK
rtw89: 8852c: rfk: add IQK
rtw89: 8852c: rfk: add RX DCK
rtw89: 8852c: rfk: add RCK
rtw89: 8852c: rfk: add TSSI
rtw89: 8852c: rfk: add LCK
rtw89: 8852c: rfk: add DACK
rtw89: 8852c: rfk: add RFK tables
plfxlc: fix le16_to_cpu warning for beacon_interval
rtw88: remove a copy of the NAPI_POLL_WEIGHT define
carl9170: tx: fix an incorrect use of list iterator
wil6210: use NAPI_POLL_WEIGHT for napi budget
ath10k: remove a copy of the NAPI_POLL_WEIGHT define
ath11k: Add support for WCN6750 device
ath11k: Datapath changes to support WCN6750
ath11k: HAL changes to support WCN6750
ath11k: Add QMI changes for WCN6750
ath11k: Fetch device information via QMI for WCN6750
ath11k: Add register access logic for WCN6750
ath11k: Add HW params for WCN6750
...
====================
Link: https://lore.kernel.org/r/20220503153622.C1671C385A4@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Most drivers should not have to worry about selecting the right
weight for their NAPI instances and pass NAPI_POLL_WEIGHT.
It'd be best if we didn't require the argument at all and selected
the default internally.
This change prepares the ground for such reshuffling, allowing
for a smooth transition. The following API should remain after
the next release cycle:
netif_napi_add()
netif_napi_add_weight()
netif_napi_add_tx()
netif_napi_add_tx_weight()
Where the _weight() variants take an explicit weight argument.
I opted for a _weight() suffix rather than a __ prefix, because
we use __ in places to mean that caller needs to also issue a
synchronize_net() call.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220502232703.396351-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Test basic (port-default, VLAN PCP and IP DSCP) QoS classification for
Ocelot switches. Advanced QoS classification using tc filters is covered
by tc_flower_chains.sh in the same directory.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220502155424.4098917-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The AlphaProject AP-SH4A-3A/AP-SH4AD-0A SH boards use IRQ0 for their SMSC
LAN911x Ethernet chip, so the networking on them must have been broken by
commit 965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure")
which filtered out 0 as well as the negative error codes -- it was kinda
correct at the time, as platform_get_irq() could return 0 on of_irq_get()
failure and on the actual 0 in an IRQ resource. This issue was fixed by
me (back in 2016!), so we should be able to fix this driver to allow IRQ0
usage again...
When merging this to the stable kernels, make sure you also merge commit
e330b9a6bb35 ("platform: don't return 0 from platform_get_irq[_byname]()
on error") -- that's my fix to platform_get_irq() for the DT platforms...
Fixes: 965b2aa78fbc ("net/smsc911x: fix irq resource allocation failure")
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/656036e4-6387-38df-b8a7-6ba683b16e63@omp.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mat Martineau says:
====================
mptcp: Userspace path manager prerequisites
This series builds upon the path manager mode selection changes merged
in 4994d4fa99ba ("Merge branch 'mptcp-path-manager-mode-selection'") to
further modify the path manager code in preparation for adding the new
netlink commands to announce/remove advertised addresses and
create/destroy subflows of an MPTCP connection. The third and final
patch series for the userspace path manager will implement those
commands as discussed in
https://lore.kernel.org/netdev/23ff3b49-2563-1874-fa35-3af55d3088e7@linux.intel.com/#r
Patches 1, 5, and 7 remove some internal constraints on path managers
(in general) without changing in-kernel PM behavior.
Patch 2 adds a self test to validate MPTCP address advertisement ack
behavior.
Patches 3, 4, and 6 add new attributes to existing MPTCP netlink events
and track internal state for populating those attributes.
====================
Link: https://lore.kernel.org/r/20220502205237.129297-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change allows userspace PM implementations to reissue ADD_ADDR
announcements (if necessary) based on their chosen policy.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change records the 'server_side' attribute of MPTCP_EVENT_CREATED
and MPTCP_EVENT_ESTABLISHED events to inform their recipient about the
Client/Server role of the running MPTCP application.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/246
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change updates internal logic to permit subflows to be
established from either the client or server ends of MPTCP
connections. This symmetry and added flexibility may be
harnessed by PM implementations running on either end in
creating new subflows.
The essence of this change lies in not relying on the
"server_side" flag (which continues to be available if needed).
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Per RFC 8684, if no port is specified in an ADD_ADDR message, MPTCP
SHOULD attempt to connect to the specified address on the same port
as the port that is already in use by the subflow on which the
ADD_ADDR signal was sent.
To facilitate that, this change reflects the specific remote port in
use by that subflow in MPTCP_EVENT_ANNOUNCED events.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change reads the addr id assigned to the remote endpoint
of a subflow from the MP_JOIN SYN/ACK message and stores it
in the related subflow context. The remote id was not being
captured prior to this change, and will now provide a consistent
view of remote endpoints and their ids as seen through netlink
events.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Check userspace PM behavior to ensure ADD_ADDR echoes are only sent when
there is an active userspace daemon. If the daemon is restarting or
hasn't loaded yet, the missing echo will cause the peer to retransmit
the ADD_ADDR - and hopefully the daemon will be ready to receive it at
that later time.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current limits on the # of addresses/subflows must apply only to
in-kernel PM managed sockets. Thus this change removes such
restrictions on connections overseen by non-kernel (e.g. userspace)
PMs. This change also ensures that the kernel does not record stats
inside struct mptcp_pm_data updated along kernel code paths when exercised
via non-kernel PMs.
Additionally, address announcements are acknolwedged and subflow
requests are honored only when it's deemed that a userspace path
manager is active at the time.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kishen Maloor <kishen.maloor@intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As noted elsewhere, various GPON SFP modules exhibit non-standard
TX-fault behaviour. In the tested case, the Huawei MA5671A, when used
in combination with a Marvell mv88e6085 switch, was found to
persistently assert TX-fault, resulting in the module being disabled.
This patch adds a quirk to ignore the SFP_F_TX_FAULT state, allowing the
module to function.
Change from v1: removal of erroneous return statment (Andrew Lunn)
Signed-off-by: Matthew Hagan <mnhagan88@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220502223315.1973376-1-mnhagan88@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp selftest fix from Kees Cook:
- Avoid using stdin for read syscall testing (Jann Horn)
* tag 'seccomp-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: Don't call read() on TTY from background pgrp
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Work around a hardware problem in the delta-ahe50dc-fan driver
- Explicitly disable PEC in PMBus core if not enabled
- Fix negative temperature values in f71882fg driver
- Fix warning on removal of adt7470 driver
- Fix CROSSHAIR VI HERO name in asus_wmi_sensors driver
- Fix build warning seen in xdpe12284 driver if
CONFIG_SENSORS_XDPE122_REGULATOR is disabled
- Fix type of 'ti,n-factor' in ti,tmp421 driver bindings
* tag 'hwmon-for-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus) delta-ahe50dc-fan: work around hardware quirk
hwmon: (pmbus) disable PEC if not enabled
hwmon: (f71882fg) Fix negative temperature
dt-bindings: hwmon: ti,tmp421: Fix type for 'ti,n-factor'
hwmon: (adt7470) Fix warning on module removal
hwmon: (asus_wmi_sensors) Fix CROSSHAIR VI HERO name
hwmon: (xdpe12284) Fix build warning seen if CONFIG_SENSORS_XDPE122_REGULATOR is disabled
|
|
A reference to the framebuffer device struct fb_info is stored in the file
private data, but this reference could no longer be valid and must not be
accessed directly. Instead, the file_fb_info() accessor function must be
used since it does sanity checking to make sure that the fb_info is valid.
This can happen for example if the registered framebuffer device is for a
driver that just uses a framebuffer provided by the system firmware. In
that case, the fbdev core would unregister the framebuffer device when a
real video driver is probed and ask to remove conflicting framebuffers.
The bug has been present for a long time but commit 27599aacbaef ("fbdev:
Hot-unplug firmware fb devices on forced removal") unmasked it since the
fbdev core started unregistering the framebuffers' devices associated.
Fixes: 27599aacbaef ("fbdev: Hot-unplug firmware fb devices on forced removal")
Reported-by: Maxime Ripard <maxime@cerno.tech>
Reported-by: Junxiao Chang <junxiao.chang@intel.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220502135014.377945-1-javierm@redhat.com
|
|
there are cases that trigger a 2nd shadow event for the same
vmaddr/raddr combination. (prefix changes, reboots, some known races)
This will increase memory usages and it will result in long latencies
when cleaning up, e.g. on shutdown. To avoid cases with a list that has
hundreds of identical raddrs we check existing entries at insert time.
As this measurably reduces the list length this will be faster than
traversing the list at shutdown time.
In the long run several places will be optimized to create less entries
and a shrinker might be necessary.
Fixes: 4be130a08420 ("s390/mm: add shadow gmap support")
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20220429151526.1560-1-borntraeger@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
|
|
On some x86 processors, CPUID leaf 0xA provides information
on Architectural Performance Monitoring features. It
advertises a PMU version which Qemu uses to determine the
availability of additional MSRs to manage the PMCs.
Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for
the same, the kernel constructs return values based on the
x86_pmu_capability irrespective of the vendor.
This leaf and the additional MSRs are not supported on AMD
and Hygon processors. If AMD PerfMonV2 is detected, the PMU
version is set to 2 and guest startup breaks because of an
attempt to access a non-existent MSR. Return zeros to avoid
this.
Fixes: a6c06ed1a60a ("KVM: Expose the architectural performance monitoring CPUID leaf")
Reported-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Message-Id: <3fef83d9c2b2f7516e8ff50d60851f29a4bcb716.1651058600.git.sandipan.das@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Zen renumbered some of the performance counters that correspond to the
well known events in perf_hw_id. This code in KVM was never updated for
that, so guest that attempt to use counters on Zen that correspond to the
pre-Zen perf_hw_id values will silently receive the wrong values.
This has been observed in the wild with rr[0] when running in Zen 3
guests. rr uses the retired conditional branch counter 00d1 which is
incorrectly recognized by KVM as PERF_COUNT_HW_STALLED_CYCLES_BACKEND.
[0] https://rr-project.org/
Signed-off-by: Kyle Huey <me@kylehuey.com>
Message-Id: <20220503050136.86298-1-khuey@kylehuey.com>
Cc: stable@vger.kernel.org
[Check guest family, not host. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
We are dropping A/D bits (and W bits) in the TDP MMU. Even if mmu_lock
is held for write, as volatile SPTEs can be written by other tasks/vCPUs
outside of mmu_lock.
Attempting to prove that bug exposed another notable goof, which has been
lurking for a decade, give or take: KVM treats _all_ MMU-writable SPTEs
as volatile, even though KVM never clears WRITABLE outside of MMU lock.
As a result, the legacy MMU (and the TDP MMU if not fixed) uses XCHG to
update writable SPTEs.
The fix does not seem to have an easily-measurable affect on performance;
page faults are so slow that wasting even a few hundred cycles is dwarfed
by the base cost.
|
|
syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
for TCP socket used by RDS is accessing sock_net() without acquiring a
refcount on net namespace. Since TCP's retransmission can happen after
a process which created net namespace terminated, we need to explicitly
acquire a refcount.
Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
Link: https://lore.kernel.org/r/a5fb1fc4-2284-3359-f6a0-e4e390239d7b@I-love.SAKURA.ne.jp
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use an atomic XCHG to write TDP MMU SPTEs that have volatile bits, even
if mmu_lock is held for write, as volatile SPTEs can be written by other
tasks/vCPUs outside of mmu_lock. If a vCPU uses the to-be-modified SPTE
to write a page, the CPU can cache the translation as WRITABLE in the TLB
despite it being seen by KVM as !WRITABLE, and/or KVM can clobber the
Accessed/Dirty bits and not properly tag the backing page.
Exempt non-leaf SPTEs from atomic updates as KVM itself doesn't modify
non-leaf SPTEs without holding mmu_lock, they do not have Dirty bits, and
KVM doesn't consume the Accessed bit of non-leaf SPTEs.
Dropping the Dirty and/or Writable bits is most problematic for dirty
logging, as doing so can result in a missed TLB flush and eventually a
missed dirty page. In the unlikely event that the only dirty page(s) is
a clobbered SPTE, clear_dirty_gfn_range() will see the SPTE as not dirty
(based on the Dirty or Writable bit depending on the method) and so not
update the SPTE and ultimately not flush. If the SPTE is cached in the
TLB as writable before it is clobbered, the guest can continue writing
the associated page without ever taking a write-protect fault.
For most (all?) file back memory, dropping the Dirty bit is a non-issue.
The primary MMU write-protects its PTEs on writeback, i.e. KVM's dirty
bit is effectively ignored because the primary MMU will mark that page
dirty when the write-protection is lifted, e.g. when KVM faults the page
back in for write.
The Accessed bit is a complete non-issue. Aside from being unused for
non-leaf SPTEs, KVM doesn't do a TLB flush when aging SPTEs, i.e. the
Accessed bit may be dropped anyways.
Lastly, the Writable bit is also problematic as an extension of the Dirty
bit, as KVM (correctly) treats the Dirty bit as volatile iff the SPTE is
!DIRTY && WRITABLE. If KVM fixes an MMU-writable, but !WRITABLE, SPTE
out of mmu_lock, then it can allow the CPU to set the Dirty bit despite
the SPTE being !WRITABLE when it is checked by KVM. But that all depends
on the Dirty bit being problematic in the first place.
Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the is_shadow_present_pte() check out of spte_has_volatile_bits()
and into its callers. Well, caller, since only one of its two callers
doesn't already do the shadow-present check.
Opportunistically move the helper to spte.c/h so that it can be used by
the TDP MMU, which is also the primary motivation for the shadow-present
change. Unlike the legacy MMU, the TDP MMU uses a single path for clear
leaf and non-leaf SPTEs, and to avoid unnecessary atomic updates, the TDP
MMU will need to check is_last_spte() prior to calling
spte_has_volatile_bits(), and calling is_last_spte() without first
calling is_shadow_present_spte() is at best odd, and at worst a violation
of KVM's loosely defines SPTE rules.
Note, mmu_spte_clear_track_bits() could likely skip the write entirely
for SPTEs that are not shadow-present. Leave that cleanup for a future
patch to avoid introducing a functional change, and because the
shadow-present check can likely be moved further up the stack, e.g.
drop_large_spte() appears to be the only path that doesn't already
explicitly check for a shadow-present SPTE.
No functional change intended.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Don't treat SPTEs that are truly writable, i.e. writable in hardware, as
being volatile (unless they're volatile for other reasons, e.g. A/D bits).
KVM _sets_ the WRITABLE bit out of mmu_lock, but never _clears_ the bit
out of mmu_lock, so if the WRITABLE bit is set, it cannot magically get
cleared just because the SPTE is MMU-writable.
Rename the wrapper of MMU-writable to be more literal, the previous name
of spte_can_locklessly_be_made_writable() is wrong and misleading.
Fixes: c7ba5b48cc8d ("KVM: MMU: fast path of handling guest page fault")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220423034752.1161007-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The program uses CLOCK_TAI as default clock since it was added to the
Linux repo. In commit:
| 040806343bb4 ("selftests/net: so_txtime multi-host support")
a help text stating the wrong default clock was added.
This patch fixes the help text.
Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support")
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20220502094638.1921702-3-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch fixes the parsing of the cmd line supplied start time on 32
bit systems. A "long" on 32 bit systems is only 32 bit wide and cannot
hold a timestamp in nano second resolution.
Fixes: 040806343bb4 ("selftests/net: so_txtime multi-host support")
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20220502094638.1921702-2-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-05-02
1) Trivial Misc updates to mlx5 driver
2) From Mark Bloch: Flow steering, general steering refactoring/cleaning
An issue with flow steering deletion flow (when creating a rule without
dests) turned out to be easy to fix but during the fix some issue
with the flow steering creation/deletion flows have been found.
The following patch series tries to fix long standing issues with flow
steering code and hopefully preventing silly future bugs.
A) Fix an issue where a proper dest type wasn't assigned.
B) Refactor and fix dests enums values, refactor deletion
function and do proper bookkeeping of dests.
C) Change mlx5_del_flow_rules() to delete rules when there are no
no more rules attached associated with an FTE.
D) Don't call hard coded deletion function but use the node's
defined one.
E) Add a WARN_ON() to catch future bugs when an FTE with dests
is deleted.
* tag 'mlx5-updates-2022-05-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: fs, an FTE should have no dests when deleted
net/mlx5: fs, call the deletion function of the node
net/mlx5: fs, delete the FTE when there are no rules attached to it
net/mlx5: fs, do proper bookkeeping for forward destinations
net/mlx5: fs, add unused destination type
net/mlx5: fs, jump to exit point and don't fall through
net/mlx5: fs, refactor software deletion rule
net/mlx5: fs, split software and IFC flow destination definitions
net/mlx5e: TC, set proper dest type
net/mlx5e: Remove unused mlx5e_dcbnl_build_rep_netdev function
net/mlx5e: Drop error CQE handling from the XSK RX handler
net/mlx5: Print initializing field in case of timeout
net/mlx5: Delete redundant default assignment of runtime devlink params
net/mlx5: Remove useless kfree
net/mlx5: use kvfree() for kvzalloc() in mlx5_ct_fs_smfs_matcher_create
====================
Link: https://lore.kernel.org/r/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Ido Schimmel says:
====================
mlxsw: Remove size limitations on egress descriptor buffer
Petr says:
Spectrum machines have two resources related to keeping packets in an
internal buffer: bytes (allocated in cell-sized units) for packet payload,
and descriptors, for keeping headers. Currently, mlxsw only configures the
bytes part of the resource management.
Spectrum switches permit a full parallel configuration for the descriptor
resources, including port-pool and port-TC-pool quotas. By default, these
are all configured to use pool 14, with an infinite quota. The ingress pool
14 is then infinite in size.
However, egress pool 14 has finite size by default. The size is chip
dependent, but always much lower than what the chip actually permits. As a
result, we can easily construct workloads that exhaust the configured
descriptor limit.
Going forward, mlxsw will have to fix this issue properly by maintaining
descriptor buffer sizes, TC bindings, and quotas that match the
architecture recommendation. Short term, fix the issue by configuring the
egress descriptor pool to be infinite in size as well. This will maintain
the same configuration philosophy, but will unlock all chip resources to be
usable.
In this patchset, patch #1 first adds the "desc" field into the pool
configuration register. Then in patch #2, the new field is used to
configure both ingress and egress pool 14 as infinite.
In patches #3 and #4, add a selftest that verifies that a large burst
can be absorbed by the shared buffer. This test specifically exercises a
scenario where descriptor buffer is the limiting factor and the test
fails without the above patches.
====================
Link: https://lore.kernel.org/r/20220502084926.365268-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add a test that sends 1Gbps of traffic through the switch, into which it
then injects a burst of traffic and tests that there are no drops.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add two helpers, start_traffic_pktsize() and start_tcp_traffic_pktsize(),
that allow explicit overriding of packet size. Change start_traffic() and
start_tcp_traffic() to dispatch through these helpers with the default
packet size.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Spectrum machines have two resources related to keeping packets in an
internal buffer: bytes (allocated in cell-sized units) for packet payload,
and descriptors, for keeping metadata. Currently, mlxsw only configures the
bytes part of the resource management.
Spectrum switches permit a full parallel configuration for the descriptor
resources, including port-pool and port-TC-pool quotas. By default, these
are all configured to use pool 14, with an infinite quota. The ingress pool
14 is then infinite in size.
However, egress pool 14 has finite size by default. The size is chip
dependent, but always much lower than what the chip actually permits. As a
result, we can easily construct workloads that exhaust the configured
descriptor limit.
Fix the issue by configuring the egress descriptor pool to be infinite in
size as well. This will maintain the configuration philosophy of the
default configuration, but will unlock all chip resources to be usable.
In the code, include both the configuration of ingress and ingress, mostly
for clarity.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
SBPR, or Shared Buffer Pools Register, configures and retrieves the shared
buffer pools and configuration. The desc field determines whether the
configuration relates to the byte pool or the descriptor pool.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
operational
In emulated environments, the bridge ports enslaved to br1 get a carrier
before changing br1's PVID. This means that by the time the PVID is
changed, br1 is already operational and configured with an IPv6
link-local address.
When the test is run with netdevs registered by mlxsw, changing the PVID
is vetoed, as changing the VID associated with an existing L3 interface
is forbidden. This restriction is similar to the 8021q driver's
restriction of changing the VID of an existing interface.
Fix this by taking br1 down and bringing it back up when it is fully
configured.
With this fix, the test reliably passes on top of both the SW and HW
data paths (emulated or not).
Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20220502084507.364774-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Radhey Shyam Pandey says:
====================
emaclite: improve error handling and minor cleanup
This patchset does error handling for of_address_to_resource() and also
removes "Don't advertise 1000BASE-T" and auto negotiation.
Changes for v3:
- Resolve git apply conflicts for 2/2 patch.
Changes for v2:
- Added Andrew's reviewed by tag in 1/2 patch.
- Move ret to down to align with reverse xmas tree style in 2/2 patch.
- Also add fixes tag in 2/2 patch.
- Specify tree name in subject prefix.
====================
Link: https://lore.kernel.org/r/1651476470-23904-1-git-send-email-radhey.shyam.pandey@xilinx.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
check the return value of of_address_to_resource() and also add
missing of_node_put() for np and npp nodes.
Fixes: e0a3bc65448c ("net: emaclite: Support multiple phys connected to one MDIO bus")
Addresses-Coverity: Event check_return value.
Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In xemaclite_open() function we are setting the max speed of
emaclite to 100Mb using phy_set_max_speed() function so,
there is no need to write the advertising registers to stop
giga-bit speed and the phy_start() function starts the
auto-negotiation so, there is no need to handle it separately
using advertising registers. Remove the phy_read and phy_write
of advertising registers in xemaclite_open() function.
Signed-off-by: Shravya Kumbham <shravya.kumbham@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Tonghao Zhang says:
====================
use standard sysctl macro
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
This patchset introduce sysctl macro or replace var
with macro.
====================
Link: https://lore.kernel.org/r/20220501035524.91205-1-xiangxia.m.yue@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch introdues the SYSCTL_THREE.
KUnit:
[00:10:14] ================ sysctl_test (10 subtests) =================
[00:10:14] [PASSED] sysctl_test_api_dointvec_null_tbl_data
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_maxlen_unset
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_len_is_zero
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_read_but_position_set
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_negative
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_negative
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_less_int_min
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_greater_int_max
[00:10:14] =================== [PASSED] sysctl_test ===================
./run_kselftest.sh -c sysctl
...
ok 1 selftests: sysctl: sysctl.sh
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch replace two, four and long_one to SYSCTL_XXX.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ath.git patches for v5.19. Major changes:
ath11k
* support setting Specific Absorption Rate (SAR) for WCN6855
* read country code from SMBIOS for WCN6855/QCA6390
* support for WCN6750
|