Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.open-mesh.org/linux-merge
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- remove unnecessary type castings, by Yu Zhe
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ido Schimmel says:
====================
mlxsw: A dedicated notifier block for router code
Petr says:
Currently all netdevice events are handled in the centralized notifier
handler maintained by spectrum.c. Since a number of events are involving
router code, spectrum.c needs to dispatch them to spectrum_router.c. The
spectrum module therefore needs to know more about the router code than it
should have, and there is are several API points through which the two
modules communicate.
In this patchset, move bulk of the router-related event handling to the
router code. Some of the knowledge has to stay: spectrum.c cannot veto
events that the router supports, and vice versa. But beyond that, the two
can ignore each other's details, which leads to more focused and simpler
code.
As a side effect, this fixes L3 HW stats support on tunnel netdevices.
The patch set progresses as follows:
- In patch #1, change spectrum code to not bounce L3 enslavement, which the
router code supports.
- In patch #2, add a new do-nothing notifier block to the router code.
- In patches #3-#6, move router-specific event handling to the router
module. In patch #7, clean up a comment.
- In patch #8, use the advantage that all router event handling is in the
router code and clean up taking router lock.
- mlxsw supports L3 HW stats on tunnels as of this patchset. Patches #9 and
#10 therefore add a selftest for L3 HW stats support on tunnels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a selftest that uses an IPIP topology and tests that L3 HW stats
reflect the traffic in the tunnel.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function get_l3_stats() from the test hw_stats_l3.sh will be useful for
any test that wishes to work with L3 stats. Furthermore, it is easy to
generalize to other HW stats suites (for when such are added). Therefore,
move the code to lib.sh, rewrite it to have the same interface as the other
stats-collecting functions, and generalize to take the name of the HW stats
suite to collect as an argument.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For notifications that the router needs to handle, router lock is taken.
Further, at least to determine whether an event is related to a tunnel
underlay, router lock also needs to be taken. Due to this, the router lock
is always taken for each unhandled event, and also for some handled events,
even if they are not related to underlay. Thus each event implies at least
one router lock, sometimes two.
Instead of deferring the locking to the leaf handlers, take the lock in the
router notifier handler always. This simplifies thinking about the locking
state, and in some cases saves one lock cycle.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The position of netdevice notifier registration no longer depends on the
router initialization, because the event handler no longer dispatches to
the router code. Update the comment at the registration to that effect.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The events related to IPIP tunnels are handled by the router code. Move the
handling from the central dispatcher in spectrum.c to the new notifier
handler in the router module.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The events NETDEV_PRE_CHANGEADDR, NETDEV_CHANGEADDR and NETDEV_CHANGEMTU
have implications for in-ASIC router interface objects, and as such are
handled in the router module. Move the handling from the central dispatcher
in spectrum.c to the new notifier handler in the router module.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
L3 HW stats are implemented in mlxsw as RIF counters, and therefore the
code resides in spectrum_router. Exclude the offload xstats events from the
mlxsw_sp_netdevice_event_is_router() predicate, and instead recreate the
glue code in the router module.
Previously, the order of dispatch was that for events on tunnels, a
dedicated handler was called, which however did not handle HW stats events.
But there is nothing special about tunnel devices as far as HW stats: there
is a RIF associated with the tunnel netdevice, and that RIF is where the
counter should be installed. Therefore now, HW stats events are tested
first, independent of netdevice type. The upshot is that as of this commit,
mlxsw supports L3 HW stats work on GRE tunnels.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Events involving VRF, as L3 concern, are handled in the router code, by the
helper mlxsw_sp_netdevice_vrf_event(). The handler is currently invoked
from the centralized dispatcher in spectrum.c. Instead, move the call to
the newly-introduced router-specific notifier handler.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently all netdevice events are handled in the centralized notifier
handler maintained by spectrum.c. Since a number of events are involving
router code, spectrum.c needs to dispatch them to spectrum_router.c. The
spectrum module therefore needs to know more about the router code than it
should have, and there is are several API points through which the two
modules communicate.
To simplify the notifier handlers, introduce a new notifier into the router
module.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Enslaving netdevices to VRF is currently handled through an
mlxsw_sp_is_vrf_event() conditional in mlxsw_sp_netdevice_event(). In the
following patch sets, VRF enslavement will be handled purely in the router
code. Therefore make handlers of NETDEV_PRECHANGEUPPER tolerant of
enslaving to VRF, so that they do not bounce the change.
For NETDEV_CHANGEUPPER, drop the WARN_ON(1) and bounce from
mlxsw_sp_netdevice_port_vlan_event(). This is the only handler that warned
and bounces even in the CHANGEUPPER code, other handler quietly do nothing
when they encounter an unfamiliar upper.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jakub Kicinski says:
====================
net: switch drivers to netif_napi_add_weight()
The minority of drivers pass a custom weight to netif_napi_add().
Switch those away to the new netif_napi_add_weight(). All drivers
(which can go thru net-next) calling netif_napi_add() will now
be calling it with NAPI_POLL_WEIGHT or 64.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A handful of WAN drivers use custom napi weights,
switch them to the new API.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
virtio netdev driver uses a custom napi weight, switch to the new
API for setting custom weight.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
r8152 uses a custom napi weight, switch to the new
API for setting custom weight.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Switch all Ethernet drivers which use custom napi weights
to the new API.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
caif_virtio uses a custom napi weight, switch to the new
API for setting custom weights.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
UM's netdev driver uses a custom napi weight, switch to the new
API for setting custom weight.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.18
A larger collection of fixes than I'd like, mainly because mixer-test
is making it's way into the CI systems and turning up issues on a wider
range of systems. The most substantial thing though is a revert and an
alternative fix for a dmaengine issue where the fix caused disruption
for some other configurations, the core fix is backed out an a driver
specific thing done instead.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix the bounds check for the 'gpio-reserved-ranges' device property
in gpiolib-of
- drop the assignment of the pwm base number in gpio-mvebu (this was
missed by the patch doing it globally for all pwm drivers)
- fix the fwnode assignment (use own fwnode, not the parent's one) for
the GPIO irqchip in gpio-visconti
- update the irq_stat field before checking the trigger field in
gpio-pca953x
- update GPIO entry in MAINTAINERS
* tag 'gpio-fixes-for-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set)
gpio: visconti: Fix fwnode of GPIO IRQ
MAINTAINERS: update the GPIO git tree entry
gpio: mvebu: drop pwm base assignment
gpiolib: of: fix bounds check for 'gpio-reserved-ranges'
|
|
Pull block fixes from Jens Axboe:
"A single revert for a change that isn't needed in 5.18, and a small
series for s390/dasd"
* tag 'block-5.18-2022-05-06' of git://git.kernel.dk/linux-block:
s390/dasd: Use kzalloc instead of kmalloc/memset
s390/dasd: Fix read inconsistency for ESE DASD devices
s390/dasd: Fix read for ESE with blksize < 4k
s390/dasd: prevent double format of tracks for ESE devices
s390/dasd: fix data corruption for ESE devices
Revert "block: release rq qos structures for queue without disk"
|
|
Pull io_uring fix from Jens Axboe:
"Just a single file assignment fix this week"
* tag 'io_uring-5.18-2022-05-06' of git://git.kernel.dk/linux-block:
io_uring: assign non-fixed early for async work
|
|
Vladimir Oltean says:
====================
Simplify migration of host filtered addresses in Felix driver
The purpose of this patch set is to remove the functions
dsa_port_walk_fdbs() and dsa_port_walk_mdbs() from the DSA core, which
were introduced when the Felix driver gained support for unicast
filtering on standalone ports. They get called when changing the tagging
protocol back and forth between "ocelot" and "ocelot-8021q".
I did not realize we could get away without having them.
The patch set was regression-tested using the local_termination.sh
selftest using both tagging protocols.
====================
Link: https://lore.kernel.org/r/20220505162213.307684-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All the users of these functions are gone, delete them before they gain
new ones.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The felix driver is the only user of dsa_port_walk_mdbs(), and there
isn't even a good reason for it, considering that the host MDB entries
are already saved by the ocelot switch lib in the ocelot->multicast list.
Rewrite the multicast entry migration procedure around the
ocelot->multicast list so we can delete dsa_port_walk_mdbs().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I just realized we don't need to migrate the host-filtered FDB entries
when the tagging protocol changes from "ocelot" to "ocelot-8021q".
Host-filtered addresses are learned towards the PGID_CPU "multicast"
port group, reserved by software, which contains BIT(ocelot->num_phys_ports).
That is the "special" port entry in the analyzer block for the CPU port
module.
In "ocelot" mode, the CPU port module's packets are redirected to the
NPI port.
In "ocelot-8021q" mode, felix_8021q_cpu_port_init() does something funny
anyway, and changes PGID_CPU to stop pointing at the CPU port module and
start pointing at the physical port where the DSA master is attached.
The fact that we can alter the destination of packets learned towards
PGID_CPU without altering the MAC table entries themselves means that it
is pointless to walk through the FDB entries, forget that they were
learned towards PGID_CPU, and re-learn them towards the "unicast" PGID
associated with the physical port connected to the DSA master. We can
let the PGID_CPU value change simply alter the destination of the
host-filtered unicast packets in one fell swoop.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ocelot_fdb_add() redirects FDB entries installed on the NPI port towards
the special reserved PGID_CPU used for host-filtered addresses. PGID_CPU
contains BIT(ocelot->num_phys_ports) in the destination port mask, which
is code name for the CPU port module.
Whereas felix_migrate_fdbs_to_*_port() uses the ocelot->num_phys_ports
PGID directly, and it appears that this works too. Even if this PGID is
set to zero, apparently its number is special and packets still reach
the CPU port module.
Nonetheless, in the end, these addresses end up in the same place
regardless of whether they go through an extra indirection layer or not.
Use PGID_CPU across to have more uniformity.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch increases the polling rate used by the
mlxbf_gige driver on the MDIO bus. The previous
polling rate was every 100us, and the new rate is
every 5us. With this change the amount of time
spent waiting for the MDIO BUSY signal to de-assert
drops from ~100us to ~27us for each operation.
Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Link: https://lore.kernel.org/r/20220505162309.20050-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Using min_t(int, ...) as a potential array index implies to the compiler
that negative offsets should be allowed. This is not the case, though.
Replace "int" with "unsigned int". Fixes the following warning exposed
under future CONFIG_FORTIFY_SOURCE improvements:
In file included from include/linux/string.h:253,
from include/linux/bitmap.h:11,
from include/linux/cpumask.h:12,
from include/linux/smp.h:13,
from include/linux/lockdep.h:14,
from include/linux/rcupdate.h:29,
from include/linux/rculist.h:11,
from include/linux/pid.h:5,
from include/linux/sched.h:14,
from include/linux/delay.h:23,
from drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:35:
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c: In function 't4_get_raw_vpd_params':
include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 29 and size [2147483648, 4294967295] [-Warray-bounds]
46 | #define __underlying_memcpy __builtin_memcpy
| ^
include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy'
388 | __underlying_##op(p, q, __fortify_size); \
| ^~~~~~~~~~~~~
include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk'
433 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \
| ^~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2796:9: note: in expansion of macro 'memcpy'
2796 | memcpy(p->id, vpd + id, min_t(int, id_len, ID_LEN));
| ^~~~~~
include/linux/fortify-string.h:46:33: warning: '__builtin_memcpy' pointer overflow between offset 0 and size [2147483648, 4294967295] [-Warray-bounds]
46 | #define __underlying_memcpy __builtin_memcpy
| ^
include/linux/fortify-string.h:388:9: note: in expansion of macro '__underlying_memcpy'
388 | __underlying_##op(p, q, __fortify_size); \
| ^~~~~~~~~~~~~
include/linux/fortify-string.h:433:26: note: in expansion of macro '__fortify_memcpy_chk'
433 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \
| ^~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:2798:9: note: in expansion of macro 'memcpy'
2798 | memcpy(p->sn, vpd + sn, min_t(int, sn_len, SERNUM_LEN));
| ^~~~~~
Additionally remove needless cast from u8[] to char * in last strim()
call.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/202205031926.FVP7epJM-lkp@intel.com
Fixes: fc9279298e3a ("cxgb4: Search VPD with pci_vpd_find_ro_info_keyword()")
Fixes: 24c521f81c30 ("cxgb4: Use pci_vpd_find_id_string() to find VPD ID string")
Cc: Raju Rangoju <rajur@chelsio.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220505233101.1224230-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
10GbE Intel Wired LAN Driver Updates 2022-05-05
This series contains updates to ixgbe and igb drivers.
Jeff Daly adjusts type for 'allow_unsupported_sfp' to match the
associated struct value for ixgbe.
Alaa Mohamed converts, deprecated, kmap() call to kmap_local_page() for
igb.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
igb: Convert kmap() to kmap_local_page()
ixgbe: Fix module_param allow_unsupported_sfp type
====================
Link: https://lore.kernel.org/r/20220505155651.2606195-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
netlink_recvmsg() does not need to change transport header.
If transport header was needed, it should have been reset
by the producer (netlink_dump()), not the consumer(s).
The following trace probably happened when multiple threads
were using MSG_PEEK.
BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
write to 0xffff88811e9f15b2 of 2 bytes by task 32012 on cpu 1:
skb_reset_transport_header include/linux/skbuff.h:2760 [inline]
netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
__sys_recvfrom+0x204/0x2c0 net/socket.c:2097
__do_sys_recvfrom net/socket.c:2115 [inline]
__se_sys_recvfrom net/socket.c:2111 [inline]
__x64_sys_recvfrom+0x74/0x90 net/socket.c:2111
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
write to 0xffff88811e9f15b2 of 2 bytes by task 32005 on cpu 0:
skb_reset_transport_header include/linux/skbuff.h:2760 [inline]
netlink_recvmsg+0x1de/0x790 net/netlink/af_netlink.c:1978
____sys_recvmsg+0x162/0x2f0
___sys_recvmsg net/socket.c:2674 [inline]
__sys_recvmsg+0x209/0x3f0 net/socket.c:2704
__do_sys_recvmsg net/socket.c:2714 [inline]
__se_sys_recvmsg net/socket.c:2711 [inline]
__x64_sys_recvmsg+0x42/0x50 net/socket.c:2711
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffff -> 0x0000
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 32005 Comm: syz-executor.4 Not tainted 5.18.0-rc1-syzkaller-00328-ge1f700ebd6be-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220505161946.2867638-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Regression fixes in zone activation:
- move a loop invariant out of the loop to avoid checking space
status
- properly handle unlimited activation
Other fixes:
- for subpage, force the free space v2 mount to avoid a warning and
make it easy to switch a filesystem on different page size systems
- export sysfs status of exclusive operation 'balance paused', so the
user space tools can recognize it and allow adding a device with
paused balance
- fix assertion failure when logging directory key range item"
* tag 'for-5.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: sysfs: export the balance paused state of exclusive operation
btrfs: fix assertion failure when logging directory key range item
btrfs: zoned: activate block group properly on unlimited active zone device
btrfs: zoned: move non-changing condition check out of the loop
btrfs: force v2 space cache usage for subpage mount
|
|
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix a socket leak when setting up an AF_LOCAL RPC client
- Ensure that knfsd connects to the gss-proxy daemon on setup
Bugfixes:
- Fix a refcount leak when migrating a task off an offlined transport
- Don't gratuitously invalidate inode attributes on delegation return
- Don't leak sockets in xs_local_connect()
- Ensure timely close of disconnected AF_LOCAL sockets"
* tag 'nfs-for-5.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
Revert "SUNRPC: attempt AF_LOCAL connect on setup"
SUNRPC: Ensure gss-proxy connects on setup
SUNRPC: Ensure timely close of disconnected AF_LOCAL sockets
SUNRPC: Don't leak sockets in xs_local_connect()
NFSv4: Don't invalidate inode attributes on delegation return
SUNRPC release the transport of a relocated task with an assigned transport
|
|
kmemleak reports the following when routing multicast traffic over an
ipsec tunnel.
Kmemleak output:
unreferenced object 0x8000000044bebb00 (size 256):
comm "softirq", pid 0, jiffies 4294985356 (age 126.810s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 80 00 00 00 05 13 74 80 ..............t.
80 00 00 00 04 9b bf f9 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000f83947e0>] __kmalloc+0x1e8/0x300
[<00000000b7ed8dca>] metadata_dst_alloc+0x24/0x58
[<0000000081d32c20>] __ipgre_rcv+0x100/0x2b8
[<00000000824f6cf1>] gre_rcv+0x178/0x540
[<00000000ccd4e162>] gre_rcv+0x7c/0xd8
[<00000000c024b148>] ip_protocol_deliver_rcu+0x124/0x350
[<000000006a483377>] ip_local_deliver_finish+0x54/0x68
[<00000000d9271b3a>] ip_local_deliver+0x128/0x168
[<00000000bd4968ae>] xfrm_trans_reinject+0xb8/0xf8
[<0000000071672a19>] tasklet_action_common.isra.16+0xc4/0x1b0
[<0000000062e9c336>] __do_softirq+0x1fc/0x3e0
[<00000000013d7914>] irq_exit+0xc4/0xe0
[<00000000a4d73e90>] plat_irq_dispatch+0x7c/0x108
[<000000000751eb8e>] handle_int+0x16c/0x178
[<000000001668023b>] _raw_spin_unlock_irqrestore+0x1c/0x28
The metadata dst is leaked when ip_route_input_mc() updates the dst for
the skb. Commit f38a9eb1f77b ("dst: Metadata destinations") correctly
handled dropping the dst in ip_route_input_slow() but missed the
multicast case which is handled by ip_route_input_mc(). Drop the dst in
ip_route_input_mc() avoiding the leak.
Fixes: f38a9eb1f77b ("dst: Metadata destinations")
Signed-off-by: Lokesh Dhoundiyal <lokesh.dhoundiyal@alliedtelesis.co.nz>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220505020017.3111846-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"x86:
- Account for family 17h event renumberings in AMD PMU emulation
- Remove CPUID leaf 0xA on AMD processors
- Fix lockdep issue with locking all vCPUs
- Fix loss of A/D bits in SPTEs
- Fix syzkaller issue with invalid guest state"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: Exit to userspace if vCPU has injected exception and invalid state
KVM: SEV: Mark nested locking of vcpu->lock
kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU
KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id
KVM: x86/mmu: Use atomic XCHG to write TDP MMU SPTEs with volatile bits
KVM: x86/mmu: Move shadow-present check out of spte_has_volatile_bits()
KVM: x86/mmu: Don't treat fully writable SPTEs as volatile (modulo A/D)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
- A fix to relocate the DTB early in boot, in cases where the
bootloader doesn't put the DTB in a region that will end up
mapped by the kernel.
This manifests as a crash early in boot on a handful of
configurations.
* tag 'riscv-for-linus-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: relocate DTB if it's outside memory region
|
|
Read stale PTP Tx timestamps from PHY on cleanup.
After running out of Tx timestamps request handlers, hardware (HW) stops
reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used
to only clean up stale handlers in driver and was leaving the hardware
registers not read. Not reading stale PTP Tx timestamps prevents next
interrupts from arriving and makes timestamping unusable.
Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices")
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The iAVF driver uses 3 virtchnl op codes to communicate with the PF
regarding the VF Tx queues:
* VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware
logic for the Tx queues
* VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts
* VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings.
There is a bug in the iAVF driver due to the race condition between VF
reset request and shutdown being executed in parallel. This leads to a
break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent.
If this occurs, the PF driver never cleans up the Tx queues. This results
in leaving behind stale Tx queue settings in the hardware and firmware.
The most obvious outcome is that upon the next
VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx
scheduler node due to a lack of space.
We need to protect ICE driver against such situation.
To fix this, make sure we clear existing stale settings out when
handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the
previous settings.
Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if
the queue is not configured. The function already handles the case when the
Tx queue is not currently configured and exits with a 0 return in that
case.
Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Function ice_plug_aux_dev() assigns pf->adev field too early prior
aux device initialization and on other side ice_unplug_aux_dev()
starts aux device deinit and at the end assigns NULL to pf->adev.
This is wrong because pf->adev should always be non-NULL only when
aux device is fully initialized and ready. This wrong order causes
a crash when ice_send_event_to_aux() call occurs because that function
depends on non-NULL value of pf->adev and does not assume that
aux device is half-initialized or half-destroyed.
After order correction the race window is tiny but it is still there,
as Leon mentioned and manipulation with pf->adev needs to be protected
by mutex.
Fix (un-)plugging functions so pf->adev field is set after aux device
init and prior aux device destroy and protect pf->adev assignment by
new mutex. This mutex is also held during ice_send_event_to_aux()
call to ensure that aux device is valid during that call.
Note that device lock used ice_send_event_to_aux() needs to be kept
to avoid race with aux drv unload.
Reproducer:
cycle=1
while :;do
echo "#### Cycle: $cycle"
ip link set ens7f0 mtu 9000
ip link add bond0 type bond mode 1 miimon 100
ip link set bond0 up
ifenslave bond0 ens7f0
ip link set bond0 mtu 9000
ethtool -L ens7f0 combined 1
ip link del bond0
ip link set ens7f0 mtu 1500
sleep 1
let cycle++
done
In short when the device is added/removed to/from bond the aux device
is unplugged/plugged. When MTU of the device is changed an event is
sent to aux device asynchronously. This can race with (un)plugging
operation and because pf->adev is set too early (plug) or too late
(unplug) the function ice_send_event_to_aux() can touch uninitialized
or destroyed fields. In the case of crash below pf->adev->dev.mutex.
Crash:
[ 53.372066] bond0: (slave ens7f0): making interface the new active one
[ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up
link
[ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[ 54.248204] bond0: (slave ens7f0): Releasing backup interface
[ 54.253955] bond0: (slave ens7f1): making interface the new active one
[ 54.274875] bond0: (slave ens7f1): Releasing backup interface
[ 54.289153] bond0 (unregistering): Released all slaves
[ 55.383179] MII link monitoring set to 100 ms
[ 55.398696] bond0: (slave ens7f0): making interface the new active one
[ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080
[ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[ 55.412198] #PF: supervisor write access in kernel mode
[ 55.412200] #PF: error_code(0x0002) - not-present page
[ 55.412201] PGD 25d2ad067 P4D 0
[ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S
5.17.0-13579-g57f2d6540f03 #1
[ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up
link
[ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/
2021
[ 55.430226] Workqueue: ice ice_service_task [ice]
[ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20
[ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
[ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
[ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001
[ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080
[ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041
[ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0
[ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000
[ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000
[ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0
[ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 55.567305] PKRU: 55555554
[ 55.570018] Call Trace:
[ 55.572474] <TASK>
[ 55.574579] ice_service_task+0xaab/0xef0 [ice]
[ 55.579130] process_one_work+0x1c5/0x390
[ 55.583141] ? process_one_work+0x390/0x390
[ 55.587326] worker_thread+0x30/0x360
[ 55.590994] ? process_one_work+0x390/0x390
[ 55.595180] kthread+0xe6/0x110
[ 55.598325] ? kthread_complete_and_exit+0x20/0x20
[ 55.603116] ret_from_fork+0x1f/0x30
[ 55.606698] </TASK>
Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Exit to userspace with an emulation error if KVM encounters an injected
exception with invalid guest state, in addition to the existing check of
bailing if there's a pending exception (KVM doesn't support emulating
exceptions except when emulating real mode via vm86).
In theory, KVM should never get to such a situation as KVM is supposed to
exit to userspace before injecting an exception with invalid guest state.
But in practice, userspace can intervene and manually inject an exception
and/or stuff registers to force invalid guest state while a previously
injected exception is awaiting reinjection.
Fixes: fc4fad79fc3d ("KVM: VMX: Reject KVM_RUN if emulation is required with pending exception")
Reported-by: syzbot+cfafed3bb76d3e37581b@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220502221850.131873-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
svm_vm_migrate_from() uses sev_lock_vcpus_for_migration() to lock all
source and target vcpu->locks. Unfortunately there is an 8 subclass
limit, so a new subclass cannot be used for each vCPU. Instead maintain
ownership of the first vcpu's mutex.dep_map using a role specific
subclass: source vs target. Release the other vcpu's mutex.dep_maps.
Fixes: b56639318bb2b ("KVM: SEV: Add support for SEV intra host migration")
Reported-by: John Sperbeck<jsperbeck@google.com>
Suggested-by: David Rientjes <rientjes@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Gonda <pgonda@google.com>
Message-Id: <20220502165807.529624-1-pgonda@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This reverts commit bfaaba99e680bf82bf2cbf69866c3f37434ff766.
Commit bfaaba99e680 ("ice: Hide bus-info in ethtool for PRs in switchdev
mode") was a workaround for lshw tool displaying incorrect
descriptions for port representors and PF in switchdev mode. Now the issue
has been fixed in the lshw tool itself [1].
Removing the workaround can be considered a regression, as the user might
be running older, unpatched lshw version. However, another important change
(ice: link representors to PCI device, which improves port representor
netdev naming with SET_NETDEV_DEV) also causes the same "regression" as
removing the workaround, i.e. unpatched lshw is able to access bus-info
information (this time not via ethtool) and the bug can occur. Therefore,
the workaround no longer prevents the bug and can be removed.
[1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Pull rdma fixes from Jason Gunthorpe:
"A few recent regressions in rxe's multicast code, and some old driver
bugs:
- Error case unwind bug in rxe for rkeys
- Dot not call netdev functions under a spinlock in rxe multicast
code
- Use the proper BH lock type in rxe multicast code
- Fix idrma deadlock and crash
- Add a missing flush to drain irdma QPs when in error
- Fix high userspace latency in irdma during destroy due to
synchronize_rcu()
- Rare race in siw MPA processing"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/rxe: Change mcg_lock to a _bh lock
RDMA/rxe: Do not call dev_mc_add/del() under a spinlock
RDMA/siw: Fix a condition race issue in MPA request processing
RDMA/irdma: Fix possible crash due to NULL netdev in notifier
RDMA/irdma: Reduce iWARP QP destroy time
RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state
RDMA/rxe: Recheck the MR in when generating a READ reply
RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core()
RDMA/rxe: Fix "Replace mr by rkey in responder resources"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull mmc fixes from Ulf Hansson:
"MMC core:
- Fix initialization for eMMC's HS200/HS400 mode
MMC host:
- sdhci-msm: Reset GCC_SDCC_BCR register to prevent timeout issues
- sunxi-mmc: Fix DMA descriptors allocated above 32 bits"
* tag 'mmc-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC
mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits
mmc: core: Set HS clock speed before sending HS CMD13
|
|
Link port representors to parent PCI device to benefit from
systemd defined naming scheme.
Example from ip tool:
- without linking:
eth0 ...
- with linking:
eth0 ...
altname enp24s0f0npf0vf0
The port representor name is being shown in altname, because the name is
longer than IFNAMSIZ (16) limit. Altname can be used in ip tool.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Pull drm fixes from Dave Airlie:
"A pretty quiet week, one fbdev, msm, kconfig, and two amdgpu fixes,
about what I'd expect for rc6.
fbdev:
- hotunplugging fix
amdgpu:
- Fix a xen dom0 regression on APUs
- Fix a potential array overflow if a receiver were to send an
erroneous audio channel count
msm:
- lockdep fix.
it6505:
- kconfig fix"
* tag 'drm-fixes-2022-05-06' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT
drm/amdgpu: do not use passthrough mode in Xen dom0
drm/bridge: ite-it6505: add missing Kconfig option select
fbdev: Make fb_release() return -ENODEV if fbdev was unregistered
drm/msm/dp: remove fail safe mode related code
|
|
When one port's input state get inverted (eg. from low to hight) after
pca953x_irq_setup but before setting irq_mask (by some other driver such as
"gpio-keys"), the next inversion of this port (eg. from hight to low) will not
be triggered any more (because irq_stat is not updated at the first time). Issue
should be fixed after this commit.
Fixes: 89ea8bbe9c3e ("gpio: pca953x.c: add interrupt handling capability")
Signed-off-by: Puyou Lu <puyou.lu@gmail.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
Jakub Kicinski says:
====================
net: disambiguate the TSO and GSO limits
This series separates the device-reported TSO limitations
from the user space-controlled GSO limits. It used to be that
we only had the former (HW limits) but they were named GSO.
This probably lead to confusion and letting user override them.
The problem came up in the BIG TCP discussion between Eric and
Alex, and seems like something we should address.
Targeting net-next because (a) nobody is reporting problems;
and (b) there is a tiny but non-zero chance that some actually
wants to lift the HW limitations.
====================
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These are now internal to the core, no need to expose them.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|