Age | Commit message (Collapse) | Author | Files | Lines |
|
This test is checking if we exited the list via break or not. However
if it did not exit via a break then "node" does not point to a valid
udp_tunnel_nic_shared_node struct. It will work because of the way
the structs are laid out it's the equivalent of
"if (info->shared->udp_tunnel_nic_info != dev)" which will always be
true, but it's not the right way to test.
Fixes: 74cc6d182d03 ("udp_tunnel: add the ability to share port tables")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
ownership. It expects current->mm to be valid.
vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
the user has not done close(), so when we are in do_exit(). In this
case current->mm is invalid and we're releasing the device, so we
should clean it anyway.
Let's check the owner only when vhost_vsock_stop() is called
by an ioctl.
When invoked from release we can not fail so we don't check return
code of vhost_vsock_stop(). We need to stop vsock even if it's not
the owner.
Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Cc: stable@vger.kernel.org
Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net/mctp/device.c:140:11: warning: Assigned value is garbage or undefined
[clang-analyzer-core.uninitialized.Assign]
mcb->idx = idx;
- Not a real problem due to how the callback runs, fix the warning.
net/mctp/route.c:458:4: warning: Value stored to 'msk' is never read
[clang-analyzer-deadcode.DeadStores]
msk = container_of(key->sk, struct mctp_sock, sk);
- 'msk' dead assignment can be removed here.
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Matt Johnston says:
====================
mctp: Fix incorrect refs for extended addr
This fixes an incorrect netdev unref and also addresses the race
condition identified by Jakub in v2. Thanks for the review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the extended addressing local route output codepath
dev_get_by_index_rcu() doesn't take a dev_hold() so we shouldn't
dev_put().
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously there was a race that could allow the mctp_dev refcount
to hit zero:
rcu_read_lock();
mdev = __mctp_dev_get(dev);
// mctp_unregister() happens here, mdev->refs hits zero
mctp_dev_hold(dev);
rcu_read_unlock();
Now we make __mctp_dev_get() take the hold itself. It is safe to test
against the zero refcount because __mctp_dev_get() is called holding
rcu_read_lock and mctp_dev uses kfree_rcu().
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alvin Šipraga says:
====================
net: dsa: realtek: fix PHY register read corruption
These two patches fix the issue reported by Arınç where PHY register
reads sometimes return garbage data.
v1 -> v2:
- no code changes
- just update the commit message of patch 2 to reflect the conclusion
of further investigation requested by Vladimir
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Realtek switches in the rtl8365mb family can access the PHY registers of
the internal PHYs via the switch registers. This method is called
indirect access. At a high level, the indirect PHY register access
method involves reading and writing some special switch registers in a
particular sequence. This works for both SMI and MDIO connected
switches.
Currently the rtl8365mb driver does not take any care to serialize the
aforementioned access to the switch registers. In particular, it is
permitted for other driver code to access other switch registers while
the indirect PHY register access is ongoing. Locking is only done at the
regmap level. This, however, is a bug: concurrent register access, even
to unrelated switch registers, risks corrupting the PHY register value
read back via the indirect access method described above.
Arınç reported that the switch sometimes returns nonsense data when
reading the PHY registers. In particular, a value of 0 causes the
kernel's PHY subsystem to think that the link is down, but since most
reads return correct data, the link then flip-flops between up and down
over a period of time.
The aforementioned bug can be readily observed by:
1. Enabling ftrace events for regmap and mdio
2. Polling BSMR PHY register for a connected port;
it should always read the same (e.g. 0x79ed)
3. Wait for step 2 to give a different value
Example command for step 2:
while true; do phytool read swp2/2/0x01; done
On my i.MX8MM, the above steps will yield a bogus value for the BSMR PHY
register within a matter of seconds. The interleaved register access it
then evident in the trace log:
kworker/3:4-70 [003] ....... 1927.139849: regmap_reg_write: ethernet-switch reg=1004 val=bd
phytool-16816 [002] ....... 1927.139979: regmap_reg_read: ethernet-switch reg=1f01 val=0
kworker/3:4-70 [003] ....... 1927.140381: regmap_reg_read: ethernet-switch reg=1005 val=0
phytool-16816 [002] ....... 1927.140468: regmap_reg_read: ethernet-switch reg=1d15 val=a69
kworker/3:4-70 [003] ....... 1927.140864: regmap_reg_read: ethernet-switch reg=1003 val=0
phytool-16816 [002] ....... 1927.140955: regmap_reg_write: ethernet-switch reg=1f02 val=2041
kworker/3:4-70 [003] ....... 1927.141390: regmap_reg_read: ethernet-switch reg=1002 val=0
phytool-16816 [002] ....... 1927.141479: regmap_reg_write: ethernet-switch reg=1f00 val=1
kworker/3:4-70 [003] ....... 1927.142311: regmap_reg_write: ethernet-switch reg=1004 val=be
phytool-16816 [002] ....... 1927.142410: regmap_reg_read: ethernet-switch reg=1f01 val=0
kworker/3:4-70 [003] ....... 1927.142534: regmap_reg_read: ethernet-switch reg=1005 val=0
phytool-16816 [002] ....... 1927.142618: regmap_reg_read: ethernet-switch reg=1f04 val=0
phytool-16816 [002] ....... 1927.142641: mdio_access: SMI-0 read phy:0x02 reg:0x01 val:0x0000 <- ?!
kworker/3:4-70 [003] ....... 1927.143037: regmap_reg_read: ethernet-switch reg=1001 val=0
kworker/3:4-70 [003] ....... 1927.143133: regmap_reg_read: ethernet-switch reg=1000 val=2d89
kworker/3:4-70 [003] ....... 1927.143213: regmap_reg_write: ethernet-switch reg=1004 val=be
kworker/3:4-70 [003] ....... 1927.143291: regmap_reg_read: ethernet-switch reg=1005 val=0
kworker/3:4-70 [003] ....... 1927.143368: regmap_reg_read: ethernet-switch reg=1003 val=0
kworker/3:4-70 [003] ....... 1927.143443: regmap_reg_read: ethernet-switch reg=1002 val=6
The kworker here is polling MIB counters for stats, as evidenced by the
register 0x1004 that we are writing to (RTL8365MB_MIB_ADDRESS_REG). This
polling is performed every 3 seconds, but is just one example of such
unsynchronized access. In Arınç's case, the driver was not using the
switch IRQ, so the PHY subsystem was itself doing polling analogous to
phytool in the above example.
A test module was created [see second Link] to simulate such spurious
switch register accesses while performing indirect PHY register reads
and writes. Realtek was also consulted to confirm whether this is a
known issue or not. The conclusion of these lines of inquiry is as
follows:
1. Reading of PHY registers via indirect access will be aborted if,
after executing the read operation (via a write to the
INDIRECT_ACCESS_CTRL_REG), any register is accessed, other than
INDIRECT_ACCESS_STATUS_REG.
2. The PHY register indirect read is only complete when
INDIRECT_ACCESS_STATUS_REG reads zero.
3. The INDIRECT_ACCESS_DATA_REG, which is read to get the result of the
PHY read, will contain the result of the last successful read
operation. If there was spurious register access and the indirect
read was aborted, then this register is not guaranteed to hold
anything meaningful and the PHY read will silently fail.
4. PHY writes do not appear to be affected by this mechanism.
5. Other similar access routines, such as for MIB counters, although
similar to the PHY indirect access method, are actually table access.
Table access is not affected by spurious reads or writes of other
registers. However, concurrent table access is not allowed. Currently
this is protected via mib_lock, so there is nothing to fix.
The above statements are corroborated both via the test module and
through consultation with Realtek. In particular, Realtek states that
this is simply a property of the hardware design and is not a hardware
bug.
To fix this problem, one must guard against regmap access while the
PHY indirect register read is executing. Fix this by using the newly
introduced "nolock" regmap in all PHY-related functions, and by aquiring
the regmap mutex at the top level of the PHY register access callbacks.
Although no issue has been observed with PHY register _writes_, this
change also serializes the indirect access method there. This is done
purely as a matter of convenience and for reasons of symmetry.
Fixes: 4af2950c50c8 ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Link: https://lore.kernel.org/netdev/CAJq09z5FCgG-+jVT7uxh1a-0CiiFsoKoHYsAWJtiKwv7LXKofQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/871qzwjmtv.fsf@bang-olufsen.dk/
Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently there is no way for Realtek DSA subdrivers to serialize
consecutive regmap accesses. In preparation for a bugfix relating to
indirect PHY register access - which involves a series of regmap
reads and writes - add a facility for subdrivers to serialize their
regmap access.
Specifically, a mutex is added to the driver private data structure and
the standard regmap is initialized with custom lock/unlock ops which use
this mutex. Then, a "nolock" variant of the regmap is added, which is
functionally equivalent to the existing regmap except that regmap
locking is disabled. Functions that wish to serialize a sequence of
regmap accesses may then lock the newly introduced driver-owned mutex
before using the nolock regmap.
Doing things this way means that subdriver code that doesn't care about
serialized register access - i.e. the vast majority of code - needn't
worry about synchronizing register access with an external lock: it can
just continue to use the original regmap.
Another advantage of this design is that, while regmaps with locking
disabled do not expose a debugfs interface for obvious reasons, there
still exists the original regmap which does expose this interface. This
interface remains safe to use even combined with driver codepaths that
use the nolock regmap, because said codepaths will use the same mutex
to synchronize access.
With respect to disadvantages, it can be argued that having
near-duplicate regmaps is confusing. However, the naming is rather
explicit, and examples will abound.
Finally, while we are at it, rename realtek_smi_mdio_regmap_config to
realtek_smi_regmap_config. This makes it consistent with the naming
realtek_mdio_regmap_config in realtek-mdio.c.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
handler
The logic from switchdev_handle_port_obj_add_foreign() is directly
adapted from switchdev_handle_fdb_event_to_device(), which already
detects events on foreign interfaces and reoffloads them towards the
switchdev neighbors.
However, when we have a simple br0 <-> bond0 <-> swp0 topology and the
switchdev_handle_port_obj_add_foreign() gets called on bond0, we get
stuck into an infinite recursion:
1. bond0 does not pass check_cb(), so we attempt to find switchdev
neighbor interfaces. For that, we recursively call
__switchdev_handle_port_obj_add() for bond0's bridge, br0.
2. __switchdev_handle_port_obj_add() recurses through br0's lowers,
essentially calling __switchdev_handle_port_obj_add() for bond0
3. Go to step 1.
This happens because switchdev_handle_fdb_event_to_device() and
switchdev_handle_port_obj_add_foreign() are not exactly the same.
The FDB event helper special-cases LAG interfaces with its lag_mod_cb(),
so this is why we don't end up in an infinite loop - because it doesn't
attempt to treat LAG interfaces as potentially foreign bridge ports.
The problem is solved by looking ahead through the bridge's lowers to
see whether there is any switchdev interface that is foreign to the @dev
we are currently processing. This stops the recursion described above at
step 1: __switchdev_handle_port_obj_add(bond0) will not create another
call to __switchdev_handle_port_obj_add(br0). Going one step upper
should only happen when we're starting from a bridge port that has been
determined to be "foreign" to the switchdev driver that passes the
foreign_dev_check_cb().
Fixes: c4076cdd21f8 ("net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ever-vigilant Linux kernel test robot reminded us that
we need to use the correct include files to be sure that
all the build variations will work correctly. Adding the
vmalloc.h include takes care of declaring our use of vzalloc()
and vfree().
drivers/net/ethernet/pensando/ionic/ionic_lif.c:396:17: error: implicit
declaration of function 'vfree'; did you mean 'kvfree'?
drivers/net/ethernet/pensando/ionic/ionic_lif.c:531:21: warning:
assignment to 'struct ionic_desc_info *' from 'int' makes pointer from
integer without a cast
Fixes: 116dce0ff047 ("ionic: Use vzalloc for large per-queue related buffers")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Link: https://lore.kernel.org/r/20220223015731.22025-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
tcp: take care of another syzbot issue
This is a minor issue: It took months for syzbot to find a C repro,
and even with it, I had to spend a lot of time to understand KFENCE
was a prereq. With the default kfence 500ms interval, I had to be
very patient to trigger the kernel warning and perform my analysis.
This series targets net-next tree, because I added a new generic helper
in the first patch, then fixed the issue in the second one.
They can be backported once proven solid.
====================
Link: https://lore.kernel.org/r/20220222032113.4005821-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot found another way to trigger the infamous WARN_ON_ONCE(delta < len)
in skb_try_coalesce() [1]
I was able to root cause the issue to kfence.
When kfence is in action, the following assertion is no longer true:
int size = xxxx;
void *ptr1 = kmalloc(size, gfp);
void *ptr2 = kmalloc(size, gfp);
if (ptr1 && ptr2)
ASSERT(ksize(ptr1) == ksize(ptr2));
We attempted to fix these issues in the blamed commits, but forgot
that TCP was possibly shifting data after skb_unclone_keeptruesize()
has been used, notably from tcp_retrans_try_collapse().
So we not only need to keep same skb->truesize value,
we also need to make sure TCP wont fill new tailroom
that pskb_expand_head() was able to get from a
addr = kmalloc(...) followed by ksize(addr)
Split skb_unclone_keeptruesize() into two parts:
1) Inline skb_unclone_keeptruesize() for the common case,
when skb is not cloned.
2) Out of line __skb_unclone_keeptruesize() for the 'slow path'.
WARNING: CPU: 1 PID: 6490 at net/core/skbuff.c:5295 skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295
Modules linked in:
CPU: 1 PID: 6490 Comm: syz-executor161 Not tainted 5.17.0-rc4-syzkaller-00229-g4f12b742eb2b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295
Code: bf 01 00 00 00 0f b7 c0 89 c6 89 44 24 20 e8 62 24 4e fa 8b 44 24 20 83 e8 01 0f 85 e5 f0 ff ff e9 87 f4 ff ff e8 cb 20 4e fa <0f> 0b e9 06 f9 ff ff e8 af b2 95 fa e9 69 f0 ff ff e8 95 b2 95 fa
RSP: 0018:ffffc900063af268 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 00000000ffffffd5 RCX: 0000000000000000
RDX: ffff88806fc05700 RSI: ffffffff872abd55 RDI: 0000000000000003
RBP: ffff88806e675500 R08: 00000000ffffffd5 R09: 0000000000000000
R10: ffffffff872ab659 R11: 0000000000000000 R12: ffff88806dd554e8
R13: ffff88806dd9bac0 R14: ffff88806dd9a2c0 R15: 0000000000000155
FS: 00007f18014f9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020002000 CR3: 000000006be7a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcp_try_coalesce net/ipv4/tcp_input.c:4651 [inline]
tcp_try_coalesce+0x393/0x920 net/ipv4/tcp_input.c:4630
tcp_queue_rcv+0x8a/0x6e0 net/ipv4/tcp_input.c:4914
tcp_data_queue+0x11fd/0x4bb0 net/ipv4/tcp_input.c:5025
tcp_rcv_established+0x81e/0x1ff0 net/ipv4/tcp_input.c:5947
tcp_v4_do_rcv+0x65e/0x980 net/ipv4/tcp_ipv4.c:1719
sk_backlog_rcv include/net/sock.h:1037 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2779
release_sock+0x54/0x1b0 net/core/sock.c:3311
sk_wait_data+0x177/0x450 net/core/sock.c:2821
tcp_recvmsg_locked+0xe28/0x1fd0 net/ipv4/tcp.c:2457
tcp_recvmsg+0x137/0x610 net/ipv4/tcp.c:2572
inet_recvmsg+0x11b/0x5e0 net/ipv4/af_inet.c:850
sock_recvmsg_nosec net/socket.c:948 [inline]
sock_recvmsg net/socket.c:966 [inline]
sock_recvmsg net/socket.c:962 [inline]
____sys_recvmsg+0x2c4/0x600 net/socket.c:2632
___sys_recvmsg+0x127/0x200 net/socket.c:2674
__sys_recvmsg+0xe2/0x1a0 net/socket.c:2704
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: c4777efa751d ("net: add and use skb_unclone_keeptruesize() helper")
Fixes: 097b9146c0e2 ("net: fix up truesize of cloned skb in skb_prepare_for_shift()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We have multiple places where this helper is convenient,
and plan using it in the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All other skbs allocated for TCP tx are using MAX_TCP_HEADER already.
MAX_HEADER can be too small for some cases (like eBPF based encapsulation),
so this can avoid extra pskb_expand_head() in lower stacks.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220222031115.4005060-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If client is unable to initiate a failover reset via H_VIOCTL hcall, then
it should schedule a failover reset as a last resort. Otherwise, there is
no need to do a last resort.
Fixes: 334c42414729 ("ibmvnic: improve failover sysfs entry")
Reported-by: Cris Forno <cforno12@outlook.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220221210545.115283-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add option to shift the clock by a specified number of nanoseconds.
The new argument -n will specify the number of nanoseconds to add to the
ptp clock. Since the API doesn't support negative shifts those needs to
be calculated by subtracting full seconds and adding a nanosecond offset.
Signed-off-by: Maciek Machnikowski <maciek@machnikowski.net>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20220221200637.125595-1-maciek@machnikowski.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a bridged port is not offloaded to the hardware - either because the
underlying driver does not implement the port_bridge_{join,leave} ops,
or because the operation failed - then its dp->bridge pointer will be
NULL when dsa_port_bridge_leave() is called. Avoid dereferncing NULL.
This fixes the following splat when removing a port from a bridge:
Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP
CPU: 3 PID: 1119 Comm: brctl Tainted: G O 5.17.0-rc4-rt4 #1
Call trace:
dsa_port_bridge_leave+0x8c/0x1e4
dsa_slave_changeupper+0x40/0x170
dsa_slave_netdevice_event+0x494/0x4d4
notifier_call_chain+0x80/0xe0
raw_notifier_call_chain+0x1c/0x24
call_netdevice_notifiers_info+0x5c/0xac
__netdev_upper_dev_unlink+0xa4/0x200
netdev_upper_dev_unlink+0x38/0x60
del_nbp+0x1b0/0x300
br_del_if+0x38/0x114
add_del_if+0x60/0xa0
br_ioctl_stub+0x128/0x2dc
br_ioctl_call+0x68/0xb0
dev_ifsioc+0x390/0x554
dev_ioctl+0x128/0x400
sock_do_ioctl+0xb4/0xf4
sock_ioctl+0x12c/0x4e0
__arm64_sys_ioctl+0xa8/0xf0
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x28/0x84
el0_svc+0x1c/0x50
el0t_64_sync_handler+0xa8/0xb0
el0t_64_sync+0x17c/0x180
Code: f9402f00 f0002261 f9401302 913cc021 (a9401404)
---[ end trace 0000000000000000 ]---
Fixes: d3eed0e57d5d ("net: dsa: keep the bridge_dev and bridge_num as part of the same structure")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220221203539.310690-1-alvin@pqrs.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
This issue was found with the help of Coccinelle and audited and fixed,
manually.
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://github.com/KSPP/linux/issues/79
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20220221173415.GA1149599@embeddedor
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vladimir Oltean reports that probing on DSA drivers that aren't yet
populating supported_interfaces now fails. Fix this by allowing
phylink to detect whether DSA actually provides an underlying
mac_select_pcs() implementation.
Reported-by: Vladimir Oltean <olteanv@gmail.com>
Fixes: bde018222c6b ("net: dsa: add support for phylink mac_select_pcs()")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/E1nMCD6-00A0wC-FG@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
30 seconds is too long interval especially if it used with ip -s l.
Reduce polling interval to 5 sec.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220221084129.3660124-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Whenever one of these functions pull all data from an skb in a frag_list,
use consume_skb() instead of kfree_skb() to avoid polluting drop
monitoring.
Fixes: 6fa01ccd8830 ("skbuff: Add pskb_extract() helper function")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220220154052.1308469-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix for a subtle bug in the recent release_agent permission check
update
- Fix for a long-standing race condition between cpuset and cpu hotplug
- Comment updates
* 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: Fix kernel-doc
cgroup-v1: Correct privileges check in release_agent writes
cgroup: clarify cgroup_css_set_fork()
cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
|
|
Alexandra Winter says:
====================
s390/net: updates 2022-02-21
Just cleanup. No functional changes, as currently virt=phys in s390.
====================
Link: https://lore.kernel.org/r/20220221145633.3869621-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix virtual vs physical address confusion (which currently are the same).
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix virtual vs physical address confusion (which currently are the same).
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mutex_is_locked() tests whether the mutex is locked *by any task*, while
here we want to test if it is held *by the current task*. To avoid
false/missed WARNINGs, use lockdep_assert_is_held() and
lockdep_assert_is_not_held() instead, which do the right thing (though
they are a no-op if CONFIG_LOCKDEP=n).
Cc: stable@vger.kernel.org
Fixes: 2554a48f4437 ("selinux: measure state and policy capabilities")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Emails to Roger Quadros TI account bounce with:
550 Invalid recipient <rogerq@ti.com> (#5.1.1)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Acked-by: Roger Quadros <rogerq@kernel.org>
Acked-By: Vinod Koul <vkoul@kernel.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220221100701.48593-1-krzysztof.kozlowski@canonical.com
|
|
Emails to Yash Shah bounce with "The email account that you tried to
reach does not exist.", so drop him from all maintainer entries.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220214082349.162973-1-krzysztof.kozlowski@canonical.com
|
|
Fix the following W=1 kernel warnings:
kernel/cgroup/cpuset.c:3718: warning: expecting prototype for
cpuset_memory_pressure_bump(). Prototype was for
__cpuset_memory_pressure_bump() instead.
kernel/cgroup/cpuset.c:3568: warning: expecting prototype for
cpuset_node_allowed(). Prototype was for __cpuset_node_allowed()
instead.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Another thing making netns dismantles potentially very slow is located
in gro_cells_destroy(),
whenever cleanup_net() has to remove a device using gro_cells framework.
RTNL is not held at this stage, so synchronize_net()
is calling synchronize_rcu():
netdev_run_todo()
ip_tunnel_dev_free()
gro_cells_destroy()
synchronize_net()
synchronize_rcu() // Ouch.
This patch uses call_rcu(), and gave me a 25x performance improvement
in my tests.
cleanup_net() is no longer blocked ~10 ms per synchronize_rcu()
call.
In the case we could not allocate the memory needed to queue the
deferred free, use synchronize_rcu_expedited()
v2: made percpu_free_defer_callback() static
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220220041155.607637-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull ITER_PIPE fix from Al Viro:
"Fix for old sloppiness in pipe_buffer reuse"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
lib/iov_iter: initialize "flags" in new pipe_buffer
|
|
The idea is to check: a) the owning user_ns of cgroup_ns, b)
capabilities in init_user_ns.
The commit 24f600856418 ("cgroup-v1: Require capabilities to set
release_agent") got this wrong in the write handler of release_agent
since it checked user_ns of the opener (may be different from the owning
user_ns of cgroup_ns).
Secondly, to avoid possibly confused deputy, the capability of the
opener must be checked.
Fixes: 24f600856418 ("cgroup-v1: Require capabilities to set release_agent")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/stable/20220216121142.GB30035@blackbody.suse.cz/
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Masami Ichikawa(CIP) <masami.ichikawa@cybertrust.co.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
With recent fixes for the permission checking when moving a task into a cgroup
using a file descriptor to a cgroup's cgroup.procs file and calling write() it
seems a good idea to clarify CLONE_INTO_CGROUP permission checking with a
comment.
Cc: Tejun Heo <tj@kernel.org>
Cc: <cgroups@vger.kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
io_rsrc_ref_quiesce will unlock the uring while it waits for references to
the io_rsrc_data to be killed.
There are other places to the data that might add references to data via
calls to io_rsrc_node_switch.
There is a race condition where this reference can be added after the
completion has been signalled. At this point the io_rsrc_ref_quiesce call
will wake up and relock the uring, assuming the data is unused and can be
freed - although it is actually being used.
To fix this check in io_rsrc_ref_quiesce if a resource has been revived.
Reported-by: syzbot+ca8bf833622a1662745b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220222161751.995746-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Almost all fault/warning bits in pmbus status registers remain set even
after fault/warning condition are removed. As per pmbus specification
these faults must be cleared by user.
Modify hwmon behavior to clear fault/warning bit after fetching data if
fault/warning bit was set. This allows to get fresh data in next read.
Signed-off-by: Vikash Chandola <vikash.chandola@linux.intel.com>
Link: https://lore.kernel.org/r/20220222131253.2426834-1-vikash.chandola@linux.intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
If an attempt is made to a sensor with a thermal zone and it fails,
the call to devm_thermal_zone_of_sensor_register() may return -ENODEV.
This may result in crashes similar to the following.
Unable to handle kernel NULL pointer dereference at virtual address 00000000000003cd
...
Internal error: Oops: 96000021 [#1] PREEMPT SMP
...
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mutex_lock+0x18/0x60
lr : thermal_zone_device_update+0x40/0x2e0
sp : ffff800014c4fc60
x29: ffff800014c4fc60 x28: ffff365ee3f6e000 x27: ffffdde218426790
x26: ffff365ee3f6e000 x25: 0000000000000000 x24: ffff365ee3f6e000
x23: ffffdde218426870 x22: ffff365ee3f6e000 x21: 00000000000003cd
x20: ffff365ee8bf3308 x19: ffffffffffffffed x18: 0000000000000000
x17: ffffdde21842689c x16: ffffdde1cb7a0b7c x15: 0000000000000040
x14: ffffdde21a4889a0 x13: 0000000000000228 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : 0000000001120000 x7 : 0000000000000001 x6 : 0000000000000000
x5 : 0068000878e20f07 x4 : 0000000000000000 x3 : 00000000000003cd
x2 : ffff365ee3f6e000 x1 : 0000000000000000 x0 : 00000000000003cd
Call trace:
mutex_lock+0x18/0x60
hwmon_notify_event+0xfc/0x110
0xffffdde1cb7a0a90
0xffffdde1cb7a0b7c
irq_thread_fn+0x2c/0xa0
irq_thread+0x134/0x240
kthread+0x178/0x190
ret_from_fork+0x10/0x20
Code: d503201f d503201f d2800001 aa0103e4 (c8e47c02)
Jon Hunter reports that the exact call sequence is:
hwmon_notify_event()
--> hwmon_thermal_notify()
--> thermal_zone_device_update()
--> update_temperature()
--> mutex_lock()
The hwmon core needs to handle all errors returned from calls
to devm_thermal_zone_of_sensor_register(). If the call fails
with -ENODEV, report that the sensor was not attached to a
thermal zone but continue to register the hwmon device.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Cc: Dmitry Osipenko <digetx@gmail.com>
Fixes: 1597b374af222 ("hwmon: Add notification support")
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
iocb_bio_iopoll() expects iocb->private to be cleared before
releasing the bio.
We already do this in blkdev_bio_end_io(), but we forgot in the
recently added blkdev_bio_end_io_async().
Fixes: 54a88eb838d3 ("block: add single bio async direct IO helper")
Cc: asml.silence@gmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220211090136.44471-1-sgarzare@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Russell King says:
====================
net: dsa: b53: convert to phylink_generic_validate() and mark as non-legacy
This series converts b53 to use phylink_generic_validate() and also
marks this driver as non-legacy.
Patch 1 cleans up an if() condition to be more readable before we
proceed with the conversion.
Patch 2 populates the supported_interfaces and mac_capabilities members
of phylink_config.
Patch 3 drops the use of phylink_helper_basex_speed() which is now not
necessary.
Patch 4 switches the driver to use phylink_generic_validate()
Patch 5 marks the driver as non-legacy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The B53 driver does not make use of the speed, duplex, pause or
advertisement in its phylink_mac_config() implementation, so it can be
marked as a non-legacy driver.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Switch the Broadcom b53 driver to using the phylink_generic_validate()
implementation by removing its own .phylink_validate method and
associated code.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Populate the supported interfaces and MAC capabilities for the Broadcom
B53 DSA switches in preparation to using these for the generic
validation functionality.
The interface modes are derived from:
- b53_serdes_phylink_validate()
- SRAB mux configuration
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I've stared at this if() statement for a while trying to work out if
it really does correspond with the comment above, and it does seem to.
However, let's make it more readable and phrase it in the same way as
the comment.
Also add a FIXME into the comment - we appear to deny Gigabit modes for
802.3z interface modes, but 802.3z interface modes only operate at
gigabit and above.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This is fixing up the use without proper initialization in patch 5/5
-o-
Hi,
The following patchset contains Netfilter fixes for net:
1) Missing #ifdef CONFIG_IP6_NF_IPTABLES in recent xt_socket fix.
2) Fix incorrect flow action array size in nf_tables.
3) Unregister flowtable hooks from netns exit path.
4) Fix missing limit object release, from Florian Westphal.
5) Memleak in nf_tables object update path, also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
stateful objects can be updated from the control plane.
The transaction logic allocates a temporary object for this purpose.
The ->init function was called for this object, so plain kfree() leaks
resources. We must call ->destroy function of the object.
nft_obj_destroy does this, but it also decrements the module refcount,
but the update path doesn't increment it.
To avoid special-casing the update object release, do module_get for
the update case too and release it via nft_obj_destroy().
Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation")
Cc: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This code dereferences "skb" after calling dev_kfree_skb().
Fixes: 2dc95a4d30ed ("net: Add dm9051 driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220221105440.GA10045@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In hsr, lockdep_is_held() is needed for rcu_dereference_bh_check().
But if lockdep is not enabled, lockdep_is_held() causes a build error:
ERROR: modpost: "lockdep_is_held" [net/hsr/hsr.ko] undefined!
Thus, this patch solved by adding lockdep_hsr_is_held(). This helper
function calls the lockdep_is_held() when lockdep is enabled, and returns 1
if not defined.
Fixes: e7f27420681f ("net: hsr: fix suspicious RCU usage warning in hsr_node_get_first()")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Link: https://lore.kernel.org/r/20220220153250.5285-1-claudiajkang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Two small fixes and one hardware-id addition"
* tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: int3472: Add terminator to gpiod_lookup_table
platform/x86: asus-wmi: Fix regression when probing for fan curve control
platform/x86: thinkpad_acpi: Add dual-fan quirk for T15g (2nd gen)
|
|
The functions copy_page_to_iter_pipe() and push_pipe() can both
allocate a new pipe_buffer, but the "flags" member initializer is
missing.
Fixes: 241699cd72a8 ("new iov_iter flavour: pipe-backed")
To: Alexander Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
To: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|