Age | Commit message (Collapse) | Author | Files | Lines |
|
Use array_size() helper instead of the open-coded version in
copy_to_user(). These sorts of multiplication factors need
to be wrapped in array_size().
Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use array_size() helper instead of the open-coded version in
copy_to_user(). These sorts of multiplication factors need
to be wrapped in array_size().
Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use array_size() helper instead of the open-coded version in
copy_{from,to}_user(). These sorts of multiplication factors
need to be wrapped in array_size().
Link: https://github.com/KSPP/linux/issues/160
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Guangbin Huang says:
====================
net: hns3: add some fixes for -net
This series adds some fixes for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the firmware compatible features are enabled in PF driver
initialization process, but they are not disabled in PF driver
deinitialization process and firmware keeps these features in enabled
status.
In this case, if load an old PF driver (for example, in VM) which not
support the firmware compatible features, firmware will still send mailbox
message to PF when link status changed and PF will print
"un-supported mailbox message, code = 201".
To fix this problem, disable these firmware compatible features in PF
driver deinitialization process.
Fixes: ed8fb4b262ae ("net: hns3: add link change event report")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the rx vlan filter will always be disabled before selftest and
be enabled after selftest as the rx vlan filter feature is fixed on in
old device earlier than V3.
However, this feature is not fixed in some new devices and it can be
disabled by user. In this case, it is wrong if rx vlan filter is enabled
after selftest. So fix it.
Fixes: bcc26e8dc432 ("net: hns3: remove unused code in hns3_self_test()")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If unicast mac address table is full, and user add a new mac address, the
unicast promisc needs to be enabled for the new unicast mac address can be
used. So does the multicast promisc.
Now this feature has been implemented for PF, and VF should be implemented
too. When the mac table of VF is overflow, PF will enable promisc for this
VF.
Fixes: 1e6e76101fd9 ("net: hns3: configure promisc mode for VF asynchronously")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, if function adds an existing unicast mac address, eventhough
driver will not add this address into hardware, but it will return 0 in
function hclge_add_uc_addr_common(). It will cause the state of this
unicast mac address is ACTIVE in driver, but it should be in TO-ADD state.
To fix this problem, function hclge_add_uc_addr_common() returns -EEXIST
if mac address is existing, and delete two error log to avoid printing
them all the time after this modification.
Fixes: 72110b567479 ("net: hns3: return 0 and print warning when hit duplicate MAC")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
HCLGE_FLAG_MQPRIO_ENABLE is supposed to set when enable
multiple TCs with tc mqprio, and HCLGE_FLAG_DCB_ENABLE is
supposed to set when enable multiple TCs with ets. But
the driver mixed the flags when updating the tm configuration.
Furtherly, PFC should be available when HCLGE_FLAG_MQPRIO_ENABLE
too, so remove the unnecessary limitation.
Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For destroy mqprio is irreversible in stack, so it's unnecessary
to rollback the tc configuration when destroy mqprio failed.
Otherwise, it may cause the configuration being inconsistent
between driver and netstack.
As the failure is usually caused by reset, and the driver will
restore the configuration after reset, so it can keep the
configuration being consistent between driver and hardware.
Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, in function hns3_nic_set_real_num_queue(), the
driver doesn't report the queue count and offset for disabled
tc. If user enables multiple TCs, but only maps user
priorities to partial of them, it may cause the queue range
of the unmapped TC being displayed abnormally.
Fix it by removing the tc enable checking, ensure the queue
count is not zero.
With this change, the tc_en is useless now, so remove it.
Fixes: a75a8efa00c5 ("net: hns3: Fix tc setup when netdev is first up")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
hns3_nic_net_open() is not allowed to called repeatly, but there
is no checking for this. When doing device reset and setup tc
concurrently, there is a small oppotunity to call hns3_nic_net_open
repeatedly, and cause kernel bug by calling napi_enable twice.
The calltrace information is like below:
[ 3078.222780] ------------[ cut here ]------------
[ 3078.230255] kernel BUG at net/core/dev.c:6991!
[ 3078.236224] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 3078.243431] Modules linked in: hns3 hclgevf hclge hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O)
[ 3078.258880] CPU: 0 PID: 295 Comm: kworker/u8:5 Tainted: G O 5.14.0-rc4+ #1
[ 3078.269102] Hardware name: , BIOS KpxxxFPGA 1P B600 V181 08/12/2021
[ 3078.276801] Workqueue: hclge hclge_service_task [hclge]
[ 3078.288774] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 3078.296168] pc : napi_enable+0x80/0x84
tc qdisc sho[w 3d0e7v8 .e3t0h218 79] lr : hns3_nic_net_open+0x138/0x510 [hns3]
[ 3078.314771] sp : ffff8000108abb20
[ 3078.319099] x29: ffff8000108abb20 x28: 0000000000000000 x27: ffff0820a8490300
[ 3078.329121] x26: 0000000000000001 x25: ffff08209cfc6200 x24: 0000000000000000
[ 3078.339044] x23: ffff0820a8490300 x22: ffff08209cd76000 x21: ffff0820abfe3880
[ 3078.349018] x20: 0000000000000000 x19: ffff08209cd76900 x18: 0000000000000000
[ 3078.358620] x17: 0000000000000000 x16: ffffc816e1727a50 x15: 0000ffff8f4ff930
[ 3078.368895] x14: 0000000000000000 x13: 0000000000000000 x12: 0000259e9dbeb6b4
[ 3078.377987] x11: 0096a8f7e764eb40 x10: 634615ad28d3eab5 x9 : ffffc816ad8885b8
[ 3078.387091] x8 : ffff08209cfc6fb8 x7 : ffff0820ac0da058 x6 : ffff0820a8490344
[ 3078.396356] x5 : 0000000000000140 x4 : 0000000000000003 x3 : ffff08209cd76938
[ 3078.405365] x2 : 0000000000000000 x1 : 0000000000000010 x0 : ffff0820abfe38a0
[ 3078.414657] Call trace:
[ 3078.418517] napi_enable+0x80/0x84
[ 3078.424626] hns3_reset_notify_up_enet+0x78/0xd0 [hns3]
[ 3078.433469] hns3_reset_notify+0x64/0x80 [hns3]
[ 3078.441430] hclge_notify_client+0x68/0xb0 [hclge]
[ 3078.450511] hclge_reset_rebuild+0x524/0x884 [hclge]
[ 3078.458879] hclge_reset_service_task+0x3c4/0x680 [hclge]
[ 3078.467470] hclge_service_task+0xb0/0xb54 [hclge]
[ 3078.475675] process_one_work+0x1dc/0x48c
[ 3078.481888] worker_thread+0x15c/0x464
[ 3078.487104] kthread+0x160/0x170
[ 3078.492479] ret_from_fork+0x10/0x18
[ 3078.498785] Code: c8027c81 35ffffa2 d50323bf d65f03c0 (d4210000)
[ 3078.506889] ---[ end trace 8ebe0340a1b0fb44 ]---
Once hns3_nic_net_open() is excute success, the flag
HNS3_NIC_STATE_DOWN will be cleared. So add checking for this
flag, directly return when HNS3_NIC_STATE_DOWN is no set.
Fixes: e888402789b9 ("net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Matt Johnston says:
====================
Updates to MCTP core
This series adds timeouts for MCTP tags (a limited resource), and a few
other improvements to the MCTP core.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Should not occur but is a sanity check.
May help tracking down Trinity reported issue
https://lore.kernel.org/lkml/20210913030701.GA5926@xsang-OptiPlex-9020/
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A route's RTAX_MTU can be set in nested RTAX_METRICS
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Describe common flows and refcounting behaviour.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In a future change, we'll want to provide a registration call for
mctp-specific devices. This requires us to have the networks established
before device driver inits, so run the core init as a subsys_initcall.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tag allocation, release and bind events are somewhat opaque outside
the kernel; this change adds a few tracepoints to assist in
instrumentation and debugging.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, a MCTP (local-eid,remote-eid,tag) tuple is allocated to a
socket on send, and only expires when the socket is closed.
This change introduces a tag timeout, freeing the tuple after a fixed
expiry - currently six seconds. This is greater than (but close to) the
max response timeout in upper-layer bindings.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, we tie the struct mctp_dev lifetime to the underlying struct
net_device, and hold/put that device as a proxy for a separate mctp_dev
refcount. This works because we're not holding any references to the
mctp_dev that are different from the netdev lifetime.
In a future change we'll break that assumption though, as we'll need to
hold mctp_dev references in a workqueue, which might live past the
netdev unregister notification.
In order to support that, this change introduces a refcount on the
mctp_dev, currently taken by the net_device->mctp_ptr reference, and
released on netdev unregister events. We can then use this for future
references that might outlast the net device.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We will want to invalidate sk_keys in a future change, which will
require a boolean flag to mark invalidated items in the socket & net
namespace lists. We'll also need to take a reference to keys, held over
non-atomic contexts, so we need a refcount on keys also.
This change adds a validity flag (currently always true) and refcount to
struct mctp_sk_key. With a refcount on the keys, using RCU no longer
makes much sense; we have exact indications on the lifetime of keys. So,
we also change the RCU list traversal to a locked implementation.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We may need to receive packets addressed to the null EID (==0), but
addressed to us at the physical layer.
This change adds a lookup for local routes when we see a packet
addressed to EID 0, and a local phys address.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allowing TUN is useful for testing, to route packets to userspace or to
tunnel between machines.
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The LAN8804 PHY has same features as that of LAN8814 PHY except that it
doesn't support 1588, SyncE or Q-USGMII.
This PHY is found inside the LAN966X switches.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ixgbe driver currently generates a NULL pointer dereference with
some machine (online cpus < 63). This is due to the fact that the
maximum value of num_xdp_queues is nr_cpu_ids. Code is in
"ixgbe_set_rss_queues"".
Here's how the problem repeats itself:
Some machine (online cpus < 63), And user set num_queues to 63 through
ethtool. Code is in the "ixgbe_set_channels",
adapter->ring_feature[RING_F_FDIR].limit = count;
It becomes 63.
When user use xdp, "ixgbe_set_rss_queues" will set queues num.
adapter->num_rx_queues = rss_i;
adapter->num_tx_queues = rss_i;
adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
And rss_i's value is from
f = &adapter->ring_feature[RING_F_FDIR];
rss_i = f->indices = f->limit;
So "num_rx_queues" > "num_xdp_queues", when run to "ixgbe_xdp_setup",
for (i = 0; i < adapter->num_rx_queues; i++)
if (adapter->xdp_ring[i]->xsk_umem)
It leads to panic.
Call trace:
[exception RIP: ixgbe_xdp+368]
RIP: ffffffffc02a76a0 RSP: ffff9fe16202f8d0 RFLAGS: 00010297
RAX: 0000000000000000 RBX: 0000000000000020 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000001c RDI: ffffffffa94ead90
RBP: ffff92f8f24c0c18 R8: 0000000000000000 R9: 0000000000000000
R10: ffff9fe16202f830 R11: 0000000000000000 R12: ffff92f8f24c0000
R13: ffff9fe16202fc01 R14: 000000000000000a R15: ffffffffc02a7530
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
7 [ffff9fe16202f8f0] dev_xdp_install at ffffffffa89fbbcc
8 [ffff9fe16202f920] dev_change_xdp_fd at ffffffffa8a08808
9 [ffff9fe16202f960] do_setlink at ffffffffa8a20235
10 [ffff9fe16202fa88] rtnl_setlink at ffffffffa8a20384
11 [ffff9fe16202fc78] rtnetlink_rcv_msg at ffffffffa8a1a8dd
12 [ffff9fe16202fcf0] netlink_rcv_skb at ffffffffa8a717eb
13 [ffff9fe16202fd40] netlink_unicast at ffffffffa8a70f88
14 [ffff9fe16202fd80] netlink_sendmsg at ffffffffa8a71319
15 [ffff9fe16202fdf0] sock_sendmsg at ffffffffa89df290
16 [ffff9fe16202fe08] __sys_sendto at ffffffffa89e19c8
17 [ffff9fe16202ff30] __x64_sys_sendto at ffffffffa89e1a64
18 [ffff9fe16202ff38] do_syscall_64 at ffffffffa84042b9
19 [ffff9fe16202ff50] entry_SYSCALL_64_after_hwframe at ffffffffa8c0008c
So I fix ixgbe_max_channels so that it will not allow a setting of queues
to be higher than the num_online_cpus(). And when run to ixgbe_xdp_setup,
take the smaller value of num_rx_queues and num_xdp_queues.
Fixes: 4a9b32f30f80 ("ixgbe: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
t-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-09-28
This series contains updates to ice driver only.
Dave adds support for QoS DSCP allowing for DSCP to TC mapping via APP
TLVs.
Ani adds enforcement of DSCP to only supported devices with the
introduction of a feature bitmap and corrects messaging of unsupported
modules based on link mode.
Jake refactors devlink info functions to be void as the functions no
longer return errors.
Jeff fixes a macro name to properly reflect the value.
Len Baker converts a kzalloc allocation to, the preferred, kcalloc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Subbaraya Sundeep <sbhatta@marvell.com>
====================
octeontx2: Add PTP support for VFs
PTP is a shared hardware block which can prepend
RX timestamps to packets before directing packets to
PFs or VFs and can notify the TX timestamps to PFs or VFs
via TX completion queue descriptors. Hence adding PTP
support for VFs is exactly similar to PFs with minimal changes.
This patchset adds that PTP support for VFs.
Patch 1 - When an interface is set in promisc/multicast
the same setting is not retained when changing mtu or channels.
This is due to toggling of the interface by driver but not
calling set_rx_mode in the down-up sequence. Since setting
an interface to multicast properly is required for ptp this is
addressed in this patch.
Patch 2 - Changes in VF driver for registering timestamping
ethtool ops and ndo_ioctl.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds PTP PHC support to NIX VF interfaces. This enables
a VF to run PTP master/slave instance. PTP block being a shared
hardware resource it is recommended to avoid running multiple
PTP instances in the system which will impact the PTP clock
accuracy.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Whenever the interface is brought up/down then set_rx_mode
function is called by the stack which enables promisc/allmulti
MCAM entries. But there are cases when driver brings
interface down and then up such as while changing number
of channels. In these cases promisc/allmulti MCAM entries
are left disabled as set_rx_mode callback is not called.
This patch enables these MCAM entries in all such cases.
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
clang build kernel failed the selftest probe_user.
$ ./test_progs -t probe_user
$ ...
$ test_probe_user:PASS:get_kprobe_res 0 nsec
$ test_probe_user:FAIL:check_kprobe_res wrong kprobe res from probe read: 0.0.0.0:0
$ #94 probe_user:FAIL
The test attached to kernel function __sys_connect(). In net/socket.c, we have
int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen)
{
......
}
...
SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
int, addrlen)
{
return __sys_connect(fd, uservaddr, addrlen);
}
The gcc compiler (8.5.0) does not inline __sys_connect() in syscall entry
function. But latest clang trunk did the inlining. So the bpf program
is not triggered.
To make the test more reliable, let us kprobe the syscall entry function
instead. Note that x86_64, arm64 and s390 have syscall wrappers and they have
to be handled specially.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210929033000.3711921-1-yhs@fb.com
|
|
Previously with CONFIG_QRTR=m a separate ns.ko would be built which
wasn't done on purpose and should be included in qrtr.ko.
Rename qrtr.c to af_qrtr.c so we can build a qrtr.ko with both af_qrtr.c
and ns.c.
Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-By: Steev Klimaszewski <steev@kali.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20210928171156.6353-1-luca@z3ntu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
fib_notifier.c hasn't use any macro or function declared
in net/netns/ipv4.h.
Thus, these files can be removed from fib_notifier.c safely
without affecting the compilation of the net/ipv4 module.
Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn>
Link: https://lore.kernel.org/r/20210928164011.1454-1-liumh1@shanghaitech.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sequence count bridge_mcast_querier::seq is protected by
net_bridge::multicast_lock but seqcount_init() does not associate the
seqcount with the lock. This leads to a warning on PREEMPT_RT because
preemption is still enabled.
Let seqcount_init() associate the seqcount with lock that protects the
write section. Remove lockdep_assert_held_once() because lockdep already checks
whether the associated lock is held.
Fixes: 67b746f94ff39 ("net: bridge: mcast: make sure querier port/address updates are consistent")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20210928141049.593833-1-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The second resource is optional which is only provided on the chipset
IPQ5018. But the blamed commit ignores that and if the resource is
not there it just fails.
the resource is used like this,
if (priv->eth_ldo_rdy) {
val = readl(priv->eth_ldo_rdy);
val |= BIT(0);
writel(val, priv->eth_ldo_rdy);
fsleep(IPQ_PHY_SET_DELAY_US);
}
This patch reverts that to still allow the second resource to be optional
because other SoC have the some MDIO controller and doesn't need to
second resource.
Fixes: fa14d03e014a ("net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()")
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210928134849.2092-1-caihuoqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kees Cook says:
====================
Hi,
In order to keep ahead of cases in the kernel where Control Flow Integrity
(CFI) may trip over function call casts, enabling -Wcast-function-type
is helpful. To that end, replace BPF_CAST_CALL() as it triggers warnings
with this option and is now one of the last places in the kernel in need
of fixing.
Thanks,
-Kees
v2:
- rebase to bpf-next
- add acks
v1: https://lore.kernel.org/lkml/20210927182700.2980499-1-keescook@chromium.org
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In order to keep ahead of cases in the kernel where Control Flow
Integrity (CFI) may trip over function call casts, enabling
-Wcast-function-type is helpful. To that end, BPF_CAST_CALL causes
various warnings and is one of the last places in the kernel
triggering this warning.
For actual function calls, replace BPF_CAST_CALL() with a typedef, which
captures the same details about the given function pointers.
This change results in no object code difference.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://github.com/KSPP/linux/issues/20
Link: https://lore.kernel.org/lkml/CAEf4Bzb46=-J5Fxc3mMZ8JQPtK1uoE0q6+g6WPz53Cvx=CBEhw@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210928230946.4062144-3-keescook@chromium.org
|
|
In order to keep ahead of cases in the kernel where Control Flow
Integrity (CFI) may trip over function call casts, enabling
-Wcast-function-type is helpful. To that end, BPF_CAST_CALL causes
various warnings and is one of the last places in the kernel triggering
this warning.
Most places using BPF_CAST_CALL actually just want a void * to perform
math on. It's not actually performing a call, so just use a different
helper to get the void *, by way of the new BPF_CALL_IMM() helper, which
can clean up a common copy/paste idiom as well.
This change results in no object code difference.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://github.com/KSPP/linux/issues/20
Link: https://lore.kernel.org/lkml/CAEf4Bzb46=-J5Fxc3mMZ8JQPtK1uoE0q6+g6WPz53Cvx=CBEhw@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210928230946.4062144-2-keescook@chromium.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some few pin control fixes for the v5.15 kernel cycle. The most
critical is the AMD fixes.
- Fix wakeup interrupts in the AMD driver affecting AMD laptops.
- Fix parent irqspec translation in the Qualcomm SPMI GPIO driver.
- Fix deferred probe handling in the Rockchip driver, this is a
stopgap solution while we look for something more elegant.
- Add PM suspend callbacks to the Qualcomm SC7280 driver.
- Some minor doc fix (should have come in earlier, sorry)"
* tag 'pinctrl-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: sc7280: Add PM suspend callbacks
gpio/rockchip: fetch deferred output settings on probe
pinctrl/rockchip: add a queue for deferred pin output settings on probe
pinctrl: qcom: spmi-gpio: correct parent irqspec translation
pinctrl: amd: Handle wake-up interrupt
pinctrl: amd: Add irq field data
pinctrl: core: Remove duplicated word from devm_pinctrl_unregister()
|
|
Pull VFIO fixes from Alex Williamson:
- Fix vfio-ap leak on uninit (Jason Gunthorpe)
- Add missing prototype arg name (Colin Ian King)
* tag 'vfio-v5.15-rc4' of git://github.com/awilliam/linux-vfio:
vfio/ap_ops: Add missed vfio_uninit_group_dev()
vfio/pci: add missing identifier name in argument of function prototype
|
|
"?:" is a GNU C extension, some environment has warning flags for its
use, or even prohibit it directly. This patch avoid triggering these
problems by simply expand it to its full form, no functionality change.
Signed-off-by: Yucong Sun <fallentree@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210928184221.1545079-1-fallentree@fb.com
|
|
Andrii Nakryiko says:
====================
Implement opt-in stricter BPF program section name (SEC()) handling logic. For
a lot of supported ELF section names, enforce exact section name match with no
arbitrary characters added at the end. See patch #9 for more details.
To allow this, patches #2 through #4 clean up and preventively fix selftests,
normalizing existing SEC() usage across multiple selftests. While at it, those
patches also reduce the amount of remaining bpf_object__find_program_by_title()
uses, which should be completely removed soon, given it's an API with
ambiguous semantics and will be deprecated and eventually removed in libbpf 1.0.
Patch #1 also introduces SEC("tc") as an alias for SEC("classifier"). "tc" is
a better and less misleading name, so patch #3 replaces all classifier* uses
with nice and short SEC("tc").
Last patch is also fixing "sk_lookup/" definition to not require and not allow
extra "/blah" parts after it, which serve no meaning.
All the other patches are gradual internal libbpf changes to:
- allow this optional strict logic for ELF section name handling;
- allow new use case (for now for "struct_ops", but that could be extended
to, say, freplace definitions), in which it can be used stand-alone to
specify just type (SEC("struct_ops")), or also accept extra parameters
which can be utilized by libbpf to either get more data or double-check
valid use (e.g., SEC("struct_ops/dctcp_init") to specify desired
struct_ops operation that is supposed to be implemented);
- get libbpf's internal logic ready to allow other libraries and
applications to specify their custom handlers for ELF section name for BPF
programs. All the pieces are in place, the only thing preventing making
this as public libbpf API is reliance on internal type for specifying BPF
program load attributes. The work is planned to revamp related low-level
libbpf APIs, at which point it will be possible to just re-use such new
types for coordination between libbpf and custom handlers.
These changes are a part of libbpf 1.0 effort ([0]). They are also intended to
be applied on top of the previous preparatory series [1], so currently CI will
be failing to apply them to bpf-next until that patch set is landed. Once it
is landed, kernel-patches daemon will automatically retest this patch set.
[0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handling
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=547675&state=*
v3->v4:
- replace SEC("classifier*") with SEC("tc") (Daniel);
v2->v3:
- applied acks, addressed most feedback, added comments to new flags (Dave);
v1->v2:
- rebase onto latest bpf-next and resolve merge conflicts w/ Dave's changes.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Update "sk_lookup/" definition to be a stand-alone type specifier,
with backwards-compatible prefix match logic in non-libbpf-1.0 mode.
Currently in selftests all the "sk_lookup/<whatever>" uses just use
<whatever> for duplicated unique name encoding, which is redundant as
BPF program's name (C function name) uniquely and descriptively
identifies the intended use for such BPF programs.
With libbpf's SEC_DEF("sk_lookup") definition updated, switch existing
sk_lookup programs to use "unqualified" SEC("sk_lookup") section names,
with no random text after it.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-11-andrii@kernel.org
|
|
Implement strict ELF section name handling for BPF programs. It utilizes
`libbpf_set_strict_mode()` framework and adds new flag: LIBBPF_STRICT_SEC_NAME.
If this flag is set, libbpf will enforce exact section name matching for
a lot of program types that previously allowed just partial prefix
match. E.g., if previously SEC("xdp_whatever_i_want") was allowed, now
in strict mode only SEC("xdp") will be accepted, which makes SEC("")
definitions cleaner and more structured. SEC() now won't be used as yet
another way to uniquely encode BPF program identifier (for that
C function name is better and is guaranteed to be unique within
bpf_object). Now SEC() is strictly BPF program type and, depending on
program type, extra load/attach parameter specification.
Libbpf completely supports multiple BPF programs in the same ELF
section, so multiple BPF programs of the same type/specification easily
co-exist together within the same bpf_object scope.
Additionally, a new (for now internal) convention is introduced: section
name that can be a stand-alone exact BPF program type specificator, but
also could have extra parameters after '/' delimiter. An example of such
section is "struct_ops", which can be specified by itself, but also
allows to specify the intended operation to be attached to, e.g.,
"struct_ops/dctcp_init". Note, that "struct_ops_some_op" is not allowed.
Such section definition is specified as "struct_ops+".
This change is part of libbpf 1.0 effort ([0], [1]).
[0] Closes: https://github.com/libbpf/libbpf/issues/271
[1] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handling
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-10-andrii@kernel.org
|
|
Complete SEC() table refactoring towards unified form by rewriting
BPF_APROG_SEC and BPF_EAPROG_SEC definitions with
SEC_DEF(SEC_ATTACHABLE_OPT) (for optional expected_attach_type) and
SEC_DEF(SEC_ATTACHABLE) (mandatory expected_attach_type), respectively.
Drop BPF_APROG_SEC, BPF_EAPROG_SEC, and BPF_PROG_SEC_IMPL macros after
that, leaving SEC_DEF() macro as the only one used.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-9-andrii@kernel.org
|
|
Refactor ELF section handler definitions table to use a set of flags and
unified SEC_DEF() macro. This allows for more succinct and table-like
set of definitions, and allows to more easily extend the logic without
adding more verbosity (this is utilized in later patches in the series).
This approach is also making libbpf-internal program pre-load callback
not rely on bpf_sec_def definition, which demonstrates that future
pluggable ELF section handlers will be able to achieve similar level of
integration without libbpf having to expose extra types and APIs.
For starters, update SEC_DEF() definitions and make them more succinct.
Also convert BPF_PROG_SEC() and BPF_APROG_COMPAT() definitions to
a common SEC_DEF() use.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-8-andrii@kernel.org
|
|
Move closer to not relying on bpf_sec_def internals that won't be part
of public API, when pluggable SEC() handlers will be allowed. Drop
pre-calculated prefix length, and in various helpers don't rely on this
prefix length availability. Also minimize reliance on knowing
bpf_sec_def's prefix for few places where section prefix shortcuts are
supported (e.g., tp vs tracepoint, raw_tp vs raw_tracepoint).
Given checking some string for having a given string-constant prefix is
such a common operation and so annoying to be done with pure C code, add
a small macro helper, str_has_pfx(), and reuse it throughout libbpf.c
where prefix comparison is performed. With __builtin_constant_p() it's
possible to have a convenient helper that checks some string for having
a given prefix, where prefix is either string literal (or compile-time
known string due to compiler optimization) or just a runtime string
pointer, which is quite convenient and saves a lot of typing and string
literal duplication.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-7-andrii@kernel.org
|
|
Refactor internals of libbpf to allow adding custom SEC() handling logic
easily from outside of libbpf. To that effect, each SEC()-handling
registration sets mandatory program type/expected attach type for
a given prefix and can provide three callbacks called at different
points of BPF program lifetime:
- init callback for right after bpf_program is initialized and
prog_type/expected_attach_type is set. This happens during
bpf_object__open() step, close to the very end of constructing
bpf_object, so all the libbpf APIs for querying and updating
bpf_program properties should be available;
- pre-load callback is called right before BPF_PROG_LOAD command is
called in the kernel. This callbacks has ability to set both
bpf_program properties, as well as program load attributes, overriding
and augmenting the standard libbpf handling of them;
- optional auto-attach callback, which makes a given SEC() handler
support auto-attachment of a BPF program through bpf_program__attach()
API and/or BPF skeletons <skel>__attach() method.
Each callbacks gets a `long cookie` parameter passed in, which is
specified during SEC() handling. This can be used by callbacks to lookup
whatever additional information is necessary.
This is not yet completely ready to be exposed to the outside world,
mainly due to non-public nature of struct bpf_prog_load_params. Instead
of making it part of public API, we'll wait until the planned low-level
libbpf API improvements for BPF_PROG_LOAD and other typical bpf()
syscall APIs, at which point we'll have a public, probably OPTS-based,
way to fully specify BPF program load parameters, which will be used as
an interface for custom pre-load callbacks.
But this change itself is already a good first step to unify the BPF
program hanling logic even within the libbpf itself. As one example, all
the extra per-program type handling (sleepable bit, attach_btf_id
resolution, unsetting optional expected attach type) is now more obvious
and is gathered in one place.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-6-andrii@kernel.org
|
|
Normalize all the other non-conforming SEC() usages across all
selftests. This is in preparation for libbpf to start to enforce
stricter SEC() rules in libbpf 1.0 mode.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-5-andrii@kernel.org
|
|
Convert all SEC("classifier*") uses to a new and strict SEC("tc")
section name. In reference_tracking selftests switch from ambiguous
searching by program title (section name) to non-ambiguous searching by
name in some selftests, getting closer to completely removing
bpf_object__find_program_by_title().
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-4-andrii@kernel.org
|
|
Convert almost all SEC("xdp_blah") uses to strict SEC("xdp") to comply
with strict libbpf 1.0 logic of exact section name match for XDP program
types. There is only one exception, which is only tested through
iproute2 and defines multiple XDP programs within the same BPF object.
Given iproute2 still works in non-strict libbpf mode and it doesn't have
means to specify XDP programs by its name (not section name/title),
leave that single file alone for now until iproute2 gains lookup by
function/program name.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20210928161946.2512801-3-andrii@kernel.org
|