summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-04-26bnxt_en: allow VF config ops when PF is closedEdwin Peer1-4/+0
It is perfectly legal for the stack to query and configure VFs via PF NDOs while the NIC is administratively down. Remove the unnecessary check for the PF to be in open state. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26bnxt_en: allow promiscuous mode for trusted VFsEdwin Peer3-7/+11
Firmware previously only allowed promiscuous mode for VFs associated with a default VLAN. It is now possible to enable promiscuous mode for a VF having no VLAN configured provided that it is trusted. In such cases the VF will see all packets received by the PF, irrespective of destination MAC or VLAN. Note, it is necessary to query firmware at the time of bnxt_promisc_ok() instead of in bnxt_hwrm_func_qcfg() because the trusted status might be altered by the PF after the VF has been configured. This check must now also be deferred because the firmware call sleeps. Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26bnxt_en: Add support for fw managed link down feature.Michael Chan2-1/+3
In the current code, the driver will not shutdown the link during IFDOWN if there are still VFs sharing the port. Newer firmware will manage the link down decision when the port is shared by VFs, so we can just call firmware to shutdown the port unconditionally and let firmware make the final decision. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26bnxt_en: Add a new phy_flags field to the main driver structure.Michael Chan3-34/+22
Copy the phy related feature flags from the firmware call HWRM_PORT_PHY_QCAPS to this new field. We can also remove the flags field in the bnxt_test_info structure. It's cleaner to have all PHY related flags in one location, directly copied from the firmware. To keep the BNXT_PHY_CFG_ABLE() macro logic the same, we need to make a slight adjustment to check that it is a PF. Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26bnxt_en: report signal mode in link up messagesEdwin Peer1-3/+19
Firmware reports link signalling mode for certain speeds. In these cases, print the signalling modes in kernel log link up messages. Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26macvlan: Add nodst option to macvlan type sourceJethro Beekman2-5/+15
The default behavior for source MACVLAN is to duplicate packets to appropriate type source devices, and then do the normal destination MACVLAN flow. This patch adds an option to skip destination MACVLAN processing if any matching source MACVLAN device has the option set. This allows setting up a "catch all" device for source MACVLAN: create one or more devices with type source nodst, and one device with e.g. type vepa, and incoming traffic will be received on exactly one device. v2: netdev wants non-standard line length Signed-off-by: Jethro Beekman <kernel@jbeekman.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26Merge tag 'mlx5-updates-2021-04-21' of ↵David S. Miller20-434/+724
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-04-21 devlink external port attribute for SF (Sub-Function) port flavour This adds the support to instantiate Sub-Functions on external hosts E.g when Eswitch manager is enabled on the ARM SmarNic SoC CPU, users are now able to spawn new Sub-Functions on the Host server CPU. Parav Pandit Says: ================== This series introduces and uses external attribute for the SF port to indicate that a SF port belongs to an external controller. This is needed to generate unique phys_port_name when PF and SF numbers are overlapping between local and external controllers. For example two controllers 0 and 1, both of these controller have a SF. having PF number 0, SF number 77. Here, phys_port_name has duplicate entry which doesn't have controller number in it. Hence, add controller number optionally when a SF port is for an external controller. This extension is similar to existing PF and VF eswitch ports of the external controller. When a SF is for external controller an example view of external SF port and config sequence: On eswitch system: $ devlink dev eswitch set pci/0033:01:00.0 mode switchdev $ devlink port show pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1 pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached phys_port_name construction: $ cat /sys/class/net/eth1/phys_port_name c1pf0sf77 Patch summary: First 3 patches prepares the eswitch to handle vports in more generic way using xarray to lookup vport from its unique vport number. Patch-1 returns maximum eswitch ports only when eswitch is enabled Patch-2 prepares eswitch to return eswitch max ports from a struct Patch-3 uses xarray for vport and representor lookup Patch-4 considers SF for an additioanl range of SF vports Patch-5 relies on SF hw table to check SF support Patch-6 extends SF devlink port attribute for external flag Patch-7 stores the per controller SF allocation attributes Patch-8 uses SF function id for filtering events Patch-9 uses helper for allocation and free Patch-10 splits hw table into per controller table and generic one Patch-11 extends sf table for additional range ================== ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26net: ethernet: ixp4xx: Support device tree probingLinus Walleij3-63/+150
This adds device tree probing to the IXP4xx ethernet driver. Add a platform data bool to tell us whether to register an MDIO bus for the device or not, as well as the corresponding NPE. We need to drop the memory region request as part of this since the OF core will request the memory for the device. Cc: Zoltan HERPAI <wigyori@uid0.hu> Cc: Raylynn Knight <rayknight@me.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26net: ethernet: ixp4xx: Retire ancient phy retrievealLinus Walleij1-5/+5
This driver was using a really dated way of obtaining the phy by printing a string and using it with phy_connect(). Switch to using more reasonable modern interfaces. Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26net: ethernet: ixp4xx: Add DT bindingsLinus Walleij1-0/+102
This adds device tree bindings for the IXP4xx ethernet controller with optional MDIO bridge. Cc: Zoltan HERPAI <wigyori@uid0.hu> Cc: Raylynn Knight <rayknight@me.com> Cc: devicetree@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26r8152: remove some bit operationsHayes Wang1-7/+7
Remove DELL_TB_RX_AGG_BUG and LENOVO_MACPASSTHRU flags of rtl8152_flags. They are only set when initializing and wouldn't be change. It is enough to record them with variables. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26hv_netvsc: Make netvsc/VF binding check both MAC and serial numberDexuan Cui1-2/+12
Currently the netvsc/VF binding logic only checks the PCI serial number. The Microsoft Azure Network Adapter (MANA) supports multiple net_device interfaces (each such interface is called a "vPort", and has its unique MAC address) which are backed by the same VF PCI device, so the binding logic should check both the MAC address and the PCI serial number. The change should not break any other existing VF drivers, because Hyper-V NIC SR-IOV implementation requires the netvsc network interface and the VF network interface have the same MAC address. Co-developed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Co-developed-by: Shachar Raindel <shacharr@microsoft.com> Signed-off-by: Shachar Raindel <shacharr@microsoft.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26ch_ktls: Remove redundant variable resultJiapeng Chong1-6/+4
Variable result is being assigned a value from a calculation however the variable is never read, so this redundant variable can be removed. Cleans up the following clang-analyzer warning: drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:1488:2: warning: Value stored to 'pos' is never read [clang-analyzer-deadcode.DeadStores]. drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:876:3: warning: Value stored to 'pos' is never read [clang-analyzer-deadcode.DeadStores]. drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:36:3: warning: Value stored to 'start' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller69-866/+3141
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-04-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 69 non-merge commits during the last 22 day(s) which contain a total of 69 files changed, 3141 insertions(+), 866 deletions(-). The main changes are: 1) Add BPF static linker support for extern resolution of global, from Andrii. 2) Refine retval for bpf_get_task_stack helper, from Dave. 3) Add a bpf_snprintf helper, from Florent. 4) A bunch of miscellaneous improvements from many developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24net/mlx5: SF, Extend SF table for additional SF id rangeParav Pandit6-53/+140
Extended the SF table to cover additioanl SF id range of external controller. A user optionallly provides the external controller number when user wants to create SF on the external controller. An example on eswitch system: $ devlink dev eswitch set pci/0033:01:00.0 mode switchdev $ devlink port show pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1 pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: SF, Split mlx5_sf_hw_table into two partsParav Pandit1-29/+58
Device has SF ids in two different contiguous ranges. One for the local controller and second for the external controller's PF. Each such range has its own maximum number of functions and base id. To allocate SF from either of the range, prepare code to split into range specific fields into its own structure. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: SF, Use helpers for allocation and freeParav Pandit1-37/+60
Use helper routines for SF id and SF table allocation and free so that subsequent patch can reuse it for multiple SF function id range. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: SF, Consider own vhca events of SF devicesParav Pandit1-1/+11
Vhca events on eswitch manager are received for all the functions on the NIC, including for SFs of external host PF controllers. While SF device handler is only interested in SF devices events related to its own PF. Hence, validate if the function belongs to self or not. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: SF, Store and use start function idParav Pandit1-2/+8
SF ids in the device are in two different contiguous ranges. One for the local controller and second for the external host controller. Prepare code to handle multiple start function id by storing it in the table. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24devlink: Extend SF port attributes to have external attributeParav Pandit3-3/+15
Extended SF port attributes to have optional external flag similar to PCI PF and VF port attributes. External atttibute is required to generate unique phys_port_name when PF number and SF number are overlapping between two controllers similar to SR-IOV VFs. When a SF is for external controller an example view of external SF port and config sequence. On eswitch system: $ devlink dev eswitch set pci/0033:01:00.0 mode switchdev $ devlink port show pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1 pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached phys_port_name construction: $ cat /sys/class/net/eth1/phys_port_name c1pf0sf77 Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: SF, Rely on hw table for SF devlink port allocationParav Pandit3-8/+9
Supporting SF allocation is currently checked at two places: (a) SF devlink port allocator and (b) SF HW table handler. Both layers are using HCA CAP to identify it using helper routine mlx5_sf_supported() and mlx5_sf_max_functions(). Instead, rely on the HW table handler to check if SF is supported or not. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: E-Switch, Consider SF ports of host PFParav Pandit2-0/+56
Query SF vports count and base id of host PF from the firmware. Account these ports in the total port calculation whenever it is non zero. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: E-Switch, Use xarray for vport number to vport and rep mappingParav Pandit11-323/+380
Currently vport number to vport and its representor are mapped using an array and an index. Vport numbers of different types of functions are not contiguous. Adding new such discontiguous range using index and number mapping is increasingly complex and hard to maintain. Hence, maintain an xarray of vport and rep whose lookup is done based on the vport number. Each VF and SF entry is marked with a xarray mark to identify the function type. Additionally PF and VF needs special handling for legacy inline mode. They are additionally marked as host function using additional HOST_FN mark. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: E-Switch, Prepare to return total vports from eswitch structParav Pandit3-11/+14
Total vports are already stored during eswitch initialization. Instead of calculating everytime, read directly from eswitch. Additionally, host PF's SF vport information is available using QUERY_HCA_CAP command. It is not available through HCA_CAP of the eswitch manager PF. Hence, this patch prepares the return total eswitch vport count from the existing eswitch struct. This further helps to keep eswitch port counting macros and logic within eswitch. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24net/mlx5: E-Switch, Return eswitch max ports when eswitch is supportedParav Pandit3-16/+22
mlx5_eswitch_get_total_vports() doesn't honor MLX5_ESWICH Kconfig flag. When MLX5_ESWITCH is disabled, FS layer continues to initialize eswitch specific ACL namespaces. Instead, start honoring MLX5_ESWITCH flag and perform vport specific initialization only when vport count is non zero. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24bpf: Document the pahole release info related to libbpf in bpf_devel_QA.rstTiezhu Yang1-0/+13
pahole starts to use libbpf definitions and APIs since v1.13 after the commit 21507cd3e97b ("pahole: add libbpf as submodule under lib/bpf"). It works well with the git repository because the libbpf submodule will use "git submodule update --init --recursive" to update. Unfortunately, the default github release source code does not contain libbpf submodule source code and this will cause build issues, the tarball from https://git.kernel.org/pub/scm/devel/pahole/pahole.git/ is same with github, you can get the source tarball with corresponding libbpf submodule codes from https://fedorapeople.org/~acme/dwarves This change documents the above issues to give more information so that we can get the tarball from the right place, early discussion is here: https://lore.kernel.org/bpf/2de4aad5-fa9e-1c39-3c92-9bb9229d0966@loongson.cn/ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/bpf/1619141010-12521-1-git-send-email-yangtiezhu@loongson.cn
2021-04-24phy: nxp-c45-tja11xx: add interrupt supportRadu Pirea (NXP OSS)1-0/+33
Added .config_intr and .handle_interrupt callbacks. Link event interrupt will trigger an interrupt every time when the link goes up or down. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24net/atm: Fix spelling mistake "requed" -> "requeued"Colin Ian King1-1/+1
There is a spelling mistake in a printk message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24selftests/net: bump timeout to 5 minutesPo-Hsu Lin2-0/+3
We found that with the latest mainline kernel (5.12.0-051200rc8) on some KVM instances / bare-metal systems, the following tests will take longer than the kselftest framework default timeout (45 seconds) to run and thus got terminated with TIMEOUT error: * xfrm_policy.sh - took about 2m20s * pmtu.sh - took about 3m5s * udpgso_bench.sh - took about 60s Bump the timeout setting to 5 minutes to allow them have a chance to finish. https://bugs.launchpad.net/bugs/1856010 Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24Merge branch 'mptcp-msg-flags'David S. Miller3-27/+99
Mat Martineau says: ==================== mptcp: Compatibility with common msg flags These patches from the MPTCP tree handle some of the msg flags that are typically used with TCP, to make it easier to adapt userspace programs for use with MPTCP. Patches 1, 2, and 4 add support for MSG_ERRQUEUE (no-op for now), MSG_TRUNC, and MSG_PEEK on the receive side. Patch 3 ignores unsupported msg flags for send and receive. Patch 5 adds a selftest for MSG_PEEK. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24selftests: mptcp: add a test case for MSG_PEEKYonglong Li2-8/+69
Extend mptcp_connect tool with MSG_PEEK support and add a test case in mptcp_connect.sh that checks the data received from/after recv() with MSG_PEEK. Acked-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24mptcp: add MSG_PEEK supportYonglong Li1-9/+13
This patch adds support for MSG_PEEK flag. Packets are not removed from the receive_queue if MSG_PEEK set in recv() system call. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24mptcp: ignore unsupported msg flagsPaolo Abeni1-4/+5
Currently mptcp_sendmsg() fails with EOPNOTSUPP if the user-space provides some unsupported flag. That is unexpected and may foul existing applications migrated to MPTCP, which expect a different behavior. Change the mentioned function to silently ignore the unsupported flags except MSG_FASTOPEN. This is the only flags currently not supported by MPTCP with user-space visible side-effects. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/162 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24mptcp: implement MSG_TRUNC supportPaolo Abeni1-7/+9
The mentioned flag is currently silenlty ignored. This change implements the TCP-like behaviour, dropping the pending data up to the specified length. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Sigend-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24mptcp: implement dummy MSG_ERRQUEUE supportPaolo Abeni1-0/+4
mptcp_recvmsg() currently silently ignores MSG_ERRQUEUE, returning input data instead of error cmsg. This change provides a dummy implementation for MSG_ERRQUEUE - always returns no data. That is consistent with the current lack of a suitable IP_RECVERR setsockopt() support. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-24Merge branch 'BPF static linker: support externs'Alexei Starovoitov17-355/+1942
Andrii Nakryiko says: ==================== Add BPF static linker support for extern resolution of global variables, functions, and BTF-defined maps. This patch set consists of 4 parts: - few patches are extending bpftool to simplify working with BTF dump; - libbpf object loading logic is extended to support __hidden functions and overriden (unused) __weak functions; also BTF-defined map parsing logic is refactored to be re-used by linker; - the crux of the patch set is BPF static linker logic extension to perform extern resolution for three categories: global variables, BPF sub-programs, BTF-defined maps; - a set of selftests that validate that all the combinations of extern/weak/__hidden are working as expected. See respective patches for more details. One aspect hasn't been addressed yet and is going to be resolved in the next patch set, but is worth mentioning. With BPF static linking of multiple .o files, dealing with static everything becomes more problematic for BPF skeleton and in general for any by name look up APIs. This is due to static entities are allowed to have non-unique name. Historically this was never a problem due to BPF programs were always confined to a single C file. That changes now and needs to be addressed. The thinking so far is for BPF static linker to prepend filename to each static variable and static map (which is currently not supported by libbpf, btw), so that they can be unambiguously resolved by (mostly) unique name. Mostly, because even filenames can be duplicated, but that should be rare and easy to address by wiser choice of filenames by users. Fortunately, static BPF subprograms don't suffer from this issues, as they are not independent entities and are neither exposed in BPF skeleton, nor is lookup-able by any of libbpf APIs (and there is little reason to do that anyways). This and few other things will be the topic of the next set of patches. Some tests rely on Clang fix ([0]), so need latest Clang built from main. [0] https://reviews.llvm.org/D100362 v2->v3: - allow only STV_DEFAULT and STV_HIDDEN ELF symbol visibility (Yonghong); - update selftests' README for required Clang 13 fix dependency (Alexei); - comments, typos, slight code changes (Yonghong, Alexei); v1->v2: - make map externs support full attribute list, adjust linked_maps selftest to demonstrate that typedef works now (though no shared header file was added to simplicity sake) (Alexei); - remove commented out parts from selftests and fix few minor code style issues; - special __weak map definition semantics not yet implemented and will be addressed in a follow up. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-04-24selftests/bpf: Document latest Clang fix expectations for linking testsAndrii Nakryiko1-0/+9
Document which fixes are required to generate correct static linking selftests. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-19-andrii@kernel.org
2021-04-24selftests/bpf: Add map linking selftestAndrii Nakryiko4-1/+191
Add selftest validating various aspects of statically linking BTF-defined map definitions. Legacy map definitions do not support extern resolution between object files. Some of the aspects validated: - correct resolution of extern maps against concrete map definitions; - extern maps can currently only specify map type and key/value size and/or type information; - weak concrete map definitions are resolved properly. Static map definitions are not yet supported by libbpf, so they are not explicitly tested, though manual testing showes that BPF linker handles them properly. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-18-andrii@kernel.org
2021-04-24selftests/bpf: Add global variables linking selftestAndrii Nakryiko4-1/+154
Add selftest validating various aspects of statically linking global variables: - correct resolution of extern variables across .bss, .data, and .rodata sections; - correct handling of weak definitions; - correct de-duplication of repeating special externs (.kconfig, .ksyms). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-17-andrii@kernel.org
2021-04-24selftests/bpf: Add function linking selftestAndrii Nakryiko4-1/+190
Add selftest validating various aspects of statically linking functions: - no conflicts and correct resolution for name-conflicting static funcs; - correct resolution of extern functions; - correct handling of weak functions, both resolution itself and libbpf's handling of unused weak function that "lost" (it leaves gaps in code with no ELF symbols); - correct handling of hidden visibility to turn global function into "static" for the purpose of BPF verification. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-16-andrii@kernel.org
2021-04-24selftests/bpf: Omit skeleton generation for multi-linked BPF object filesAndrii Nakryiko1-1/+3
Skip generating individual BPF skeletons for files that are supposed to be linked together to form the final BPF object file. Very often such files are "incomplete" BPF object files, which will fail libbpf bpf_object__open() step, if used individually, thus failing BPF skeleton generation. This is by design, so skip individual BPF skeletons and only validate them as part of their linked final BPF object file and skeleton. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-15-andrii@kernel.org
2021-04-24selftests/bpf: Use -O0 instead of -Og in selftests buildsAndrii Nakryiko1-4/+4
While -Og is designed to work well with debugger, it's still inferior to -O0 in terms of debuggability experience. It will cause some variables to still be inlined, it will also prevent single-stepping some statements and otherwise interfere with debugging experience. So switch to -O0 which turns off any optimization and provides the best debugging experience. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-14-andrii@kernel.org
2021-04-24libbpf: Support extern resolution for BTF-defined maps in .maps sectionAndrii Nakryiko1-0/+132
Add extra logic to handle map externs (only BTF-defined maps are supported for linking). Re-use the map parsing logic used during bpf_object__open(). Map externs are currently restricted to always match complete map definition. So all the specified attributes will be compared (down to pining, map_flags, numa_node, etc). In the future this restriction might be relaxed with no backwards compatibility issues. If any attribute is mismatched between extern and actual map definition, linker will report an error, pointing out which one mismatches. The original intent was to allow for extern to specify attributes that matters (to user) to enforce. E.g., if you specify just key information and omit value, then any value fits. Similarly, it should have been possible to enforce map_flags, pinning, and any other possible map attribute. Unfortunately, that means that multiple externs can be only partially overlapping with each other, which means linker would need to combine their type definitions to end up with the most restrictive and fullest map definition. This requires an extra amount of BTF manipulation which at this time was deemed unnecessary and would require further extending generic BTF writer APIs. So that is left for future follow ups, if there will be demand for that. But the idea seems intresting and useful, so I want to document it here. Weak definitions are also supported, but are pretty strict as well, just like externs: all weak map definitions have to match exactly. In the follow up patches this most probably will be relaxed, with __weak map definitions being able to differ between each other (with non-weak definition always winning, of course). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-13-andrii@kernel.org
2021-04-24libbpf: Add linker extern resolution support for functions and global variablesAndrii Nakryiko1-59/+790
Add BPF static linker logic to resolve extern variables and functions across multiple linked together BPF object files. For that, linker maintains a separate list of struct glob_sym structures, which keeps track of few pieces of metadata (is it extern or resolved global, is it a weak symbol, which ELF section it belongs to, etc) and ties together BTF type info and ELF symbol information and keeps them in sync. With adding support for extern variables/funcs, it's now possible for some sections to contain both extern and non-extern definitions. This means that some sections may start out as ephemeral (if only externs are present and thus there is not corresponding ELF section), but will be "upgraded" to actual ELF section as symbols are resolved or new non-extern definitions are appended. Additional care is taken to not duplicate extern entries in sections like .kconfig and .ksyms. Given libbpf requires BTF type to always be present for .kconfig/.ksym externs, linker extends this requirement to all the externs, even those that are supposed to be resolved during static linking and which won't be visible to libbpf. With BTF information always present, static linker will check not just ELF symbol matches, but entire BTF type signature match as well. That logic is stricter that BPF CO-RE checks. It probably should be re-used by .ksym resolution logic in libbpf as well, but that's left for follow up patches. To make it unnecessary to rewrite ELF symbols and minimize BTF type rewriting/removal, ELF symbols that correspond to externs initially will be updated in place once they are resolved. Similarly for BTF type info, VAR/FUNC and var_secinfo's (sec_vars in struct bpf_linker) are staying stable, but types they point to might get replaced when extern is resolved. This might leave some left-over types (even though we try to minimize this for common cases of having extern funcs with not argument names vs concrete function with names properly specified). That can be addresses later with a generic BTF garbage collection. That's left for a follow up as well. Given BTF type appending phase is separate from ELF symbol appending/resolution, special struct glob_sym->underlying_btf_id variable is used to communicate resolution and rewrite decisions. 0 means underlying_btf_id needs to be appended (it's not yet in final linker->btf), <0 values are used for temporary storage of source BTF type ID (not yet rewritten), so -glob_sym->underlying_btf_id is BTF type id in obj-btf. But by the end of linker_append_btf() phase, that underlying_btf_id will be remapped and will always be > 0. This is the uglies part of the whole process, but keeps the other parts much simpler due to stability of sec_var and VAR/FUNC types, as well as ELF symbol, so please keep that in mind while reviewing. BTF-defined maps require some extra custom logic and is addressed separate in the next patch, so that to keep this one smaller and easier to review. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-12-andrii@kernel.org
2021-04-24libbpf: Tighten BTF type ID rewriting with error checkingAndrii Nakryiko1-0/+7
It should never fail, but if it does, it's better to know about this rather than end up with nonsensical type IDs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-11-andrii@kernel.org
2021-04-24libbpf: Extend sanity checking ELF symbols with externs validationAndrii Nakryiko1-9/+40
Add logic to validate extern symbols, plus some other minor extra checks, like ELF symbol #0 validation, general symbol visibility and binding validations. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-10-andrii@kernel.org
2021-04-24libbpf: Make few internal helpers available outside of libbpf.cAndrii Nakryiko3-18/+16
Make skip_mods_and_typedefs(), btf_kind_str(), and btf_func_linkage() helpers available outside of libbpf.c, to be used by static linker code. Also do few cleanups (error code fixes, comment clean up, etc) that don't deserve their own commit. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-9-andrii@kernel.org
2021-04-24libbpf: Factor out symtab and relos sanity checksAndrii Nakryiko1-106/+127
Factor out logic for sanity checking SHT_SYMTAB and SHT_REL sections into separate sections. They are already quite extensive and are suffering from too deep indentation. Subsequent changes will extend SYMTAB sanity checking further, so it's better to factor each into a separate function. No functional changes are intended. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-8-andrii@kernel.org
2021-04-24libbpf: Refactor BTF map definition parsingAndrii Nakryiko2-111/+178
Refactor BTF-defined maps parsing logic to allow it to be nicely reused by BPF static linker. Further, at least for BPF static linker, it's important to know which attributes of a BPF map were defined explicitly, so provide a bit set for each known portion of BTF map definition. This allows BPF static linker to do a simple check when dealing with extern map declarations. The same capabilities allow to distinguish attributes explicitly set to zero (e.g., __uint(max_entries, 0)) vs the case of not specifying it at all (no max_entries attribute at all). Libbpf is currently not utilizing that, but it could be useful for backwards compatibility reasons later. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-7-andrii@kernel.org
2021-04-24libbpf: Allow gaps in BPF program sections to support overriden weak functionsAndrii Nakryiko1-36/+22
Currently libbpf is very strict about parsing BPF program instruction sections. No gaps are allowed between sequential BPF programs within a given ELF section. Libbpf enforced that by keeping track of the next section offset that should start a new BPF (sub)program and cross-checks that by searching for a corresponding STT_FUNC ELF symbol. But this is too restrictive once we allow to have weak BPF programs and link together two or more BPF object files. In such case, some weak BPF programs might be "overridden" by either non-weak BPF program with the same name and signature, or even by another weak BPF program that just happened to be linked first. That, in turn, leaves BPF instructions of the "lost" BPF (sub)program intact, but there is no corresponding ELF symbol, because no one is going to be referencing it. Libbpf already correctly handles such cases in the sense that it won't append such dead code to actual BPF programs loaded into kernel. So the only change that needs to be done is to relax the logic of parsing BPF instruction sections. Instead of assuming next BPF (sub)program section offset, iterate available STT_FUNC ELF symbols to discover all available BPF subprograms and programs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210423181348.1801389-6-andrii@kernel.org