Age | Commit message (Collapse) | Author | Files | Lines |
|
Move htb related functions and data to a separated file for better
encapsulation.
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Following the change of the functions to be object like, change also
the names.
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
As a step to make htb self-contained replace the passing of priv as a
parameter to htb function calls with members in the htb struct.
Full decoupling the htb from priv will require more work, so for now
leave the priv as one of the members in the htb struct, to be replaced
by channels in a future commit.
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Move structure mlx5e_htb from the main driver include file "en.h" to be
hidden in qos.c where the qos functionality is implemented, forward
declare it for the rest of the driver and allocate it dynamically upon
user demand only.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
|
|
Preparation for dynamic allocation of the HTB struct.
The statistics should be preserved even when the struct is de-allocated.
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5e_get_qos_sq is a part of the SQ lifecycle, so need be under the
title.
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
HTB id fields are needed for selecting queue. Moving them to the
selq_params struct will simplify synchronization between control flow
and mlx5e_select_queues and will keep the IDs in the hot cacheline of
mlx5e_selq_params.
Replace mlx5e_selq_prepare() with separate functions that change subsets
of parameters, while keeping the rest.
This also will be useful to hide mlx5e_htb structure from the rest of the
driver in a later patch in this series.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
|
|
No need to expose all htb tc functions to the main driver file,
expose only the master htb tc function mlx5e_htb_setup_tc()
which selects the internal "now static" function to call.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
|
|
Keep mqprio_rl data to params and restore the configuration in case of
devlink reload.
Change the location of mqprio_rl resources cleanup so it will be done
also in reload flow.
Also, remove the rl pointer from the params, since this is dynamic object
and saved to priv.
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
HW-GRO (SHAMPO) packet merger scheme implies header-data split in the
driver, report it through the ethtool interface.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Stephane Grosjean contributes a peak-usb cleanup and update series.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The CANFD-USB PCAN-USB FD interface undergoes an internal component
change that requires a slight modification of its drivers, which leads
them to dynamically use endpoint numbers provided by the interface
itself. In addition to a change in the calls to the USB functions
exported by the kernel, the detection of the USB interface dedicated
to CAN must also be modified, as some PEAK-System devices support
other interfaces than CAN.
Link: https://lore.kernel.org/all/20220719120632.26774-3-s.grosjean@peak-system.com
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
[mkl: add missing cpu_to_le16() conversion]
[mkl: fix networking block comment style]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The data structure returned from the USB device contains a number
flashed by the user and not the serial number of the device.
Link: https://lore.kernel.org/all/20220719120632.26774-2-s.grosjean@peak-system.com
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Mark the input prompt and data pointer as const.
Link: https://lore.kernel.org/all/20220719120632.26774-1-s.grosjean@peak-system.com
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
[mkl: mark data pointer as const, too; update commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The update is compatible/pure extension of 2.x IP core version
- new option for 2, 4, or 8 Tx buffers option during synthesis.
The 2.x version has fixed 4 Tx buffers. 3.x version default
is 4 as well
- new REG_TX_COMMAND_TXT_BUFFER_COUNT provides synthesis
choice. When read as 0 assume 2.x core with fixed 4 Tx buffers.
- new REG_ERR_CAPT_TS_BITS field to provide most significant
active/implemented timestamp bit. For 2.x read as zero,
assume value 63 is such case for 64 bit counter.
- new REG_MODE_RXBAM bit which controls automatic advance
to next word after Rx FIFO register read. Bit is set
to 1 by default after the core reset (REG_MODE_RST)
and value 1 has to be preserved for the normal ctucanfd
Linux driver operation. Even preceding driver version
resets core and then modifies only known/required MODE
register bits so backward and forward compatibility is
ensured.
See complete datasheet for time-triggered and other
updated capabilities
http://canbus.pages.fel.cvut.cz/ctucanfd_ip_core/doc/Datasheet.pdf
The fields related to ongoing Ondrej Ille's work
on fault tolerant version with parity protected buffers
and FIFOs are not included for now. Their inclusion will
be considered when design is settled and tested.
Link: https://lore.kernel.org/all/14a98ed1829121f0f3bde784f1aa533bc3cc7fe0.1658139843.git.pisa@cmp.felk.cvut.cz
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The comment referred to a status (warning) other than the one that was
being managed (active error).
Link: https://lore.kernel.org/all/20220716170112.2020291-1-dario.binacchi@amarulasolutions.com
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
We can't call close_candev() with a spin lock held, so release the lock
before calling it. After calling close_candev(), we can update the
fields of the private `struct can_priv' without having to acquire the
lock.
Fixes: c4e54b063f42f ("can: slcan: use CAN network device driver API")
Link: https://lore.kernel.org/linux-kernel/Ysrf1Yc5DaRGN1WE@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/all/20220715072951.859586-1-dario.binacchi@amarulasolutions.com
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Biju Das says:
====================
Add support for RZ/N1 SJA1000 CAN controller
This patch series aims to add support for RZ/N1 SJA1000 CAN controller.
The SJA1000 CAN controller on RZ/N1 SoC has some differences compared
to others like it has no clock divider register (CDR) support and it has
no HW loopback (HW doesn't see tx messages on rx), so introduced a new
compatible 'renesas,rzn1-sja1000' to handle these differences.
v3->v4:
* Updated bindings as per coding style used in example-schema.
* Entire entry in properties compatible declared as enum. Also Descriptions
do not bring any information,so removed it from compatible description.
* Used decimal values in nxp,tx-output-mode enums.
* Fixed indentaions in binding examples.
* Removed clock-names from bindings, as it is single clock.
* Optimized the code as per Vincent's suggestion.
* Updated clock handling as per bindings.
v2->v3:
* Added reg-io-width is a required property for technologic,sja1000 & renesas,rzn1-sja1000
* Removed enum type from nxp,tx-output-config and updated the description
for combination of TX0 and TX1.
* Updated the example for technologic,sja1000
v1->v2:
* Moved $ref: can-controller.yaml# to top along with if conditional to
avoid multiple mapping issues with the if conditional in the subsequent
patch.
* Added an example for RZ/N1D SJA1000 usage.
* Updated commit description for patch#2,#3 and #6
* Removed the quirk macro SJA1000_NO_HW_LOOPBACK_QUIRK
* Added prefix SJA1000_QUIRK_* for quirk macro.
* Replaced of_device_get_match_data->device_get_match_data.
* Added error handling on clk error path
* Started using "devm_clk_get_optional_enabled" for clk get,prepare and enable.
Ref:
[1] https://lore.kernel.org/linux-renesas-soc/20220701162320.102165-1-biju.das.jz@bp.renesas.com/T/#t
====================
Link: https://lore.kernel.org/all/20220710115248.190280-1-biju.das.jz@bp.renesas.com
[mkl: applying patches 1...5 only, as 6 depends
devm_clk_get_optional_enabled(), which is not in
net-next/master, yet]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Change the return type as void for SoC specific init function as it
always return 0.
Link: https://lore.kernel.org/all/20220710115248.190280-6-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This patch replaces of_match_device->device_get_match_data
to get pointer to device data.
Link: https://lore.kernel.org/all/20220710115248.190280-5-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
As per Chapter 6.5.16 of the RZ/N1 Peripheral Manual, The SJA1000
CAN controller does not support Clock Divider Register compared to
the reference Philips SJA1000 device.
This patch adds a device quirk to handle this difference.
Link: https://lore.kernel.org/all/20220710115248.190280-4-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Add CAN binding documentation for Renesas RZ/N1 SoC.
The SJA1000 CAN controller on RZ/N1 SoC has some differences compared
to others like it has no clock divider register (CDR) support and it has
no HW loopback (HW doesn't see tx messages on rx), so introduced a new
compatible 'renesas,rzn1-sja1000' to handle these differences.
Link: https://lore.kernel.org/all/20220710115248.190280-3-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Convert the NXP SJA1000 CAN Controller Device Tree binding
documentation to json-schema.
Update the example to match reality.
Link: https://lore.kernel.org/all/20220710115248.190280-2-biju.das.jz@bp.renesas.com
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Marc Kleine-Budde says:
====================
can: slcan: checkpatch cleanups
This is a patch series consisting of various checkpatch cleanups for
the slcan driver.
====================
Link: https://lore.kernel.org/all/20220704125954.1587880-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Remove braces after if() for single statement blocks, also remove else
after return() in if() block.
Link: https://lore.kernel.org/all/20220704125954.1587880-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
All comparison to NULL could be written "!val", convert them to make
checkpatch happy.
Link: https://lore.kernel.org/all/20220704125954.1587880-5-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Add and remove whitespace to make checkpatch happy.
Link: https://lore.kernel.org/all/20220704125954.1587880-4-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Convert the last printk(LEVEL ...) to pr_level().
Link: https://lore.kernel.org/all/20220704125954.1587880-3-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Convert all comments to network subsystem style comments.
Link: https://lore.kernel.org/all/20220704125954.1587880-2-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The snprintf() function returns the number of bytes which *would* have
been copied if there were no space. So, since this code does not check
the return value, there if the buffer was not large enough then there
would be a buffer overflow two lines later when it does:
actual = sl->tty->ops->write(sl->tty, sl->xbuff, n);
Use scnprintf() instead because that returns the number of bytes which
were actually copied.
Fixes: 52f9ac85b876 ("can: slcan: allow to send commands to the adapter")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/all/YsVA9KoY/ZSvNGYk@kili
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The snprintf() function returns the number of bytes it *would* have
copied if there were enough space. So it can return > the
sizeof(gen->attach_target).
Fixes: 67234743736a ("libbpf: Generate loader program out of BPF ELF file.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/YtZ+oAySqIhFl6/J@kili
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The snprintf() function returns the number of bytes which *would*
have been copied if there were space. In other words, it can be
> sizeof(pin_path).
Fixes: c0fa1b6c3efc ("bpf: btf: Add BTF tests")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/YtZ+aD/tZMkgOUw+@kili
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add documentation for BPF_MAP_TYPE_HASH including kernel version
introduced, usage and examples. Document BPF_MAP_TYPE_PERCPU_HASH,
BPF_MAP_TYPE_LRU_HASH and BPF_MAP_TYPE_LRU_PERCPU_HASH variations.
Note that this file is included in the BPF documentation by the glob in
Documentation/bpf/maps.rst
v3:
Fix typos reported by Stanislav Fomichev and Yonghong Song.
Add note about iteration and deletion as requested by Yonghong Song.
v2:
Describe memory allocation semantics as suggested by Stanislav Fomichev.
Fix u64 typo reported by Stanislav Fomichev.
Cut down usage examples to only show usage in context.
Updated patch description to follow style recommendation, reported by
Bagas Sanjaya.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220718125847.1390-1-donald.hunter@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add test validating that libbpf adjusts (and reflects adjusted) ringbuf
size early, before bpf_object is loaded. Also make sure we can't
successfully resize ringbuf map after bpf_object is loaded.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220715230952.2219271-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Make libbpf adjust RINGBUF map size (rounding it up to closest power-of-2
of page_size) more eagerly: during open phase when initializing the map
and on explicit calls to bpf_map__set_max_entries().
Such approach allows user to check actual size of BPF ringbuf even
before it's created in the kernel, but also it prevents various edge
case scenarios where BPF ringbuf size can get out of sync with what it
would be in kernel. One of them (reported in [0]) is during an attempt
to pin/reuse BPF ringbuf.
Move adjust_ringbuf_sz() helper closer to its first actual use. The
implementation of the helper is unchanged.
Also make detection of whether bpf_object is already loaded more robust
by checking obj->loaded explicitly, given that map->fd can be < 0 even
if bpf_object is already loaded due to ability to disable map creation
with bpf_map__set_autocreate(map, false).
[0] Closes: https://github.com/libbpf/libbpf/pull/530
Fixes: 0087a681fa8c ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220715230952.2219271-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Fix documentation for bpf_skb_pull_data() helper for
when len == 0.
Fixes: fa15601ab31e ("bpf: add documentation for eBPF helpers (33-41)")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220715193800.3940070-1-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Teach libbpf to fallback to tracefs mount point (/sys/kernel/tracing) if
debugfs (/sys/kernel/debug/tracing) isn't mounted.
Acked-by: Yonghong Song <yhs@fb.com>
Suggested-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715185736.898848-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
skbs, that is, the flow->head is null.
The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
run a bpf prog which redirects empty skbs.
So we should determine whether the length of the packet modified by bpf
prog or others like bpf_prog_test is valid before forwarding it directly.
LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5
LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html
Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Andrii Nakryiko says:
====================
Fix 32-bit overflow in value pointer calculations in BPF array map. And then
raise obsolete limit on array map value size. Add selftest making sure this is
working as intended.
v1->v2:
- fix broken patch #1 (no mask_index use in helper, as stated in commit
message; and add missing semicolon).
====================
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a simple big 16MB array and validate access to the very last byte of
it to make sure that kernel supports > KMALLOC_MAX_SIZE value_size for
BPF array maps (which are backing .bss in this case).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715053146.1291891-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Syscall-side map_lookup_elem() and map_update_elem() used to use
kmalloc() to allocate temporary buffers of value_size, so
KMALLOC_MAX_SIZE limit on value_size made sense to prevent creation of
array map that won't be accessible through syscall interface.
But this limitation since has been lifted by relying on kvmalloc() in
syscall handling code. So remove KMALLOC_MAX_SIZE, which among other
things means that it's possible to have BPF global variable sections
(.bss, .data, .rodata) bigger than 8MB now. Keep the sanity check to
prevent trivial overflows like round_up(map->value_size, 8) and restrict
value size to <= INT_MAX (2GB).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715053146.1291891-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
BPF_MAP_TYPE_ARRAY is rounding value_size to closest multiple of 8 and
stores that as array->elem_size for various memory allocations and
accesses.
But the code tends to re-calculate round_up(map->value_size, 8) in
multiple places instead of using array->elem_size. Cleaning this up and
making sure we always use array->size to avoid duplication of this
(admittedly simple) logic for consistency.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715053146.1291891-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
If BPF array map is bigger than 4GB, element pointer calculation can
overflow because both index and elem_size are u32. Fix this everywhere
by forcing 64-bit multiplication. Extract this formula into separate
small helper and use it consistently in various places.
Speculative-preventing formula utilizing index_mask trick is left as is,
but explicit u64 casts are added in both places.
Fixes: c85d69135a91 ("bpf: move memory size checks to bpf_map_charge_init()")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220715053146.1291891-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The vlen bits in the BTF type of kind BTF_KIND_FUNC are used to convey the
linkage information for functions. The Linux kernel only supports
linkage values of BTF_FUNC_STATIC and BTF_FUNC_GLOBAL at this time.
Signed-off-by: Indu Bhagat <indu.bhagat@oracle.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220714223310.1140097-1-indu.bhagat@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This particular ones is about having the following:
CONFIG_BPF_LSM=y
# CONFIG_CGROUP_BPF is not set
Also, add __maybe_unused to the args for the !CONFIG_NET cases.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220714185404.3647772-1-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Andrii Nakryiko says:
====================
Add SEC("ksyscall")/SEC("kretsyscall") sections and corresponding
bpf_program__attach_ksyscall() API that simplifies tracing kernel syscalls
through kprobe mechanism. Kprobing syscalls isn't trivial due to varying
syscall handler names in the kernel and various ways syscall argument are
passed, depending on kernel architecture and configuration. SEC("ksyscall")
allows user to not care about such details and just get access to syscall
input arguments, while libbpf takes care of necessary feature detection logic.
There are still more quirks that are not straightforward to hide completely
(see comments about mmap(), clone() and compat syscalls), so in such more
advanced scenarios user might need to fall back to plain SEC("kprobe")
approach, but for absolute majority of users SEC("ksyscall") is a big
improvement.
As part of this patch set libbpf adds two more virtual __kconfig externs, in
addition to existing LINUX_KERNEL_VERSION: LINUX_HAS_BPF_COOKIE and
LINUX_HAS_SYSCALL_WRAPPER, which let's libbpf-provided BPF-side code minimize
external dependencies and assumptions and let's user-space part of libbpf to
perform all the feature detection logic. This benefits USDT support code,
which now doesn't depend on BPF CO-RE for its functionality.
v1->v2:
- normalize extern variable-related warn and debug message formats (Alan);
rfc->v1:
- drop dependency on kallsyms and speed up SYSCALL_WRAPPER detection (Alexei);
- drop dependency on /proc/config.gz in bpf_tracing.h (Yaniv);
- add doc comment and ephasize mmap(), clone() and compat quirks that are
not supported (Ilya);
- use mechanism similar to LINUX_KERNEL_VERSION to also improve USDT code.
====================
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Convert few selftest that used plain SEC("kprobe") with arch-specific
syscall wrapper prefix to ksyscall/kretsyscall and corresponding
BPF_KSYSCALL macro. test_probe_user.c is especially benefiting from this
simplification.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
kretsyscall variants (for return kprobes) to allow users to kprobe
syscall functions in kernel. These special sections allow to ignore
complexities and differences between kernel versions and host
architectures when it comes to syscall wrapper and corresponding
__<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
whether host kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER (though libbpf
itself doesn't rely on /proc/config.gz for detecting this, see
BPF_KSYSCALL patch for how it's done internally).
Combined with the use of BPF_KSYSCALL() macro, this allows to just
specify intended syscall name and expected input arguments and leave
dealing with all the variations to libbpf.
In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
bpf_program__attach_ksyscall() API which allows to specify syscall name
at runtime and provide associated BPF cookie value.
At the moment SEC("ksyscall") and bpf_program__attach_ksyscall() do not
handle all the calling convention quirks for mmap(), clone() and compat
syscalls. It also only attaches to "native" syscall interfaces. If host
system supports compat syscalls or defines 32-bit syscalls in 64-bit
kernel, such syscall interfaces won't be attached to by libbpf.
These limitations may or may not change in the future. Therefore it is
recommended to use SEC("kprobe") for these syscalls or if working with
compat and 32-bit interfaces is required.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Improve BPF_KPROBE_SYSCALL (and rename it to shorter BPF_KSYSCALL to
match libbpf's SEC("ksyscall") section name, added in next patch) to use
__kconfig variable to determine how to properly fetch syscall arguments.
Instead of relying on hard-coded knowledge of whether kernel's
architecture uses syscall wrapper or not (which only reflects the latest
kernel versions, but is not necessarily true for older kernels and won't
necessarily hold for later kernel versions on some particular host
architecture), determine this at runtime by attempting to create
perf_event (with fallback to kprobe event creation through tracefs on
legacy kernels, just like kprobe attachment code is doing) for kernel
function that would correspond to bpf() syscall on a system that has
CONFIG_ARCH_HAS_SYSCALL_WRAPPER set (e.g., for x86-64 it would try
'__x64_sys_bpf').
If host kernel uses syscall wrapper, syscall kernel function's first
argument is a pointer to struct pt_regs that then contains syscall
arguments. In such case we need to use bpf_probe_read_kernel() to fetch
actual arguments (which we do through BPF_CORE_READ() macro) from inner
pt_regs.
But if the kernel doesn't use syscall wrapper approach, input
arguments can be read from struct pt_regs directly with no probe reading.
All this feature detection is done without requiring /proc/config.gz
existence and parsing, and BPF-side helper code uses newly added
LINUX_HAS_SYSCALL_WRAPPER virtual __kconfig extern to keep in sync with
user-side feature detection of libbpf.
BPF_KSYSCALL() macro can be used both with SEC("kprobe") programs that
define syscall function explicitly (e.g., SEC("kprobe/__x64_sys_bpf"))
and SEC("ksyscall") program added in the next patch (which are the same
kprobe program with added benefit of libbpf determining correct kernel
function name automatically).
Kretprobe and kretsyscall (added in next patch) programs don't need
BPF_KSYSCALL as they don't provide access to input arguments. Normal
BPF_KRETPROBE is completely sufficient and is recommended.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Exercise libbpf's logic for unknown __weak virtual __kconfig externs.
USDT selftests are already excercising non-weak known virtual extern
already (LINUX_HAS_BPF_COOKIE), so no need to add explicit tests for it.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220714070755.3235561-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|