Age | Commit message (Collapse) | Author | Files | Lines |
|
Add support for ASO action of type flow metering
on device that supports STEv1.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Hamdan Igbaria <hamdani@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Use skb_inner_tcp_all_headers() instead of skb_tcp_all_headers() when
transmitting an encapsulated packet in mlx5e_tx_get_gso_ihs().
Fixes: 504148fedb85 ("net: add skb_[inner_]tcp_all_headers helpers")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently, driver is setting default values to all timeouts during
function setup. The offending commit is using a timeout before
function setup, meaning: the timeout is 0 (or garbage), since no
value have been set.
This may result in failure to probe the driver:
mlx5_function_setup:1034:(pid 69850): Firmware over 4294967296 MS in pre-initializing state, aborting
probe_one:1591:(pid 69850): mlx5_init_one failed with error code -16
Hence, set default values to timeouts during tout_init()
Fixes: 37ca95e62ee2 ("net/mlx5: Increase FW pre-init timeout for health recovery")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Fix several issues in SMFS steering info dump:
- Fix outdated macro value for matcher mask in the SMFS debug dump format.
The existing value denotes the old format of the matcher mask, as it was
used during the early stages of development, and it results in wrong
parsing by the steering dump parser - wrong fields are shown in the
parsed output.
- Add the missing destination table to the dumped action.
The missing dest table handle breaks the ability to associate between
the "go to table" action and the actual table in the steering info.
Fixes: 9222f0b27da2 ("net/mlx5: DR, Add support for dumping steering info")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commit limited log_max_qp to be 17 due to FW capabilities.
Recently, it turned out that there are old FW versions that supported
more than 17, so the cited commit caused a degradation.
Thus, set the maximum log_max_qp back to 18 as it was before the
cited commit.
Fixes: 7f839965b2d7 ("net/mlx5: Update log_max_qp value to be 17 at most")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
While extending available range of supported chains/prios referenced commit
also modified slow path rules to go to FT chain instead of actual slow FDB.
However neither of existing users of the MLX5_ATTR_FLAG_SLOW_PATH
flag (tunnel encap entries with invalid encap and flows with trap action)
need to match on FT chain. After bridge offload was implemented packets of
such flows can also be matched by bridge priority tables which is
undesirable. Restore slow path flows implementation to redirect packets to
slow_fdb.
Fixes: 278d51f24330 ("net/mlx5: E-Switch, Increase number of chains and priorities")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Before commit 76c31e5f7585 ("net/mlx5e: Use FW limitation for max MPW
WQEBBs"), the maximum size of MPWQE in WQEBBs was hardcoded as a driver
constant. That commit started using the firmware capability that can
further limit the size, however, it unintentionally changed a few
things:
1. The calculation of MLX5E_MAX_KLM_PER_WQE used the size in DS, which
was replaced by the size in WQEBBs, making the resulting value 4 times
smaller.
2. MLX5E_TX_MPW_MAX_WQEBBS used to be aligned to the cache line size
(either 64 or 128 bytes, i.e. 1 or 2 WQEBBs), but it's no longer the
case if the firmware capability is smaller than the driver maximum.
Fix both issues by using the correct units for MLX5E_MAX_KLM_PER_WQE and
by aligning mlx5e_get_sw_max_sq_mpw_wqebbs after taking the minimum.
Besides fixing the arithmetics in calculation of MLX5E_MAX_KLM_PER_WQE,
also use appropriate constants: `size of BSF * num of DS per WQEBB *
number of WQEBBs` (the calculation before the blamed commit) doesn't
make much sense to calculate the WQE size in bytes, so just use `size of
WQEBB * number of WQEBBs`.
While at it, replace the types that hold the number of WQEBBs by u8.
These values don't exceed 16, and it allows to fill holes in two
structs.
Fixes: 76c31e5f7585 ("net/mlx5e: Use FW limitation for max MPW WQEBBs")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
ICOSQ is used to post UMR WQEs for both regular RQ and XSK RQ. However,
space in ICOSQ is reserved only for the regular RQ, which may cause
ICOSQ overflows when using XSK (the most risk is on activating
channels).
This commit fixes the issue by reserving space for XSK UMR WQEs as well.
As XSK may be enabled without restarting the channel and recreating the
ICOSQ, this space is reserved unconditionally.
Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
MLX5E_MAX_RQ_NUM_MTTS should be the maximum value, so that
MLX5_MTT_OCTW(MLX5E_MAX_RQ_NUM_MTTS) fits into u16. The current value of
1 << 17 results in MLX5_MTT_OCTW(1 << 17) = 1 << 16, which doesn't fit
into u16. This commit replaces it with the maximum value that still
fits u16.
Fixes: 73281b78a37a ("net/mlx5e: Derive Striding RQ size from MTU")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commit changed CT to use multi table actions post act infrastructure instead
of using it own post act infrastructure, this broke decap during VF tunnel offload
(Stack devices) with CT due to wrong match on in_port metadata in the post act table.
This changed only broke VF tunnel offload because it modify the packet in_port metadata
to be VF metadata and it isn't propagate the post act creation.
Fixed by modify post act rules to match only on fte_id and not match on in_port metadata
which isn't needed.
Fixes: a81283263bb0 ("net/mlx5e: Use multi table support for CT and sample actions")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
cipher/version
The driver reports whether TX/RX TLS device offloads are supported, but
not which ciphers/versions, these should be handled by returning
-EOPNOTSUPP when .tls_dev_add() is called.
Remove the WARN_ON kernel trace when the driver gets a request to
offload a cipher/version that is not supported as it is expected.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Users may request that pages from an OpenCL SVM allocation be migrated
to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
nouveau_dmem_migrate_vma() to do the migration. If the total range to be
migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
starting address means that only the first chunk will get migrated.
Fix the calculation so that the entire range will get migrated if
possible.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Fixes: e3d8b0890469 ("drm/nouveau/svm: map pages after migration")
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220720062745.960701-1-apopple@nvidia.com
Cc: <stable@vger.kernel.org> # v5.8+
|
|
The Parameterized Testing example contains a compilation error, as the
signature for the description helper function is void(*)(const struct
sha1_test_case *, char *), and the struct is non-const. This is
warned by Clang:
error: initialization of ‘void (*)(struct sha1_test_case *, char *)’
from incompatible pointer type ‘void (*)(const struct sha1_test_case *,
char *)’ [-Werror=incompatible-pointer-types]
33 | KUNIT_ARRAY_PARAM(sha1, cases, case_to_desc);
| ^~~~~~~~~~~~
../include/kunit/test.h:1339:70: note: in definition of macro
‘KUNIT_ARRAY_PARAM’
1339 | void
(*__get_desc)(typeof(__next), char *) = get_desc; \
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter, no known blockers for
the release.
Current release - regressions:
- wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop(), fix
taking the lock before its initialized
- Bluetooth: mgmt: fix double free on error path
Current release - new code bugs:
- eth: ice: fix tunnel checksum offload with fragmented traffic
Previous releases - regressions:
- tcp: md5: fix IPv4-mapped support after refactoring, don't take the
pure v6 path
- Revert "tcp: change pingpong threshold to 3", improving detection
of interactive sessions
- mld: fix netdev refcount leak in mld_{query | report}_work() due to
a race
- Bluetooth:
- always set event mask on suspend, avoid early wake ups
- L2CAP: fix use-after-free caused by l2cap_chan_put
- bridge: do not send empty IFLA_AF_SPEC attribute
Previous releases - always broken:
- ping6: fix memleak in ipv6_renew_options()
- sctp: prevent null-deref caused by over-eager error paths
- virtio-net: fix the race between refill work and close, resulting
in NAPI scheduled after close and a BUG()
- macsec:
- fix three netlink parsing bugs
- avoid breaking the device state on invalid change requests
- fix a memleak in another error path
Misc:
- dt-bindings: net: ethernet-controller: rework 'fixed-link' schema
- two more batches of sysctl data race adornment"
* tag 'net-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
stmmac: dwmac-mediatek: fix resource leak in probe
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
net: ping6: Fix memleak in ipv6_renew_options().
net/funeth: Fix fun_xdp_tx() and XDP packet reclaim
sctp: leave the err path free in sctp_stream_init to sctp_stream_free
sfc: disable softirqs for ptp TX
ptp: ocp: Select CRC16 in the Kconfig.
tcp: md5: fix IPv4-mapped support
virtio-net: fix the race between refill work and close
mptcp: Do not return EINPROGRESS when subflow creation succeeds
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Bluetooth: Always set event mask on suspend
Bluetooth: mgmt: Fix double free on error path
wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
ice: do not setup vlan for loopback VSI
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: Fix VSIs unable to share unicast MAC
ice: Fix tunnel checksum offload with fragmented traffic
ice: Fix max VLANs available for VF
netfilter: nft_queue: only allow supported familes and hooks
...
|
|
Add support for NETIF_F_LOOPBACK. This feature can be set via:
$ ethtool -K eth0 loopback <on|off>
Feature can be useful for local data path tests.
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Instead of rather verbose comparison of current netdev->features bits vs
the incoming ones from user, let us compress them by a helper features
set that will be the result of netdev->features XOR features. This way,
current, extensive branches:
if (features & NETIF_F_BIT && !(netdev->features & NETIF_F_BIT))
set_feature(true);
else if (!(features & NETIF_F_BIT) && netdev->features & NETIF_F_BIT)
set_feature(false);
can become:
netdev_features_t changed = netdev->features ^ features;
if (changed & NETIF_F_BIT)
set_feature(!!(features & NETIF_F_BIT));
This is nothing new as currently several other drivers use this
approach, which I find much more convenient.
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When trust is turned off for the VF, the expectation is that promiscuous
and allmulticast filters are removed. Currently default VSI filter is not
getting cleared in this flow.
Example:
ip link set enp236s0f0 vf 0 trust on
ip link set enp236s0f0v0 promisc on
ip link set enp236s0f0 vf 0 trust off
/* promiscuous mode is still enabled on VF0 */
Remove switch filters for both cases.
This commit fixes above behavior by removing default VSI filters and
allmulticast filters when vf-true-promisc-support is OFF.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In current implementation default VSI switch filter is only able to
forward traffic to a single VSI. This limits promiscuous mode with
private flag 'vf-true-promisc-support' to a single VF. Enabling it on
the second VF won't work. Also allmulticast support doesn't seem to be
properly implemented when vf-true-promisc-support is true.
Use standard ice_add_rule_internal() function that already implements
forwarding to multiple VSI's instead of constructing AQ call manually.
Add switch filter for allmulticast mode when vf-true-promisc-support is
enabled. The same filter is added regardless of the flag - it doesn't
matter for this case.
Remove unnecessary fields in switch structure. From now on book keeping
will be done by ice_add_rule_internal().
Refactor unnecessarily passed function arguments.
To test:
1) Create 2 VM's, and two VF's. Attach VF's to VM's.
2) Enable promiscuous mode on both of them and check if
traffic is seen on both of them.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
update version number
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
The ACC (automatic C-state conversion) feature was available on Sky Lake and
Cascade Lake Xeons (SKX and CLX), but it is not available on Ice Lake and
Sapphire Rapids Xeons (ICX and SPR). Therefore, stop decoding it for ICX and
SPR.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Sapphire Rapids Xeon (SPR) supports 2 flavors of PC6 - PC6N (non-retention) and
PC6R (retention). Before this patch we used ICX package C-state limits, which
was wrong, because ICX has only one PC6 flavor. With this patch, we use SKX PC6
limits for SPR, because they are the same.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
The 'automatic_cstate_conversion_probe()' function has a too long 'if'
statement, convert it to a 'switch' statement in order to improve code
readability a bit.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Before this patch, SPR platform was considered identical to ICX platform. This
patch separates SPR support from ICX.
This patch is a preparation for adding SPR-specific package C-state limits
support.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
remove duplicate "the" in comment
Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Add initial support for Raptorlake model
Signed-off-by: George D Sworo <george.d.sworo@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Add support for ALDERLAKE_N platform.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Intel Performance Hybrid processors have a 2nd MSR
describing the turbo limits enforced on the Ecores.
Note, TRL and Secondary-TRL are usually R/O information,
but on overclock-capable parts, they can be written.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
code cleanup only.
no functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
CPUID leaf 7 EDX now tells us if the processor has hybrid CPUs
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Update turbostat.8 to reflect new uncore frequency output (UncMHz)
Also, refresh examples.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
When CONFIG_INTEL_UNCORE_FREQ_CONTROL is effective,
(Linux 5.9 and later), print the current (and default)
min and max uncore frequency limits.
When that driver provides the current uncore frequency
(Linux 5.18 and later), print a UncMHz column
reflecting the current uncore frequency.
Note that UncMHz is an instantaneous sample, not an average.
eg.
$ sudo ./turbostat -S --show frequency
...
Uncore Frequency pkg0 die0: 800 - 3900 MHz (800 - 3900 MHz)
...
Avg_MHz Busy% Bzy_MHz TSC_MHz UncMHz
28 0.70 4049 3095 3900
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Currently if a fscanf fails then an early return leaks an open
file pointer. Fix this by fclosing the file before the return.
Detected using static analysis with cppcheck:
tools/power/x86/turbostat/turbostat.c:2039:3: error: Resource leak: fp [resourceLeak]
Fixes: eae97e053fe3 ("tools/power turbostat: Support thermal throttle count print")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Using strncmp for a single character comparison is overly complicated,
just use a simpler single character comparison instead. Also stops
static analyzers (such as cppcheck) from complaining about strncmp on
non-null terminated strings.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
It would be handy to have cmdline in turbostat output. For example,
according to the turbostat output, there are no C-states requested.
In this case the user is very curious if something like
intel_idle.max_cstate=0 was used, or may be idle=none too. It is
also curious whether things like intel_pstate=nohwp were used.
Print the boot command line accordingly:
turbostat version 21.05.04 - Len Brown <lenb@kernel.org>
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.16.0+ root=UUID=
b42359ed-1e05-42eb-8757-6bf2a1c19070 ro quiet splash vt.handoff=7
Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
RaptorLake is compatible with AlderLake.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
The 82576 PTP implementation still uses .adjfreq instead of using the newer
.adjfine.
This implementation uses a pre-simplified calculation since the base
increment value for the 82576 is just 16 * 2^19. Converting this into
scaled_ppm is tricky, and makes the intent a bit less clear.
Simply convert to the normal flow of multiplying the base increment value
by the scaled_ppm and then dividing by 1000000ULL << 16. This can be
implemented using mul_u64_u64_div_u64 which can avoid the possible overflow
that might occur for large adjustments.
Use of .adjfine can improve the precision of small adjustments and gets us
one driver closer to removing the old implementation from the kernel
entirely.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Convert the ixgbe PTP frequency adjustment implementations from .adjfreq to
.adjfine. This allows using the scaled parts per million adjustment from
the PTP core and results in a more precise adjustment for small
corrections.
To avoid overflow, use mul_u64_u64_div_u64 to perform the calculation. On
X86 platforms, this will use instructions that perform the operations with
128bit intermediate values. For other architectures, the implementation
will limit the loss of precision as much as possible.
This change slightly improves the precision of frequency adjustments for
all ixgbe based devices, and gets us one driver closer to being able to
remove the older .adjfreq implementation from the kernel.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The i40e driver currently implements the .adjfreq handler for frequency
adjustments. This takes the adjustment parameter in parts per billion. The
PTP core supports .adjfine which provides an adjustment in scaled parts per
million. This has a higher resolution and can result in more precise
adjustments for small corrections.
Convert the existing .adjfreq implementation to the newer .adjfine
implementation. This is trivial since it just requires changing the divisor
from 1000000000ULL to (1000000ULL << 16) in the mul_u64_u64_div_u64 call.
This improves the precision of the adjustments and gets us one driver
closer to removing the old .adjfreq support from the kernel.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The i40e device has a different clock rate depending on the current link
speed. This requires using a different increment rate for the PTP clock
registers. For slower link speeds, the base increment value is larger.
Directly multiplying the larger increment value by the parts per billion
adjustment might overflow.
To avoid this, the i40e implementation defaults to using the lower
increment value and then multiplying the adjustment afterwards. This causes
a loss of precision for lower link speeds.
We can fix this by using mul_u64_u64_div_u64 instead of performing the
multiplications using standard C operations. On X86, this will use special
instructions that perform the multiplication and division with 128bit
intermediate values. For other architectures, the fallback implementation
will limit the loss of precision for large values. Small adjustments don't
overflow anyways and won't lose precision at all.
This allows first multiplying the base increment value and then performing
the adjustment calculation, since we no longer fear overflowing. It also
makes it easier to convert to the even more precise .adjfine implementation
in a following change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The PTP implementation for the e1000e driver uses the older .adjfreq
method. This method takes an adjustment in parts per billion. The newer
.adjfine implementation uses scaled_ppm. The use of scaled_ppm allows for
finer grained adjustments and is preferred over using the older
implementation.
Make use of mul_u64_u64_div_u64 in order to handle possible overflow of the
multiplication used to calculate the desired adjustment to the hardware
increment value.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The e1000e_phc_adjfreq function validates that the input delta is within
the maximum range. This is already handled by the core PTP code and this is
a duplicate and thus unnecessary check. It also complicates refactoring to
use the newer .adjfine implementation, where the input is no longer
specified in parts per billion. Remove the range validation check.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The PTP frequency adjustment code needs to determine an appropriate
adjustment given an input scaled_ppm adjustment.
We calculate the adjustment to the register by multiplying the base
(nominal) increment value by the scaled_ppm and then dividing by the
scaled one million value.
For very large adjustments, this might overflow. To avoid this, both the
scaled_ppm and divisor values are downshifted.
We can avoid that on X86 architectures by using mul_u64_u64_div_u64. This
helper function will perform the multiplication and division with 128bit
intermediate values. We know that scaled_ppm is never larger than the
divisor so this operation will never result in an overflow.
This improves the accuracy of the calculations for large adjustment values
on X86. It is likely an improvement on other architectures as well because
the default implementation of mul_u64_u64_div_u64 is smarter than the
original approach taken in the ice code.
Additionally, this implementation is easier to read, using fewer local
variables and lines of code to implement.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt()
before returning. Otherwise it is a resource leak.
Fixes: fa4b3ca60e80 ("stmmac: dwmac-mediatek: fix clock issue")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
device while matching route. That may trigger null-ptr-deref bug
for ip6_ptr probability as following.
=========================================================
BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
Read of size 4 at addr 0000000000000308 by task ping6/263
CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14
Call trace:
dump_backtrace+0x1a8/0x230
show_stack+0x20/0x70
dump_stack_lvl+0x68/0x84
print_report+0xc4/0x120
kasan_report+0x84/0x120
__asan_load4+0x94/0xd0
find_match.part.0+0x70/0x134
__find_rr_leaf+0x408/0x470
fib6_table_lookup+0x264/0x540
ip6_pol_route+0xf4/0x260
ip6_pol_route_output+0x58/0x70
fib6_rule_lookup+0x1a8/0x330
ip6_route_output_flags_noref+0xd8/0x1a0
ip6_route_output_flags+0x58/0x160
ip6_dst_lookup_tail+0x5b4/0x85c
ip6_dst_lookup_flow+0x98/0x120
rawv6_sendmsg+0x49c/0xc70
inet_sendmsg+0x68/0x94
Reproducer as following:
Firstly, prepare conditions:
$ip netns add ns1
$ip netns add ns2
$ip link add veth1 type veth peer name veth2
$ip link set veth1 netns ns1
$ip link set veth2 netns ns2
$ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
$ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
$ip netns exec ns1 ifconfig veth1 up
$ip netns exec ns2 ifconfig veth2 up
$ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
$ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1
Secondly, execute the following two commands in two ssh windows
respectively:
$ip netns exec ns1 sh
$while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done
$ip netns exec ns1 sh
$while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done
It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.
cpu0 cpu1
fib6_table_lookup
__find_rr_leaf
addrconf_notify [ NETDEV_CHANGEMTU ]
addrconf_ifdown
RCU_INIT_POINTER(dev->ip6_ptr, NULL)
find_match
ip6_ignore_linkdown
So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to
fix the null-ptr-deref bug.
Fixes: dcd1f572954f ("net/ipv6: Remove fib6_idev")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy(). As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
struct ipv6_sr_hdr *hdr;
char data[24] = {0};
int fd;
hdr = (struct ipv6_sr_hdr *)data;
hdr->hdrlen = 2;
hdr->type = IPV6_SRCRT_TYPE_4;
fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
close(fd);
To fix memory leaks, let's add a destroy function.
Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range. The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0). Thus, the local DoS does
not succeed until we change the default value. However, at least
Ubuntu/Fedora/RHEL loosen it.
$ cat /usr/lib/sysctl.d/50-default.conf
...
-net.ipv4.ping_group_range = 0 2147483647
Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.
setsockopt
IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
IPV6_HOPOPTS (inet6_sk(sk)->opt)
IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
IPV6_RTHDR (inet6_sk(sk)->opt)
IPV6_DSTOPTS (inet6_sk(sk)->opt)
IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)
getsockopt
IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)
For the record, I left a different splat with syzbot's one.
unreferenced object 0xffff888006270c60 (size 96):
comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
hex dump (first 32 bytes):
01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
[<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
[<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
[<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
[<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
[<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The cgroup_update_dfl_csses() function updates css associations when a
cgroup's subtree_control file is modified. Any changes made to a cgroup's
subtree_control file, however, will only affect its descendants but not
the cgroup itself. So there is no point in migrating csses associated
with that cgroup. We can skip them instead.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.
KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.
While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list. The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.
Fix this by locking the watcher list in add_watch_to_object().
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: keyrings@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since __post_watch_notification() walks wlist->watchers with only the
RCU read lock held, we need to use RCU methods to add to the list (we
already use RCU methods to remove from the list).
Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
hlist_add_head() for that list.
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This fixes the paths of x86 / arm efi-stub source files.
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessos.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220727140539.10021-1-jprvita@endlessos.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Update to commit 6c757e9f55f0 ("docs/scheduler:
fix unit error")
ddb21d27a6a5 ("docs/scheduler: Change unit of
cpu_time and rq_time to nanoseconds")
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Reviewed-by: Wu XiangCheng <bobwxc@email.cn>
Reviewed-by: Alex Shi <alexs@kernel.org>
Link: https://lore.kernel.org/r/3cb1c4c466dfa38d72a867dc6e2c833ceb69ecb7.1658983157.git.siyanteng@loongson.cn
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|