Age | Commit message (Collapse) | Author | Files | Lines |
|
Even if an IOMMU might be present for some PCI segment in the system,
that doesn't necessarily mean it provides translation for the device
we care about. It appears that what we care about here is specifically
whether DMA mapping ops involve any IOMMU overhead or not, so check for
translation actually being active for our device.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/7350f957944ecfce6cce90f422e3992a1f428775.1649166055.git.robin.murphy@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As noted in the original commit 685343fc3ba6 ("net: add
name_assign_type netdev attribute")
... when the kernel has given the interface a name using global
device enumeration based on order of discovery (ethX, wlanY, etc)
... are labelled NET_NAME_ENUM.
That describes this case, so set the default for the devices here to
NET_NAME_ENUM. Current popular network setup tools like systemd use
this only to warn if you're setting static settings on interfaces that
might change, so it is expected this only leads to better user
information, but not changing of interfaces, etc.
Signed-off-by: Ian Wienand <iwienand@redhat.com>
Link: https://lore.kernel.org/r/20220406093635.1601506-1-iwienand@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
BPF_USDT_ARG_REG_DEREF handling always reads 8 bytes, regardless of
the actual argument size. On little-endian the relevant argument bits
end up in the lower bits of val, and later on the code that handles
all the argument types expects them to be there.
On big-endian they end up in the upper bits of val, breaking that
expectation. Fix by right-shifting val on big-endian.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220407214411.257260-3-iii@linux.ibm.com
|
|
Fix several typos and references to non-existing headers.
Also use __BYTE_ORDER__ instead of __BYTE_ORDER for consistency with
the rest of the bpf code - see commit 45f2bebc8079 ("libbpf: Fix
endianness detection in BPF_CORE_READ_BITFIELD_PROBED()") for
rationale).
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220407214411.257260-2-iii@linux.ibm.com
|
|
The congestion status of a tcp flow may be updated since there
is congestion between tcp sender and receiver. It makes sense to
add tracepoint for congestion status set function to summate cc
status duration and evaluate the performance of network
and congestion algorithm. the backgound of this patch is below.
Link: https://github.com/iovisor/bcc/pull/3899
Signed-off-by: Ping Gan <jacky_gam_2001@163.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220406010956.19656-1-jacky_gam_2001@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Increment rx_otherhost_dropped counter when packet dropped due to
mismatched dest MAC addr.
An example when this drop can occur is when manually crafting raw
packets that will be consumed by a user space application via a tap
device. For testing purposes local traffic was generated using trafgen
for the client and netcat to start a server
Tested: Created 2 netns, sent 1 packet using trafgen from 1 to the other
with "{eth(daddr=$INCORRECT_MAC...}", verified that iproute2 showed the
counter was incremented. (Also had to modify iproute2 to show the stat,
additional patch for that coming next.)
Signed-off-by: Jeffrey Ji <jeffreyji@google.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220406172600.1141083-1-jeffreyjilinux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
net: create a net/core/ internal header
We are adding stuff to netdevice.h which really should be
local to net/core/. Create a net/core/dev.h header and use it.
Minor cleanups precede.
====================
Link: https://lore.kernel.org/r/20220406213754.731066-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's a number of functions and static variables used
under net/core/ but not from the outside. We currently
dump most of them into netdevice.h. That bad for many
reasons:
- netdevice.h is very cluttered, hard to figure out
what the APIs are;
- netdevice.h is very long;
- we have to touch netdevice.h more which causes expensive
incremental builds.
Create a header under net/core/ and move some declarations.
The new header is also a bit of a catch-all but that's
fine, if we create more specific headers people will
likely over-think where their declaration fit best.
And end up putting them in netdevice.h, again.
More work should be done on splitting netdevice.h into more
targeted headers, but that'd be more time consuming so small
steps.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We have a bunch of functions which are only used under
net/core/ yet they get exported. Remove the exports.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Following patch will hide that typedef. There seems to be
no strong reason for hyperv to use it, so let's not.
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As reported by Naresh:
perf build errors on i386 [1] on Linux next-20220407 [2]
usdt.c:1181:5: error: "__x86_64__" is not defined, evaluates to 0
[-Werror=undef]
1181 | #if __x86_64__
| ^~~~~~~~~~
usdt.c:1196:5: error: "__x86_64__" is not defined, evaluates to 0
[-Werror=undef]
1196 | #if __x86_64__
| ^~~~~~~~~~
cc1: all warnings being treated as errors
Use #ifdef instead of #if to avoid this.
Fixes: 4c59e584d158 ("libbpf: Add x86-specific USDT arg spec parsing logic")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220407203842.3019904-1-andrii@kernel.org
|
|
link could be null but still dereference bpf_link__destroy(&link->link)
and it will lead to a null pointer access.
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649299098-2069-1-git-send-email-baihaowen@meizu.com
|
|
Alan Maguire says:
====================
Follow-up series to [1] to address some suggestions from Andrii to
improve parsing and make it more robust (patches 1, 2) and to improve
validation of u[ret]probe firing by validating expected argument
and return values (patch 3).
[1] https://lore.kernel.org/bpf/164903521182.13106.12656654142629368774.git-patchwork-notify@kernel.org/
Changes since v1:
- split library name, auto-attach parsing into separate patches (Andrii, patches 1, 2)
- made str_has_sfx() static inline, avoided repeated strlen()s by storing lengths,
used strlen() instead of strnlen() (Andrii, patch 1)
- fixed sscanf() arg to use %li, switched logging to use "prog '%s'" format,
used direct strcmp() on probe_type instead of prefix check (Andrii, patch 2)
- switched auto-attach tests to log parameter/return values to be checked by
user-space side of tests. Needed to add pid filtering to avoid capturing
stray malloc()s (Andrii, patch 3)
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
uprobe/uretprobe tests don't do any validation of arguments/return values,
and without this we can't be sure we are attached to the right function,
or that we are indeed attached to a uprobe or uretprobe. To fix this
record argument and return value for auto-attached functions and ensure
these match expectations. Also need to filter by pid to ensure we do
not pick up stray malloc()s since auto-attach traces libc system-wide.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649245431-29956-4-git-send-email-alan.maguire@oracle.com
|
|
For uprobe auto-attach, the parsing can be simplified for the SEC()
name to a single sscanf(); the return value of the sscanf can then
be used to distinguish between sections that simply specify
"u[ret]probe" (and thus cannot auto-attach), those that specify
"u[ret]probe/binary_path:function+offset" etc.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649245431-29956-3-git-send-email-alan.maguire@oracle.com
|
|
In the process of doing path resolution for uprobe attach, libraries are
identified by matching a ".so" substring in the binary_path.
This matches a lot of patterns that do not conform to library.so[.version]
format, so instead match a ".so" _suffix_, and if that fails match a
".so." substring for the versioned library case.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649245431-29956-2-git-send-email-alan.maguire@oracle.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Correctly propagate coherence information for VMbus devices (Michael
Kelley)
- Disable balloon and memory hot-add on ARM64 temporarily (Boqun Feng)
- Use barrier to prevent reording when reading ring buffer (Michael
Kelley)
- Use virt_store_mb in favour of smp_store_mb (Andrea Parri)
- Fix VMbus device object initialization (Andrea Parri)
- Deactivate sysctl_record_panic_msg on isolated guest (Andrea Parri)
- Fix a crash when unloading VMbus module (Guilherme G. Piccoli)
* tag 'hyperv-fixes-signed-20220407' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()
Drivers: hv: balloon: Disable balloon and hot-add accordingly
Drivers: hv: balloon: Support status report for larger page sizes
Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
PCI: hv: Propagate coherence from VMbus device to PCI device
Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device
Drivers: hv: vmbus: Fix potential crash on module unload
Drivers: hv: vmbus: Fix initialization of device object in vmbus_device_register()
Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
- Another fixup to the fast_init/crng_init split, this time in how much
entropy is being credited, from Jan Varho.
- As discussed, we now opportunistically call try_to_generate_entropy()
in /dev/urandom reads, as a replacement for the reverted commit. I
opted to not do the more invasive wait_for_random_bytes() change at
least for now, preferring to do something smaller and more obvious
for the time being, but maybe that can be revisited as things evolve
later.
- Userspace can use FUSE or userfaultfd or simply move a process to
idle priority in order to make a read from the random device never
complete, which breaks forward secrecy, fixed by overwriting
sensitive bytes early on in the function.
- Jann Horn noticed that /dev/urandom reads were only checking for
pending signals if need_resched() was true, a bug going back to the
genesis commit, now fixed by always checking for signal_pending() and
calling cond_resched(). This explains various noticeable signal
delivery delays I've seen in programs over the years that do long
reads from /dev/urandom.
- In order to be more like other devices (e.g. /dev/zero) and to
mitigate the impact of fixing the above bug, which has been around
forever (users have never really needed to check the return value of
read() for medium-sized reads and so perhaps many didn't), we now
move signal checking to the bottom part of the loop, and do so every
PAGE_SIZE-bytes.
* tag 'random-5.18-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: check for signals every PAGE_SIZE chunk of /dev/[u]random
random: check for signal_pending() outside of need_resched() check
random: do not allow user to keep crng key around on stack
random: opportunistically initialize on /dev/urandom reads
random: do not split fast init input in add_hwgenerator_randomness()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fixes from Damien Le Moal:
- Fix a compilation warning due to an uninitialized variable in
ata_sff_lost_interrupt(), from me.
- Fix invalid internal command tag handling in the sata_dwc_460ex
driver, from Christian.
- Disable READ LOG DMA EXT with Samsung 840 EVO SSDs as this command
causes the drives to hang, from Christian.
- Change the config option CONFIG_SATA_LPM_POLICY back to its original
name CONFIG_SATA_LPM_MOBILE_POLICY to avoid potential problems with
users losing their configuration (as discussed during the merge
window), from Mario.
* tag 'ata-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: ahci: Rename CONFIG_SATA_LPM_POLICY configuration item back
ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
ata: sata_dwc_460ex: Fix crash due to OOB write
ata: libata-sff: Fix compilation warning in ata_sff_lost_interrupt()
|
|
Trade text size for rodata size and replace tons of nested if-elses
to the const mask match based structs. The almost entire
ice_find_dummy_packet() now becomes just one plain while-increment
loop. The order in ice_dummy_pkt_profiles[] should be same with the
if-elses order previously, as masks become less and less strict
through the array to follow the original code flow.
Apart from removing 80 locs of 4-level if-elses, it brings a solid
text size optimization:
add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056)
Function old new delta
ice_fill_adv_dummy_packet 289 291 +2
ice_adv_add_update_vsi_list 201 - -201
ice_add_adv_rule 2950 2093 -857
Total: Before=414512, After=413456, chg -0.25%
add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672)
RO Data old new delta
ice_dummy_pkt_profiles - 672 +672
Total: Before=37895, After=38567, chg +1.77%
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Declarations of dummy/template packet headers and offsets can be
minified to improve readability and simplify adding new templates.
Move all the repetitive constructions into two macros and let them
do the name and type expansions.
Linewrap removal is yet another positive side effect.
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ice_find_dummy_packet() contains a lot of boilerplate code and a
nice room for copy-paste mistakes.
Instead of passing 3 separate pointers back and forth to get packet
template (dummy) params, directly return a structure containing
them. Then, use a macro to compose compound literals and avoid code
duplication on return path.
Now, dummy packet type/name is needed only once to return a full
correct triple pkt-pkt_len-offsets, and those are all one-liners.
dummy_ipv4_gtpu_ipv4_packet_offsets is just moved around and renamed
(as well as dummy_ipv6_gtp_packet_offsets) with no function changes.
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
A loop performing header modification according to the provided mask
in ice_fill_adv_dummy_packet() is very cryptic (and error-prone).
Replace two identical cast-deferences with a variable. Replace three
struct-member-array-accesses with a variable. Invert the condition,
reduce the indentation by one -> eliminate line wraps.
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ice_adv_lkup_elem fields h_u and m_u are being accessed as raw u16
arrays in several places.
To reduce cast and braces burden, add permanent array-of-u16 aliases
with the same size as the `union ice_prot_hdr` itself via anonymous
unions to the actual struct declaration, and just access them
directly.
This:
- removes the need to cast the union to u16[] and then dereference
it each time -> reduces the horizon for potential bugs;
- improves -Warray-bounds coverage -- the array size is now known
at compilation time;
- addresses cppcheck complaints.
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When a slip driver is detaching, the slip_close() will act to
cleanup necessary resources and sl->tty is set to NULL in
slip_close(). Meanwhile, the packet we transmit is blocked,
sl_tx_timeout() will be called. Although slip_close() and
sl_tx_timeout() use sl->lock to synchronize, we don`t judge
whether sl->tty equals to NULL in sl_tx_timeout() and the
null pointer dereference bug will happen.
(Thread 1) | (Thread 2)
| slip_close()
| spin_lock_bh(&sl->lock)
| ...
... | sl->tty = NULL //(1)
sl_tx_timeout() | spin_unlock_bh(&sl->lock)
spin_lock(&sl->lock); |
... | ...
tty_chars_in_buffer(sl->tty)|
if (tty->ops->..) //(2) |
... | synchronize_rcu()
We set NULL to sl->tty in position (1) and dereference sl->tty
in position (2).
This patch adds check in sl_tx_timeout(). If sl->tty equals to
NULL, sl_tx_timeout() will goto out.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20220405132206.55291-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, when user adds a tc action and the action gets offloaded,
the user expects the HW stats to be counted also. This limits the
amount of supported offloaded filters, as HW counter resources may
be quite limited. Without counter assigned, the HW is capable to
carry much more filters.
To resolve the issue above, the following types of HW stats are
offloaded and supported by the driver:
any - current default, user does not care about the type.
delayed - polled from HW periodically.
disabled - no HW stats needed.
immediate - not supported.
Example:
tc filter add dev PORT ingress proto ip flower skip_sw ip_proto 0x11 \
action drop
tc filter add dev PORT ingress proto ip flower skip_sw ip_proto 0x12 \
action drop hw_stats disabled
tc filter add dev sw1p1 ingress proto ip flower skip_sw ip_proto 0x14 \
action drop hw_stats delayed
Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Link: https://lore.kernel.org/r/1649164814-18731-1-git-send-email-volodymyr.mytnyk@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
idev->addr_list needs to be protected by idev->lock. However, it is not
always possible to do so while iterating and performing actions on
inet6_ifaddr instances. For example, multiple functions (like
addrconf_{join,leave}_anycast) eventually call down to other functions
that acquire the idev->lock. The current code temporarily unlocked the
idev->lock during the loops, which can cause race conditions. Moving the
locks up is also not an appropriate solution as the ordering of lock
acquisition will be inconsistent with for example mc_lock.
This solution adds an additional field to inet6_ifaddr that is used
to temporarily add the instances to a temporary list while holding
idev->lock. The temporary list can then be traversed without holding
idev->lock. This change was done in two places. In addrconf_ifdown, the
list_for_each_entry_safe variant of the list loop is also no longer
necessary as there is no deletion within that specific loop.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf 2022-04-06
We've added 8 non-merge commits during the last 8 day(s) which contain
a total of 9 files changed, 139 insertions(+), 36 deletions(-).
The main changes are:
1) rethook related fixes, from Jiri and Masami.
2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin.
3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
bpf: selftests: Test fentry tracing a struct_ops program
bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
rethook: Fix to use WRITE_ONCE() for rethook:: Handler
selftests/bpf: Fix warning comparing pointer to 0
bpf: Fix sparse warnings in kprobe_multi_resolve_syms
bpftool: Explicit errno handling in skeletons
====================
Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In 1448769c9cdb ("random: check for signal_pending() outside of
need_resched() check"), Jann pointed out that we previously were only
checking the TIF_NOTIFY_SIGNAL and TIF_SIGPENDING flags if the process
had TIF_NEED_RESCHED set, which meant in practice, super long reads to
/dev/[u]random would delay signal handling by a long time. I tried this
using the below program, and indeed I wasn't able to interrupt a
/dev/urandom read until after several megabytes had been read. The bug
he fixed has always been there, and so code that reads from /dev/urandom
without checking the return value of read() has mostly worked for a long
time, for most sizes, not just for <= 256.
Maybe it makes sense to keep that code working. The reason it was so
small prior, ignoring the fact that it didn't work anyway, was likely
because /dev/random used to block, and that could happen for pretty
large lengths of time while entropy was gathered. But now, it's just a
chacha20 call, which is extremely fast and is just operating on pure
data, without having to wait for some external event. In that sense,
/dev/[u]random is a lot more like /dev/zero.
Taking a page out of /dev/zero's read_zero() function, it always returns
at least one chunk, and then checks for signals after each chunk. Chunk
sizes there are of length PAGE_SIZE. Let's just copy the same thing for
/dev/[u]random, and check for signals and cond_resched() for every
PAGE_SIZE amount of data. This makes the behavior more consistent with
expectations, and should mitigate the impact of Jann's fix for the
age-old signal check bug.
---- test program ----
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <sys/random.h>
static unsigned char x[~0U];
static void handle(int) { }
int main(int argc, char *argv[])
{
pid_t pid = getpid(), child;
signal(SIGUSR1, handle);
if (!(child = fork())) {
for (;;)
kill(pid, SIGUSR1);
}
pause();
printf("interrupted after reading %zd bytes\n", getrandom(x, sizeof(x), 0));
kill(child, SIGTERM);
return 0;
}
Cc: Jann Horn <jannh@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Fix:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c: In function ‘bnx2x_check_blocks_with_parity3’:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:4917:4: error: case label does not reduce to an integer constant
case AEU_INPUTS_ATTN_BITS_MCP_LATCHED_SCPAD_PARITY:
^~~~
See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ariel Elior <aelior@marvell.com>
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220405151517.29753-4-bp@alien8.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We had various bugs over the years with code
breaking the assumption that tp->snd_cwnd is greater
than zero.
Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added
in commit 8b8a321ff72c ("tcp: fix zero cwnd in tcp_cwnd_reduction")
can trigger, and without a repro we would have to spend
considerable time finding the bug.
Instead of complaining too late, we want to catch where
and when tp->snd_cwnd is set to an illegal value.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When invoking bpf_for_each_map_elem callback, we are passed a
PTR_TO_MAP_KEY, previously writes to this through helper may be allowed,
but the fix in previous patches is meant to prevent that case. The test
case tries to pass it as writable memory to helper, and fails test if it
succeeds to pass the verifier.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add two test cases, one pass read only map value pointer to global
func, which should be rejected. The same code checks it for kfunc, so
that is covered as well. Second one tries to use the missing check for
PTR_TO_MEM's MEM_RDONLY flag and tries to write to a read only memory
pointer. Without prior patches, both of these tests fail.
Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
It is not permitted to write to PTR_TO_MAP_KEY, but the current code in
check_helper_mem_access would allow for it, reject this case as well, as
helpers taking ARG_PTR_TO_UNINIT_MEM also take PTR_TO_MAP_KEY.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The commit being fixed was aiming to disallow users from incorrectly
obtaining writable pointer to memory that is only meant to be read. This
is enforced now using a MEM_RDONLY flag.
For instance, in case of global percpu variables, when the BTF type is
not struct (e.g. bpf_prog_active), the verifier marks register type as
PTR_TO_MEM | MEM_RDONLY from bpf_this_cpu_ptr or bpf_per_cpu_ptr
helpers. However, when passing such pointer to kfunc, global funcs, or
BPF helpers, in check_helper_mem_access, there is no expectation
MEM_RDONLY flag will be set, hence it is checked as pointer to writable
memory. Later, verifier sets up argument type of global func as
PTR_TO_MEM | PTR_MAYBE_NULL, so user can use a global func to get around
the limitations imposed by this flag.
This check will also cover global non-percpu variables that may be
introduced in kernel BTF in future.
Also, we update the log message for PTR_TO_BUF case to be similar to
PTR_TO_MEM case, so that the reason for error is clear to user.
Fixes: 34d3a78c681e ("bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.")
Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When passing pointer to some map value to kfunc or global func, in
verifier we are passing meta as NULL to various functions, which uses
meta->raw_mode to check whether memory is being written to. Since some
kfunc or global funcs may also write to memory pointers they receive as
arguments, we must check for write access to memory. E.g. in some case
map may be read only and this will be missed by current checks.
However meta->raw_mode allows for uninitialized memory (e.g. on stack),
since there is not enough info available through BTF, we must perform
one call for read access (raw_mode = false), and one for write access
(raw_mode = true).
Fixes: e5069b9c23b3 ("bpf: Support pointers in global func args")
Fixes: d583691c47dc ("bpf: Introduce mem, size argument pair support for kfunc")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_map_value_size() uses num_possible_cpus() to determine map size, but
some of the tests only allocate enough memory for online cpus. This
results in out-of-bound writes in userspace during bpf(BPF_MAP_LOOKUP_ELEM)
syscalls in cases when number of online cpus is lower than the number of
possible cpus. Fix by switching from get_nprocs_conf() to
bpf_num_possible_cpus() when determining the number of processors in
these tests (test_progs/netcnt and test_cgroup_storage).
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220406085408.339336-1-asavkov@redhat.com
|
|
There is a spelling mistake in a pr_warn message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220406080835.14879-1-colin.i.king@gmail.com
|
|
The function does not check that parsing_end is false after parsing
argument. Thus, if the final part of the argument is something like '4-',
which is invalid, parse_num_list() will discard it instead of returning
-EINVAL.
Before:
$ ./test_progs -n 2,4-
#2 atomic_bounds:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
After:
$ ./test_progs -n 2,4-
Failed to parse test numbers.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220406003622.73539-1-ytcoode@gmail.com
|
|
Report connection tracking tuple direction in
bpf_skb_ct_lookup/bpf_xdp_ct_lookup helpers. Direction will be used to
implement snat/dnat through xdp ebpf program.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/aa1aaac89191cfc64078ecef36c0a48c302321b9.1648908601.git.lorenzo@kernel.org
|
|
The previous commit fixed support for dual-stack sockets in
bpf_tcp_check_syncookie. This commit adjusts the selftest to verify the
fixed functionality.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-2-maximmi@nvidia.com
|
|
bpf_tcp_gen_syncookie looks at the IP version in the IP header and
validates the address family of the socket. It supports IPv4 packets in
AF_INET6 dual-stack sockets.
On the other hand, bpf_tcp_check_syncookie looks only at the address
family of the socket, ignoring the real IP version in headers, and
validates only the packet size. This implementation has some drawbacks:
1. Packets are not validated properly, allowing a BPF program to trick
bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
socket.
2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
up receiving a SYNACK with the cookie, but the following ACK gets
dropped.
This patch fixes these issues by changing the checks in
bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
version from the header is taken into account, and it is validated
properly with address family.
Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
|
|
There is a same action when the variable is initialized
Signed-off-by: Hongbin Wang <wh_bin@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All remaining skbs should be released when myri10ge_xmit fails to
transmit a packet. Fix it within another skb_list_walk_safe.
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver for LAN Media WAN interfaces spews build warnings on
microblaze. The virt_to_bus() calls discard the volatile keyword.
The right thing to do would be to migrate this driver to a modern
DMA API but it seems unlikely anyone is actually using it.
There had been no fixes or functional changes here since
the git era begun.
Let's remove this driver, there isn't much changing in the APIs,
if users come forward we can apologize and revert.
Link: https://lore.kernel.org/all/20220321144013.440d7fc0@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
aqc111_rx_fixup() contains several out-of-bounds accesses that can be
triggered by a malicious (or defective) USB device, in particular:
- The metadata array (desc_offset..desc_offset+2*pkt_count) can be out of bounds,
causing OOB reads and (on big-endian systems) OOB endianness flips.
- A packet can overlap the metadata array, causing a later OOB
endianness flip to corrupt data used by a cloned SKB that has already
been handed off into the network stack.
- A packet SKB can be constructed whose tail is far beyond its end,
causing out-of-bounds heap data to be considered part of the SKB's
data.
Found doing variant analysis. Tested it with another driver (ax88179_178a), since
I don't have a aqc111 device to test it, but the code looks very similar.
Signed-off-by: Marcin Kozlowski <marcinguy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
netdev_alloc_skb() has assigned ssi->netdev to skb->dev if successed,
no need to repeat assignment.
Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
"little-endian" has no specific content, use more helper function
of_property_read_bool() instead of of_get_property()
Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qede_build_skb() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.
This results in a kernel panic because the skb to reserve is NULL.
Add a check in case build_skb() failed to allocate and return NULL.
The NULL return is handled correctly in callers to qede_build_skb().
Fixes: 8a8633978b842 ("qede: Add build_skb() support.")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole'
Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used.
Fixes: 4b340a5a726d ("net: ip6mr: add support for passing full packet on wrong mif")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|