Age | Commit message (Collapse) | Author | Files | Lines |
|
Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was
moved to common x86 code.
No functional change intended.
Fixes: 37486135d3a7b ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The Linux TSC calibration procedure is subject to small variations
(its common to see +-1 kHz difference between reboots on a given CPU, for example).
So migrating a guest between two hosts with identical processor can fail, in case
of a small variation in calibrated TSC between them.
Without TSC scaling, the current kernel interface will either return an error
(if user_tsc_khz <= tsc_khz) or enable TSC catchup mode.
This change enables the following TSC tolerance check to
accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default).
/*
* Compute the variation in TSC rate which is acceptable
* within the range of tolerance and decide if the
* rate being applied is within that bounds of the hardware
* rate. If so, no scaling or compensation need be done.
*/
thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm);
thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm);
if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) {
pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi);
use_scaling = 1;
}
NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Message-Id: <20200616114741.GA298183@fuller.cnet>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Only MSR address range 0x800 through 0x8ff is architecturally reserved
and dedicated for accessing APIC registers in x2APIC mode.
Fixes: 0105d1a52640 ("KVM: x2apic interface to lapic")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fix a typo in gue.h
Signed-off-by: Aiden Leong <aiden.leong@aibsd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Igor Russkikh says:
====================
net: atlantic: additional A2 features
This patchset adds more features to A2:
* half duplex rates;
* EEE;
* flow control;
* link partner capabilities reporting;
* phy loopback.
Feature-wise A2 is almost on-par with A1 save for WoL and filtering, which
will be submitted as separate follow-up patchset(s).
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the phy loopback support on A2.
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds link partner capabilities reporting support on A2.
In particular, the following capabilities are available for reporting:
* link rate;
* EEE;
* flow control.
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds flow control support on A2.
Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds EEE support on A2.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Co-developed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes 2.5G baseX wrong usage/reporting, since it shouldn't have
been mixed with baseT.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support for 10M/100M/1G half duplex rates, which are
supported by A2 in additional to full duplex rates supported by A1.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In RFC 8684, we don't need to send sndr_key in SYN package anymore, so drop
it.
Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec")
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
arg cannot be NULL since its already being dereferenced
before. Remove the redundant NULL check.
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix check in ethtool_rx_flow_rule_create
Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert the mtk_eth_soc driver to use the finalised link parameters in
mac_link_up() rather than the parameters in mac_config().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When an interface is being deleted, "/proc/net/dev_snmp6/<interface name>"
is deleted.
The function for this is addrconf_ifdown() in the addrconf_notify() and
it is called by notification, which is NETDEV_UNREGISTER.
But, if NETDEV_CHANGEMTU is triggered after NETDEV_UNREGISTER,
this proc file will be created again.
This recreated proc file will be deleted by netdev_wati_allrefs().
Before netdev_wait_allrefs() is called, creating a new HSR interface
routine can be executed and It tries to create a proc file but it will
find an un-deleted proc file.
At this point, it warns about it.
To avoid this situation, it can use ->dellink() instead of
->ndo_uninit() to release resources because ->dellink() is called
before NETDEV_UNREGISTER.
So, a proc file will not be recreated.
Test commands
ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link set dummy0 mtu 1300
#SHELL1
while :
do
ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
done
#SHELL2
while :
do
ip link del hsr0
done
Splat looks like:
[ 9888.980852][ T2752] proc_dir_entry 'dev_snmp6/hsr0' already registered
[ 9888.981797][ C2] WARNING: CPU: 2 PID: 2752 at fs/proc/generic.c:372 proc_register+0x2d5/0x430
[ 9888.981798][ C2] Modules linked in: hsr dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6x
[ 9888.981814][ C2] CPU: 2 PID: 2752 Comm: ip Tainted: G W 5.8.0-rc1+ #616
[ 9888.981815][ C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 9888.981816][ C2] RIP: 0010:proc_register+0x2d5/0x430
[ 9888.981818][ C2] Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 65 01 00 00 49 8b b5 e0 00 00 00 48 89 ea 40
[ 9888.981819][ C2] RSP: 0018:ffff8880628dedf0 EFLAGS: 00010286
[ 9888.981821][ C2] RAX: dffffc0000000008 RBX: ffff888028c69170 RCX: ffffffffaae09a62
[ 9888.981822][ C2] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f75ac
[ 9888.981823][ C2] RBP: ffff888028c693f4 R08: ffffed100d9401bd R09: ffffed100d9401bd
[ 9888.981824][ C2] R10: ffffffffaddf406f R11: 0000000000000001 R12: ffff888028c69308
[ 9888.981825][ C2] R13: ffff8880663584c8 R14: dffffc0000000000 R15: ffffed100518d27e
[ 9888.981827][ C2] FS: 00007f3876b3b0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[ 9888.981828][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9888.981829][ C2] CR2: 00007f387601a8c0 CR3: 000000004101a002 CR4: 00000000000606e0
[ 9888.981830][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9888.981831][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9888.981832][ C2] Call Trace:
[ 9888.981833][ C2] ? snmp6_seq_show+0x180/0x180
[ 9888.981834][ C2] proc_create_single_data+0x7c/0xa0
[ 9888.981835][ C2] snmp6_register_dev+0xb0/0x130
[ 9888.981836][ C2] ipv6_add_dev+0x4b7/0xf60
[ 9888.981837][ C2] addrconf_notify+0x684/0x1ca0
[ 9888.981838][ C2] ? __mutex_unlock_slowpath+0xd0/0x670
[ 9888.981839][ C2] ? kasan_unpoison_shadow+0x30/0x40
[ 9888.981840][ C2] ? wait_for_completion+0x250/0x250
[ 9888.981841][ C2] ? inet6_ifinfo_notify+0x100/0x100
[ 9888.981842][ C2] ? dropmon_net_event+0x227/0x410
[ 9888.981843][ C2] ? notifier_call_chain+0x90/0x160
[ 9888.981844][ C2] ? inet6_ifinfo_notify+0x100/0x100
[ 9888.981845][ C2] notifier_call_chain+0x90/0x160
[ 9888.981846][ C2] register_netdevice+0xbe5/0x1070
[ ... ]
Reported-by: syzbot+1d51c8b74efa4c44adeb@syzkaller.appspotmail.com
Fixes: e0a4b99773d3 ("hsr: use upper/lower device infrastructure")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vladimir Oltean says:
====================
Multicast improvement in Ocelot and Felix drivers
This series makes some basic multicast forwarding functionality work for
Felix DSA and for Ocelot switchdev. IGMP/MLD snooping in Felix is still
missing, and there are other improvements to be made in the general area
of multicast address filtering towards the CPU, but let's get these
hardware-specific fixes out of the way first.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current procedure for installing a multicast address is hardcoded
for IPv4. But, in the ocelot hardware, there are 3 different procedures
for IPv4, IPv6 and for regular L2 multicast.
For IPv6 (33-33-xx-xx-xx-xx), it's the same as for IPv4
(01-00-5e-xx-xx-xx), except that the destination port mask is stuffed
into first 2 bytes of the MAC address except into first 3 bytes.
For plain Ethernet multicast, there's no port-in-address stuffing going
on, instead the DEST_IDX (pointer to PGID) is used there, just as for
unicast. So we have to use one of the nonreserved multicast PGIDs that
the hardware has allocated for this purpose.
This patch classifies the type of multicast address based on its first
bytes, then redirects to one of the 3 different hardware procedures.
Note that this gives us a really better way of redirecting PTP frames
sent at 01-1b-19-00-00-00 to the CPU. Previously, Yangbo Lu tried to add
a trapping rule for PTP EtherType but got a lot of pushback:
https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/
But right now, that isn't needed at all. The application stack (ptp4l)
does this for the PTP multicast addresses it's interested in (which are
configurable, and include 01-1b-19-00-00-00):
memset(&mreq, 0, sizeof(mreq));
mreq.mr_ifindex = index;
mreq.mr_type = PACKET_MR_MULTICAST;
mreq.mr_alen = MAC_LEN;
memcpy(mreq.mr_address, addr1, MAC_LEN);
err1 = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP, &mreq,
sizeof(mreq));
Into the kernel, this translates into a dev_mc_add on the switch network
interfaces, and our drivers know that it means they should translate it
into a host MDB address (make the CPU port be the destination).
Previously, this was broken because all mdb addresses were treated as
IPv4 (which 01-1b-19-00-00-00 obviously is not).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current iterators are impossible to understand at first glance
without switching back and forth between the definitions and their
actual use in the for loops.
So introduce some convenience names to help readability.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds the mdb hooks in felix and exports the mdb functions from
ocelot.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When used in DSA mode (as seen in Felix), the DEST_IDX in the MAC table
should point to the PGID for the CPU port (PGID_CPU) and not for the
Ethernet port where the CPU queues are redirected to (also known as Node
Processor Interface - NPI).
Because for Felix this distinction shouldn't really matter (from DSA
perspective, the NPI port _is_ the CPU port), make the ocelot library
act upon the CPU port when NPI mode is enabled. This has no effect for
the mscc_ocelot driver for VSC7514, because that does not use NPI (and
ocelot->npi is -1).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ocelot hardware designers have made some hacks to support multicast
IPv4 and IPv6 addresses. Normally, the MAC table matches on MAC
addresses and the destination ports are selected through the DEST_IDX
field of the respective MAC table entry. The DEST_IDX points to a Port
Group ID (PGID) which contains the bit mask of ports that frames should
be forwarded to. But there aren't a lot of PGIDs (only 80 or so) and
there are clearly many more IP multicast addresses than that, so it
doesn't scale to use this PGID mechanism, so something else was done.
Since the first portion of the MAC address is known, the hack they did
was to use a single PGID for _flooding_ unknown IPv4 multicast
(PGID_MCIPV4 == 62), but for known IP multicast, embed the destination
ports into the first 3 bytes of the MAC address recorded in the MAC
table.
The VSC7514 datasheet explains it like this:
3.9.1.5 IPv4 Multicast Entries
MAC table entries with the ENTRY_TYPE = 2 settings are interpreted
as IPv4 multicast entries.
IPv4 multicasts entries match IPv4 frames, which are classified to
the specified VID, and which have DMAC = 0x01005Exxxxxx, where
xxxxxx is the lower 24 bits of the MAC address in the entry.
Instead of a lookup in the destination mask table (PGID), the
destination set is programmed as part of the entry MAC address. This
is shown in the following table.
Table 78: IPv4 Multicast Destination Mask
Destination Ports Record Bit Field
---------------------------------------------
Ports 10-0 MAC[34-24]
Example: All IPv4 multicast frames in VLAN 12 with MAC 01005E112233 are
to be forwarded to ports 3, 8, and 9. This is done by inserting the
following entry in the MAC table entry:
VALID = 1
VID = 12
MAC = 0x000308112233
ENTRY_TYPE = 2
DEST_IDX = 0
But this procedure is not at all what's going on in the driver. In fact,
the code that embeds the ports into the MAC address looks like it hasn't
actually been tested. This patch applies the procedure described in the
datasheet.
Since there are many other fixes to be made around multicast forwarding
until it works properly, there is no real reason for this patch to be
backported to stable trees, or considered a real fix of something that
should have worked.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove support for context switching between the guest's and host's
desired UMWAIT_CONTROL. Propagating the guest's value to hardware isn't
required for correct functionality, e.g. KVM intercepts reads and writes
to the MSR, and the latency effects of the settings controlled by the
MSR are not architecturally visible.
As a general rule, KVM should not allow the guest to control power
management settings unless explicitly enabled by userspace, e.g. see
KVM_CAP_X86_DISABLE_EXITS. E.g. Intel's SDM explicitly states that C0.2
can improve the performance of SMT siblings. A devious guest could
disable C0.2 so as to improve the performance of their workloads at the
detriment to workloads running in the host or on other VMs.
Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
condition where updates from the host may cause KVM to enter the guest
with the incorrect value. Because updates are are propagated to all
CPUs via IPI (SMP function callback), the value in hardware may be
stale with respect to the cached value and KVM could enter the guest
with the wrong value in hardware. As above, the guest can't observe the
bad value, but it's a weird and confusing wart in the implementation.
Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
lists. Using the lists is only necessary for MSRs that are required for
correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
old hardware, or for MSRs that need to-the-uop precision, e.g. perf
related MSRs. For UMWAIT_CONTROL, the effects are only visible in the
kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
vcpu_vmx_run(). Using the atomic lists is undesirable as they are more
expensive than direct RDMSR/WRMSR.
Furthermore, even if giving the guest control of the MSR is legitimate,
e.g. in pass-through scenarios, it's not clear that the benefits would
outweigh the overhead. E.g. saving and restoring an MSR across a VMX
roundtrip costs ~250 cycles, and if the guest diverged from the host
that cost would be paid on every run of the guest. In other words, if
there is a legitimate use case then it should be enabled by a new
per-VM capability.
Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can
correctly expose other WAITPKG features to the guest, e.g. TPAUSE,
UMWAIT and UMONITOR.
Fixes: 6e3ba4abcea56 ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
Cc: stable@vger.kernel.org
Cc: Jingqi Liu <jingqi.liu@intel.com>
Cc: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Andrii Nakryiko says:
====================
This patch set implements libbpf support for a second kind of special externs,
kernel symbols, in addition to existing Kconfig externs.
Right now, only untyped (const void) externs are supported, which, in
C language, allow only to take their address. In the future, with kernel BTF
getting type info about its own global and per-cpu variables, libbpf will
extend this support with BTF type info, which will allow to also directly
access variable's contents and follow its internal pointers, similarly to how
it's possible today in fentry/fexit programs.
As a first practical use of this functionality, bpftool gained ability to show
PIDs of processes that have open file descriptors for BPF map/program/link/BTF
object. It relies on iter/task_file BPF iterator program to extract this
information efficiently.
There was a bunch of bpftool refactoring (especially Makefile) necessary to
generalize bpftool's internal BPF program use. This includes generalization of
BPF skeletons support, addition of a vmlinux.h generation, extracting and
building minimal subset of bpftool for bootstrapping.
v2->v3:
- fix sec_btf_id check (Hao);
v1->v2:
- docs fixes (Quentin);
- dual GPL/BSD license for pid_inter.bpf.c (Quentin);
- NULL-init kcfg_data (Hao Luo);
rfc->v1:
- show pids, if supported by kernel, always (Alexei);
- switched iter output to binary to support showing process names;
- update man pages;
- fix few minor bugs in libbpf w.r.t. extern iteration.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add statements about bpftool being able to discover process info, holding
reference to BPF map, prog, link, or BTF. Show example output as well.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-10-andriin@fb.com
|
|
Add bpf_iter-based way to find all the processes that hold open FDs against
BPF object (map, prog, link, btf). bpftool always attempts to discover this,
but will silently give up if kernel doesn't yet support bpf_iter BPF programs.
Process name and PID are emitted for each process (task group).
Sample output for each of 4 BPF objects:
$ sudo ./bpftool prog show
2694: cgroup_device tag 8c42dee26e8cd4c2 gpl
loaded_at 2020-06-16T15:34:32-0700 uid 0
xlated 648B jited 409B memlock 4096B
pids systemd(1)
2907: cgroup_skb name egress tag 9ad187367cf2b9e8 gpl
loaded_at 2020-06-16T18:06:54-0700 uid 0
xlated 48B jited 59B memlock 4096B map_ids 2436
btf_id 1202
pids test_progs(2238417), test_progs(2238445)
$ sudo ./bpftool map show
2436: array name test_cgr.bss flags 0x400
key 4B value 8B max_entries 1 memlock 8192B
btf_id 1202
pids test_progs(2238417), test_progs(2238445)
2445: array name pid_iter.rodata flags 0x480
key 4B value 4B max_entries 1 memlock 8192B
btf_id 1214 frozen
pids bpftool(2239612)
$ sudo ./bpftool link show
61: cgroup prog 2908
cgroup_id 375301 attach_type egress
pids test_progs(2238417), test_progs(2238445)
62: cgroup prog 2908
cgroup_id 375344 attach_type egress
pids test_progs(2238417), test_progs(2238445)
$ sudo ./bpftool btf show
1202: size 1527B prog_ids 2908,2907 map_ids 2436
pids test_progs(2238417), test_progs(2238445)
1242: size 34684B
pids bpftool(2258892)
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-9-andriin@fb.com
|
|
Wrap source argument of BPF_CORE_READ family of macros into parentheses to
allow uses like this:
BPF_CORE_READ((struct cast_struct *)src, a, b, c);
Fixes: 7db3822ab991 ("libbpf: Add BPF_CORE_READ/BPF_CORE_READ_INTO helpers")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200619231703.738941-8-andriin@fb.com
|
|
Adapt Makefile to support BPF skeleton generation beyond single profiler.bpf.c
case. Also add vmlinux.h generation and switch profiler.bpf.c to use it.
clang-bpf-global-var feature is extended and renamed to clang-bpf-co-re to
check for support of preserve_access_index attribute, which, together with BTF
for global variables, is the minimum requirement for modern BPF programs.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-7-andriin@fb.com
|
|
Build minimal "bootstrap mode" bpftool to enable skeleton (and, later,
vmlinux.h generation), instead of building almost complete, but slightly
different (w/o skeletons, etc) bpftool to bootstrap complete bpftool build.
Current approach doesn't scale well (engineering-wise) when adding more BPF
programs to bpftool and other complicated functionality, as it requires
constant adjusting of the code to work in both bootstrapped mode and normal
mode.
So it's better to build only minimal bpftool version that supports only BPF
skeleton code generation and BTF-to-C conversion. Thankfully, this is quite
easy to accomplish due to internal modularity of bpftool commands. This will
also allow to keep adding new functionality to bpftool in general, without the
need to care about bootstrap mode for those new parts of bpftool.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-6-andriin@fb.com
|
|
Move functions that parse map and prog by id/tag/name/etc outside of
map.c/prog.c, respectively. These functions are used outside of those files
and are generic enough to be in common. This also makes heavy-weight map.c and
prog.c more decoupled from the rest of bpftool files and facilitates more
lightweight bootstrap bpftool variant.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-5-andriin@fb.com
|
|
Validate libbpf is able to handle weak and strong kernel symbol externs in BPF
code correctly.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-4-andriin@fb.com
|
|
Add support for another (in addition to existing Kconfig) special kind of
externs in BPF code, kernel symbol externs. Such externs allow BPF code to
"know" kernel symbol address and either use it for comparisons with kernel
data structures (e.g., struct file's f_op pointer, to distinguish different
kinds of file), or, with the help of bpf_probe_user_kernel(), to follow
pointers and read data from global variables. Kernel symbol addresses are
found through /proc/kallsyms, which should be present in the system.
Currently, such kernel symbol variables are typeless: they have to be defined
as `extern const void <symbol>` and the only operation you can do (in C code)
with them is to take its address. Such extern should reside in a special
section '.ksyms'. bpf_helpers.h header provides __ksym macro for this. Strong
vs weak semantics stays the same as with Kconfig externs. If symbol is not
found in /proc/kallsyms, this will be a failure for strong (non-weak) extern,
but will be defaulted to 0 for weak externs.
If the same symbol is defined multiple times in /proc/kallsyms, then it will
be error if any of the associated addresses differs. In that case, address is
ambiguous, so libbpf falls on the side of caution, rather than confusing user
with randomly chosen address.
In the future, once kernel is extended with variables BTF information, such
ksym externs will be supported in a typed version, which will allow BPF
program to read variable's contents directly, similarly to how it's done for
fentry/fexit input arguments.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-3-andriin@fb.com
|
|
Switch existing Kconfig externs to be just one of few possible kinds of more
generic externs. This refactoring is in preparation for ksymbol extern
support, added in the follow up patch. There are no functional changes
intended.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-2-andriin@fb.com
|
|
Syzbot reports an use-after-free in workqueue context:
BUG: KASAN: use-after-free in mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737
mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737
__smsc95xx_mdio_read drivers/net/usb/smsc95xx.c:217 [inline]
smsc95xx_mdio_read+0x583/0x870 drivers/net/usb/smsc95xx.c:278
check_carrier+0xd1/0x2e0 drivers/net/usb/smsc95xx.c:644
process_one_work+0x777/0xf90 kernel/workqueue.c:2274
worker_thread+0xa8f/0x1430 kernel/workqueue.c:2420
kthread+0x2df/0x300 kernel/kthread.c:255
It looks like that smsc95xx_unbind() is freeing the structures that are
still in use by the concurrently running workqueue callback. Thus switch
to using cancel_delayed_work_sync() to ensure the work callback really
is no longer active.
Reported-by: syzbot+29dc7d4ae19b703ff947@syzkaller.appspotmail.com
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ido Schimmel says:
====================
mlxsw: Offload TC action pedit munge tcp/udp sport/dport
Petr says:
On Spectrum-2 and Spectrum-3, it is possible to overwrite L4 port number of
a TCP or UDP packet in the ACL engine. That corresponds to the pedit munges
of tcp and udp sport resp. dport fields. Offload these munges on the
systems where they are supported.
The current offloading code assumes that all systems support the same set
of fields. This now changes, so in patch #1 first split handling of pedit
munges by chip type. The analysis of which packet field a given munge
describes is kept generic.
Patch #2 introduces the new flexible action fields. Patch #3 then adds the
new pedit fields, and dispatches on them on Spectrum>1.
Patch #4 adds a forwarding selftest for pedit dsfield, applicable to SW as
well as HW datapaths.
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a test that checks that pedit adjusts port numbers of tcp and udp
packets.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Spectrum-2 supports an ACL action L4_PORT, which allows TCP and UDP source
and destination port number change. Offload suitable mangles to this
action.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add fields related to L4_PORT_ACTION, which is used for changing of TCP and
UDP port numbers.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Certain ACL actions are only available on some Spectrum revisions. In
particular, L4_PORT_ACTION is not available on Spectrum-1. Introduce a
new ops struct intended to hold these differences, mlxsw_sp_rulei_ops.
Prime it with a sole member, act_mangle_field, meant for handling of
pedit mangles.
Create two ops structures, one for Spectrum-1, the other for Spectrum-2
and above. Add callbacks for act_mangle_field and dispatch to the common
handler.
Invoke mlxsw_sp_rulei_ops.act_mangle_field from the field mangler
instead of calling the common handler directly.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The second commit cited below performed a cast of 'u32 buffsize' to
'(u16 *)' when calling mlxsw_sp_port_headroom_8x_adjust():
mlxsw_sp_port_headroom_8x_adjust(mlxsw_sp_port, (u16 *) &buffsize);
Colin noted that this will behave differently on big endian
architectures compared to little endian architectures.
Fix this by following Colin's suggestion and have the function accept
and return 'u32' instead of passing the current size by reference.
Fixes: da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
Fixes: 60833d54d56c ("mlxsw: spectrum: Adjust headroom buffers for 8x ports")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Colin Ian King <colin.king@canonical.com>
Suggested-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Maxim Kochetkov says:
====================
Add Marvell 88E1340S, 88E1548P support
This patch series add new PHY id support.
Russell King asked to use single style for referencing functions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for this new phy ID.
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for this new phy ID.
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The kernel in general does not use &func referencing format.
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Heiner Kallweit says:
====================
r8169: mark device as detached in PCI D3 and improve locking
Mark the netdevice as detached whenever parent is in PCI D3hot and not
accessible. This mainly applies to runtime-suspend state.
In addition take RTNL lock in suspend calls, this allows to remove
the driver-specific mutex and improve PM callbacks in general.
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simplify rtl8169_runtime_resume() by calling rtl8169_resume().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that the critical sections are protected with RTNL lock, we don't
need a separate mutex any longer.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Most relevant ops (open, close, ethtool ops) are protected with RTNL
lock by net core. Make sure that such ops can't be interrupted by
e.g. (runtime-)suspending by taking the RTNL lock in suspend ops
and the PCI error handler.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Factor out bringing device up to a new function rtl8169_up(), similar
to rtl8169_down() for bringing the device down.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|