Age | Commit message (Collapse) | Author | Files | Lines |
|
Simple overlapping changes to linux/vermagic.h
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Fix memory leak in netfilter flowtable, from Roi Dayan.
2) Ref-count leaks in netrom and tipc, from Xiyu Yang.
3) Fix warning when mptcp socket is never accepted before close, from
Florian Westphal.
4) Missed locking in ovs_ct_exit(), from Tonghao Zhang.
5) Fix large delays during PTP synchornization in cxgb4, from Rahul
Lakkireddy.
6) team_mode_get() can hang, from Taehee Yoo.
7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver,
from Niklas Schnelle.
8) Fix handling of bpf XADD on BTF memory, from Jann Horn.
9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson.
10) Missing queue memory release in iwlwifi pcie code, from Johannes
Berg.
11) Fix NULL deref in macvlan device event, from Taehee Yoo.
12) Initialize lan87xx phy correctly, from Yuiko Oshino.
13) Fix looping between VRF and XFRM lookups, from David Ahern.
14) etf packet scheduler assumes all sockets are full sockets, which is
not necessarily true. From Eric Dumazet.
15) Fix mptcp data_fin handling in RX path, from Paolo Abeni.
16) fib_select_default() needs to handle nexthop objects, from David
Ahern.
17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun.
18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca.
19) Correct rx/tx stats in bcmgenet driver, from Doug Berger.
20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix
from Luke Nelson.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits)
selftests/bpf: Fix a couple of broken test_btf cases
tools/runqslower: Ensure own vmlinux.h is picked up first
bpf: Make bpf_link_fops static
bpftool: Respect the -d option in struct_ops cmd
selftests/bpf: Add test for freplace program with expected_attach_type
bpf: Propagate expected_attach_type when verifying freplace programs
bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd
bpf, x86_32: Fix logic error in BPF_LDX zero-extension
bpf, x86_32: Fix clobbering of dst for BPF_JSET
bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
bpf: Fix reStructuredText markup
net: systemport: suppress warnings on failed Rx SKB allocations
net: bcmgenet: suppress warnings on failed Rx SKB allocations
macsec: avoid to set wrong mtu
mac80211: sta_info: Add lockdep condition for RCU list usage
mac80211: populate debugfs only after cfg80211 init
net: bcmgenet: correct per TX/RX ring statistics
net: meth: remove spurious copyright text
net: phy: bcm84881: clear settings on link down
chcr: Fix CPU hard lockup
...
|
|
Move the callback into the phylink_config structure, rather than
providing a callback to set this up.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Help end-users of the 'tc' command to see if the drivers ndo_setup_tc
function call fails. Troubleshooting when this happens is non-trivial
(see full process here[1]), and results in net_device getting assigned
the 'qdisc noop', which will drop all TX packets on the interface.
[1]: https://github.com/xdp-project/xdp-project/blob/master/areas/arm64/board_nxp_ls1088/nxp-board04-troubleshoot-qdisc.org
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just three changes:
* fix a wrong GFP_KERNEL in hwsim
* fix the debugfs mess after the mac80211 registration race fix
* suppress false-positive RCU list lockdep warnings
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function sta_info_get_by_idx() uses RCU list primitive.
It is called with local->sta_mtx held from mac80211/cfg.c.
Add lockdep expression to avoid any false positive RCU list warnings.
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Link: https://lore.kernel.org/r/20200409082906.27427-1-madhuparnabhowmik10@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When fixing the initialization race, we neglected to account for
the fact that debugfs is initialized in wiphy_register(), and
some debugfs things went missing (or rather were rerooted to the
global debugfs root).
Fix this by adding debugfs entries only after wiphy_register().
This requires some changes in the rate control code since it
currently adds debugfs at alloc time, which can no longer be
done after the reordering.
Reported-by: Jouni Malinen <j@w1.fi>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Reported-by: Hauke Mehrtens <hauke@hauke-m.de>
Reported-by: Felix Fietkau <nbd@nbd.name>
Cc: stable@vger.kernel.org
Fixes: 52e04b4ce5d0 ("mac80211: fix race in ieee80211_register_hw()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/20200423111344.0e00d3346f12.Iadc76a03a55093d94391fc672e996a458702875d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When setting the meter rate to 4+Gbps, there is an
overflow, the meters don't work as expected.
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before invoking the ovs_meter_cmd_reply_stats, "meter"
was checked, so don't check it agin in that function.
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Don't allow user to create meter unlimitedly, which may cause
to consume a large amount of kernel memory. The max number
supported is decided by physical memory and 20K meters as default.
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In kernel datapath of Open vSwitch, there are only 1024
buckets of meter in one datapath. If installing more than
1024 (e.g. 8192) meters, it may lead to the performance drop.
But in some case, for example, Open vSwitch used as edge
gateway, there should be 20K at least, where meters used for
IP address bandwidth limitation.
[Open vSwitch userspace datapath has this issue too.]
For more scalable meter, this patch use meter array instead of
hash tables, and expand/shrink the array when necessary. So we
can install more meters than before in the datapath.
Introducing the struct *dp_meter_instance, it's easy to
expand meter though changing the *ti point in the struct
*dp_meter_table.
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove unnecassary casts in the argument to kfree.
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
x25_lapb_receive_frame() invokes x25_get_neigh(), which returns a
reference of the specified x25_neigh object to "nb" with increased
refcnt.
When x25_lapb_receive_frame() returns, local variable "nb" becomes
invalid, so the refcount should be decreased to keep refcount balanced.
The reference counting issue happens in one path of
x25_lapb_receive_frame(). When pskb_may_pull() returns false, the
function forgets to decrease the refcnt increased by x25_get_neigh(),
causing a refcnt leak.
Fix this issue by calling x25_neigh_put() when pskb_may_pull() returns
false.
Fixes: cb101ed2c3c7 ("x25: Handle undersized/fragmented skbs")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Normal there should be checked for nla_put_in6_addr like other
usage in net.
Detected by CoverityScan, CID# 1461639
Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
Signed-off-by: Bo YU <tsu.yubo@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gro_flush_timeout and napi_defer_hard_irqs can be read
from napi_complete_done() while other cpus write the value,
whithout explicit synchronization.
Use READ_ONCE()/WRITE_ONCE() to annotate the races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Back in commit 3b47d30396ba ("net: gro: add a per device gro flush timer")
we added the ability to arm one high resolution timer, that we used
to keep not-complete packets in GRO engine a bit longer, hoping that further
frames might be added to them.
Since then, we added the napi_complete_done() interface, and commit
364b6055738b ("net: busy-poll: return busypolling status to drivers")
allowed drivers to avoid re-arming NIC interrupts if we made a promise
that their NAPI poll() handler would be called in the near future.
This infrastructure can be leveraged, thanks to a new device parameter,
which allows to arm the napi hrtimer, instead of re-arming the device
hard IRQ.
We have noticed that on some servers with 32 RX queues or more, the chit-chat
between the NIC and the host caused by IRQ delivery and re-arming could hurt
throughput by ~20% on 100Gbit NIC.
In contrast, hrtimers are using local (percpu) resources and might have lower
cost.
The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy
than gro_flush_timeout (/sys/class/net/ethX/)
By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.
This patch does not change the prior behavior of gro_flush_timeout
if used alone : NIC hard irqs should be rearmed as before.
One concrete usage can be :
echo 20000 >/sys/class/net/eth1/gro_flush_timeout
echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs
If at least one packet is retired, then we will reset napi counter
to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
of the queue.
On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
avoidance was only possible if napi->poll() was exhausting its budget
and not call napi_complete_done().
This feature also can be used to work around some non-optimal NIC irq
coalescing strategies.
Having the ability to insert XX usec delays between each napi->poll()
can increase cache efficiency, since we increase batch sizes.
It also keeps serving cpus not idle too long, reducing tail latencies.
Co-developed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gro_cells lib is used by different encapsulating netdevices, such as
geneve, macsec, vxlan etc. to speed up decapsulated traffic processing.
CPU tag is a sort of "encapsulation", and we can use the same mechs to
greatly improve overall DSA performance.
skbs are passed to the GRO layer after removing CPU tags, so we don't
need any new packet offload types as it was firstly proposed by me in
the first GRO-over-DSA variant [1].
The size of struct gro_cells is sizeof(void *), so hot struct
dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields
remain in one 32-byte cacheline.
The other positive side effect is that drivers for network devices
that can be shipped as CPU ports of DSA-driven switches can now use
napi_gro_frags() to pass skbs to kernel. Packets built that way are
completely non-linear and are likely being dropped without GRO.
This was tested on to-be-mainlined-soon Ethernet driver that uses
napi_gro_frags(), and the overall performance was on par with the
variant from [1], sometimes even better due to minimal overhead.
net.core.gro_normal_batch tuning may help to push it to the limit
on particular setups and platforms.
iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup
on 1.2 GHz MIPS board:
5.7-rc2 baseline:
[ID] Interval Transfer Bitrate Retr
[ 5] 0.00-120.01 sec 9.00 GBytes 644 Mbits/sec 413 sender
[ 5] 0.00-120.00 sec 8.99 GBytes 644 Mbits/sec receiver
Iface RX packets TX packets
eth0 7097731 7097702
port0 426050 6671829
port1 6671681 425862
port1.218 6671677 425851
With this patch:
[ID] Interval Transfer Bitrate Retr
[ 5] 0.00-120.01 sec 12.2 GBytes 870 Mbits/sec 122 sender
[ 5] 0.00-120.00 sec 12.2 GBytes 870 Mbits/sec receiver
Iface RX packets TX packets
eth0 9474792 9474777
port0 455200 353288
port1 9019592 455035
port1.218 353144 455024
v2:
- Add some performance examples in the commit message;
- No functional changes.
[1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/
Signed-off-by: Alexander Lobakin <bloodyreaper@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RFC4862 5.5.3 e) prevents received Router Advertisements from reducing
the Valid Lifetime of configured addresses to less than two hours, thus
preventing hosts from reacting to the information provided by a router
that has positive knowledge that a prefix has become invalid.
This patch makes hosts honor all Valid Lifetime values, as per
draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help
mitigate the problem discussed in draft-ietf-v6ops-slaac-renum.
Note: Attacks aiming at disabling an advertised prefix via a Valid
Lifetime of 0 are not really more harmful than other attacks
that can be performed via forged RA messages, such as those
aiming at completely disabling a next-hop router via an RA that
advertises a Router Lifetime of 0, or performing a Denial of
Service (DoS) attack by advertising illegitimate prefixes via
forged PIOs. In scenarios where RA-based attacks are of concern,
proper mitigations such as RA-Guard [RFC6105] [RFC7113] should
be implemented.
Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull nfsd fixes from Chuck Lever:
"The first set of 5.7-rc fixes for NFS server issues.
These were all unresolved at the time the 5.7 window opened, and
needed some additional time to ensure they were correctly addressed.
They are ready now.
At the moment I know of one more urgent issue regarding the NFS
server. A fix has been tested and is under review. I expect to send
one more pull request, containing this fix (which now consists of 3
patches).
Fixes:
- Address several use-after-free and memory leak bugs
- Prevent a backchannel livelock"
* tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6:
svcrdma: Fix leak of svc_rdma_recv_ctxt objects
svcrdma: Fix trace point use-after-free race
SUNRPC: Fix backchannel RPC soft lockups
SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge
nfsd: memory corruption in nfsd4_lock()
|
|
A user reported [0] hitting the WARN_ON in fib_info_nh:
[ 8633.839816] ------------[ cut here ]------------
[ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
...
[ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
...
[ 8633.839848] RSP: 0018:ffffb04d407f7d00 EFLAGS: 00010286
[ 8633.839850] RAX: 0000000000000000 RBX: ffff9460b9897ee8 RCX: 00000000000000fe
[ 8633.839851] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000000
[ 8633.839852] RBP: ffff946076049850 R08: 0000000059263a83 R09: ffff9460840e4000
[ 8633.839853] R10: 0000000000000014 R11: 0000000000000000 R12: ffffb04d407f7dc0
[ 8633.839854] R13: ffffffffa4ce3240 R14: 0000000000000000 R15: ffff9460b7681f60
[ 8633.839857] FS: 00007fcac2e02700(0000) GS:ffff9460bdc00000(0000) knlGS:0000000000000000
[ 8633.839858] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8633.839859] CR2: 00007f27beb77e28 CR3: 0000000077734000 CR4: 00000000000006f0
[ 8633.839867] Call Trace:
[ 8633.839871] ip_route_output_key_hash_rcu+0x421/0x890
[ 8633.839873] ip_route_output_key_hash+0x5e/0x80
[ 8633.839876] ip_route_output_flow+0x1a/0x50
[ 8633.839878] __ip4_datagram_connect+0x154/0x310
[ 8633.839880] ip4_datagram_connect+0x28/0x40
[ 8633.839882] __sys_connect+0xd6/0x100
...
The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.
Add test case that covers the affected code path.
[0] https://github.com/FRRouting/frr/issues/6089
Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The NetLabel Tools project has moved from http://netlabel.sf.net to a
GitHub project. Update to directly refer to the new home for the tools.
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The data fin flag is set only via a DSS option, but
mptcp_incoming_options() copies it unconditionally from the
provided RX options.
Since we do not clear all the mptcp sock RX options in a
socket free/alloc cycle, we can end-up with a stray data_fin
value while parsing e.g. MPC packets.
That would lead to mapping data corruption and will trigger
a few WARN_ON() in the RX path.
Instead of adding a costly memset(), fetch the data_fin flag
only for DSS packets - when we always explicitly initialize
such bit at option parsing time.
Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the following coccicheck warning:
net/caif/caif_dev.c:410:2-13: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:445:2-13: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:145:1-12: WARNING: Assignment of 0/1 to bool
variable
net/caif/caif_dev.c:223:1-12: WARNING: Assignment of 0/1 to bool
variable
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When starting shutdown in sctp_sf_do_dupcook_a(), get the value for
SHUTDOWN Cumulative TSN Ack from the new association, which is
reconstructed from the cookie, instead of the old association, which
the peer doesn't have anymore.
Otherwise the SHUTDOWN is either ignored or replied to with an ABORT
by the peer because CTSN Ack doesn't match the peer's Initial TSN.
Fixes: bdf6fa52f01b ("sctp: handle association restarts when the socket is closed.")
Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we start shutdown in sctp_sf_do_dupcook_a(), we want to bundle
the SHUTDOWN with the COOKIE-ACK to ensure that the peer receives them
at the same time and in the correct order. This bundling was broken by
commit 4ff40b86262b ("sctp: set chunk transport correctly when it's a
new asoc"), which assigns a transport for the COOKIE-ACK, but not for
the SHUTDOWN.
Fix this by passing a reference to the COOKIE-ACK chunk as an argument
to sctp_sf_do_9_2_start_shutdown() and onward to
sctp_make_shutdown(). This way the SHUTDOWN chunk is assigned the same
transport as the COOKIE-ACK chunk, which allows them to be bundled.
In sctp_sf_do_9_2_start_shutdown(), the void *arg parameter was
previously unused. Now that we're taking it into use, it must be a
valid pointer to a chunk, or NULL. There is only one call site where
it's not, in sctp_sf_autoclose_timer_expire(). Fix that too.
Fixes: 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc")
Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no reason to fail the probing of the switch if the MTU couldn't
be configured correctly (either the switch port itself, or the host
port) for whatever reason. MTU-sized traffic probably won't work, sure,
but we can still probably limp on and support some form of communication
anyway, which the users would probably appreciate more.
Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports")
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb->sk does not always point to a full blown socket,
we need to use sk_fullsock() before accessing fields which
only make sense on full socket.
BUG: KASAN: use-after-free in report_sock_error+0x286/0x300 net/sched/sch_etf.c:141
Read of size 1 at addr ffff88805eb9b245 by task syz-executor.5/9630
CPU: 1 PID: 9630 Comm: syz-executor.5 Not tainted 5.7.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382
__kasan_report.cold+0x35/0x4d mm/kasan/report.c:511
kasan_report+0x33/0x50 mm/kasan/common.c:625
report_sock_error+0x286/0x300 net/sched/sch_etf.c:141
etf_enqueue_timesortedlist+0x389/0x740 net/sched/sch_etf.c:170
__dev_xmit_skb net/core/dev.c:3710 [inline]
__dev_queue_xmit+0x154a/0x30a0 net/core/dev.c:4021
neigh_hh_output include/net/neighbour.h:499 [inline]
neigh_output include/net/neighbour.h:508 [inline]
ip6_finish_output2+0xfb5/0x25b0 net/ipv6/ip6_output.c:117
__ip6_finish_output+0x442/0xab0 net/ipv6/ip6_output.c:143
ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x239/0x810 net/ipv6/ip6_output.c:176
dst_output include/net/dst.h:435 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ip6_xmit+0xe1a/0x2090 net/ipv6/ip6_output.c:280
tcp_v6_send_synack+0x4e7/0x960 net/ipv6/tcp_ipv6.c:521
tcp_rtx_synack+0x10d/0x1a0 net/ipv4/tcp_output.c:3916
inet_rtx_syn_ack net/ipv4/inet_connection_sock.c:669 [inline]
reqsk_timer_handler+0x4c2/0xb40 net/ipv4/inet_connection_sock.c:763
call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1405
expire_timers kernel/time/timer.c:1450 [inline]
__run_timers kernel/time/timer.c:1774 [inline]
__run_timers kernel/time/timer.c:1741 [inline]
run_timer_softirq+0x623/0x1600 kernel/time/timer.c:1787
__do_softirq+0x26c/0x9f7 kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x192/0x1d0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:546 [inline]
smp_apic_timer_interrupt+0x19e/0x600 arch/x86/kernel/apic/apic.c:1140
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
</IRQ>
RIP: 0010:des_encrypt+0x157/0x9c0 lib/crypto/des.c:792
Code: 85 22 06 00 00 41 31 dc 41 8b 4d 04 44 89 e2 41 83 e4 3f 4a 8d 3c a5 60 72 72 88 81 e2 3f 3f 3f 3f 48 89 f8 48 c1 e8 03 31 d9 <0f> b6 34 28 48 89 f8 c1 c9 04 83 e0 07 83 c0 03 40 38 f0 7c 09 40
RSP: 0018:ffffc90003b5f6c0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff10e4e55 RBX: 00000000d2f846d0 RCX: 00000000d2f846d0
RDX: 0000000012380612 RSI: ffffffff839863ca RDI: ffffffff887272a8
RBP: dffffc0000000000 R08: ffff888091d0a380 R09: 0000000000800081
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000012
R13: ffff8880a8ae8078 R14: 00000000c545c93e R15: 0000000000000006
cipher_crypt_one crypto/cipher.c:75 [inline]
crypto_cipher_encrypt_one+0x124/0x210 crypto/cipher.c:82
crypto_cbcmac_digest_update+0x1b5/0x250 crypto/ccm.c:830
crypto_shash_update+0xc4/0x120 crypto/shash.c:119
shash_ahash_update+0xa3/0x110 crypto/shash.c:246
crypto_ahash_update include/crypto/hash.h:547 [inline]
hash_sendmsg+0x518/0xad0 crypto/algif_hash.c:102
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x308/0x7e0 net/socket.c:2362
___sys_sendmsg+0x100/0x170 net/socket.c:2416
__sys_sendmmsg+0x195/0x480 net/socket.c:2506
__do_sys_sendmmsg net/socket.c:2535 [inline]
__se_sys_sendmmsg net/socket.c:2532 [inline]
__x64_sys_sendmmsg+0x99/0x100 net/socket.c:2532
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x45c829
Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f6d9528ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000004fc080 RCX: 000000000045c829
RDX: 0000000000000001 RSI: 0000000020002640 RDI: 0000000000000004
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000008d7 R14: 00000000004cb7aa R15: 00007f6d9528f6d4
Fixes: 4b15c7075352 ("net/sched: Make etf report drops on error_queue")
Fixes: 25db26a91364 ("net/sched: Introduce the ETF Qdisc")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add tracepoint support for QRTR with NS as the first candidate. Later on
this can be extended to core QRTR and transport drivers.
The trace_printk() used in NS has been replaced by tracepoints.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net/ipv6/ila/ila_xlat.c:604:0: warning: macro "ILA_HASH_TABLE_SIZE" is not used [-Wunused-macros]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the act_ct SW offload in flowtable, The counter of the conntrack
entry will never update. So update the nf_conn_acct conuter in act_ct
flowtable software offload.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPSKB_XFRM_TRANSFORMED and IP6SKB_XFRM_TRANSFORMED are skb flags set by
xfrm code to tell other skb handlers that the packet has been passed
through the xfrm output functions. Simplify the code and just always
set them rather than conditionally based on netfilter enabled thus
making the flag available for other users.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 018d26fcd12a ("cgroup, netclassid: periodically release file_lock
on classid") added a second cond_resched to write_classid indirectly by
update_classid_task. Remove the one in write_classid.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Get rid of linux/vermagic.h includes, so that MODULE_ARCH_VERMAGIC from
the arch header arch/x86/include/asm/module.h won't be redefined.
In file included from ./include/linux/module.h:30,
from drivers/net/ethernet/3com/3c515.c:56:
./arch/x86/include/asm/module.h:73: warning: "MODULE_ARCH_VERMAGIC"
redefined
73 | # define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY
|
In file included from drivers/net/ethernet/3com/3c515.c:25:
./include/linux/vermagic.h:28: note: this is the location of the
previous definition
28 | #define MODULE_ARCH_VERMAGIC ""
|
Fixes: 6bba2e89a88c ("net/3com: Delete driver and module versions from 3com drivers")
Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Shannon Nelson <snelson@pensando.io> # ionic
Acked-by: Sebastian Reichel <sre@kernel.org> # power
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) flow_block_cb memleak in nf_flow_table_offload_del_cb(), from Roi Dayan.
2) Fix error path handling in nf_nat_inet_register_fn(), from Hillf Danton.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The variable rc is being assigned with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We don't need them, as we can use the current ingress opt
data instead. Setting them in syn_recv_sock() may causes
inconsistent mptcp socket status, as per previous commit.
Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If multiple CPUs races on the same req_sock in syn_recv_sock(),
flipping such field can cause inconsistent child socket status.
When racing, the CPU losing the req ownership may still change
the mptcp request socket mp_capable flag while the CPU owning
the request is cloning the socket, leaving the child socket with
'is_mptcp' set but no 'mp_capable' flag.
Such socket will stay with 'conn' field cleared, heading to oops
in later mptcp callback.
Address the issue tracking the fallback status in a local variable.
Fixes: 58b09919626b ("mptcp: create msk early")
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Following splat can occur during self test:
BUG: KASAN: use-after-free in subflow_data_ready+0x156/0x160
Read of size 8 at addr ffff888100c35c28 by task mptcp_connect/4808
subflow_data_ready+0x156/0x160
tcp_child_process+0x6a3/0xb30
tcp_v4_rcv+0x2231/0x3730
ip_protocol_deliver_rcu+0x5c/0x860
ip_local_deliver_finish+0x220/0x360
ip_local_deliver+0x1c8/0x4e0
ip_rcv_finish+0x1da/0x2f0
ip_rcv+0xd0/0x3c0
__netif_receive_skb_one_core+0xf5/0x160
__netif_receive_skb+0x27/0x1c0
process_backlog+0x21e/0x780
net_rx_action+0x35f/0xe90
do_softirq+0x4c/0x50
[..]
This occurs when accessing subflow_ctx->conn.
Problem is that tcp_child_process() calls listen sockets'
sk_data_ready() notification, but it doesn't hold the listener
lock. Another cpu calling close() on the listener will then cause
transition of refcount to 0.
Fixes: 58b09919626bf ("mptcp: create msk early")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When an interface is executing a self test, put the interface into
operative status testing.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to speed, duplex and dorment, report the testing status
in sysfs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RFC 2863 defines the operational state testing. Add support for this
state, both as a IF_LINK_MODE_ and __LINK_STATE_.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation") fixed a
problem found by syzbot an unfortunate logic error meant that it
also broke IPV6_ADDRFORM.
Rearrange the checks so that the earlier test is just one of the series
of checks made before moving the socket from IPv6 to IPv4.
Fixes: b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation")
Signed-off-by: John Haxby <john.haxby@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot wrote:
| =============================
| WARNING: suspicious RCU usage
| 5.7.0-rc1+ #45 Not tainted
| -----------------------------
| net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
|
| other info that might help us debug this:
| rcu_scheduler_active = 2, debug_locks = 1
| ...
|
| stack backtrace:
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| ...
| ovs_ct_exit
| ovs_exit_net
| ops_exit_list.isra.7
| cleanup_net
| process_one_work
| worker_thread
To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
lockdep_ovsl_is_held as optional lockdep expression.
Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Yi-Hung Wei <yihung.wei@gmail.com>
Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A case of warning was reported by syzbot.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 19934 at net/netfilter/nf_nat_core.c:1106
nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 19934 Comm: syz-executor.5 Not tainted 5.6.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:221
__warn.cold+0x2f/0x35 kernel/panic.c:582
report_bug+0x27b/0x2f0 lib/bug.c:195
fixup_bug arch/x86/kernel/traps.c:175 [inline]
fixup_bug arch/x86/kernel/traps.c:170 [inline]
do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
Code: ff df 48 c1 ea 03 80 3c 02 00 75 75 48 8b 44 24 10 4c 89 ef 48 c7 00 00 00 00 00 e8 e8 f8 53 fb e9 4d fe ff ff e8 ee 9c 16 fb <0f> 0b e9 41 fe ff ff e8 e2 45 54 fb e9 b5 fd ff ff 48 8b 7c 24 20
RSP: 0018:ffffc90005487208 EFLAGS: 00010246
RAX: 0000000000040000 RBX: 0000000000000004 RCX: ffffc9001444a000
RDX: 0000000000040000 RSI: ffffffff865c94a2 RDI: 0000000000000005
RBP: ffff88808b5cf000 R08: ffff8880a2620140 R09: fffffbfff14bcd79
R10: ffffc90005487208 R11: fffffbfff14bcd78 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
nf_nat_ipv6_unregister_fn net/netfilter/nf_nat_proto.c:1017 [inline]
nf_nat_inet_register_fn net/netfilter/nf_nat_proto.c:1038 [inline]
nf_nat_inet_register_fn+0xfc/0x140 net/netfilter/nf_nat_proto.c:1023
nf_tables_register_hook net/netfilter/nf_tables_api.c:224 [inline]
nf_tables_addchain.constprop.0+0x82e/0x13c0 net/netfilter/nf_tables_api.c:1981
nf_tables_newchain+0xf68/0x16a0 net/netfilter/nf_tables_api.c:2235
nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
___sys_sendmsg+0x100/0x170 net/socket.c:2416
__sys_sendmsg+0xec/0x1b0 net/socket.c:2449
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
and to quiesce it, unregister NFPROTO_IPV6 hook instead of NFPROTO_INET
in case of failing to register NFPROTO_IPV4 hook.
Reported-by: syzbot <syzbot+33e06702fd6cffc24c40@syzkaller.appspotmail.com>
Fixes: d164385ec572 ("netfilter: nat: add inet family nat support")
Cc: Florian Westphal <fw@strlen.de>
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We need to set sk_state to CLOSED, else we will get following:
IPv4: Attempt to release TCP socket in state 3 00000000b95f109e
IPv4: Attempt to release TCP socket in state 10 00000000b95f109e
First one is from inet_sock_destruct(), second one from
mptcp_sk_clone failure handling. Setting sk_state to CLOSED isn't
enough, we also need to orphan sk so it has DEAD flag set.
Otherwise, a very similar warning is printed from inet_sock_destruct().
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Following snippet (replicated from syzkaller reproducer) generates
warning: "IPv4: Attempt to release TCP socket in state 1".
int main(void) {
struct sockaddr_in sin1 = { .sin_family = 2, .sin_port = 0x4e20,
.sin_addr.s_addr = 0x010000e0, };
struct sockaddr_in sin2 = { .sin_family = 2,
.sin_addr.s_addr = 0x0100007f, };
struct sockaddr_in sin3 = { .sin_family = 2, .sin_port = 0x4e20,
.sin_addr.s_addr = 0x0100007f, };
int r0 = socket(0x2, 0x1, 0x106);
int r1 = socket(0x2, 0x1, 0x106);
bind(r1, (void *)&sin1, sizeof(sin1));
connect(r1, (void *)&sin2, sizeof(sin2));
listen(r1, 3);
return connect(r0, (void *)&sin3, 0x4d);
}
Reason is that the newly generated mptcp socket is closed via the ulp
release of the tcp listener socket when its accept backlog gets purged.
To fix this, delay setting the ESTABLISHED state until after userspace
calls accept and via mptcp specific destructor.
Fixes: 58b09919626bf ("mptcp: create msk early")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/9
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch makes it impossible that cmpri or cmpre values are set to the
value 16 which is not possible, because these are 4 bit values. We
currently run in an overflow when assigning the value 16 to it.
According to the standard a value of 16 can be interpreted as a full
elided address which isn't possible to set as compression value. A reason
why this cannot be set is that the current ipv6 header destination address
should never show up inside the segments of the rpl header. In this case we
run in a overflow and the address will have no compression at all. Means
cmpri or compre is set to 0.
As we handle cmpri and cmpre sometimes as unsigned char or 4 bit value
inside the rpl header the current behaviour ends in an invalid header
format. This patch simple use the best compression method if we ever run
into the case that the destination address is showed up inside the rpl
segments. We avoid the overflow handling and the rpl header is still valid,
even when we have the destination address inside the rpl segments.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tipc_rcv() invokes tipc_node_find() twice, which returns a reference of
the specified tipc_node object to "n" with increased refcnt.
When tipc_rcv() returns or a new object is assigned to "n", the original
local reference of "n" becomes invalid, so the refcount should be
decreased to keep refcount balanced.
The issue happens in some paths of tipc_rcv(), which forget to decrease
the refcnt increased by tipc_node_find() and will cause a refcnt leak.
Fix this issue by calling tipc_node_put() before the original object
pointed by "n" becomes invalid.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tipc_crypto_rcv() invokes tipc_aead_get(), which returns a reference of
the tipc_aead object to "aead" with increased refcnt.
When tipc_crypto_rcv() returns, the original local reference of "aead"
becomes invalid, so the refcount should be decreased to keep refcount
balanced.
The issue happens in one error path of tipc_crypto_rcv(). When TIPC
message decryption status is EINPROGRESS or EBUSY, the function forgets
to decrease the refcnt increased by tipc_aead_get() and causes a refcnt
leak.
Fix this issue by calling tipc_aead_put() on the error path when TIPC
message decryption status is EINPROGRESS or EBUSY.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|