Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 74e98eb085889b0d2d4908f59f6e00026063014f upstream.
There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.
It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:
[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:ffff8800bc70fab0 EFLAGS: 00010202
[135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
[135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
[135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
[135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
[135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
[135546.061668] FS: 00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
[135546.062836] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
[135546.064723] Stack:
[135546.065048] ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
[135546.066247] 0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
[135546.067438] 1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 79462ad02e861803b3840cc782248c7359451cd9 upstream.
郭永刚 reported that one could simply crash the kernel as root by
using a simple program:
int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;
socket_fd = socket(10,3,0x40000000);
connect(socket_fd , &addr,16);
AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.
This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.
kernel: Call Trace:
kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110
kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10
kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89
I found no particular commit which introduced this problem.
CVE: CVE-2015-8543
Cc: Cong Wang <cwang@twopensource.com>
Reported-by: 郭永刚 <guoyonggang@360.cn>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: open-code U8_MAX]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 7d267278a9ece963d77eefec61630223fce08c6c upstream.
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.
Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit f648f807f61e64d247d26611e34cc97e4ed03401 upstream.
Commit f8d960524328 ("sctp: Enforce retransmission limit during shutdown")
fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not
resetting the association overall_error_count. This allowed the association
to better enforce assoc.max_retrans limit.
However, the same issue still exists when the association is in SHUTDOWN_RECEIVED
state. In this state, HB-ACKs will continue to reset the overall_error_count
for the association would extend the lifetime of association unnecessarily.
This patch solves this by resetting the overall_error_count whenever the current
state is small then SCTP_STATE_SHUTDOWN_PENDING. As a small side-effect, we
end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT
states, but they are not really impacted because we disable Heartbeats in those
states.
Fixes: Commit f8d960524328 ("sctp: Enforce retransmission limit during shutdown")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit ba51b6be38c122f7dab40965b4397aaf6188a464 upstream.
Hit the following splat testing VRF change for ipsec:
[ 113.475692] ===============================
[ 113.476194] [ INFO: suspicious RCU usage. ]
[ 113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted
[ 113.477545] -------------------------------
[ 113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-side critical section!
[ 113.479288]
[ 113.479288] other info that might help us debug this:
[ 113.479288]
[ 113.480207]
[ 113.480207] rcu_scheduler_active = 1, debug_locks = 1
[ 113.480931] 2 locks held by setkey/6829:
[ 113.481371] #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff814e9887>] pfkey_sendmsg+0xfb/0x213
[ 113.482509] #1: (rcu_read_lock){......}, at: [<ffffffff814e767f>] rcu_read_lock+0x0/0x6e
[ 113.483509]
[ 113.483509] stack backtrace:
[ 113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED
[ 113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[ 113.486845] 0000000000000001 ffff88001d4c7a98 ffffffff81518af2 ffffffff81086962
[ 113.487732] ffff88001d538480 ffff88001d4c7ac8 ffffffff8107ae75 ffffffff8180a154
[ 113.488628] 0000000000000b30 0000000000000000 00000000000000d0 ffff88001d4c7ad8
[ 113.489525] Call Trace:
[ 113.489813] [<ffffffff81518af2>] dump_stack+0x4c/0x65
[ 113.490389] [<ffffffff81086962>] ? console_unlock+0x3d6/0x405
[ 113.491039] [<ffffffff8107ae75>] lockdep_rcu_suspicious+0xfa/0x103
[ 113.491735] [<ffffffff81064032>] rcu_preempt_sleep_check+0x45/0x47
[ 113.492442] [<ffffffff8106404d>] ___might_sleep+0x19/0x1c8
[ 113.493077] [<ffffffff81064268>] __might_sleep+0x6c/0x82
[ 113.493681] [<ffffffff81133190>] cache_alloc_debugcheck_before.isra.50+0x1d/0x24
[ 113.494508] [<ffffffff81134876>] kmem_cache_alloc+0x31/0x18f
[ 113.495149] [<ffffffff814012b5>] skb_clone+0x64/0x80
[ 113.495712] [<ffffffff814e6f71>] pfkey_broadcast_one+0x3d/0xff
[ 113.496380] [<ffffffff814e7b84>] pfkey_broadcast+0xb5/0x11e
[ 113.497024] [<ffffffff814e82d1>] pfkey_register+0x191/0x1b1
[ 113.497653] [<ffffffff814e9770>] pfkey_process+0x162/0x17e
[ 113.498274] [<ffffffff814e9895>] pfkey_sendmsg+0x109/0x213
In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes
the RCU lock.
Since pfkey_broadcast takes the RCU lock the allocation argument is
pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock.
The one call outside of rcu can be done with GFP_KERNEL.
Fixes: 7f6b9dbd5afbd ("af_key: locking change")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 468b732b6f76b138c0926eadf38ac88467dcd271 upstream.
"len" is a signed integer. We check that len is not negative, so it
goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison
is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than
INT_MAX so the condition can never be true.
I don't know if this is harmful but it seems safe to limit "len" to
INT_MAX - 4095.
Fixes: a8c879a7ee98 ('RDS: Info and stats')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 4b31814d20cbe5cd4ccf18089751e77a04afe4f2 upstream.
When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.
Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 0848f6428ba3a2e42db124d41ac6f548655735bf upstream.
When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.
However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.
Also, IPv4 checksum is not corrected after reassembly.
Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.")
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 4479004e6409087d1b4986881dc98c6c15dffb28 upstream.
If we don't do this, and we then fail to recreate the debugfs
directory during a mode change, then we will fail later trying
to add stations to this now bogus directory:
BUG: unable to handle kernel NULL pointer dereference at 0000006c
IP: [<c0a92202>] mutex_lock+0x12/0x30
Call Trace:
[<c0678ab4>] start_creating+0x44/0xc0
[<c0679203>] debugfs_create_dir+0x13/0xf0
[<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211]
Signed-off-by: Tom Hughes <tom@compton.nu>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec upstream.
Shared skbs must not be modified and this is crucial for broadcast
and/or multicast paths where we use it as an optimisation to avoid
unnecessary cloning.
The function skb_recv_datagram breaks this rule by setting peeked
without cloning the skb first. This causes funky races which leads
to double-free.
This patch fixes this by cloning the skb and replacing the skb
in the list when setting skb->peeked.
Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 4fabb59449aa44a585b3603ffdadd4c5f4d0c033 upstream.
Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.
A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).
115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118 if (atomic_dec_and_test(&rds_ibdev->refcount))
119 queue_work(rds_wq, &rds_ibdev->free_work);
120 }
fix is to drop refcount when rds_ib_alloc_fmr failed.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 2c17d27c36dcce2b6bf689f41a46b9e909877c21 upstream.
Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:
CPU 1 CPU 2
skb->dev: no reference
process_backlog:__skb_dequeue
process_backlog:local_irq_enable
on_each_cpu for
flush_backlog => IPI(hardirq): flush_backlog
- packet not found in backlog
CPU delayed ...
synchronize_net
- no ongoing RCU
read-side sections
netdev_run_todo,
rcu_barrier: no
ongoing callbacks
__netif_receive_skb_core:rcu_read_lock
- too late
free dev
process packet for freed dev
Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4:
- adjust context
- no need to change "goto unlock" to "goto out"]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit e9e4dd3267d0c5234c5c0f47440456b10875dec9 upstream.
commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
fixes a problem where processed packet comes from device
with destroyed inetdev (dev->ip_ptr). This is not expected
because inetdev_destroy is called in NETDEV_UNREGISTER
phase and packets should not be processed after
dev_close_many() and synchronize_net(). Above fix is still
required because inetdev_destroy can be called for other
reasons. But it shows the real problem: backlog can keep
packets for long time and they do not hold reference to
device. Such packets are then delivered to upper levels
at the same time when device is unregistered.
Calling flush_backlog after NETDEV_UNREGISTER_FINAL still
accounts all packets from backlog but before that some packets
continue to be delivered to upper levels long after the
synchronize_net call which is supposed to wait the last
ones. Also, as Eric pointed out, processed packets, mostly
from other devices, can continue to add new packets to backlog.
Fix the problem by moving flush_backlog early, after the
device driver is stopped and before the synchronize_net() call.
Then use netif_running check to make sure we do not add more
packets to backlog. We have to do it in enqueue_to_backlog
context when the local IRQ is disabled. As result, after the
flush_backlog and synchronize_net sequence all packets
should be accounted.
Thanks to Eric W. Biederman for the test script and his
valuable feedback!
Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 4f7d2cdfdde71ffe962399b7020c674050329423 upstream.
Jason Gunthorpe reported that since commit c02db8c6290b ("rtnetlink: make
SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
anymore with respect to their policy, that is, ifla_vfinfo_policy[].
Before, they were part of ifla_policy[], but they have been nested since
placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
which is another nested attribute for the actual VF attributes such as
IFLA_VF_MAC, IFLA_VF_VLAN, etc.
Despite the policy being split out from ifla_policy[] in this commit,
it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
testing for struct nlattr, but it doesn't know about the data context and
their requirements.
Fix, on top of Jason's initial work, does 1) parsing of the attributes
with the right policy, and 2) using the resulting parsed attribute table
from 1) instead of the nla_for_each_nested() loop (just like we used to
do when still part of ifla_policy[]).
Reference: http://thread.gmane.org/gmane.linux.network/368913
Fixes: c02db8c6290b ("rtnetlink: make SR-IOV VF interface symmetric")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Rony Efraim <ronye@mellanox.com>
Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop unsupported attributes
- Use ndo_set_vf_tx_rate operation, not ndo_set_vf_rate]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit a84b69cb6e0a41e86bc593904faa6def3b957343 upstream.
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 82cd003a77173c91b9acad8033fb7931dac8d751 upstream.
struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used. -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews. The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.
Fixes: http://tracker.ceph.com/issues/2759
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 754bc547f0a79f7568b5b81c7fc0a8d044a6571a upstream.
When a port goes through a link down/up the multicast router configuration
is not restored.
Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 468479e6043c84f5a65299cc07cb08a22a28c2b1 upstream.
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f600698d ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4:
- adjust context
- fanout_demux_lb() returns a pointer]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit f98f4514d07871da7a113dd9e3e330743fd70ae4 upstream.
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f600698dc ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: use ACCESS_ONCE() instead of READ_ONCE()]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 2dab80a8b486f02222a69daca6859519e05781d9 upstream.
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f19 ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f19 ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 upstream.
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4:
- use global spinlock instead of per-namespace lock]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 88de6af24f2b48b06c514d3c3d0a8f22fafe30bd upstream.
req->rq_private_buf isn't initialised when xprt_setup_backchannel calls
xprt_free_allocation.
Fixes: fb7a0b9addbdb ("nfs41: New backchannel helper routines")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit d079abd181950a44cdf31daafd1662388a6c4d2e upstream.
Too many spaces were introduced in commit 63adc6fb8ac0 ("pktgen: cleanup
checkpatch warnings"), thus misaligning "src_min:" to other columns.
Fixes: 63adc6fb8ac0 ("pktgen: cleanup checkpatch warnings")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit beb39db59d14990e401e235faf66a6b9b31240b0 upstream.
We have two problems in UDP stack related to bogus checksums :
1) We return -EAGAIN to application even if receive queue is not empty.
This breaks applications using edge trigger epoll()
2) Under UDP flood, we can loop forever without yielding to other
processes, potentially hanging the host, especially on non SMP.
This patch is an attempt to make things better.
We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit d496f7842aada20c61e6044b3395383fa972872c upstream.
A ROSE socket doesn't necessarily always have a neighbour pointer so check
if the neighbour pointer is valid before dereferencing it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit aff09ce303f83bd370772349238482ae422a2341 upstream.
Currently bridge can silently drop ipv4 fragments.
If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4,
br_nf_pre_routing defragments incoming ipv4 fragments
but nfct check in br_nf_dev_queue_xmit does not allow re-fragment combined
packet back, and therefore it is dropped in br_dev_queue_push_xmit without
incrementing of any failcounters
It seems the only way to hit the ip_fragment code in the bridge xmit
path is to have a fragment list whose reassembled fragments go over
the mtu. This only happens if nf_defrag is enabled. Thanks to
Florian Westphal for providing feedback to clarify this.
Defragmentation ipv4 is required not only in conntracks but at least in
TPROXY target and socket match, therefore #ifdef is changed from
NF_CONNTRACK_IPV4 to NF_DEFRAG_IPV4
Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Kirill Tkhai <ktkhai@odin.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
Based on 08adb7dabd4874cc5666b4490653b26534702ce0 upstream.
We found that after v3.10.73, recvmsg might return -EFAULT while -EINVAL
was expected.
We tested it through the recvmsg01 testcase come from LTP testsuit. It set
msg->msg_namelen to -1 and the recvmsg syscall returned errno 14, which is
unexpected (errno 22 is expected):
recvmsg01 4 TFAIL : invalid socket length ; returned -1 (expected -1),
errno 14 (expected 22)
Linux mainline has no this bug for commit 08adb7dab fixes it accidentally.
However, it is too large and complex to be backported to LTS 3.10.
Commit 281c9c36 (net: compat: Update get_compat_msghdr() to match
copy_msghdr_from_user() behaviour) made get_compat_msghdr() return
error if msg_sys->msg_namelen was negative, which changed the behaviors
of recvmsg and sendmsg syscall in a lib32 system:
Before commit 281c9c36, get_compat_msghdr() wouldn't fail and it would
return -EINVAL in move_addr_to_user() or somewhere if msg_sys->msg_namelen
was invalid and then syscall returned -EINVAL, which is correct.
And now, when msg_sys->msg_namelen is negative, get_compat_msghdr() will
fail and wants to return -EINVAL, however, the outer syscall will return
-EFAULT directly, which is unexpected.
This patch gets the return value of get_compat_msghdr() as well as
copy_msghdr_from_user(), then returns this expected value if
get_compat_msghdr() fails.
Fixes: 281c9c36 (net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour)
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Hanbing Xu <xuhanbing@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit a134f083e79fb4c3d0a925691e732c56911b4326 upstream.
If we don't do that, then the poison value is left in the ->pprev
backlink.
This can cause crashes if we do a disconnect, followed by a connect().
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Wen Xu <hotdog3645@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 1a040eaca1a22f8da8285ceda6b5e4a2cb704867 upstream.
Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit c4c832f89dc468cf11dc0dd17206bace44526651 upstream.
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to disable softirqs because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. The spin locks in br_fdb_update were _bh before commit f8ae737deea1
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d1398983f ("bridge: add NTF_USE support")
Using local_bh_disable/enable around br_fdb_update() allows us to keep
using the spin_lock/unlock in br_fdb_update for the fast-path.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d1398983f ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 1d7c49037b12016e7056b9f2c990380e2187e766 upstream.
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to use spin_lock_bh because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. These locks were _bh before commit f8ae737deea1
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d1398983f ("bridge: add NTF_USE support")
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d1398983f ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 47cc84ce0c2fe75c99ea5963c4b5704dd78ead54 upstream.
When more than a multicast address is present in a MLDv2 report, all but
the first address is ignored, because the code breaks out of the loop if
there has not been an error adding that address.
This has caused failures when two guests connected through the bridge
tried to communicate using IPv6. Neighbor discoveries would not be
transmitted to the other guest when both used a link-local address and a
static address.
This only happens when there is a MLDv2 querier in the network.
The fix will only break out of the loop when there is a failure adding a
multicast address.
The mdb before the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
After the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::fb temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
dev ovirtmgmt port bond0.86 grp ff02::d temp
dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
dev ovirtmgmt port bond0.86 grp ff02::16 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp
Fixes: 08b202b67264 ("bridge br_multicast: IPv6 MLD support.")
Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 47b4e1fc4972cc43a19121bc2608a60aef3bf216 upstream.
Remove checking tailroom when adding IV as it uses only
headroom, and move the check to the ICV generation that
actually needs the tailroom.
In other case I hit such warning and datapath don't work,
when testing:
- IBSS + WEP
- ath9k with hw crypt enabled
- IPv6 data (ping6)
WARNING: CPU: 3 PID: 13301 at net/mac80211/wep.c:102 ieee80211_wep_add_iv+0x129/0x190 [mac80211]()
[...]
Call Trace:
[<ffffffff817bf491>] dump_stack+0x45/0x57
[<ffffffff8107746a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8107755a>] warn_slowpath_null+0x1a/0x20
[<ffffffffc09ae109>] ieee80211_wep_add_iv+0x129/0x190 [mac80211]
[<ffffffffc09ae7ab>] ieee80211_crypto_wep_encrypt+0x6b/0xd0 [mac80211]
[<ffffffffc09d3fb1>] invoke_tx_handlers+0xc51/0xf30 [mac80211]
[...]
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[lizf: Backported to 3.4: s/IEEE80211_WEP/_WEP/g]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit f30bf2a5cac6c60ab366c4bc6db913597bf4d6ab upstream.
Fix memory leak introduced in commit a0840e2e165a ("IPVS: netns,
ip_vs_ctl local vars moved to ipvs struct."):
unreferenced object 0xffff88005785b800 (size 2048):
comm "(-localed)", pid 1434, jiffies 4294755650 (age 1421.089s)
hex dump (first 32 bytes):
bb 89 0b 83 ff ff ff ff b0 78 f0 4e 00 88 ff ff .........x.N....
04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8262ea8e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811fba74>] __kmalloc_track_caller+0x244/0x430
[<ffffffff811b88a0>] kmemdup+0x20/0x50
[<ffffffff823276b7>] ip_vs_control_net_init+0x1f7/0x510
[<ffffffff8231d630>] __ip_vs_init+0x100/0x250
[<ffffffff822363a1>] ops_init+0x41/0x190
[<ffffffff82236583>] setup_net+0x93/0x150
[<ffffffff82236cc2>] copy_net_ns+0x82/0x140
[<ffffffff810ab13d>] create_new_namespaces+0xfd/0x190
[<ffffffff810ab49a>] unshare_nsproxy_namespaces+0x5a/0xc0
[<ffffffff810833e3>] SyS_unshare+0x173/0x310
[<ffffffff8265cbd7>] system_call_fastpath+0x12/0x6f
[<ffffffffffffffff>] 0xffffffffffffffff
Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
There's a check for ip6_null_entry, but it's not enough if the config
CONFIG_IPV6_MULTIPLE_TABLES is selected. Blackhole or prohibited entries
should also be ignored.
This path is for kernel before v3.6, as there's a commit b94f1c0
use icmpv6_notify() instead of rt6_redirect() and rt6_redirect has
been deleted.
The oops as follow:
[exception RIP: do_raw_write_lock+12]
RIP: ffffffff8122c42c RSP: ffff880666e45820 RFLAGS: 00010282
RAX: ffff8801207bffd8 RBX: 0000000000000018 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880666e45898 RDI: 0000000000000018
RBP: ffff880666e45830 R8: 000000000000001e R9: 0000000006000000
R10: ffff88011796b8a0 R11: 0000000000000004 R12: ffff88010391ed00
R13: 0000000000000000 R14: ffff880666e45898 R15: ffff88011796b890
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
[ffff880666e45838] _raw_write_lock_bh at ffffffff81450b39
[ffff880666e45858] __ip6_ins_rt at ffffffff813ed8c1
[ffff880666e45888] ip6_ins_rt at ffffffff813eef58
[ffff880666e458b8] rt6_redirect at ffffffff813f0b84
[ffff880666e45958] ndisc_rcv at ffffffff813f95d8
[ffff880666e45a08] icmpv6_rcv at ffffffff814000e8
[ffff880666e45ae8] ip6_input_finish at ffffffff813e43bb
[ffff880666e45b38] ip6_input at ffffffff813e4b08
[ffff880666e45b68] ipv6_rcv at ffffffff813e4969
[ffff880666e45bc8] __netif_receive_skb at ffffffff8135158a
[ffff880666e45c38] dev_gro_receive at ffffffff81351cb0
[ffff880666e45c78] napi_gro_receive at ffffffff81351fc5
[ffff880666e45cb8] tg3_rx at ffffffffa0bfb354 [tg]
[ffff880666e45d88] tg3_poll_work at ffffffffa0c07857 [tg]
[ffff880666e45e18] tg3_poll_msix at ffffffffa0c07d1b [tg]
[ffff880666e45e68] net_rx_action at ffffffff81352219
[ffff880666e45ec8] __do_softirq at ffffffff8103e5a1
[ffff880666e45f38] call_softirq at ffffffff81459c4c
[ffff880666e45f50] do_softirq at ffffffff8100413d
[ffff880666e45f80] do_IRQ at ffffffff81003cce
This happened when ip6_route_redirect found a rt which was set
blackhole, the rt had a NULL rt6i_table argument which is accessed by
__ip6_ins_rt() when trying to lock rt6i_table->tb6_lock caused a BUG:
"BUG: unable to handle kernel NULL pointer"
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
|
|
commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream.
The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.
The same issue is there in app_tcp_pkt_in(). Thanks to Julian Anastasov
for noticing that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 330966e501ffe282d7184fde4518d5e0c24bc7f8 upstream.
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.
However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.
Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.
However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.
It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
[lizf: Backported to 3.4: drop some hunks as there are fewer skb_gso_segment()
users in 3.4]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 92e5dfc34cf39c20ae1087bd5e676238b5d0dfac upstream.
Fix return check typo.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 788211d81bfdf9b6a547d0530f206ba6ee76b107 upstream.
There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:
* tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
* station is destroyed
* reorder timer fires before ieee80211_free_tid_rx() runs,
accessing the station, thus potentially crashing due to
the use-after-free
The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.
Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit d079535d5e1bf5e2e7c856bae2483414ea21e137 upstream.
In case we move the whole dev group to another netns,
we should call for_each_netdev_safe(), otherwise we get
a soft lockup:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ip:798]
irq event stamp: 255424
hardirqs last enabled at (255423): [<ffffffff81a2aa95>] restore_args+0x0/0x30
hardirqs last disabled at (255424): [<ffffffff81a2ad5a>] apic_timer_interrupt+0x6a/0x80
softirqs last enabled at (255422): [<ffffffff81079ebc>] __do_softirq+0x2c1/0x3a9
softirqs last disabled at (255417): [<ffffffff8107a190>] irq_exit+0x41/0x95
CPU: 0 PID: 798 Comm: ip Not tainted 4.0.0-rc4+ #881
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800d1b88000 ti: ffff880119530000 task.ti: ffff880119530000
RIP: 0010:[<ffffffff810cad11>] [<ffffffff810cad11>] debug_lockdep_rcu_enabled+0x28/0x30
RSP: 0018:ffff880119533778 EFLAGS: 00000246
RAX: ffff8800d1b88000 RBX: 0000000000000002 RCX: 0000000000000038
RDX: 0000000000000000 RSI: ffff8800d1b888c8 RDI: ffff8800d1b888c8
RBP: ffff880119533778 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 000000000000b5c2 R12: 0000000000000246
R13: ffff880119533708 R14: 00000000001d5a40 R15: ffff88011a7d5a40
FS: 00007fc01315f740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f367a120988 CR3: 000000011849c000 CR4: 00000000000007f0
Stack:
ffff880119533798 ffffffff811ac868 ffffffff811ac831 ffffffff811ac828
ffff8801195337c8 ffffffff811ac8c9 ffff8801195339b0 ffff8801197633e0
0000000000000000 ffff8801195339b0 ffff8801195337d8 ffffffff811ad2d7
Call Trace:
[<ffffffff811ac868>] rcu_read_lock+0x37/0x6e
[<ffffffff811ac831>] ? rcu_read_unlock+0x5f/0x5f
[<ffffffff811ac828>] ? rcu_read_unlock+0x56/0x5f
[<ffffffff811ac8c9>] __fget+0x2a/0x7a
[<ffffffff811ad2d7>] fget+0x13/0x15
[<ffffffff811be732>] proc_ns_fget+0xe/0x38
[<ffffffff817c7714>] get_net_ns_by_fd+0x11/0x59
[<ffffffff817df359>] rtnl_link_get_net+0x33/0x3e
[<ffffffff817df3d7>] do_setlink+0x73/0x87b
[<ffffffff810b28ce>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff81a2aa95>] ? retint_restore_args+0xe/0xe
[<ffffffff817e0301>] rtnl_newlink+0x40c/0x699
[<ffffffff817dffe0>] ? rtnl_newlink+0xeb/0x699
[<ffffffff81a29246>] ? _raw_spin_unlock+0x28/0x33
[<ffffffff8143ed1e>] ? security_capable+0x18/0x1a
[<ffffffff8107da51>] ? ns_capable+0x4d/0x65
[<ffffffff817de5ce>] rtnetlink_rcv_msg+0x181/0x194
[<ffffffff817de407>] ? rtnl_lock+0x17/0x19
[<ffffffff817de407>] ? rtnl_lock+0x17/0x19
[<ffffffff817de44d>] ? __rtnl_unlock+0x17/0x17
[<ffffffff818327c6>] netlink_rcv_skb+0x4d/0x93
[<ffffffff817de42f>] rtnetlink_rcv+0x26/0x2d
[<ffffffff81830f18>] netlink_unicast+0xcb/0x150
[<ffffffff8183198e>] netlink_sendmsg+0x501/0x523
[<ffffffff8115cba9>] ? might_fault+0x59/0xa9
[<ffffffff817b5398>] ? copy_from_user+0x2a/0x2c
[<ffffffff817b7b74>] sock_sendmsg+0x34/0x3c
[<ffffffff817b7f6d>] ___sys_sendmsg+0x1b8/0x255
[<ffffffff8115c5eb>] ? handle_pte_fault+0xbd5/0xd4a
[<ffffffff8100a2b0>] ? native_sched_clock+0x35/0x37
[<ffffffff8109e94b>] ? sched_clock_local+0x12/0x72
[<ffffffff8109eb9c>] ? sched_clock_cpu+0x9e/0xb7
[<ffffffff810cadbf>] ? rcu_read_lock_held+0x3b/0x3d
[<ffffffff811ac1d8>] ? __fcheck_files+0x4c/0x58
[<ffffffff811ac946>] ? __fget_light+0x2d/0x52
[<ffffffff817b8adc>] __sys_sendmsg+0x42/0x60
[<ffffffff817b8b0c>] SyS_sendmsg+0x12/0x1c
[<ffffffff81a29e32>] system_call_fastpath+0x12/0x17
Fixes: e7ed828f10bd8 ("netlink: support setting devgroup parameters")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 496fcc294daab18799e190c0264863d653588d1f upstream.
As HT/VHT depend heavily on QoS/WMM, it's not a good idea to
let userspace add clients that have HT/VHT but not QoS/WMM.
Since it does so in certain cases we've observed (client is
using HT IEs but not QoS/WMM) just ignore the HT/VHT info at
this point and don't pass it down to the drivers which might
unconditionally use it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[lizf: Backported to 3.4:
- adjust context
- 3.4 doesn't support VHT]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit aa75ebc275b2a91b193654a177daf900ad6703f0 upstream.
Some APs experience problems when working with
U-APSD. Decreasing the probability of that
happening by using legacy mode for all ACs but VO
isn't enough.
Cisco 4410N originally forced us to enable VO by
default only because it treated non-VO ACs as
legacy.
However some APs (notably Netgear R7000) silently
reclassify packets to different ACs. Since u-APSD
ACs require trigger frames for frame retrieval
clients would never see some frames (e.g. ARP
responses) or would fetch them accidentally after
a long time.
It makes little sense to enable u-APSD queues by
default because it needs userspace applications to
be aware of it to actually take advantage of the
possible additional powersavings. Implicitly
depending on driver autotrigger frame support
doesn't make much sense.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit d6a4ed6fe0a0d4790941e7f13e56630b8b9b053d upstream.
Some APs experience problems when working with U-APSD. Decrease the
probability of that happening by using legacy mode for all ACs but VO.
The AP that caused us troubles was a Cisco 4410N. It ignores our
setting, and always treats non-VO ACs as legacy.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit d0c22119f574b851e63360c6b8660fe9593bbc3c upstream.
The mesh forwarding path was not checking that data
frames were protected when running an encrypted network;
add the necessary check.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 969439016d2cf61fef53a973d7e6d2061c3793b1 upstream.
When accessing CAN network interfaces with AF_PACKET sockets e.g. by dhclient
this can lead to a skb_under_panic due to missing skb initialisations.
Add the missing initialisations at the CAN skbuff creation times on driver
level (rx path) and in the network layer (tx path).
Reported-by: Austin Schuh <austin@peloton-tech.com>
Reported-by: Daniel Steer <daniel.steer@mclaren.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[lizf: Backported to 3.4:
- adjust context
- drop changes to alloc_canfd_skb(), as there's no such function]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 528c943f3bb919aef75ab2fff4f00176f09a4019 upstream.
ip_vs_conn_fill_param_sync() gets in param.pe a module
reference for persistence engine from __ip_vs_pe_getbyname()
but forgets to put it. Problem occurs in backup for
sync protocol v1 (2.6.39).
Also, pe_data usually comes in sync messages for
connection templates and ip_vs_conn_new() copies
the pointer only in this case. Make sure pe_data
is not leaked if it comes unexpectedly for normal
connections. Leak can happen only if bogus messages
are sent to backup server.
Fixes: fe5e7a1efb66 ("IPVS: Backup, Adding Version 1 receive capability")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 1711fd9addf214823b993468567cab1f8254fc51 upstream.
POLL_OUT isn't what callers of ->poll() are expecting to see; it's
actually __SI_POLL | 2 and it's a siginfo code, not a poll bitmap
bit...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 2c3fbe3cf28fbd7001545a92a83b4f8acfd9fa36 upstream.
In case an infinite timeout (0) is requested, the irda wait_until_sent
implementation would use a zero poll timeout rather than the default
200ms.
Note that wait_until_sent is currently never called with a 0-timeout
argument due to a bug in tty_wait_until_sent.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 9c1c98a3bb7b7593b60264b9a07e001e68b46697 upstream.
The current minstrel_ht rate control behavior is somewhat optimistic in
trying to find optimum TX rate. While this is usually fine for normal
Data frames, there are cases where a more conservative set of retry
parameters would be beneficial to make the connection more robust.
EAPOL frames are critical to the authentication and especially the
EAPOL-Key message 4/4 (the last message in the 4-way handshake) is
important to get through to the AP. If that message is lost, the only
recovery mechanism in many cases is to reassociate with the AP and start
from scratch. This can often be avoided by trying to send the frame with
more conservative rate and/or with more link layer retries.
In most cases, minstrel_ht is currently using the initial EAPOL-Key
frames for probing higher rates and this results in only five link layer
transmission attempts (one at high(ish) MCS and four at MCS0). While
this works with most APs, it looks like there are some deployed APs that
may have issues with the EAPOL frames using HT MCS immediately after
association. Similarly, there may be issues in cases where the signal
strength or radio environment is not good enough to be able to get
frames through even at couple of MCS 0 tries.
The best approach for this would likely to be to reduce the TX rate for
the last rate (3rd rate parameter in the set) to a low basic rate (say,
6 Mbps on 5 GHz and 2 or 5.5 Mbps on 2.4 GHz), but doing that cleanly
requires some more effort. For now, we can start with a simple one-liner
that forces the minimum rate to be used for EAPOL frames similarly how
the TX rate is selected for the IEEE 802.11 Management frames. This does
result in a small extra latency added to the cases where the AP would be
able to receive the higher rate, but taken into account how small number
of EAPOL frames are used, this is likely to be insignificant. A future
optimization in the minstrel_ht design can also allow this patch to be
reverted to get back to the more optimized initial TX rate.
It should also be noted that many drivers that do not use minstrel as
the rate control algorithm are already doing similar workarounds by
forcing the lowest TX rate to be used for EAPOL frames.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[lizf: Backported to 3.4: adjust the if statement]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|
|
commit 78296c97ca1fd3b104f12e1f1fbc06c46635990b upstream.
As soon as extract_icmp6_fields() returns, its local storage (automatic
variables) is deallocated and can be overwritten.
Lets add an additional parameter to make sure storage is valid long
enough.
While we are at it, adds some const qualifiers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: b64c9256a9b76 ("tproxy: added IPv6 support to the socket match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
|