Age | Commit message (Collapse) | Author | Files | Lines |
|
When arp is off on a device, and ioctl(SIOCGARP) is queried,
a buggy answer is given with MAC address of the device, instead
of the mac address of the destination/gateway.
We filter out NUD_NOARP neighbours for /proc/net/arp,
we must do the same for SIOCGARP ioctl.
Tested:
lpaa23:~# ./arp 10.246.7.190
MAC=00:01:e8:22:cb:1d // correct answer
lpaa23:~# ip link set dev eth0 arp off
lpaa23:~# cat /proc/net/arp # check arp table is now 'empty'
IP address HW type Flags HW address Mask Device
lpaa23:~# ./arp 10.246.7.190
MAC=00:1a:11:c3:0d:7f // buggy answer before patch (this is eth0 mac)
After patch :
lpaa23:~# ip link set dev eth0 arp off
lpaa23:~# ./arp 10.246.7.190
ioctl(SIOCGARP) failed: No such device or address
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Vytautas Valancius <valas@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This notification causes the FIB to be updated, which is not needed
because the address already exists, and more importantly it may undo
intentional changes that were made to the FIB after the address was
originally added. (As a point of comparison, when an address becomes
deprecated because its preferred lifetime expired, a notification on
this chain is not generated.)
The motivation for this commit is fixing an incompatibility between
DHCP clients which set and update the address lifetime according to
the lease, and a commercial VPN client which replaces kernel routes
in a way that outbound traffic is sent only through the tunnel (and
disconnects if any further route changes are detected via netlink).
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These should be handled only by the respective STP which is in control.
They become problematic for devices with limited resources with many
ports because the hold_timer is per port and fires each second and the
hello timer fires each 2 seconds even though it's global. While in
user-space STP mode these timers are completely unnecessary so it's better
to keep them off.
Also ensure that when the bridge is up these timers are started only when
running with kernel STP.
Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable patches:
- Fix a situation where the client uses the wrong (zero) stateid.
- Fix a memory leak in nfs_do_recoalesce
Bugfixes:
- Plug a memory leak when ->prepare_layoutcommit fails
- Fix an Oops in the NFSv4 open code
- Fix a backchannel deadlock
- Fix a livelock in sunrpc when sendmsg fails due to low memory
availability
- Don't revalidate the mapping if both size and change attr are up to
date
- Ensure we don't miss a file extension when doing pNFS
- Several fixes to handle NFSv4.1 sequence operation status bits
correctly
- Several pNFS layout return bugfixes"
* tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
nfs: Fix an oops caused by using other thread's stack space in ASYNC mode
nfs: plug memory leak when ->prepare_layoutcommit fails
SUNRPC: Report TCP errors to the caller
sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable.
NFSv4.2: handle NFS-specific llseek errors
NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce()
NFS: Fix a memory leak in nfs_do_recoalesce
NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE
NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability
NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised
NFS: Don't revalidate the mapping if both size and change attr are up to date
NFSv4/pnfs: Ensure we don't miss a file extension
NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked
SUNRPC: xprt_complete_bc_request must also decrement the free slot count
SUNRPC: Fix a backchannel deadlock
pNFS: Don't throw out valid layout segments
pNFS: pnfs_roc_drain() fix a race with open
pNFS: Fix races between return-on-close and layoutreturn.
pNFS: pnfs_roc_drain should return 'true' when sleeping
pNFS: Layoutreturn must invalidate all existing layout segments.
...
|
|
When binding a PF_PACKET socket, the use count of the bound interface is
always increased with dev_hold in dev_get_by_{index,name}. However,
when rebound with the same protocol and device as in the previous bind
the use count of the interface was not decreased. Ultimately, this
caused the deletion of the interface to fail with the following message:
unregister_netdevice: waiting for dummy0 to become free. Usage count = 1
This patch moves the dev_put out of the conditional part that was only
executed when either the protocol or device changed on a bind.
Fixes: 902fefb82ef7 ('packet: improve socket create/bind latency in some cases')
Signed-off-by: Lars Westerhoff <lars.westerhoff@newtec.eu>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
It was reported that update_suffix was taking a long time on systems where
a large number of leaves were attached to a single node. As it turns out
fib_table_flush was calling update_suffix for each leaf that didn't have all
of the aliases stripped from it. As a result, on this large node removing
one leaf would result in us calling update_suffix for every other leaf on
the node.
The fix is to just remove the calls to leaf_pull_suffix since they are
redundant as we already have a call in resize that will go through and
update the suffix length for the node before we exit out of
fib_table_flush or fib_table_flush_external.
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The networking layer does not reliably report the distinction between
a non-block write failing because:
1/ the queue is too full already and
2/ a memory allocation attempt failed.
The distinction is important because in the first case it is
appropriate to retry as soon as the socket reports that it is
writable, and in the second case a small delay is required as the
socket will most likely report as writable but kmalloc could still
fail.
sk_stream_wait_memory() exhibits this distinction nicely, setting
'vm_wait' if a small wait is needed. However in the non-blocking case
it always returns -EAGAIN no matter the cause of the failure. This
-EAGAIN call get all the way to sunrpc.
The sunrpc layer expects EAGAIN to indicate the first cause, and
ENOBUFS to indicate the second. Various documentation suggests that
this is not unreasonable, but does not guarantee the desired error
codes.
The result of getting -EAGAIN when -ENOBUFS is expected is that the
send is tried again in a tight loop and soft lockups are reported.
so: add tests after calls to xs_sendpages() to translate -EAGAIN into
-ENOBUFS if the socket is writable. This cannot happen inside
xs_sendpages() as the test for "is socket writable" is different
between TCP and UDP.
With this change, the tight loop retrying xs_sendpages() becomes a
loop which only retries every 250ms, and so will not trigger a
soft-lockup warning.
It is possible that the write did fail because the queue was too full
and by the time xs_sendpages() completed, the queue was writable
again. In this case an extra 250ms delay is inserted that isn't
really needed. This circumstance suggests a degree of congestion so a
delay is not necessarily a bad thing, and it can only cause a single
250ms delay, not a series of them.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
with flags = MSG_WAITALL | MSG_PEEK.
sk_wait_data waits for sk_receive_queue not empty, but in this case,
the receive queue is not empty, but does not contain any skb that we
can use.
Add a "last skb seen on receive queue" argument to sk_wait_data, so
that it sleeps until the receive queue has new skbs.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
Reported-by: Enrico Scholz <rh-bugzilla@ensc.de>
Reported-by: Dan Searle <dan@censornet.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2015-07-23
Here's another one-patch pull request for 4.2 which targets a potential
NULL pointer dereference in the LE Security Manager code that can be
triggered by using older user space tools. The issue has been there
since 4.0 so there's the appropriate "Cc: stable" in place.
Let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can simply remove the INET_FRAG_EVICTED flag to avoid all the flags
race conditions with the evictor and use a participation test for the
evictor list, when we're at that point (after inet_frag_kill) in the
timer there're 2 possible cases:
1. The evictor added the entry to its evictor list while the timer was
waiting for the chainlock
or
2. The timer unchained the entry and the evictor won't see it
In both cases we should be able to see list_evictor correctly due
to the sync on the chainlock.
Joint work with Florian Westphal.
Tested-by: Frank Schreuder <fschreuder@transip.nl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Frank reports 'NMI watchdog: BUG: soft lockup' errors when
load is high. Instead of (potentially) unbounded restarts of the
eviction process, just skip to the next entry.
One caveat is that, when a netns is exiting, a timer may still be running
by the time inet_evict_bucket returns.
We use the frag memory accounting to wait for outstanding timers,
so that when we free the percpu counter we can be sure no running
timer will trip over it.
Reported-and-tested-by: Frank Schreuder <fschreuder@transip.nl>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Followup patch will call it after inet_frag_queue was freed, so q->net
doesn't work anymore (but netf = q->net; free(q); mem_limit(netf) would).
Tested-by: Frank Schreuder <fschreuder@transip.nl>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 65ba1f1ec0eff ("inet: frags: fix a race between inet_evict_bucket
and inet_frag_kill") describes the bug, but the fix doesn't work reliably.
Problem is that ->flags member can be set on other cpu without chainlock
being held by that task, i.e. the RMW-Cycle can clear INET_FRAG_EVICTED
bit after we put the element on the evictor private list.
We can crash when walking the 'private' evictor list since an element can
be deleted from list underneath the evictor.
Join work with Nikolay Alexandrov.
Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
Reported-by: Johan Schuijt <johan@transip.nl>
Tested-by: Frank Schreuder <fschreuder@transip.nl>
Signed-off-by: Nikolay Alexandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from
RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the
(as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d5e53 ("net:
sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support"), see [1].
Imho, it was not a good idea, and we should just revert that message
for a couple of reasons:
1) It's uapi and therefore set in stone forever.
2) To be able to run on older and newer kernels, an SCTP application
would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/
SCTP_RCVINFO support, so that on older kernels, it can make use
of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO.
In my (limited) experience, a lot of SCTP appliances are migrating
to newer kernels only ve(ee)ry slowly.
3) Some people don't have the chance to change their applications,
f.e. due to proprietary legacy stuff. So, they'll hit this warning
in fast path and are stuck with older kernels.
But i.e. due to point 1) I really fail to see the benefit of a warning.
So just revert that for now, the issue was reported up Jamal.
[1] http://thread.gmane.org/gmane.linux.network/321960/
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Michael Tuexen <tuexen@fh-muenster.de>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since slave_changelink support was added there have been a few race
conditions when using br_setport() since some of the port functions it
uses require the bridge lock. It is very easy to trigger a lockup due to
some internal spin_lock() usage without bh disabled, also it's possible to
get the bridge into an inconsistent state.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 3ac636b8591c ("bridge: implement rtnl_link_ops->slave_changelink")
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains ten Netfilter/IPVS fixes, they are:
1) Address refcount leak when creating an expectation from the ctnetlink
interface.
2) Fix bug splat in the IDLETIMER target related to sysfs, from Dmitry
Torokhov.
3) Resolve panic for unreachable route in IPVS with locally generated
traffic in the output path, from Alex Gartrell.
4) Fix wrong source address in rare cases for tunneled traffic in IPVS,
from Julian Anastasov.
5) Fix crash if scheduler is changed via ipvsadm -E, again from Julian.
6) Make sure skb->sk is unset for forwarded traffic through IPVS, again from
Alex Gartrell.
7) Fix crash with IPVS sync protocol v0 and FTP, from Julian.
8) Reset sender cpu for forwarded traffic in IPVS, also from Julian.
9) Allocate template conntracks through kmalloc() to resolve netns dependency
problems with the conntrack kmem_cache.
10) Fix zones with expectations that clash using the same tuple, from Joe
Stringer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In dev_queue_xmit() net_cls protected with rcu-bh.
[ 270.730026] ===============================
[ 270.730029] [ INFO: suspicious RCU usage. ]
[ 270.730033] 4.2.0-rc3+ #2 Not tainted
[ 270.730036] -------------------------------
[ 270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage!
[ 270.730041] other info that might help us debug this:
[ 270.730043] rcu_scheduler_active = 1, debug_locks = 1
[ 270.730045] 2 locks held by dhclient/748:
[ 270.730046] #0: (rcu_read_lock_bh){......}, at: [<ffffffff81682b70>] __dev_queue_xmit+0x50/0x960
[ 270.730085] #1: (&qdisc_tx_lock){+.....}, at: [<ffffffff81682d60>] __dev_queue_xmit+0x240/0x960
[ 270.730090] stack backtrace:
[ 270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2
[ 270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[ 270.730100] 0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007
[ 270.730103] ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00
[ 270.730105] ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8
[ 270.730108] Call Trace:
[ 270.730121] [<ffffffff817ad487>] dump_stack+0x4c/0x65
[ 270.730148] [<ffffffff810ca4f2>] lockdep_rcu_suspicious+0xe2/0x120
[ 270.730153] [<ffffffff816a62d2>] task_cls_state+0x92/0xa0
[ 270.730158] [<ffffffffa00b534f>] cls_cgroup_classify+0x4f/0x120 [cls_cgroup]
[ 270.730164] [<ffffffff816aac74>] tc_classify_compat+0x74/0xc0
[ 270.730166] [<ffffffff816ab573>] tc_classify+0x33/0x90
[ 270.730170] [<ffffffffa00bcb0a>] htb_enqueue+0xaa/0x4a0 [sch_htb]
[ 270.730172] [<ffffffff81682e26>] __dev_queue_xmit+0x306/0x960
[ 270.730174] [<ffffffff81682b70>] ? __dev_queue_xmit+0x50/0x960
[ 270.730176] [<ffffffff816834a3>] dev_queue_xmit_sk+0x13/0x20
[ 270.730185] [<ffffffff81787770>] dev_queue_xmit+0x10/0x20
[ 270.730187] [<ffffffff8178b91c>] packet_snd.isra.62+0x54c/0x760
[ 270.730190] [<ffffffff8178be25>] packet_sendmsg+0x2f5/0x3f0
[ 270.730203] [<ffffffff81665245>] ? sock_def_readable+0x5/0x190
[ 270.730210] [<ffffffff817b64bb>] ? _raw_spin_unlock+0x2b/0x40
[ 270.730216] [<ffffffff8173bcbc>] ? unix_dgram_sendmsg+0x5cc/0x640
[ 270.730219] [<ffffffff8165f367>] sock_sendmsg+0x47/0x50
[ 270.730221] [<ffffffff8165f42f>] sock_write_iter+0x7f/0xd0
[ 270.730232] [<ffffffff811fd4c7>] __vfs_write+0xa7/0xf0
[ 270.730234] [<ffffffff811fe5b8>] vfs_write+0xb8/0x190
[ 270.730236] [<ffffffff811fe8c2>] SyS_write+0x52/0xb0
[ 270.730239] [<ffffffff817b6bae>] entry_SYSCALL_64_fastpath+0x12/0x76
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Otherwise the skbuff related structures are not correctly
refcount'ed.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fib_select_default considers alternative routes only when
res->fi is for the first alias in res->fa_head. In the
common case this can happen only when the initial lookup
matches the first alias with highest TOS value. This
prevents the alternative routes to require specific TOS.
This patch solves the problem as follows:
- routes that require specific TOS should be returned by
fib_select_default only when TOS matches, as already done
in fib_table_lookup. This rule implies that depending on the
TOS we can have many different lists of alternative gateways
and we have to keep the last used gateway (fa_default) in first
alias for the TOS instead of using single tb_default value.
- as the aliases are ordered by many keys (TOS desc,
fib_priority asc), we restrict the possible results to
routes with matching TOS and lowest metric (fib_priority)
and routes that match any TOS, again with lowest metric.
For example, packet with TOS 8 can not use gw3 (not lowest
metric), gw4 (different TOS) and gw6 (not lowest metric),
all other gateways can be used:
tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
tos 8 via gw2 metric 2
tos 8 via gw3 metric 3
tos 4 via gw4
tos 0 via gw5
tos 0 via gw6 metric 1
Reported-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fib_trie starting from 4.1 can link fib aliases from
different prefixes in same list. Make sure the alternative
gateways are in same table and for same prefix (0) by
checking tb_id and fa_slen.
Fixes: 79e5ad2ceb00 ("fib_trie: Remove leaf_info")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull virtio/vhost fixes from Michael Tsirkin:
"Bugfixes and documentation fixes.
Igor's patch that allows users to tweak memory table size is
borderline, but it does fix known crashes, so I merged it"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: add max_mem_regions module parameter
vhost: extend memory regions allocation to vmalloc
9p/trans_virtio: reset virtio device on remove
virtio/s390: rename drivers/s390/kvm -> drivers/s390/virtio
MAINTAINERS: separate section for s390 virtio drivers
virtio: define virtio_pci_cfg_cap in header.
virtio: Fix typecast of pointer in vring_init()
virtio scsi: fix unused variable warning
vhost: use binary search instead of linear in find_region()
virtio_net: document VIRTIO_NET_CTRL_GUEST_OFFLOADS
|
|
The l2cap_conn->smp pointer may be NULL for various valid reasons where SMP has
failed to initialize properly. One such scenario is when crypto support is
missing, another when the adapter has been powered on through a legacy method.
The smp_conn_security() function should have the appropriate check for this
situation to avoid NULL pointer dereferences.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 4.0+
|
|
Pull networking fixes from David Miller:
1) Don't use shared bluetooth antenna in iwlwifi driver for management
frames, from Emmanuel Grumbach.
2) Fix device ID check in ath9k driver, from Felix Fietkau.
3) Off by one in xen-netback BUG checks, from Dan Carpenter.
4) Fix IFLA_VF_PORT netlink attribute validation, from Daniel Borkmann.
5) Fix races in setting peeked bit flag in SKBs during datagram
receive. If it's shared we have to clone it otherwise the value can
easily be corrupted. Fix from Herbert Xu.
6) Revert fec clock handling change, causes regressions. From Fabio
Estevam.
7) Fix use after free in fq_codel and sfq packet schedulers, from WANG
Cong.
8) ipvlan bug fixes (memory leaks, missing rcu_dereference_bh, etc.)
from WANG Cong and Konstantin Khlebnikov.
9) Memory leak in act_bpf packet action, from Alexei Starovoitov.
10) ARM bpf JIT bug fixes from Nicolas Schichan.
11) Fix backwards compat of ANY_LAYOUT in virtio_net driver, from
Michael S Tsirkin.
12) Destruction of bond with different ARP header types not handled
correctly, fix from Nikolay Aleksandrov.
13) Revert GRO receive support in ipv6 SIT tunnel driver, causes
regressions because the GRO packets created cannot be processed
properly on the GSO side if we forward the frame. From Herbert Xu.
14) TCCR update race and other fixes to ravb driver from Sergei
Shtylyov.
15) Fix SKB leaks in caif_queue_rcv_skb(), from Eric Dumazet.
16) Fix panics on packet scheduler filter replace, from Daniel Borkmann.
17) Make sure AF_PACKET sees properly IP headers in defragmented frames
(via PACKET_FANOUT_FLAG_DEFRAG option), from Edward Hyunkoo Jee.
18) AF_NETLINK cannot hold mutex in RCU callback, fix from Florian
Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
ravb: fix ring memory allocation
net: phy: dp83867: Fix warning check for setting the internal delay
openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
netlink: don't hold mutex in rcu callback when releasing mmapd ring
ARM: net: fix vlan access instructions in ARM JIT.
ARM: net: handle negative offsets in BPF JIT.
ARM: net: fix condition for load_order > 0 when translating load instructions.
tcp: suppress a division by zero warning
drivers: net: cpsw: remove tx event processing in rx napi poll
inet: frags: fix defragmented packet's IP header for af_packet
net: mvneta: fix refilling for Rx DMA buffers
stmmac: fix setting of driver data in stmmac_dvr_probe
sched: cls_flow: fix panic on filter replace
sched: cls_flower: fix panic on filter replace
sched: cls_bpf: fix panic on filter replace
net/mdio: fix mdio_bus_match for c45 PHY
net: ratelimit warnings about dst entry refcount underflow or overflow
caif: fix leaks and race in caif_queue_rcv_skb()
qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355
ravb: fix race updating TCCR
...
|
|
Calling xprt_complete_bc_request() effectively causes the slot to be allocated,
so it needs to decrement the backchannel free slot count as well.
Fixes: 0d2a970d0ae5 ("SUNRPC: Fix a backchannel race")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
xprt_alloc_bc_request() cannot call xprt_free_bc_request() without
deadlocking, since it already holds the xprt->bc_pa_lock.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 0d2a970d0ae55 ("SUNRPC: Fix a backchannel race")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.
Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].
Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().
The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kirill A. Shutemov says:
This simple test-case trigers few locking asserts in kernel:
int main(int argc, char **argv)
{
unsigned int block_size = 16 * 4096;
struct nl_mmap_req req = {
.nm_block_size = block_size,
.nm_block_nr = 64,
.nm_frame_size = 16384,
.nm_frame_nr = 64 * block_size / 16384,
};
unsigned int ring_size;
int fd;
fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
exit(1);
if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
exit(1);
ring_size = req.nm_block_nr * req.nm_block_size;
mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
return 0;
}
+++ exited with 0 +++
BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
#0: (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
#1: ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
#2: (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20
CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
Call Trace:
<IRQ> [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
[<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
[<ffffffff81085bed>] __might_sleep+0x4d/0x90
[<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
[<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
[<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
[<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
[<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
[<ffffffff817e484d>] __sk_free+0x1d/0x160
[<ffffffff817e49a9>] sk_free+0x19/0x20
[..]
Cong Wang says:
We can't hold mutex lock in a rcu callback, [..]
Thomas Graf says:
The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.
Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Diagnosed-by: Cong Wang <cwang@twopensource.com>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrew Morton reported following warning on one ARM build
with gcc-4.4 :
net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
net/ipv4/inet_hashtables.c:617: warning: division by zero
Even guarded with a test on sizeof(spinlock_t), compiler does not
like current construct on a !CONFIG_SMP build.
Remove the warning by using a temporary variable.
Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()")
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.
However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.
Also, IPv4 checksum is not corrected after reassembly.
Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.")
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The following test case causes a NULL pointer dereference in cls_flow:
tc filter add dev foo parent 1: handle 0x1 flow hash keys dst action ok
tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
flow hash keys mark action drop
To be more precise, actually two different panics are fixed, the first
occurs because tcf_exts_init() is not called on the newly allocated
filter when we do a replace. And the second panic uncovered after that
happens since the arguments of list_replace_rcu() are swapped, the old
element needs to be the first argument and the new element the second.
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The following test case causes a NULL pointer dereference in cls_flower:
tc filter add dev foo parent 1: flower eth_type ipv4 action ok flowid 1:1
tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
flower eth_type ipv6 action ok flowid 1:1
The problem is that commit 77b9900ef53a ("tc: introduce Flower classifier")
accidentally swapped the arguments of list_replace_rcu(), the old
element needs to be the first argument and the new element the second.
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The following test case causes a NULL pointer dereference in cls_bpf:
FOO="1,6 0 0 4294967295,"
tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
bpf bytecode "$FOO" flowid 1:1 action drop
The problem is that commit 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
accidentally swapped the arguments of list_replace_rcu(), the old
element needs to be the first argument and the new element the second.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Some fixes for the current cycle:
1. Arik introduced an rtnl-locked regulatory API to be able
to differentiate between place do/don't have the RTNL;
this fixes missing locking in some of the code paths
2. Two small mesh bugfixes from Bob, one to avoid treating
a certain malformed over-the-air frame and one to avoid
sending a garbage field over the air.
3. A fix for powersave during WoWLAN suspend from Krishna Chaitanya.
4. A fix for a powersave vs. aggregation teardown race, from Michal.
5. Thomas reduced the loglevel of CRDA messages to avoid spamming
the kernel log with mostly irrelevant information.
6. Tom fixed a dangling debugfs directory pointer that could cause
crashes if subsequent addition of the same interface to debugfs
failed for some reason.
7. A fix from myself for a list corruption issue in mac80211 during
combined interface shutdown/removal - shut down interfaces first
and only then remove them to avoid that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kernel generates a lot of warnings when dst entry reference counter
overflows and becomes negative. That bug was seen several times at
machines with outdated 3.10.y kernels. Most like it's already fixed
in upstream. Anyway that flood completely kills machine and makes
further debugging impossible.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
1) If sk_filter() is applied, skb was leaked (not freed)
2) Testing SOCK_DEAD twice is racy :
packet could be freed while already queued.
3) Remove obsolete comment about caching skb->len
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch reverts 19424e052fb44da2f00d1a868cbb51f3e9f4bbb5 ("sit:
Add gro callbacks to sit_offload") because it generates packets
that cannot be handled even by our own GSO.
Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs
Simon Horman says:
====================
IPVS Fixes for v4.2
please consider this fix for v4.2.
For reasons that are not clear to me it is a bumper crop.
It seems to me that they are all relevant to stable.
Please let me know if you need my help to get the fixes into stable.
* ipvs: fix ipv6 route unreach panic
This problem appears to be present since IPv6 support was added to
IPVS in v2.6.28.
* ipvs: skb_orphan in case of forwarding
This appears to resolve a problem resulting from a side effect of
41063e9dd119 ("ipv4: Early TCP socket demux.") which was included in v3.6.
* ipvs: do not use random local source address for tunnels
This appears to resolve a problem introduced by
026ace060dfe ("ipvs: optimize dst usage for real server") in v3.10.
* ipvs: fix crash if scheduler is changed
This appears to resolve a problem introduced by
ceec4c381681 ("ipvs: convert services to rcu") in v3.10.
Julian has provided backports of the fix:
* [PATCHv2 3.10.81] ipvs: fix crash if scheduler is changed
http://www.spinics.net/lists/lvs-devel/msg04008.html
* [PATCHv2 3.12.44,3.14.45,3.18.16,4.0.6] ipvs: fix crash if scheduler is changed
http://www.spinics.net/lists/lvs-devel/msg04007.html
Please let me know how you would like to handle guiding these
backports into stable.
* ipvs: fix crash with sync protocol v0 and FTP
This appears to resolve a problem introduced by
749c42b620a9 ("ipvs: reduce sync rate with time thresholds") in v3.5
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Quoting Daniel Borkmann:
"When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present, which is problematic i.e. if there is such bravery that
the priviledged user inside the netns is assumed untrusted.
Minimal example:
ip netns add foo
ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
ip netns del foo
What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.
Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net->ct.count. The issue why they are not yet nf_ct_put() is
because the per netns .exit() handler from x_tables (which would eventually
invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
called in the dependency chain at a *later* point in time than the per
netns .exit() handler for the connection tracker.
This is clearly a chicken'n'egg problem: after the connection tracker
.exit() handler, we've teared down all the connection tracking
infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
invoked at a later point in time during the netns cleanup, as that would
lead to a use-after-free. At the same time, we cannot make x_tables depend
on the connection tracker module, so that the xt_ct_tg_destroy() would
be invoked earlier in the cleanup chain."
Daniel confirms this has to do with the order in which modules are loaded or
having compiled nf_conntrack as modules while x_tables built-in. So we have no
guarantees regarding the order in which netns callbacks are executed.
Fix this by allocating the templates through kmalloc() from the respective
SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
is marked as unlikely since conntrack templates are rarely allocated and only
from the configuration plane path.
Note that templates are not kept in any list to avoid further dependencies with
nf_conntrack anymore, thus, the tmpl larval list is removed.
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The RTNL is required to check for IR-relaxation conditions that allow
more channels to beacon. Export an RTNL locked version of reg_can_beacon
and use it where possible in AP/STA interface type flows, where
IR-relaxation may be applicable.
Fixes: 06f207fc5418 ("cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STA")
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Although mesh_rx_plink_frame() already checks that frames have enough
bytes for the action code plus another two bytes for capability/reason
code, it doesn't take into account that confirm frames also have an
additional two-byte aid. As a result, a corrupt frame could cause a
subsequent subtraction to wrap around to ill effect. Add another
check for this case.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
According to 802.11-2012 8.5.16.3.2 AID comes directly after the
capability bytes in mesh peering confirm frames. The existing
code, however, was adding a 2 byte offset to this location,
resulting in garbage data going out over the air. Remove the
offset to fix it.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
With a basic Linux userspace, the messages "Calling CRDA to update
world regulatory domain" appears 10 times after boot every second or
so, followed by a final "Exceeded CRDA call max attempts. Not calling
CRDA". For those of us not having the corresponding userspace parts,
having those messages repeatedly displayed at boot time is a bit
annoying, so this commit reduces their log level to pr_debug().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If the hardware is unregistered while interfaces are up, mac80211 will
unregister all interfaces, which in turns causes mac80211 to be called
again to remove them all from the driver and eventually shut down the
hardware.
During this shutdown, however, it's currently already unsafe to iterate
the list of interfaces atomically, as the list is manipulated in an
unsafe manner. This puts an undue burden on the driver - it must stop
all its activities before calling ieee80211_unregister_hw(), while in
the normal stop path it can do all cleanup in the stop method. If, for
example, it's using the iteration during RX for some reason, it would
have to stop RX before unregistering to avoid crashes.
Fix this problem by closing all interfaces before unregistering them.
This will cause the driver stop to have completed before we manipulate
the interface list, and after the driver is stopped *and* has called
ieee80211_unregister_hw() it really musn't be iterating any more as
the memory will be freed as well.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If for any reason we're in the middle of PS-polling or awake after
TX due to dynamic powersave while going to suspend, go back to save
power. This might cause a response frame to get lost, but since we
can't really wait for it while going to suspend that's still better
than not enabling powersave which would cause higher power usage
during (and possibly even after) suspend.
Note that this really only affects the very few drivers that use
the powersave implementation in mac80211.
Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com>
[rewrite misleading commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When acting as AP and a PS-Poll frame is received
associated station is marked as one in a Service
Period. This state is kept until Tx status for
released frame is reported. While a station is in
Service Period PS-Poll frames are ignored.
However if PS-Poll was received during A-MPDU
teardown it was possible to have the to-be
released frame re-queued back to pending queue.
In such case the frame was stripped of 2 important
flags:
(a) IEEE80211_TX_CTL_NO_PS_BUFFER
(b) IEEE80211_TX_STATUS_EOSP
Stripping of (a) led to the frame that was to be
released to be queued back to ps_tx_buf queue. If
station remained to use only PS-Poll frames the
re-queued frame (and new ones) was never actually
transmitted because mac80211 would ignore
subsequent PS-Poll frames due to station being in
Service Period. There was nothing left to clear
the Service Period bit (no xmit -> no tx status ->
no SP end), i.e. the AP would have the station
stuck in Service Period. Beacon TIM would
repeatedly prompt station to poll for frames but
it would get none.
Once (a) is not stripped (b) becomes important
because it's the main condition to clear the
Service Period bit of the station when Tx status
for the released frame is reported back.
This problem was observed with ath9k acting as P2P
GO in some testing scenarios but isn't limited to
it. AP operation with mac80211 based Tx A-MPDU
control combined with clients using PS-Poll frames
is subject to this race.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If we don't do this, and we then fail to recreate the debugfs
directory during a mode change, then we will fail later trying
to add stations to this now bogus directory:
BUG: unable to handle kernel NULL pointer dereference at 0000006c
IP: [<c0a92202>] mutex_lock+0x12/0x30
Call Trace:
[<c0678ab4>] start_creating+0x44/0xc0
[<c0679203>] debugfs_create_dir+0x13/0xf0
[<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211]
Cc: stable@kernel.org
Signed-off-by: Tom Hughes <tom@compton.nu>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
prog->bpf_ops is populated when act_bpf is used with classic BPF and
prog->bpf_name is optionally used with extended BPF.
Fix memory leak when act_bpf is released.
Fixes: d23b8ad8ab23 ("tc: add BPF based action")
Fixes: a8cb5f556b56 ("act_bpf: add initial eBPF support for actions")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|