summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-12-03vhost: fix skb leak in handle_rx()Wei Xu1-10/+10
Matthew found a roughly 40% tcp throughput regression with commit c67df11f(vhost_net: try batch dequing from skb array) as discussed in the following thread: https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html Eventually we figured out that it was a skb leak in handle_rx() when sending packets to the VM. This usually happens when a guest can not drain out vq as fast as vhost fills in, afterwards it sets off the traffic jam and leaks skb(s) which occurs as no headcount to send on the vq from vhost side. This can be avoided by making sure we have got enough headcount before actually consuming a skb from the batched rx array while transmitting, which is simply done by moving checking the zero headcount a bit ahead. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03Merge branch 'bnxt_en-fixes'David S. Miller2-29/+31
Michael Chan says: ==================== bnxt_en: Fixes. A shutdown fix for SMARTNIC, 2 fixes related to TC Flower vxlan filters, and the last one fixes an out-of-scope variable when sending short firmware messages. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()Vasundhara Volam1-1/+1
short_input variable is assigned to another data pointer which is referred out of its scope. Fix it by moving short_input definition to the beginning of bnxt_hwrm_do_send_msg() function. No failure has been reported so far due to this issue. Fixes: e605db801bde ("bnxt_en: Support for Short Firmware Message") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03bnxt_en: fix dst/src fid for vxlan encap/decap actionsSathya Perla1-22/+26
For flows that involve a vxlan encap action, the vxlan sock interface may be specified as the outgoing interface. The driver must resolve the outgoing PF interface used by this socket and use the dst_fid of the PF in the hwrm_cfa_encap_record_alloc cmd. Similarily for flows that have a vxlan decap action, the fid of the incoming PF interface must be used as the src_fid in the hwrm_cfa_decap_filter_alloc cmd. Fixes: 8c95f773b4a3 ("bnxt_en: add support for Flower based vxlan encap/decap offload") Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03bnxt_en: wildcard smac while creating tunnel decap filterSunil Challa1-5/+2
While creating a decap filter the tunnel smac need not (and must not) be specified as we cannot ascertain the neighbor in the recv path. 'ttl' match is also not needed for the decap filter and must be wild-carded. Fixes: f484f6782e01 ("bnxt_en: add hwrm FW cmds for cfa_encap_record and decap_filter") Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdownRay Jui1-1/+2
The current 'bnxt_shutdown' implementation only invokes 'bnxt_ulp_shutdown' to shut down RoCE in the case when the system is in the path of power off (SYSTEM_POWER_OFF). While this may work in most cases, it does not work in the smart NIC case, when Linux 'reboot' command is initiated from the Linux that runs on the ARM cores of the NIC card. In this particular case, Linux 'reboot' results in a system 'L3' level reset where the entire ARM and associated subsystems are being reset, but at the same time, Nitro core is being kept in sane state (to allow external PCIe connected servers to continue to work). Without properly shutting down RoCE and freeing all associated resources, it results in the ARM core to hang immediately after the 'reboot' By always invoking 'bnxt_ulp_shutdown' in 'bnxt_shutdown', it fixes the above issue Fixes: 0efd2fc65c92 ("bnxt_en: Add a callback to inform RDMA driver during PCI shutdown.") Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01Merge branch 'sfp-phylink-fixes'David S. Miller2-11/+31
Russell King says: ==================== SFP/phylink fixes Here are four phylink fixes: - the "options" is a big-endian value, we must test the bits taking the endian-ness into account. - improve the handling of RX_LOS polarity, taking no RX_LOS polarity bits set to mean there is no RX_LOS functionality provided. - do not report modules that require the address mode switching as supporting SFF8472. - ensure that the mac_link_down() function is called when phylink_stop() is called. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01phylink: ensure we take the link down when phylink_stop() is calledRussell King1-0/+1
Ensure that we tell the MAC to take the link down when phylink_stop() is called, and that this completes prior to phylink_stop() returns. Reported-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sfp: warn about modules requiring address change sequenceRussell King1-1/+7
We do not support SFP modules which require the address change sequence as detailed by SFF 8472 revision 1.22 section 8.9. Warn when these modules are inserted, and treat them as SFF8079 modules for ethtool. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sfp: improve RX_LOS handlingRussell King1-11/+22
There are two bits in the option word for the RX_LOS signal. One reports that the RX_LOS signal is active high, the other reports that it is active low. When both or neither are set, the result is not well defined in the specification. Rather than assuming that neither set means normal RX_LOS, take this as meaning no RX_LOS signal available, thereby ignoring the signal. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sfp: fix RX_LOS signal handlingRussell King1-3/+5
The options word is a be16 quantity, so we need to test the flags having converted the endian-ness. Convert the flag bits to be16, which can be optimised by the compiler, rather than converting a variable at runtime. Reported-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01net: phy-micrel: check return code in flp center functionMax Uvarov1-2/+4
Fix obvious typo that first return value is set but not checked. Signed-off-by: Max Uvarov <muvarov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()Tommi Rantala1-4/+0
Remove the second tipc_rcv() call in tipc_udp_recv(). We have just checked that the bearer is not up, and calling tipc_rcv() with a bearer that is not up leads to a TIPC div-by-zero crash in tipc_node_calculate_timer(). The crash is rare in practice, but can happen like this: We're enabling a bearer, but it's not yet up and fully initialized. At the same time we receive a discovery packet, and in tipc_udp_recv() we end up calling tipc_rcv() with the not-yet-initialized bearer, causing later the div-by-zero crash in tipc_node_calculate_timer(). Jon Maloy explains the impact of removing the second tipc_rcv() call: "link setup in the worst case will be delayed until the next arriving discovery messages, 1 sec later, and this is an acceptable delay." As the tipc_rcv() call is removed, just leave the function via the rcu_out label, so that we will kfree_skb(). [ 12.590450] Own node address <1.1.1>, network identity 1 [ 12.668088] divide error: 0000 [#1] SMP [ 12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1 [ 12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000 [ 12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] [ 12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246 [ 12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000 [ 12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600 [ 12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001 [ 12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8 [ 12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800 [ 12.702338] FS: 0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000 [ 12.705099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0 [ 12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 12.712627] Call Trace: [ 12.713390] <IRQ> [ 12.714011] tipc_node_check_dest+0x2e8/0x350 [tipc] [ 12.715286] tipc_disc_rcv+0x14d/0x1d0 [tipc] [ 12.716370] tipc_rcv+0x8b0/0xd40 [tipc] [ 12.717396] ? minmax_running_min+0x2f/0x60 [ 12.718248] ? dst_alloc+0x4c/0xa0 [ 12.718964] ? tcp_ack+0xaf1/0x10b0 [ 12.719658] ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc] [ 12.720634] tipc_udp_recv+0x71/0x1d0 [tipc] [ 12.721459] ? dst_alloc+0x4c/0xa0 [ 12.722130] udp_queue_rcv_skb+0x264/0x490 [ 12.722924] __udp4_lib_rcv+0x21e/0x990 [ 12.723670] ? ip_route_input_rcu+0x2dd/0xbf0 [ 12.724442] ? tcp_v4_rcv+0x958/0xa40 [ 12.725039] udp_rcv+0x1a/0x20 [ 12.725587] ip_local_deliver_finish+0x97/0x1d0 [ 12.726323] ip_local_deliver+0xaf/0xc0 [ 12.726959] ? ip_route_input_noref+0x19/0x20 [ 12.727689] ip_rcv_finish+0xdd/0x3b0 [ 12.728307] ip_rcv+0x2ac/0x360 [ 12.728839] __netif_receive_skb_core+0x6fb/0xa90 [ 12.729580] ? udp4_gro_receive+0x1a7/0x2c0 [ 12.730274] __netif_receive_skb+0x1d/0x60 [ 12.730953] ? __netif_receive_skb+0x1d/0x60 [ 12.731637] netif_receive_skb_internal+0x37/0xd0 [ 12.732371] napi_gro_receive+0xc7/0xf0 [ 12.732920] receive_buf+0x3c3/0xd40 [ 12.733441] virtnet_poll+0xb1/0x250 [ 12.733944] net_rx_action+0x23e/0x370 [ 12.734476] __do_softirq+0xc5/0x2f8 [ 12.734922] irq_exit+0xfa/0x100 [ 12.735315] do_IRQ+0x4f/0xd0 [ 12.735680] common_interrupt+0xa2/0xa2 [ 12.736126] </IRQ> [ 12.736416] RIP: 0010:native_safe_halt+0x6/0x10 [ 12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d [ 12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000 [ 12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88 [ 12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 [ 12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000 [ 12.741831] default_idle+0x2a/0x100 [ 12.742323] arch_cpu_idle+0xf/0x20 [ 12.742796] default_idle_call+0x28/0x40 [ 12.743312] do_idle+0x179/0x1f0 [ 12.743761] cpu_startup_entry+0x1d/0x20 [ 12.744291] start_secondary+0x112/0x120 [ 12.744816] secondary_startup_64+0xa5/0xa5 [ 12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00 00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48 89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f [ 12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0 [ 12.748555] ---[ end trace 1399ab83390650fd ]--- [ 12.749296] Kernel panic - not syncing: Fatal exception in interrupt [ 12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 12.751215] Rebooting in 60 seconds.. Fixes: c9b64d492b1f ("tipc: add replicast peer discovery") Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01tcp/dccp: block bh before arming time_wait timerEric Dumazet2-0/+12
Maciej Żenczykowski reported some panics in tcp_twsk_destructor() that might be caused by the following bug. timewait timer is pinned to the cpu, because we want to transition timwewait refcount from 0 to 4 in one go, once everything has been initialized. At the time commit ed2e92394589 ("tcp/dccp: fix timewait races in timer handling") was merged, TCP was always running from BH habdler. After commit 5413d1babe8f ("net: do not block BH while processing socket backlog") we definitely can run tcp_time_wait() from process context. We need to block BH in the critical section so that the pinned timer has still its purpose. This bug is more likely to happen under stress and when very small RTO are used in datacenter flows. Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01Merge branch 'sctp-prsctp-chunk-fixes'David S. Miller3-7/+26
Xin Long says: ==================== sctp: a couple of fixes for chunks abandoned in prsctp Now when abandoning chunks in prsctp, it doesn't consider for frags in one msg, which would cause peer can never receive the whole frags for one msg to get them reassembled, these pieces of this msg will stay in the reasm queue forever and block the following chunks' receiving. This patchset is to fix them in patch 2 and 3, and also fix another issue for prsctp in patch 1. ==================== Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sctp: do not abandon the other frags in unsent outq if one msg has ↵Xin Long2-1/+6
outstanding frags Now for the abandoned chunks in unsent outq, it would just free the chunks. Because no tsn is assigned to them yet, there's no need to send fwd tsn to peer, unlike for the abandoned chunks in sent outq. The problem is when parts of the msg have been sent and the other frags are still in unsent outq, if they are abandoned/dropped, the peer would never get this msg reassembled. So these frags in unsent outq can't be dropped if this msg already has outstanding frags. This patch does the check in sctp_chunk_abandoned and sctp_prsctp_prune_unsent. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sctp: abandon the whole msg if one part of a fragmented message is abandonedXin Long3-5/+17
As rfc3758#section-3.1 demands: A3) When a TSN is "abandoned", if it is part of a fragmented message, all other TSN's within that fragmented message MUST be abandoned at the same time. Besides, if it couldn't handle this, the rest frags would never get assembled in peer side. This patch supports it by adding abandoned flag in sctp_datamsg, when one chunk is being abandoned, set chunk->msg->abandoned as well. Next time when checking for abandoned, go checking chunk->msg->abandoned first. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01sctp: only update outstanding_bytes for transmitted queue when doing ↵Xin Long1-2/+4
prsctp_prune Now outstanding_bytes is only increased when appending chunks into one packet and sending it at 1st time, while decreased when it is about to move into retransmit queue. It means outstanding_bytes value is already decreased for all chunks in retransmit queue. However sctp_prsctp_prune_sent is a common function to check the chunks in both transmitted and retransmit queue, it decrease outstanding_bytes when moving a chunk into abandoned queue from either of them. It could cause outstanding_bytes underflow, as it also decreases it's value for the chunks in retransmit queue. This patch fixes it by only updating outstanding_bytes for transmitted queue when pruning queues for prsctp prio policy, the same fix is also needed in sctp_check_transmitted. Fixes: 8dbdf1f5b09c ("sctp: implement prsctp PRIO policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30net: dsa: bcm_sf2: Set correct CHAIN_ID and slice number maskFlorian Fainelli1-2/+2
When configuring an IPv6 address mask, we should use SLICE_NUM_MASK as the mask in order to make sure all bits are masked by the hardware. Also, we want matching entries to have a CHAIN_ID value set to the same value as the rule index we return to user-space for convenience, so fix that too. Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules") Fixes: dd8eff68343d ("net: dsa: bcm_sf2: Allow matching arbitrary IPv6 masks/lengths") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30sit: update frag_off infoHangbin Liu1-0/+1
After parsing the sit netlink change info, we forget to update frag_off in ipip6_tunnel_update(). Fix it by assigning frag_off with new value. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30tcp: remove buggy call to tcp_v6_restore_cb()Eric Dumazet1-1/+0
tcp_v6_send_reset() expects to receive an skb with skb->cb[] layout as used in TCP stack. MD5 lookup uses tcp_v6_iif() and tcp_v6_sdif() and thus TCP_SKB_CB(skb)->header.h6 This patch probably fixes RST packets sent on behalf of a timewait md5 ipv6 socket. Before Florian patch, tcp_v6_restore_cb() was needed before jumping to no_tcp_socket label. Fixes: 271c3b9b7bda ("tcp: honour SO_BINDTODEVICE for TW_RST case too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30act_sample: get rid of tcf_sample_cleanup_rcu()Cong Wang2-12/+3
Similar to commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu"), TC actions don't need to respect RCU grace period, because it is either just detached from tc filter (standalone case) or it is removed together with tc filter (bound case) in which case RCU grace period is already respected at filter layer. Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30Merge tag 'rxrpc-fixes-20171129' of ↵David S. Miller5-27/+35
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fixes Here are three patches for AF_RXRPC. One removes some whitespace, one fixes terminal ACK generation and the third makes a couple of places actually use the timeout value just determined rather than ignoring it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30net: stmmac: dwmac-sun8i: fix allwinner,leds-active-low handlingCorentin Labbe1-2/+1
The driver expect "allwinner,leds-active-low" to be in PHY node, but the binding doc expect it to be in MAC node. Since all board DT use it also in MAC node, the driver need to search allwinner,leds-active-low in MAC node. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30skbuff: Grammar s/are can/can/, s/change/changes/Geert Uytterhoeven1-2/+1
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30net: mvpp2: allocate zeroed tx descriptorsYan Markman1-1/+1
Reserved and unused fields in the Tx descriptors should be 0. The PPv2 driver doesn't clear them at run-time (for performance reasons) but these descriptors aren't zeroed when allocated, which can lead to unpredictable behaviors. This patch fixes this by using dma_zalloc_coherent instead of dma_alloc_coherent. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Yan Markman <ymarkman@marvell.com> [Antoine: commit message] Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-30Merge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds15-132/+270
Pull nfsd fixes from Bruce Fields: "I screwed up my merge window pull request; I only sent half of what I meant to. There were no new features, just bugfixes of various importance and some very minor cleanup, so I think it's all still appropriate for -rc2. Highlights: - Fixes from Trond for some races in the NFSv4 state code. - Fix from Naofumi Honda for a typo in the blocked lock notificiation code - Fixes from Vasily Averin for some problems starting and stopping lockd especially in network namespaces" * tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits) lockd: fix "list_add double add" caused by legacy signal interface nlm_shutdown_hosts_net() cleanup race of nfsd inetaddr notifiers vs nn->nfsd_serv change race of lockd inetaddr notifiers vs nlmsvc_rqst change SUNRPC: make cache_detail structures const NFSD: make cache_detail structures const sunrpc: make the function arg as const nfsd: check for use of the closed special stateid nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat lockd: lost rollback of set_grace_period() in lockd_down_net() lockd: added cleanup checks in exit_net hook grace: replace BUG_ON by WARN_ONCE in exit_net hook nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class lockd: remove net pointer from messages nfsd: remove net pointer from debug messages nfsd: Fix races with check_stateid_generation() nfsd: Ensure we check stateid validity in the seqid operation checks nfsd: Fix race in lock stateid creation nfsd4: move find_lock_stateid nfsd: Ensure we don't recognise lock stateids after freeing them ...
2017-11-30Merge tag 'for-4.15-rc2-tag' of ↵Linus Torvalds19-135/+314
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We've collected some fixes in since the pre-merge window freeze. There's technically only one regression fix for 4.15, but the rest seems important and candidates for stable. - fix missing flush bio puts in error cases (is serious, but rarely happens) - fix reporting stat::st_blocks for buffered append writes - fix space cache invalidation - fix out of bound memory access when setting zlib level - fix potential memory corruption when fsync fails in the middle - fix crash in integrity checker - incremetnal send fix, path mixup for certain unlink/rename combination - pass flags to writeback so compressed writes can be throttled properly - error handling fixes" * tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: incremental send, fix wrong unlink path after renaming file btrfs: tree-checker: Fix false panic for sanity test Btrfs: fix list_add corruption and soft lockups in fsync btrfs: Fix wild memory access in compression level parser btrfs: fix deadlock when writing out space cache btrfs: clear space cache inode generation always Btrfs: fix reported number of inode blocks after buffered append writes Btrfs: move definition of the function btrfs_find_new_delalloc_bytes Btrfs: bail out gracefully rather than BUG_ON btrfs: dev_alloc_list is not protected by RCU, use normal list_del btrfs: add missing device::flush_bio puts btrfs: Fix transaction abort during failure in btrfs_rm_dev_item Btrfs: add write_flags for compression bio
2017-11-30Merge tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds1-0/+1
Pull Microblaze fix from Michal Simek: "Add missing header to mmu_context_mm.h" * tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: add missing include to mmu_context_mm.h
2017-11-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds1-1/+1
Pull sparc fix from David Miller: "Sparc T4 and later cpu bootup regression fix" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix boot on T4 and later.
2017-11-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds72-582/+1182
Pull networking fixes from David Miller: 1) The forcedeth conversion from pci_*() DMA interfaces to dma_*() ones missed one spot. From Zhu Yanjun. 2) Missing CRYPTO_SHA256 Kconfig dep in cfg80211, from Johannes Berg. 3) Fix checksum offloading in thunderx driver, from Sunil Goutham. 4) Add SPDX to vm_sockets_diag.h, from Stephen Hemminger. 5) Fix use after free of packet headers in TIPC, from Jon Maloy. 6) "sizeof(ptr)" vs "sizeof(*ptr)" bug in i40e, from Gustavo A R Silva. 7) Tunneling fixes in mlxsw driver, from Petr Machata. 8) Fix crash in fanout_demux_rollover() of AF_PACKET, from Mike Maloney. 9) Fix race in AF_PACKET bind() vs. NETDEV_UP notifier, from Eric Dumazet. 10) Fix regression in sch_sfq.c due to one of the timer_setup() conversions. From Paolo Abeni. 11) SCTP does list_for_each_entry() using wrong struct member, fix from Xin Long. 12) Don't use big endian netlink attribute read for IFLA_BOND_AD_ACTOR_SYSTEM, it is in cpu endianness. Also from Xin Long. 13) Fix mis-initialization of q->link.clock in CBQ scheduler, preventing adding filters there. From Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) ethernet: dwmac-stm32: Fix copyright net: via: via-rhine: use %p to format void * address instead of %x net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit myri10ge: Update MAINTAINERS net: sched: cbq: create block for q->link.block atm: suni: remove extraneous space to fix indentation atm: lanai: use %p to format kernel addresses instead of %x VSOCK: Don't set sk_state to TCP_CLOSE before testing it atm: fore200e: use %pK to format kernel addresses instead of %x ambassador: fix incorrect indentation of assignment statement vxlan: use __be32 type for the param vni in __vxlan_fdb_delete bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM sctp: use right member as the param of list_for_each_entry sch_sfq: fix null pointer dereference at timer expiration cls_bpf: don't decrement net's refcount when offload fails net/packet: fix a race in packet_bind() and packet_notifier() packet: fix crash in fanout_demux_rollover() sctp: remove extern from stream sched sctp: force the params with right types for sctp csum apis sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail ...
2017-11-29sparc64: Fix boot on T4 and later.David S. Miller1-1/+1
If we don't put the NG4fls.o object into the same part of the link as the generic sparc64 objects for fls() and __fls() then the relocation in the branch we use for patching will not fit. Move NG4fls.o into lib-y to fix this problem. Fixes: 46ad8d2d22c1 ("sparc64: Use sparc optimized fls and __fls for T4 and above") Signed-off-by: David S. Miller <davem@davemloft.net> Reported-by: Anatoly Pugachev <matorola@gmail.com> Tested-by: Anatoly Pugachev <matorola@gmail.com>
2017-11-29vsprintf: don't use 'restricted_pointer()' when not restrictingLinus Torvalds1-0/+2
Instead, just fall back on the new '%p' behavior which hashes the pointer. Otherwise, '%pK' - that was intended to mark a pointer as restricted - just ends up leaking pointers that a normal '%p' wouldn't leak. Which just make the whole thing pointless. I suspect we should actually get rid of '%pK' entirely, and make it just work as '%p' regardless, but this is the minimal obvious fix. People who actually use 'kptr_restrict' should weigh in on which behavior they want. Cc: Tobin Harding <me@tobin.cc> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29kallsyms: take advantage of the new '%px' formatLinus Torvalds3-13/+7
The conditional kallsym hex printing used a special fixed-width '%lx' output (KALLSYM_FMT) in preparation for the hashing of %p, but that series ended up adding a %px specifier to help with the conversions. Use it, and avoid the "print pointer as an unsigned long" code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29Merge tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linuxLinus Torvalds5-95/+248
Pull printk pointer hashing update from Tobin Harding: "Here is the patch set that implements hashing of printk specifier %p. First we have two clean up patches then we do the hashing. Hashing is done via the SipHash algorithm. The next patch adds printk specifier %px for printing pointers when we _really_ want to see the address i.e %px is functionally equivalent to %lx. Final patch in the set fixes KASAN since we break it by hashing %p. For the record here is the justification for the series: Currently there exist approximately 14 000 places in the Kernel where addresses are being printed using an unadorned %p. This potentially leaks sensitive information about the Kernel layout in memory. Many of these calls are stale, instead of fixing every call we hash the address by default before printing. We then add %px to provide a way to print the actual address. Although this is achievable using %lx, using %px will assist us if we ever want to change pointer printing behaviour. %px is more uniquely grep'able (there are already >50 000 uses of %lx). The added advantage of hashing %p is that security is now opt-out, if you _really_ want the address you have to work a little harder and use %px. This will of course break some users, forcing code printing needed addresses to be updated" [ I do expect this to be an annoyance, and a number of %px users to be added for debuggability. But nobody is willing to audit existing %p users for information leaks, and a number of places really only use the pointer as an object identifier rather than really 'I need the address'. IOW - sorry for the inconvenience, but it's the least inconvenient of the options. - Linus ] * tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linux: kasan: use %px to print addresses instead of %p vsprintf: add printk specifier %px printk: hash addresses printed with %p vsprintf: refactor %pK code out of pointer() docs: correct documentation for %pK
2017-11-29Revert "mm, thp: Do not make pmd/pud dirty without a reason"Linus Torvalds5-24/+16
This reverts commit 152e93af3cfe2d29d8136cc0a02a8612507136ee. It was a nice cleanup in theory, but as Nicolai Stange points out, we do need to make the page dirty for the copy-on-write case even when we didn't end up making it writable, since the dirty bit is what we use to check that we've gone through a COW cycle. Reported-by: Michal Hocko <mhocko@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29ethernet: dwmac-stm32: Fix copyrightBenjamin Gaignard1-2/+2
Uniformize STMicroelectronics copyrights header Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com> CC: Alexandre Torgue <alexandre.torgue@st.com> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-29net: via: via-rhine: use %p to format void * address instead of %xColin Ian King1-2/+2
Don't use %x and casting to print out an address, instead use %p and remove the casting. Cleans up smatch warnings: drivers/net/ethernet/via/via-rhine.c:998 rhine_init_one_common() warn: argument 4 to %lx specifier is cast from pointer Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-29rxrpc: Fix variable overwriteGustavo A. R. Silva2-2/+2
Values assigned to both variable resend_at and ack_at are overwritten before they can be used. The correct fix here is to add 'now' to the previously computed value in resend_at and ack_at. Addresses-Coverity-ID: 1462262 Addresses-Coverity-ID: 1462263 Addresses-Coverity-ID: 1462264 Fixes: beb8e5e4f38c ("rxrpc: Express protocol timeouts in terms of RTT") Link: https://marc.info/?i=17004.1511808959%40warthog.procyon.org.uk Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-29net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bitGeert Uytterhoeven1-0/+1
On 64-bit (e.g. powerpc64/allmodconfig): drivers/net/ethernet/xilinx/ll_temac_main.c: In function 'temac_start_xmit_done': drivers/net/ethernet/xilinx/ll_temac_main.c:633:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] dev_kfree_skb_irq((struct sk_buff *)cur_p->app4); ^ cdmac_bd.app4 is u32, so it is too small to hold a kernel pointer. Note that several other fields in struct cdmac_bd are also too small to hold physical addresses on 64-bit platforms. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-29rxrpc: Fix ACK generation from the connection event processorDavid Howells1-21/+29
Repeat terminal ACKs and now terminal ACKs are now generated from the connection event processor rather from call handling as this allows us to discard client call structures as soon as possible and free up the channel for a follow on call. However, in ACKs so generated, the additional information trailer is malformed because the padding that's meant to be in the middle isn't included in what's transmitted. Fix it so that the 3 bytes of padding are included in the transmission. Further, the trailer is misaligned because of the padding, so assigment to the u16 and u32 fields inside it might cause problems on some arches, so fix this by breaking the padding and the trailer out of the packed struct. (This also deals with potential compiler weirdies where some of the nested structs are packed and some aren't). The symptoms can be seen in wireshark as terminal DUPLICATE or IDLE ACK packets in which the Max MTU, Interface MTU and rwind fields have weird values and the Max Packets field is apparently missing. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-29rxrpc: Clean up whitespaceDavid Howells3-4/+4
Clean up some whitespace from rxrpc. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-29myri10ge: Update MAINTAINERSHyong-Youb Kim1-2/+2
Change the maintainer to Chris Lee who has access to Myricom hardware and can test/review. Update the website URL. Signed-off-by: Hyong-Youb Kim <hykim@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-29kasan: use %px to print addresses instead of %pTobin C. Harding1-4/+4
Pointers printed with %p are now hashed by default. Kasan needs the actual address. We can use the new printk specifier %px for this purpose. Use %px instead of %p to print addresses. Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-29vsprintf: add printk specifier %pxTobin C. Harding3-2/+36
printk specifier %p now hashes all addresses before printing. Sometimes we need to see the actual unmodified address. This can be achieved using %lx but then we face the risk that if in future we want to change the way the Kernel handles printing of pointers we will have to grep through the already existent 50 000 %lx call sites. Let's add specifier %px as a clear, opt-in, way to print a pointer and maintain some level of isolation from all the other hex integer output within the Kernel. Add printk specifier %px to print the actual unmodified address. Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-29printk: hash addresses printed with %pTobin C. Harding3-46/+155
Currently there exist approximately 14 000 places in the kernel where addresses are being printed using an unadorned %p. This potentially leaks sensitive information regarding the Kernel layout in memory. Many of these calls are stale, instead of fixing every call lets hash the address by default before printing. This will of course break some users, forcing code printing needed addresses to be updated. Code that _really_ needs the address will soon be able to use the new printk specifier %px to print the address. For what it's worth, usage of unadorned %p can be broken down as follows (thanks to Joe Perches). $ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c 1084 arch 20 block 10 crypto 32 Documentation 8121 drivers 1221 fs 143 include 101 kernel 69 lib 100 mm 1510 net 40 samples 7 scripts 11 security 166 sound 152 tools 2 virt Add function ptr_to_id() to map an address to a 32 bit unique identifier. Hash any unadorned usage of specifier %p and any malformed specifiers. Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-29vsprintf: refactor %pK code out of pointer()Tobin C. Harding1-43/+54
Currently code to handle %pK is all within the switch statement in pointer(). This is the wrong level of abstraction. Each of the other switch clauses call a helper function, pK should do the same. Refactor code out of pointer() to new function restricted_pointer(). Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-29docs: correct documentation for %pKTobin C. Harding1-2/+1
Current documentation indicates that %pK prints a leading '0x'. This is not the case. Correct documentation for printk specifier %pK. Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-29Merge branch 'linus' of ↵Linus Torvalds5-38/+66
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - avoid potential bogus alignment for some AEAD operations - fix crash in algif_aead - avoid sleeping in softirq context with async af_alg * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: skcipher - Fix skcipher_walk_aead_common crypto: af_alg - remove locking in async callback crypto: algif_aead - skip SGL entries with NULL page
2017-11-29net: sched: cbq: create block for q->link.blockJiri Pirko1-1/+8
q->link.block is not initialized, that leads to EINVAL when one tries to add filter there. So initialize it properly. This can be reproduced by: $ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit $ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 0x00 flowid 1:1 Reported-by: Jaroslav Aster <jaster@redhat.com> Reported-by: Ivan Vecera <ivecera@redhat.com> Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>