summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2018-09-01Revert "net: sched: act: add extack for lookup callback"Cong Wang17-36/+21
This reverts commit 331a9295de23 ("net: sched: act: add extack for lookup callback"). This extack is never used after 6 months... In fact, it can be just set in the caller, right after ->lookup(). Cc: Alexander Aring <aring@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller8-52/+108
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-09-01 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add AF_XDP zero-copy support for i40e driver (!), from Björn and Magnus. 2) BPF verifier improvements by giving each register its own liveness chain which allows to simplify and getting rid of skip_callee() logic, from Edward. 3) Add bpf fs pretty print support for percpu arraymap, percpu hashmap and percpu lru hashmap. Also add generic percpu formatted print on bpftool so the same can be dumped there, from Yonghong. 4) Add bpf_{set,get}sockopt() helper support for TCP_SAVE_SYN and TCP_SAVED_SYN options to allow reflection of tos/tclass from received SYN packet, from Nikita. 5) Misc improvements to the BPF sockmap test cases in terms of cgroup v2 interaction and removal of incorrect shutdown() calls, from John. 6) Few cleanups in xdp_umem_assign_dev() and xdpsock samples, from Prashant. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01xsk: i40e: get rid of useless struct xdp_umem_propsMagnus Karlsson5-28/+18
This commit gets rid of the structure xdp_umem_props. It was there to be able to break a dependency at one point, but this is no longer needed. The values in the struct are instead stored directly in the xdp_umem structure. This simplifies the xsk code as well as af_xdp zero-copy drivers and as a bonus gets rid of one internal header file. The i40e driver is also adapted to the new interface in this commit. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01xsk: remove unnecessary assignmentPrashant Bhole1-2/+0
Since xdp_umem_query() was added one assignment of bpf.command was missed from cleanup. Removing the assignment statement. Fixes: 84c6b86875e01a0 ("xsk: don't allow umem replace at stack level") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockoptNikita V. Shirokov1-4/+21
Adding support for two new bpf get/set sockopts: TCP_SAVE_SYN (set) and TCP_SAVED_SYN (get). This would allow for bpf program to build logic based on data from ingress SYN packet (e.g. doing tcp's tos/ tclass reflection (see sample prog)) and do it transparently from userspace program point of view. Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01xdp: remove redundant variable 'headroom'Colin Ian King1-2/+1
Variable 'headroom' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable ‘headroom’ set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-30xsk: include XDP meta data in AF_XDP framesBjörn Töpel1-6/+18
Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did not include XDP meta data in the data buffers copied out to the user application. In this commit, we check if meta data is available, and if so, it is prepended to the frame. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-30Merge tag 'mac80211-next-for-davem-2018-08-29' of ↵David S. Miller12-67/+229
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Only a few changes at this point: * new channels in 60 GHz * clarify (average) ACK signal reporting API * expose ieee80211_send_layer2_update() for all drivers * start/stop mac80211's TXQs properly when required * avoid regulatory restore with IE ignoring * spelling: contidion -> condition * fully implement WFA Multi-AP backhaul ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-30net/tls: Calculate nsg for zerocopy path without skb_cow_data.Doron Roberts-Kedes1-1/+79
decrypt_skb fails if the number of sg elements required to map it is greater than MAX_SKB_FRAGS. nsg must always be calculated, but skb_cow_data adds unnecessary memcpy's for the zerocopy case. The new function skb_nsg calculates the number of scatterlist elements required to map the skb without the extra overhead of skb_cow_data. This patch reduces memcpy by 50% on my encrypted NBD benchmarks. Reported-by: Vakul Garg <Vakul.garg@nxp.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Tested-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-30ip: fail fast on IP defrag errorsPeter Oskolkov1-9/+12
The current behavior of IP defragmentation is inconsistent: - some overlapping/wrong length fragments are dropped without affecting the queue; - most overlapping fragments cause the whole frag queue to be dropped. This patch brings consistency: if a bad fragment is detected, the whole frag queue is dropped. Two major benefits: - fail fast: corrupted frag queues are cleared immediately, instead of by timeout; - testing of overlapping fragments is now much easier: any kind of random fragment length mutation now leads to the frag queue being discarded (IP packet dropped); before this patch, some overlaps were "corrected", with tests not seeing expected packet drops. Note that in one case (see "if (end&7)" conditional) the current behavior is preserved as there are concerns that this could be legitimate padding. Signed-off-by: Peter Oskolkov <posk@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-30ethtool: drop get_settings and set_settings callbacksMichal Kubecek1-123/+35
Since [gs]et_settings ethtool_ops callbacks have been deprecated in February 2016, all in tree NIC drivers have been converted to provide [gs]et_link_ksettings() and out of tree drivers have had enough time to do the same. Drop get_settings() and set_settings() and implement both ETHTOOL_[GS]SET and ETHTOOL_[GS]LINKSETTINGS only using [gs]et_link_ksettings(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-30net/ncsi: remove duplicated include from ncsi-netlink.cYueHaibing1-1/+0
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29xsk: expose xdp_umem_get_{data,dma} to driversBjörn Töpel1-10/+0
Move the xdp_umem_get_{data,dma} functions to include/net/xdp_sock.h, so that the upcoming zero-copy implementation in the Ethernet drivers can utilize them. Also, supply some dummy function implementations for CONFIG_XDP_SOCKETS=n configs. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29xdp: export xdp_rxq_info_unreg_mem_modelBjörn Töpel1-2/+13
Export __xdp_rxq_info_unreg_mem_model as xdp_rxq_info_unreg_mem_model, so it can be used from netdev drivers. Also, add additional checks for the memory type. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29xdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPYBjörn Töpel1-0/+39
This commit adds proper MEM_TYPE_ZERO_COPY support for convert_to_xdp_frame. Converting a MEM_TYPE_ZERO_COPY xdp_buff to an xdp_frame is done by transforming the MEM_TYPE_ZERO_COPY buffer into a MEM_TYPE_PAGE_ORDER0 frame. This is costly, and in the future it might make sense to implement a more sophisticated thread-safe alloc/free scheme for MEM_TYPE_ZERO_COPY, so that no allocation and copy is required in the fast-path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29cfg80211: clarify frames covered by average ACK signal reportBalaji Pothunoori2-6/+7
Modify the API to include all ACK frames in average ACK signal strength reporting, not just ACKs for data frames. Make exposing the data conditional on implementing the extended feature flag. This is how it was really implemented in mac80211, update the code there to use the new defines and clean up some of the setting code. Keep nl80211.h source compatibility by keeping the old names. Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org> [rewrite commit log, change compatibility to be old=new instead of the other way around, update kernel-doc, roll in mac80211 changes, make mac80211 depend on valid bit instead of HW flag] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-29mac80211: add missing WFA Multi-AP backhaul STA Rx requirementSathishkumar Muruganandam1-1/+1
The current mac80211 WDS (4-address mode) can be used to cover most of the Multi-AP requirements for Data frames per the WFA Multi-AP Specification v1.0. When configuring AP/STA interfaces in 4-address mode, they are able to function as fronthaul AP/backhaul STA of Multi-AP device complying below Tx, Rx requirements except one missing STA Rx requirement added by this patch. Multi-AP specification section 14.1 describes the following requirements: Transmitter requirements ------------------------ 1. Fronthaul AP i) When DA!=RA of backhaul STA, must use 4-address format ii) When DA==RA of backhaul STA, shall use either 3-address or 4-address format with RA updated with STA MAC (mac80211 support 4-address format via AP/VLAN interface) 2. Backhaul STA i) When SA!=TA of backhaul STA, must use 4-address format ii) When SA==TA of backhaul STA, shall use either 3-address or 4-address format with RA updated with AP MAC (mac80211 support 4-address format via use_4addr) Receiver requirements --------------------- 1. Fronthaul AP i) When SA!=TA of backhaul STA, must support receiving 4-address format frames ii) When SA==TA of backhaul STA, must support receiving both 3-address and 4-address format frames (mac80211 support both 3-addr & 4-addr via AP/VLAN interface) 2. Backhaul STA i) When DA!=RA of backhaul STA, must support receiving 4-address format frames ii) When DA==RA of backhaul STA, must support receiving both 3-address and 4-address format frames (mac80211 support only receiving 4-address format via use_4addr) This patch addresses the above Rx requirement (ii) for backhaul STA to receive unicast (DA==RA) 3-address frames in addition to 4-address frames. The current design doesn't accept 3-address frames when configured in 4-address mode (use_4addr). Hence add a check to allow 3-address frames when DA==RA of backhaul STA (adhering to Table 9-26 of IEEE Std 802.11™-2016). This case was tested with a bridged station interface when associated with a non-mac80211 based vendor AP implementation using 3-address frames for WDS. STA was able to support the Multi-AP Rx requirement when DA==RA. No issues, no loops seen when tested with mac80211 based AP as well. Verified and confirmed all other Tx and Rx requirements of AP and STA for Multi-AP respectively. They all work using the current mac80211-WDS design. Signed-off-by: Sathishkumar Muruganandam <murugana@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28cfg80211: Add support for 60GHz band channels 5 and 6Alexei Avshalom Lazar3-5/+5
The current support in the 60GHz band is for channels 1-4. Add support for channels 5 and 6. This requires enlarging ieee80211_channel.center_freq from u16 to u32. Signed-off-by: Alexei Avshalom Lazar <ailizaro@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28mac80211: add stop/start logic for software TXQsManikanta Pubbisetty4-7/+121
Sometimes, it is required to stop the transmissions momentarily and resume it later; stopping the txqs becomes very critical in scenarios where the packet transmission has to be ceased completely. For example, during the hardware restart, during off channel operations, when initiating CSA(upon detecting a radar on the DFS channel), etc. The TX queue stop/start logic in mac80211 works well in stopping the TX when drivers make use of netdev queues, i.e, when Qdiscs in network layer take care of traffic scheduling. Since the devices implementing wake_tx_queue can run without Qdiscs, packets will be handed to mac80211 directly without queueing them in the netdev queues. Also, mac80211 does not invoke any of the netif_stop_*/netif_wake_* APIs if wake_tx_queue is implemented. Since the queues are not stopped in this case, transmissions can continue and this will impact negatively on the operation of the wireless device. For example, During hardware restart, we stop the netdev queues so that packets are not sent to the driver. Since ath10k implements wake_tx_queue, TX queues will not be stopped and packets might reach the hardware while it is restarting; this can make hardware unresponsive and the only possible option for recovery is to reboot the entire system. There is another problem to this, it is observed that the packets were sent on the DFS channel for a prolonged duration after radar detection impacting the channel closing time. We can still invoke netif stop/wake APIs when wake_tx_queue is implemented but this could lead to packet drops in network layer; adding stop/start logic for software TXQs in mac80211 instead makes more sense; the change proposed adds the same in mac80211. Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28cfg80211: Avoid regulatory restore when COUNTRY_IE_IGNORE is setRajeev Kumar Sirasanagandla1-0/+46
When REGULATORY_COUNTRY_IE_IGNORE is set, __reg_process_hint_country_ie() ignores the country code change request from __cfg80211_connect_result() via regulatory_hint_country_ie(). After Disconnect, similar to above, country code should not be reset to world when country IE ignore is set. But this is violated and restore of regulatory settings is invoked by cfg80211_disconnect_work via regulatory_hint_disconnect(). To address this, avoid regulatory restore from regulatory_hint_disconnect() when COUNTRY_IE_IGNORE is set. Note: Currently, restore_regulatory_settings() takes care of clearing beacon hints. But in the proposed change, regulatory restore is avoided. Therefore, explicitly clear beacon hints when DISABLE_BEACON_HINTS is not set. Signed-off-by: Rajeev Kumar Sirasanagandla <rsirasan@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28cfg80211/mac80211: make ieee80211_send_layer2_update a public functionDedy Lansky2-46/+47
Make ieee80211_send_layer2_update() a common function so other drivers can re-use it. Signed-off-by: Dedy Lansky <dlansky@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-28rfkill: fix spelling mistake contidion to conditionRichard Guy Briggs1-2/+2
This came about while trying to determine if there would be any pattern match on contid, a new audit container identifier internal variable. This was the only one. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds30-265/+142
Pull networking fixes from David Miller: 1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks. 2) Better fix for AB-BA deadlock in packet scheduler code, from Cong Wang. 3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel Borkmann. 4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent attackers using it as a side-channel. From Eric Dumazet. 5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva. 6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin Liu. 7) hns and hns3 bug fixes from Huazhong Tan. 8) Fix RIF leak in mlxsw driver, from Ido Schimmel. 9) iova range check fix in vhost, from Jason Wang. 10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend. 11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng. 12) Memory exposure in TCA_U32_SEL handling, from Kees Cook. 13) TCP BBR congestion control fixes from Kevin Yang. 14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger. 15) qed driver fixes from Tomer Tayar. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits) net: sched: Fix memory exposure from short TCA_U32_SEL qed: fix spelling mistake "comparsion" -> "comparison" vhost: correctly check the iova range when waking virtqueue qlge: Fix netdev features configuration. net: macb: do not disable MDIO bus at open/close time Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency" net: macb: Fix regression breaking non-MDIO fixed-link PHYs mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge i40e: fix condition of WARN_ONCE for stat strings i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled ixgbe: fix driver behaviour after issuing VFLR ixgbe: Prevent unsupported configurations with XDP ixgbe: Replace GFP_ATOMIC with GFP_KERNEL igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback() igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init() igb: Use an advanced ctx descriptor for launchtime e1000: ensure to free old tx/rx rings in set_ringparam() e1000: check on netif_running() before calling e1000_up() ixgb: use dma_zalloc_coherent instead of allocator/memset ice: Trivial formatting fixes ...
2018-08-27net: sched: Fix memory exposure from short TCA_U32_SELKees Cook1-2/+8
Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink policy, so max length isn't enforced, only minimum. This means nkeys (from userspace) was being trusted without checking the actual size of nla_len(), which could lead to a memory over-read, and ultimately an exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within a namespace. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-26Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds1-10/+6
Pull IDA updates from Matthew Wilcox: "A better IDA API: id = ida_alloc(ida, GFP_xxx); ida_free(ida, id); rather than the cumbersome ida_simple_get(), ida_simple_remove(). The new IDA API is similar to ida_simple_get() but better named. The internal restructuring of the IDA code removes the bitmap preallocation nonsense. I hope the net -200 lines of code is convincing" * 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax: (29 commits) ida: Change ida_get_new_above to return the id ida: Remove old API test_ida: check_ida_destroy and check_ida_alloc test_ida: Convert check_ida_conv to new API test_ida: Move ida_check_max test_ida: Move ida_check_leaf idr-test: Convert ida_check_nomem to new API ida: Start new test_ida module target/iscsi: Allocate session IDs from an IDA iscsi target: fix session creation failure handling drm/vmwgfx: Convert to new IDA API dmaengine: Convert to new IDA API ppc: Convert vas ID allocation to new IDA API media: Convert entity ID allocation to new IDA API ppc: Convert mmu context allocation to new IDA API Convert net_namespace to new IDA API cb710: Convert to new IDA API rsxx: Convert to new IDA API osd: Convert to new IDA API sd: Convert to new IDA API ...
2018-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2-4/+9
Daniel Borkmann says: ==================== pull-request: bpf 2018-08-24 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix BPF sockmap and tls where we get a hang in do_tcp_sendpages() when sndbuf is full due to missing calls into underlying socket's sk_write_space(), from John. 2) Two BPF sockmap fixes to reject invalid parameters on map creation and to fix a map element miscount on allocation failure. Another fix for BPF hash tables to use per hash table salt for jhash(), from Daniel. 3) Fix for bpftool's command line parsing in order to terminate on bad arguments instead of keeping looping in some border cases, from Quentin. 4) Fix error value of xdp_umem_assign_dev() in order to comply with expected bind ops error codes, from Prashant. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-6/+6
Merge yet more updates from Andrew Morton: - the rest of MM - various misc fixes and tweaks * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) mm: Change return type int to vm_fault_t for fault handlers lib/fonts: convert comments to utf-8 s390: ebcdic: convert comments to UTF-8 treewide: convert ISO_8859-1 text comments to utf-8 drivers/gpu/drm/gma500/: change return type to vm_fault_t docs/core-api: mm-api: add section about GFP flags docs/mm: make GFP flags descriptions usable as kernel-doc docs/core-api: split memory management API to a separate file docs/core-api: move *{str,mem}dup* to "String Manipulation" docs/core-api: kill trailing whitespace in kernel-api.rst mm/util: add kernel-doc for kvfree mm/util: make strndup_user description a kernel-doc comment fs/proc/vmcore.c: hide vmcoredd_mmap_dumps() for nommu builds treewide: correct "differenciate" and "instanciate" typos fs/afs: use new return type vm_fault_t drivers/hwtracing/intel_th/msu.c: change return type to vm_fault_t mm: soft-offline: close the race against page allocation mm: fix race on soft-offlining free huge pages namei: allow restricted O_CREAT of FIFOs and regular files hfs: prevent crash on exit from failed search ...
2018-08-24treewide: convert ISO_8859-1 text comments to utf-8Arnd Bergmann2-6/+6
Almost all files in the kernel are either plain text or UTF-8 encoded. A couple however are ISO_8859-1, usually just a few characters in a C comments, for historic reasons. This converts them all to UTF-8 for consistency. Link: http://lkml.kernel.org/r/20180724111600.4158975-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Simon Horman <horms@verge.net.au> [IPVS portion] Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [IIO] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Rob Herring <robh@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rob Herring <robh+dt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-24Merge tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds19-49/+75
Pull NFS client updates from Anna Schumaker: "These patches include adding async support for the v4.2 COPY operation. I think Bruce is planning to send the server patches for the next release, but I figured we could get the client side out of the way now since it's been in my tree for a while. This shouldn't cause any problems, since the server will still respond with synchronous copies even if the client requests async. Features: - Add support for asynchronous server-side COPY operations Stable bufixes: - Fix an off-by-one in bl_map_stripe() (v3.17+) - NFSv4 client live hangs after live data migration recovery (v4.9+) - xprtrdma: Fix disconnect regression (v4.18+) - Fix locking in pnfs_generic_recover_commit_reqs (v4.14+) - Fix a sleep in atomic context in nfs4_callback_sequence() (v4.9+) Other bugfixes and cleanups: - Optimizations and fixes involving NFS v4.1 / pNFS layout handling - Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking - Immediately reschedule writeback when the server replies with an error - Fix excessive attribute revalidation in nfs_execute_ok() - Add error checking to nfs_idmap_prepare_message() - Use new vm_fault_t return type - Return a delegation when reclaiming one that the server has recalled - Referrals should inherit proto setting from parents - Make rpc_auth_create_args a const - Improvements to rpc_iostats tracking - Fix a potential reference leak when there is an error processing a callback - Fix rmdir / mkdir / rename nlink accounting - Fix updating inode change attribute - Fix error handling in nfsn4_sp4_select_mode() - Use an appropriate work queue for direct-write completion - Don't busy wait if NFSv4 session draining is interrupted" * tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (54 commits) pNFS: Remove unwanted optimisation of layoutget pNFS/flexfiles: ff_layout_pg_init_read should exit on error pNFS: Treat RECALLCONFLICT like DELAY... pNFS: When updating the stateid in layoutreturn, also update the recall range NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence() NFSv4: Fix locking in pnfs_generic_recover_commit_reqs NFSv4: Fix a typo in nfs4_init_channel_attrs() NFSv4: Don't busy wait if NFSv4 session draining is interrupted NFS recover from destination server reboot for copies NFS add a simple sync nfs4_proc_commit after async COPY NFS handle COPY ERR_OFFLOAD_NO_REQS NFS send OFFLOAD_CANCEL when COPY killed NFS export nfs4_async_handle_error NFS handle COPY reply CB_OFFLOAD call race NFS add support for asynchronous COPY NFS COPY xdr handle async reply NFS OFFLOAD_CANCEL xdr NFS CB_OFFLOAD xdr NFS: Use an appropriate work queue for direct-write completion NFSv4: Fix error handling in nfs4_sp4_select_mode() ...
2018-08-24Merge tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds10-111/+140
Pull nfsd updates from Bruce Fields: "Chuck Lever fixed a problem with NFSv4.0 callbacks over GSS from multi-homed servers. The only new feature is a minor bit of protocol (change_attr_type) which the client doesn't even use yet. Other than that, various bugfixes and cleanup" * tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux: (27 commits) sunrpc: Add comment defining gssd upcall API keywords nfsd: Remove callback_cred nfsd: Use correct credential for NFSv4.0 callback with GSS sunrpc: Extract target name into svc_cred sunrpc: Enable the kernel to specify the hostname part of service principals sunrpc: Don't use stack buffer with scatterlist rpc: remove unneeded variable 'ret' in rdma_listen_handler nfsd: use true and false for boolean values nfsd: constify write_op[] fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id NFSD: Handle full-length symlinks NFSD: Refactor the generic write vector fill helper svcrdma: Clean up Read chunk path svcrdma: Avoid releasing a page in svc_xprt_release() nfsd: Mark expected switch fall-through sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data' nfsd: fix leaked file lock with nfs exported overlayfs nfsd: don't advertise a SCSI layout for an unsupported request_queue nfsd: fix corrupted reply to badly ordered compound nfsd: clarify check_op_ordering ...
2018-08-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-23/+25
Pull more rdma updates from Jason Gunthorpe: "This is the SMC cleanup promised, a randconfig regression fix, and kernel oops fix. Summary: - Switch SMC over to rdma_get_gid_attr and remove the compat - Fix a crash in HFI1 with some BIOS's - Fix a randconfig failure" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/ucm: fix UCM link error IB/hfi1: Invalid NUMA node information can cause a divide by zero RDMA/smc: Replace ib_query_gid with rdma_get_gid_attr
2018-08-23net/ipv6: init ip6 anycast rt->dst.input as ip6_inputHangbin Liu1-1/+1
Commit 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path") forgot to handle anycast route and init anycast rt->dst.input to ip6_forward. Fix it by setting anycast rt->dst.input back to ip6_input. Fixes: 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23tcp_bbr: apply PROBE_RTT cwnd cap even if acked==0Kevin Yang1-2/+2
This commit fixes a corner case where TCP BBR would enter PROBE_RTT mode but not reduce its cwnd. If a TCP receiver ACKed less than one full segment, the number of delivered/acked packets was 0, so that bbr_set_cwnd() would short-circuit and exit early, without cutting cwnd to the value we want for PROBE_RTT. The fix is to instead make sure that even when 0 full packets are ACKed, we do apply all the appropriate caps, including the cap that applies in PROBE_RTT mode. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23tcp_bbr: in restart from idle, see if we should exit PROBE_RTTKevin Yang1-0/+4
This patch fix the case where BBR does not exit PROBE_RTT mode when it restarts from idle. When BBR restarts from idle and if BBR is in PROBE_RTT mode, BBR should check if it's time to exit PROBE_RTT. If yes, then BBR should exit PROBE_RTT mode and restore the cwnd to its full value. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23tcp_bbr: add bbr_check_probe_rtt_done() helperKevin Yang1-16/+18
This patch add a helper function bbr_check_probe_rtt_done() to 1. check the condition to see if bbr should exit probe_rtt mode; 2. process the logic of exiting probe_rtt mode. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23ipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT stateEric Dumazet1-0/+6
tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally to send some control packets. 1) RST packets, through tcp_v4_send_reset() 2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack() These packets assert IP_DF, and also use the hashed IP ident generator to provide an IPv4 ID number. Geoff Alexander reported this could be used to build off-path attacks. These packets should not be fragmented, since their size is smaller than IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment, regardless of inner IPID. We really can use zero IPID, to address the flaw, and as a bonus, avoid a couple of atomic operations in ip_idents_reserve() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Geoff Alexander <alexandg@cs.unm.edu> Tested-by: Geoff Alexander <alexandg@cs.unm.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23addrconf: reduce unnecessary atomic allocationsCong Wang1-3/+3
All the 3 callers of addrconf_add_mroute() assert RTNL lock, they don't take any additional lock either, so it is safe to convert it to GFP_KERNEL. Same for sit_add_v4_addrs(). Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23sch_cake: Fix TC filter flow override and expand it to hosts as wellToke Høiland-Jørgensen1-4/+19
The TC filter flow mapping override completely skipped the call to cake_hash(); however that meant that the internal state was not being updated, which ultimately leads to deadlocks in some configurations. Fix that by passing the overridden flow ID into cake_hash() instead so it can react appropriately. In addition, the major number of the class ID can now be set to override the host mapping in host isolation mode. If both host and flow are overridden (or if the respective modes are disabled), flow dissection and hashing will be skipped entirely; otherwise, the hashing will be kept for the portions that are not set by the filter. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23net/ncsi: Fixup .dumpit message flags and ID check in Netlink handlerSamuel Mendoza-Jonas1-2/+2
The ncsi_pkg_info_all_nl() .dumpit handler is missing the NLM_F_MULTI flag, causing additional package information after the first to be lost. Also fixup a sanity check in ncsi_write_package_info() to reject out of range package IDs. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-23sunrpc: Add comment defining gssd upcall API keywordsChuck Lever1-0/+17
During review, it was found that the target, service, and srchost keywords are easily conflated. Add an explainer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23sunrpc: Extract target name into svc_credChuck Lever1-25/+45
NFSv4.0 callback needs to know the GSS target name the client used when it established its lease. That information is available from the GSS context created by gssproxy. Make it available in each svc_cred. Note this will also give us access to the real target service principal name (which is typically "nfs", but spec does not require that). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23sunrpc: Enable the kernel to specify the hostname part of service principalsChuck Lever1-3/+17
A multi-homed NFS server may have more than one "nfs" key in its keytab. Enable the kernel to pick the key it wants as a machine credential when establishing a GSS context. This is useful for GSS-protected NFSv4.0 callbacks, which are required by RFC 7530 S3.3.3 to use the same principal as the service principal the client used when establishing its lease. A complementary modification to rpc.gssd is required to fully enable this feature. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-23sunrpc: Don't use stack buffer with scatterlistLaura Abbott1-3/+9
Fedora got a bug report from NFS: kernel BUG at include/linux/scatterlist.h:143! ... RIP: 0010:sg_init_one+0x7d/0x90 .. make_checksum+0x4e7/0x760 [rpcsec_gss_krb5] gss_get_mic_kerberos+0x26e/0x310 [rpcsec_gss_krb5] gss_marshal+0x126/0x1a0 [auth_rpcgss] ? __local_bh_enable_ip+0x80/0xe0 ? call_transmit_status+0x1d0/0x1d0 [sunrpc] call_transmit+0x137/0x230 [sunrpc] __rpc_execute+0x9b/0x490 [sunrpc] rpc_run_task+0x119/0x150 [sunrpc] nfs4_run_exchange_id+0x1bd/0x250 [nfsv4] _nfs4_proc_exchange_id+0x2d/0x490 [nfsv4] nfs41_discover_server_trunking+0x1c/0xa0 [nfsv4] nfs4_discover_server_trunking+0x80/0x270 [nfsv4] nfs4_init_client+0x16e/0x240 [nfsv4] ? nfs_get_client+0x4c9/0x5d0 [nfs] ? _raw_spin_unlock+0x24/0x30 ? nfs_get_client+0x4c9/0x5d0 [nfs] nfs4_set_client+0xb2/0x100 [nfsv4] nfs4_create_server+0xff/0x290 [nfsv4] nfs4_remote_mount+0x28/0x50 [nfsv4] mount_fs+0x3b/0x16a vfs_kern_mount.part.35+0x54/0x160 nfs_do_root_mount+0x7f/0xc0 [nfsv4] nfs4_try_mount+0x43/0x70 [nfsv4] ? get_nfs_version+0x21/0x80 [nfs] nfs_fs_mount+0x789/0xbf0 [nfs] ? pcpu_alloc+0x6ca/0x7e0 ? nfs_clone_super+0x70/0x70 [nfs] ? nfs_parse_mount_options+0xb40/0xb40 [nfs] mount_fs+0x3b/0x16a vfs_kern_mount.part.35+0x54/0x160 do_mount+0x1fd/0xd50 ksys_mount+0xba/0xd0 __x64_sys_mount+0x21/0x30 do_syscall_64+0x60/0x1f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe This is BUG_ON(!virt_addr_valid(buf)) triggered by using a stack allocated buffer with a scatterlist. Convert the buffer for rc4salt to be dynamically allocated instead. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258 Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-22tls: possible hang when do_tcp_sendpages hits sndbuf is full caseJohn Fastabend1-2/+7
Currently, the lower protocols sk_write_space handler is not called if TLS is sending a scatterlist via tls_push_sg. However, normally tls_push_sg calls do_tcp_sendpage, which may be under memory pressure, that in turn may trigger a wait via sk_wait_event. Typically, this happens when the in-flight bytes exceed the sdnbuf size. In the normal case when enough ACKs are received sk_write_space() will be called and the sk_wait_event will be woken up allowing it to send more data and/or return to the user. But, in the TLS case because the sk_write_space() handler does not wake up the events the above send will wait until the sndtimeo is exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a hang to the user (especially this impatient user). To fix this pass the sk_write_space event to the lower layers sk_write_space event which in the TCP case will wake any pending events. I observed the above while integrating sockmap and ktls. It initially appeared as test_sockmap (modified to use ktls) occasionally hanging. To reliably reproduce this reduce the sndbuf size and stress the tls layer by sending many 1B sends. This results in every byte needing a header and each byte individually being sent to the crypto layer. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-22Convert net_namespace to new IDA APIMatthew Wilcox1-10/+6
Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-08-21xsk: fix return value of xdp_umem_assign_dev()Prashant Bhole1-2/+2
s/ENOTSUPP/EOPNOTSUPP/ in function umem_assign_dev(). This function's return value is directly returned by xsk_bind(). EOPNOTSUPP is bind()'s possible return value. Fixes: f734607e819b ("xsk: refactor xdp_umem_assign_dev()") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-21act_ife: fix a potential deadlockCong Wang1-13/+21
use_all_metadata() acquires read_lock(&ife_mod_lock), then calls add_metainfo() which calls find_ife_oplist() which acquires the same lock again. Deadlock! Introduce __add_metainfo() which accepts struct tcf_meta_ops *ops as an additional parameter and let its callers to decide how to find it. For use_all_metadata(), it already has ops, no need to find it again, just call __add_metainfo() directly. And, as ife_mod_lock is only needed for find_ife_oplist(), this means we can make non-atomic allocation for populate_metalist() now. Fixes: 817e9f2c5c26 ("act_ife: acquire ife_mod_lock before reading ifeoplist") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-21act_ife: move tcfa_lock down to where necessaryCong Wang1-25/+13
The only time we need to take tcfa_lock is when adding a new metainfo to an existing ife->metalist. We don't need to take tcfa_lock so early and so broadly in tcf_ife_init(). This means we can always take ife_mod_lock first, avoid the reverse locking ordering warning as reported by Vlad. Reported-by: Vlad Buslov <vladbu@mellanox.com> Tested-by: Vlad Buslov <vladbu@mellanox.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-21Revert "net: sched: act_ife: disable bh when taking ife_mod_lock"Cong Wang1-10/+10
This reverts commit 42c625a486f3 ("net: sched: act_ife: disable bh when taking ife_mod_lock"), because what ife_mod_lock protects is absolutely not touched in rate est timer BH context, they have no race. A better fix is following up. Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-21net_sched: remove list_head from tc_actionCong Wang2-4/+1
After commit 90b73b77d08e, list_head is no longer needed. Now we just need to convert the list iteration to array iteration for drivers. Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions") Cc: Jiri Pirko <jiri@mellanox.com> Cc: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>