summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2018-10-11net/ipv6: Make ipv6_route_table_template staticDavid Ahern1-2/+0
ipv6_route_table_template is exported but there are no users outside of route.c. Make it static. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller7-18/+230
Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-10-08 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow BPF programs to perform lookups for sockets in a network namespace. This would allow programs to determine early on in processing whether the stack is expecting to receive the packet, and perform some action (eg drop, forward somewhere) based on this information. 2) per-cpu cgroup local storage from Roman Gushchin. Per-cpu cgroup local storage is very similar to simple cgroup storage except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations in a fast path. The example of these hybrid counters is in selftests/bpf/netcnt_prog.c 3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet 4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski 5) rename of libbpf interfaces from Andrey Ignatov. libbpf is maturing as a library and should follow good practices in library design and implementation to play well with other libraries. This patch set brings consistent naming convention to global symbols. 6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov to let Apache2 projects use libbpf 7) various AF_XDP fixes from Björn and Magnus ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller10-51/+103
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Support for matching on ipsec policy already set in the route, from Florian Westphal. 2) Split set destruction into deactivate and destroy phase to make it fit better into the transaction infrastructure, also from Florian. This includes a patch to warn on imbalance when setting the new activate and deactivate interfaces. 3) Release transaction list from the workqueue to remove expensive synchronize_rcu() from configuration plane path. This speeds up configuration plane quite a bit. From Florian Westphal. 4) Add new xfrm/ipsec extension, this new extension allows you to match for ipsec tunnel keys such as source and destination address, spi and reqid. From Máté Eckl and Florian Westphal. 5) Add secmark support, this includes connsecmark too, patches from Christian Gottsche. 6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng. One follow up patch to calm a clang warning for this one, from Nathan Chancellor. 7) Flush conntrack entries based on layer 3 family, from Kristian Evensen. 8) New revision for cgroups2 to shrink the path field. 9) Get rid of obsolete need_conntrack(), as a result from recent demodularization works. 10) Use WARN_ON instead of BUG_ON, from Florian Westphal. 11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian. 12) Remove superfluous check for timeout netlink parser and dump functions in layer 4 conntrack helpers. 13) Unnecessary redundant rcu read side locks in NAT redirect, from Taehee Yoo. 14) Pass nf_hook_state structure to error handlers, patch from Florian Westphal. 15) Remove ->new() interface from layer 4 protocol trackers. Place them in the ->packet() interface. From Florian. 16) Place conntrack ->error() handling in the ->packet() interface. Patches from Florian Westphal. 17) Remove unused parameter in the pernet initialization path, also from Florian. 18) Remove additional parameter to specify layer 3 protocol when looking up for protocol tracker. From Florian. 19) Shrink array of layer 4 protocol trackers, from Florian. 20) Check for linear skb only once from the ALG NAT mangling codebase, from Taehee Yoo. 21) Use rhashtable_walk_enter() instead of deprecated rhashtable_walk_init(), also from Taehee. 22) No need to flush all conntracks when only one single address is gone, from Tan Hu. 23) Remove redundant check for NAT flags in flowtable code, from Taehee Yoo. 24) Use rhashtable_lookup() instead of rhashtable_lookup_fast() from netfilter codebase, since rcu read lock side is already assumed in this path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rtnetlink: Update fib dumps for strict data checkingDavid Ahern1-0/+2
Add helper to check netlink message for route dumps. If the strict flag is set the dump request is expected to have an rtmsg struct as the header. All elements of the struct are expected to be 0 with the exception of rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX set. Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute, and ip6mr_rtm_dumproute to call this helper if strict data checking is enabled. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08netlink: Add new socket option to enable strict checking on dumpsDavid Ahern2-0/+2
Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace can use via setsockopt to request strict checking of headers and attributes on dump requests. To get dump features such as kernel side filtering based on data in the header or attributes appended to the dump request, userspace must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero value. Since the netlink sock and its flags are private to the af_netlink code, the strict checking flag is passed to dump handlers via a flag in the netlink_callback struct. For old userspace on new kernel there is no impact as all of the data checks in later patches are wrapped in a check on the new strict flag. For new userspace on old kernel, the setsockopt will fail and even if new userspace sets data in the headers and appended attributes the kernel will silently ignore it. Moving forward when the setsockopt succeeds, the new userspace on old kernel means the dump request can pass an attribute the kernel does not understand. The dump will then fail as the older kernel does not understand it. New userspace on new kernel setting the socket option gets the benefit of the improved data dump. Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter checking on the NEW, DEL, and SET commands. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08netlink: Add strict version of nlmsg_parse and nla_parseDavid Ahern1-0/+17
nla_parse is currently lenient on message parsing, allowing type to be 0 or greater than max expected and only logging a message "netlink: %d bytes leftover after parsing attributes in process `%s'." if the netlink message has unknown data at the end after parsing. What this could mean is that the header at the front of the attributes is actually wrong and the parsing is shifted from what is expected. Add a new strict version that actually fails with EINVAL if there are any bytes remaining after the parsing loop completes, if the atttrbitue type is 0 or greater than max expected. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08netlink: Add extack message to nlmsg_parse for invalid header lengthDavid Ahern1-1/+3
Give a user a reason why EINVAL is returned in nlmsg_parse. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08netlink: Pass extack to dump handlersDavid Ahern1-0/+1
Declare extack in netlink_dump and pass to dump handlers via netlink_callback. Add any extack message after the dump_done_errno allowing error messages to be returned. This will be useful when strict checking is done on dump requests, returning why the dump fails EINVAL. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08dt-bindings: net: vsc8531: add two additional LED modes for VSC8584Quentin Schulz1-0/+2
The VSC8584 (and most likely other PHYs in the same generation) has two additional LED modes that can be picked, so let's add them. Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08bpf: add verifier callback to get stack usage info for offloaded progsQuentin Monnet2-0/+2
In preparation for BPF-to-BPF calls in offloaded programs, add a new function attribute to the struct bpf_prog_offload_ops so that drivers supporting eBPF offload can hook at the end of program verification, and potentially extract information collected by the verifier. Implement a minimal callback (returning 0) in the drivers providing the structs, namely netdevsim and nfp. This will be useful in the nfp driver, in later commits, to extract the number of subprograms as well as the stack depth for those subprograms. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller11-49/+52
2018-10-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman1-0/+18
Dave writes: "Networking fixes: 1) Fix truncation of 32-bit right shift in bpf, from Jann Horn. 2) Fix memory leak in wireless wext compat, from Stefan Seyfried. 3) Use after free in cfg80211's reg_process_hint(), from Yu Zhao. 4) Need to cancel pending work when unbinding in smsc75xx otherwise we oops, also from Yu Zhao. 5) Don't allow enslaving a team device to itself, from Ido Schimmel. 6) Fix backwards compat with older userspace for rtnetlink FDB dumps. From Mauricio Faria. 7) Add validation of tc policy netlink attributes, from David Ahern. 8) Fix RCU locking in rawv6_send_hdrinc(), from Wei Wang." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) net: mvpp2: Extract the correct ethtype from the skb for tx csum offload ipv6: take rcu lock in rawv6_send_hdrinc() net: sched: Add policy validation for tc attributes rtnetlink: fix rtnl_fdb_dump() for ndmsg header yam: fix a missing-check bug net: bpfilter: Fix type cast and pointer warnings net: cxgb3_main: fix a missing-check bug bpf: 32-bit RSH verification must truncate input before the ALU op net: phy: phylink: fix SFP interface autodetection be2net: don't flip hw_features when VXLANs are added/deleted net/packet: fix packet drop as of virtio gso net: dsa: b53: Keep CPU port as tagged in all VLANs openvswitch: load NAT helper bnxt_en: get the reduced max_irqs by the ones used by RDMA bnxt_en: free hwrm resources, if driver probe fails. bnxt_en: Fix enables field in HWRM_QUEUE_COS2BW_CFG request bnxt_en: Fix VNIC reservations on the PF. team: Forbid enslaving team device to itself net/usb: cancel pending work when unbinding smsc75xx mlxsw: spectrum: Delete RIF when VLAN device is removed ...
2018-10-06Merge branch 'akpm'Greg Kroah-Hartman6-0/+28
* akpm: mm: madvise(MADV_DODUMP): allow hugetlbfs pages ocfs2: fix locking for res->tracking and dlm->tracking_list mm/vmscan.c: fix int overflow in callers of do_shrink_slab() mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly mm/vmstat.c: fix outdated vmstat_text proc: restrict kernel stack dumps to root mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes mm/migrate.c: split only transparent huge pages when allocation fails ipc/shm.c: use ERR_CAST() for shm_lock() error return mm/gup_benchmark: fix unsigned comparison to zero in __gup_benchmark_ioctl mm, thp: fix mlocking THP page with migration enabled ocfs2: fix crash in ocfs2_duplicate_clusters_by_page() hugetlb: take PMD sharing into account when flushing tlb/caches mm: migration: fix migration of huge PMD shared pages
2018-10-06mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizesAnshuman Khandual4-0/+8
ARM64 architecture also supports 32MB and 512MB HugeTLB page sizes. This just adds mmap() system call argument encoding for them. Link: http://lkml.kernel.org/r/1537841300-6979-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-06mm: migration: fix migration of huge PMD shared pagesMike Kravetz2-0/+20
The page migration code employs try_to_unmap() to try and unmap the source page. This is accomplished by using rmap_walk to find all vmas where the page is mapped. This search stops when page mapcount is zero. For shared PMD huge pages, the page map count is always 1 no matter the number of mappings. Shared mappings are tracked via the reference count of the PMD page. Therefore, try_to_unmap stops prematurely and does not completely unmap all mappings of the source page. This problem can result is data corruption as writes to the original source page can happen after contents of the page are copied to the target page. Hence, data is lost. This problem was originally seen as DB corruption of shared global areas after a huge page was soft offlined due to ECC memory errors. DB developers noticed they could reproduce the issue by (hotplug) offlining memory used to back huge pages. A simple testcase can reproduce the problem by creating a shared PMD mapping (note that this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using migrate_pages() to migrate process pages between nodes while continually writing to the huge pages being migrated. To fix, have the try_to_unmap_one routine check for huge PMD sharing by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared mapping it will be 'unshared' which removes the page table entry and drops the reference on the PMD page. After this, flush caches and TLB. mmu notifiers are called before locking page tables, but we can not be sure of PMD sharing until page tables are locked. Therefore, check for the possibility of PMD sharing before locking so that notifiers can prepare for the worst possible case. Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com [mike.kravetz@oracle.com: make _range_in_vma() a static inline] Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-06Merge branch 'sched-urgent-for-linus' of ↵Greg Kroah-Hartman2-33/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "scheduler fixes: These fixes address a rather involved performance regression between v4.17->v4.19 in the sched/numa auto-balancing code. Since distros really need this fix we accelerated it to sched/urgent for a faster upstream merge. NUMA scheduling and balancing performance is now largely back to v4.17 levels, without reintroducing the NUMA placement bugs that v4.18 and v4.19 fixed. Many thanks to Srikar Dronamraju, Mel Gorman and Jirka Hladky, for reporting, testing, re-testing and solving this rather complex set of bugs." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/numa: Migrate pages to local nodes quicker early in the lifetime of a task mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration sched/numa: Avoid task migration for small NUMA improvement mm/migrate: Use spin_trylock() while resetting rate limit sched/numa: Limit the conditions where scan period is reset sched/numa: Reset scan rate whenever task moves across nodes sched/numa: Pass destination CPU as a parameter to migrate_task_rq sched/numa: Stop multiple tasks from moving to the CPU at the same time
2018-10-06dt-bindings: add constants for Microsemi Ocelot SerDes driverQuentin Schulz1-0/+12
The Microsemi Ocelot has multiple SerDes and requires that the SerDes be muxed accordingly to the hardware representation. Let's add a constant for each SerDes available in the Microsemi Ocelot. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-06phy: add QSGMII and PCIE modesQuentin Schulz1-0/+2
Prepare for upcoming phys that'll handle QSGMII or PCIe. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-06net: mscc: ocelot: simplify register access for PLL5 configurationQuentin Schulz1-0/+74
Since HSIO address space can be accessed by different drivers, let's simplify the register address definitions so that it can be easily used by all drivers and put the register address definition in the include/soc/mscc/ocelot_hsio.h header file. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-06net: mscc: ocelot: move the HSIO header to include/socQuentin Schulz1-0/+785
Since HSIO address space can be used by different drivers (PLL, SerDes muxing, temperature sensor), let's move it somewhere it can be included by all drivers. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05udp: gro behind static keyWillem de Bruijn1-0/+2
Avoid the socket lookup cost in udp_gro_receive if no socket has a udp tunnel callback configured. udp_sk(sk)->gro_receive requires a registration with setup_udp_tunnel_sock, which enables the static key. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05net_sched: convert idrinfo->lock from spinlock to a mutexCong Wang1-2/+2
In commit ec3ed293e766 ("net_sched: change tcf_del_walker() to take idrinfo->lock") we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking with the spinlock held. Unfortunately, this causes a lot of troubles here: 1. tcf_chain_destroy() could be called right after we queue the work but before the work runs. This is a use-after-free. 2. The chain refcnt is already 0, we can't even just hold it again. We can check refcnt==1 but it is ugly. 3. The chain with refcnt 0 is still visible in its block, which means it could be still found and used! 4. The block has a refcnt too, we can't hold it without introducing a proper API either. We can make it working but the end result is ugly. Instead of wasting time on reviewing it, let's just convert the troubling spinlock to a mutex, which allows us to use non-atomic allocations too. Fixes: ec3ed293e766 ("net_sched: change tcf_del_walker() to take idrinfo->lock") Reported-by: Ido Schimmel <idosch@idosch.org> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05ethtool: don't allow disabling queues with umem installedJakub Kicinski1-0/+7
We already check the RSS indirection table does not use queues which would be disabled by channel reconfiguration. Make sure user does not try to disable queues which have a UMEM and zero-copy AF_XDP socket installed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-05net: add umem reference in netdev{_rx}_queueMagnus Karlsson1-0/+6
These references to the umem will be used to store information on what kind of AF_XDP umem that is bound to a queue id, if any. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-05net/packet: fix packet drop as of virtio gsoJianfeng Tan1-0/+18
When we use raw socket as the vhost backend, a packet from virito with gso offloading information, cannot be sent out in later validaton at xmit path, as we did not set correct skb->protocol which is further used for looking up the gso function. To fix this, we set this field according to virito hdr information. Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Jianfeng Tan <jianfeng.tan@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05net: Move free of dst_metrics to helperDavid Ahern1-0/+9
Move the refcounting and potential free of dst metrics associated for ipv4 and ipv6 to a common helper. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05net: common metrics init helper for dst_entryDavid Ahern1-0/+12
ipv4 and ipv6 both use refcounted metrics if FIB entries have metrics set. Move the common initialization code to a helper and use for both protocols. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05net: Move free of fib_metrics to helperDavid Ahern1-0/+6
Move the refcounting and potential free of dst metrics associated with a fib entry to a helper and use it in both ipv4 and ipv6. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05net: common metrics init helper for FIB entriesDavid Ahern1-2/+2
Consolidate initialization of ipv4 and ipv6 metrics when fib entries are created into a single helper, ip_fib_metrics_init, that handles the call to ip_metrics_convert. If no metrics are defined for the fib entry, then the metrics is set to dst_default_metrics. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05net: sched: remove unused helpersJakub Kicinski1-10/+0
tcf_block_dev() doesn't seem to be used anywhere in the tree. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05Merge tag 'drm-fixes-2018-10-05' of git://anongit.freedesktop.org/drm/drmGreg Kroah-Hartman1-2/+3
Dave writes: "amdgpu and two core fixes Two fixes for amdgpu: one corrects a use of process->mm one fix for display code race condition that can result in a crash Two core fixes: One for a use-after-free in the leasing code One for a cma/fbdev crash." * tag 'drm-fixes-2018-10-05' of git://anongit.freedesktop.org/drm/drm: drm/amdkfd: Fix incorrect use of process->mm drm/amd/display: Signal hw_done() after waiting for flip_done() drm/cma-helper: Fix crash in fbdev error path drm: fix use-after-free read in drm_mode_create_lease_ioctl()
2018-10-04tc: Add support for configuring the taprio schedulerVinicius Costa Gomes1-0/+46
This traffic scheduler allows traffic classes states (transmission allowed/not allowed, in the simplest case) to be scheduled, according to a pre-generated time sequence. This is the basis of the IEEE 802.1Qbv specification. Example configuration: tc qdisc replace dev enp3s0 parent root handle 100 taprio \ num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 2@2 \ base-time 1528743495910289987 \ sched-entry S 01 300000 \ sched-entry S 02 300000 \ sched-entry S 04 300000 \ clockid CLOCK_TAI The configuration format is similar to mqprio. The main difference is the presence of a schedule, built by multiple "sched-entry" definitions, each entry has the following format: sched-entry <CMD> <GATE MASK> <INTERVAL> The only supported <CMD> is "S", which means "SetGateStates", following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK> is a bitmask where each bit is a associated with a traffic class, so bit 0 (the least significant bit) being "on" means that traffic class 0 is "active" for that schedule entry. <INTERVAL> is a time duration in nanoseconds that specifies for how long that state defined by <CMD> and <GATE MASK> should be held before moving to the next entry. This schedule is circular, that is, after the last entry is executed it starts from the first one, indefinitely. The other parameters can be defined as follows: - base-time: specifies the instant when the schedule starts, if 'base-time' is a time in the past, the schedule will start at base-time + (N * cycle-time) where N is the smallest integer so the resulting time is greater than "now", and "cycle-time" is the sum of all the intervals of the entries in the schedule; - clockid: specifies the reference clock to be used; The parameters should be similar to what the IEEE 802.1Q family of specification defines. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04devlink: Add generic parameter msix_vec_per_pf_minVasundhara Volam1-0/+4
msix_vec_per_pf_min - This param sets the number of minimal MSIX vectors required for the device initialization. This value is set in the device which limits MSIX vectors per PF. Cc: Jiri Pirko <jiri@mellanox.com> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04devlink: Add generic parameter msix_vec_per_pf_maxVasundhara Volam1-0/+4
msix_vec_per_pf_max - This param sets the number of MSIX vectors that the device requests from the host on driver initialization. This value is set in the device which is applicable per PF. Cc: Jiri Pirko <jiri@mellanox.com> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04devlink: Add generic parameter ignore_ariVasundhara Volam1-0/+4
ignore_ari - Device ignores ARI(Alternate Routing ID) capability, even when platforms has the support and creates same number of partitions when platform does not support ARI capability. Cc: Jiri Pirko <jiri@mellanox.com> Cc: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04Merge tag 'ovl-fixes-4.19-rc7' of ↵Greg Kroah-Hartman1-14/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Miklos writes: "overlayfs fixes for 4.19-rc7 This update fixes a couple of regressions in the stacked file update added in this cycle, as well as some older bugs uncovered by syzkaller. There's also one trivial naming change that touches other parts of the fs subsystem." * tag 'ovl-fixes-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix format of setxattr debug ovl: fix access beyond unterminated strings ovl: make symbol 'ovl_aops' static vfs: swap names of {do,vfs}_clone_file_range() ovl: fix freeze protection bypass in ovl_clone_file_range() ovl: fix freeze protection bypass in ovl_write_iter() ovl: fix memory leak on unlink of indexed file
2018-10-04Merge tag 'mlx5-updates-2018-10-03' of ↵David S. Miller3-5/+14
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2018-10-03 mlx5 core driver and ethernet netdev updates, please note there is a small devlink releated update to allow extack argument to eswitch operations. From Eli Britstein, 1) devlink: Add extack argument to the eswitch related operations 2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks 3) net/mlx5e: Add extack messages for TC offload failures From Eran Ben Elisha, 4) mlx5e: Add counter for aRFS rule insertion failures From Feras Daoud 5) Fast teardown support for mlx5 device This change introduces the enhanced version of the "Force teardown" that allows SW to perform teardown in a faster way without the need to reclaim all the FW pages. Fast teardown provides the following advantages: 1- Fix a FW race condition that could cause command timeout 2- Avoid moving to polling mode 3- Close the vport to prevent PCI ACK to be sent without been scatter to memory ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04Merge tag 'rxrpc-next-20181004' of ↵David S. Miller1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Development Here are some development patches for AF_RXRPC. The most significant points are: (1) Change the tracepoint that indicates a packet has been transmitted into one that indicates a packet is about to be transmitted. Without this, the response tracepoint may occur first if the round trip is fast enough. (2) Sort out AFS address list handling to better enforce maximum capacity to use helper functions to fill them and to do an insertion sort to order them. This is here to make (3) easier. (3) Keep AF_INET addresses as AF_INET addresses rather than converting them to AF_INET6 in both AF_RXRPC and kAFS. I hadn't realised that a UDP6 socket would just call down into UDP4 if given an AF_INET address. (4) Allow the timestamp on the first DATA packet of a reply to be retrieved by a kernel service. This will give the kAFS a more accurate base from which to calculate the callback promise expiration. (5) Allow the rxrpc protocol epoch value to be retrieved from an incoming call. This will allow kAFS to determine if the fileserver restarted and if two addresses apparently assigned to the same fileserver actually are different boxes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04dns: Allow the dns resolver to retrieve a server setDavid Howells2-3/+117
Allow the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. In terms of communication with userspace, "srv=1" is added to the callout string (the '1' indicating the maximum data version supported by the kernel) to ask the userspace side for this. If the userspace side doesn't recognise it, it will ignore the option and return the usual text address list. If the userspace side does recognise it, it will return some binary data that begins with a zero byte that would cause the string parsers to give an error. The second byte contains the version of the data in the blob (this may be between 1 and the version specified in the callout data). The remainder of the payload is version-specific. In version 1, the payload looks like (note that this is packed): u8 Non-string marker (ie. 0) u8 Content (0 => Server list) u8 Version (ie. 1) u8 Source (eg. DNS_RECORD_FROM_DNS_SRV) u8 Status (eg. DNS_LOOKUP_GOOD) u8 Number of servers foreach-server { u16 Name length (LE) u16 Priority (as per SRV record) (LE) u16 Weight (as per SRV record) (LE) u16 Port (LE) u8 Source (eg. DNS_RECORD_FROM_NSS) u8 Status (eg. DNS_LOOKUP_GOT_NOT_FOUND) u8 Protocol (eg. DNS_SERVER_PROTOCOL_UDP) u8 Number of addresses char[] Name (not NUL-terminated) foreach-address { u8 Family (AF_INET{,6}) union { u8[4] ipv4_addr u8[16] ipv6_addr } } } This can then be used to fetch a whole cell's VL-server configuration for AFS, for example. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04rxrpc: Allow the reply time to be obtained on a client callDavid Howells1-0/+1
Allow the epoch value to be queried on a server connection. This is in the rxrpc header of every packet for use in routing and is derived from the client's state. It's also not supposed to change unless the client gets restarted. AFS can make use of this information to deduce whether a fileserver has been restarted because the fileserver makes client calls to the filesystem driver's cache manager to send notifications (ie. callback breaks) about conflicting changes from other clients. These convey the fileserver's own epoch value back to the filesystem. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04rxrpc: Allow the reply time to be obtained on a client callDavid Howells1-0/+3
Allow the timestamp on the sk_buff holding the first DATA packet of a reply to be queried. This can then be used as a base for the expiry time calculation on the callback promise duration indicated by an operation result. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller14-29/+52
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net' overlapped the renaming of a netlink attribute in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04net/mlx5: Add Fast teardown supportFeras Daoud2-2/+8
Today mlx5 devices support two teardown modes: 1- Regular teardown 2- Force teardown This change introduces the enhanced version of the "Force teardown" that allows SW to perform teardown in a faster way without the need to reclaim all the pages. Fast teardown provides the following advantages: 1- Fix a FW race condition that could cause command timeout 2- Avoid moving to polling mode 3- Close the vport to prevent PCI ACK to be sent without been scatter to memory Signed-off-by: Feras Daoud <ferasda@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-04devlink: Add extack for eswitch operationsEli Britstein1-3/+6
Add extack argument to the eswitch related operations. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-04Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman8-18/+10
David writes: "Networking fixes: 1) Prefix length validation in xfrm layer, from Steffen Klassert. 2) TX status reporting fix in mac80211, from Andrei Otcheretianski. 3) Fix hangs due to TX_DROP in mac80211, from Bob Copeland. 4) Fix DMA error regression in b43, from Larry Finger. 5) Add input validation to xenvif_set_hash_mapping(), from Jan Beulich. 6) SMMU unmapping fix in hns driver, from Yunsheng Lin. 7) Bluetooh crash in unpairing on SMP, from Matias Karhumaa. 8) WoL handling fixes in the phy layer, from Heiner Kallweit. 9) Fix deadlock in bonding, from Mahesh Bandewar. 10) Fill ttl inherit infor in vxlan driver, from Hangbin Liu. 11) Fix TX timeouts during netpoll, from Michael Chan. 12) RXRPC layer fixes from David Howells. 13) Another batch of ndo_poll_controller() removals to deal with excessive resource consumption during load. From Eric Dumazet. 14) Fix a specific TIPC failure secnario, from LUU Duc Canh. 15) Really disable clocks in r8169 during suspend so that low power states can actually be reached. 16) Fix SYN backlog lockdep issue in tcp and dccp, from Eric Dumazet. 17) Fix RCU locking in netpoll SKB send, which shows up in bonding, from Dave Jones. 18) Fix TX stalls in r8169, from Heiner Kallweit. 19) Fix locksup in nfp due to control message storms, from Jakub Kicinski. 20) Various rmnet bug fixes from Subash Abhinov Kasiviswanathan and Sean Tranchetti. 21) Fix use after free in ip_cmsg_recv_dstaddr(), from Eric Dumazet." * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (122 commits) ixgbe: check return value of napi_complete_done() sctp: fix fall-through annotation r8169: always autoneg on resume ipv4: fix use-after-free in ip_cmsg_recv_dstaddr() net: qualcomm: rmnet: Fix incorrect allocation flag in receive path net: qualcomm: rmnet: Fix incorrect allocation flag in transmit net: qualcomm: rmnet: Skip processing loopback packets net: systemport: Fix wake-up interrupt race during resume rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096 bonding: fix warning message inet: make sure to grab rcu_read_lock before using ireq->ireq_opt nfp: avoid soft lockups under control message storm declance: Fix continuation with the adapter identification message net: fec: fix rare tx timeout r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO tun: napi flags belong to tfile tun: initialize napi_mutex unconditionally tun: remove unused parameters bond: take rcu lock in netpoll_send_skb_on_dev rtnetlink: Fail dump if target netnsid is invalid ...
2018-10-03virtchnl: Added support to exchange additional speed valuesYashaswini Raghuram Prathivadi Bhayankaram1-0/+15
Introduced a new virtchnl capability flag and a struct to support exchange of additional supported speeds. Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-03Merge tag 'media/v4.19-3' of ↵Greg Kroah-Hartman1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Mauro writes: "media fixes for v4.19-rc6" * tag 'media/v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: v4l: event: Prevent freeing event subscriptions while accessed
2018-10-03media: v4l: event: Prevent freeing event subscriptions while accessedSakari Ailus1-0/+4
The event subscriptions are added to the subscribed event list while holding a spinlock, but that lock is subsequently released while still accessing the subscription object. This makes it possible to unsubscribe the event --- and freeing the subscription object's memory --- while the subscription object is simultaneously accessed. Prevent this by adding a mutex to serialise the event subscription and unsubscription. This also gives a guarantee to the callback ops that the add op has returned before the del op is called. This change also results in making the elems field less special: subscriptions are only added to the event list once they are fully initialised. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: stable@vger.kernel.org # for 4.14 and up Fixes: c3b5b0241f62 ("V4L/DVB: V4L: Events: Add backend") Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2018-10-03netfilter: xt_quota: fix the behavior of xt_quota moduleChenbo Feng1-3/+5
A major flaw of the current xt_quota module is that quota in a specific rule gets reset every time there is a rule change in the same table. It makes the xt_quota module not very useful in a table in which iptables rules are changed at run time. This fix introduces a new counter that is visible to userspace as the remaining quota of the current rule. When userspace restores the rules in a table, it can restore the counter to the remaining quota instead of resetting it to the full quota. Signed-off-by: Chenbo Feng <fengc@google.com> Suggested-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-03tls: Add support for inplace records encryptionVakul Garg1-0/+1
Presently, for non-zero copy case, separate pages are allocated for storing plaintext and encrypted text of records. These pages are stored in sg_plaintext_data and sg_encrypted_data scatterlists inside record structure. Further, sg_plaintext_data & sg_encrypted_data are passed to cryptoapis for record encryption. Allocating separate pages for plaintext and encrypted text is inefficient from both required memory and performance point of view. This patch adds support of inplace encryption of records. For non-zero copy case, we reuse the pages from sg_encrypted_data scatterlist to copy the application's plaintext data. For the movement of pages from sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new function move_to_plaintext_sg(). This function add pages into sg_plaintext_data from sg_encrypted_data scatterlists. tls_do_encryption() is modified to pass the same scatterlist as both source and destination into aead_request_set_crypt() if inplace crypto has been enabled. A new ariable 'inplace_crypto' has been introduced in record structure to signify whether the same scatterlist can be used. By default, the inplace_crypto is enabled in get_rec(). If zero-copy is used (i.e. plaintext data is not copied), inplace_crypto is set to '0'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Reviewed-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>