summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)AuthorFilesLines
2017-09-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-3/+90
Merge updates from Andrew Morton: - various misc bits - DAX updates - OCFS2 - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits) mm,fork: introduce MADV_WIPEONFORK x86,mpx: make mpx depend on x86-64 to free up VMA flag mm: add /proc/pid/smaps_rollup mm: hugetlb: clear target sub-page last when clearing huge page mm: oom: let oom_reap_task and exit_mmap run concurrently swap: choose swap device according to numa node mm: replace TIF_MEMDIE checks by tsk_is_oom_victim mm, oom: do not rely on TIF_MEMDIE for memory reserves access z3fold: use per-cpu unbuddied lists mm, swap: don't use VMA based swap readahead if HDD is used as swap mm, swap: add sysfs interface for VMA based swap readahead mm, swap: VMA based swap readahead mm, swap: fix swap readahead marking mm, swap: add swap readahead hit statistics mm/vmalloc.c: don't reinvent the wheel but use existing llist API mm/vmstat.c: fix wrong comment selftests/memfd: add memfd_create hugetlbfs selftest mm/shmem: add hugetlbfs support to memfd_create() mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas() ...
2017-09-07mm/shmem: add hugetlbfs support to memfd_create()Mike Kravetz1-0/+24
This patch came out of discussions in this e-mail thread: http://lkml.kernel.org/r/1499357846-7481-1-git-send-email-mike.kravetz%40oracle.com The Oracle JVM team is developing a new garbage collection model. This new model requires multiple mappings of the same anonymous memory. One straight forward way to accomplish this is with memfd_create. They can use the returned fd to create multiple mappings of the same memory. The JVM today has an option to use (static hugetlb) huge pages. If this option is specified, they would like to use the same garbage collection model requiring multiple mappings to the same memory. Using hugetlbfs, it is possible to explicitly mount a filesystem and specify file paths in order to get an fd that can be used for multiple mappings. However, this introduces additional system admin work and coordination. Ideally they would like to get a hugetlbfs fd without requiring explicit mounting of a filesystem. Today, mmap and shmget can make use of hugetlbfs without explicitly mounting a filesystem. The patch adds this functionality to memfd_create. Add a new flag MFD_HUGETLB to memfd_create() that will specify the file to be created resides in the hugetlbfs filesystem. This is the generic hugetlbfs filesystem not associated with any specific mount point. As with other system calls that request hugetlbfs backed pages, there is the ability to encode huge page size in the flag arguments. hugetlbfs does not support sealing operations, therefore specifying MFD_ALLOW_SEALING with MFD_HUGETLB will result in EINVAL. Of course, the memfd_man page would need updating if this type of functionality moves forward. Link: http://lkml.kernel.org/r/1502149672-7759-2-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07userfaultfd: provide pid in userfault msg - add feat unionAndrea Arcangeli1-1/+3
No ABI change, but this will make it more explicit to software that ptid is only available if requested by passing UFFD_FEATURE_THREAD_ID to UFFDIO_API. The fact it's a union will also self document it shouldn't be taken for granted there's a tpid there. Link: http://lkml.kernel.org/r/20170802165145.22628-7-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Alexey Perevalov <a.perevalov@samsung.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07userfaultfd: provide pid in userfault msgAlexey Perevalov1-3/+7
It could be useful for calculating downtime during postcopy live migration per vCPU. Side observer or application itself will be informed about proper task's sleep during userfaultfd processing. Process's thread id is being provided when user requeste it by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features. Link: http://lkml.kernel.org/r/20170802165145.22628-6-aarcange@redhat.com Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07mm: userfaultfd: add feature to request for a signal deliveryPrakash Sangappa1-1/+9
In some cases, userfaultfd mechanism should just deliver a SIGBUS signal to the faulting process, instead of the page-fault event. Dealing with page-fault event using a monitor thread can be an overhead in these cases. For example applications like the database could use the signaling mechanism for robustness purpose. Database uses hugetlbfs for performance reason. Files on hugetlbfs filesystem are created and huge pages allocated using fallocate() API. Pages are deallocated/freed using fallocate() hole punching support. These files are mmapped and accessed by many processes as shared memory. The database keeps track of which offsets in the hugetlbfs file have pages allocated. Any access to mapped address over holes in the file, which can occur due to bugs in the application, is considered invalid and expect the process to simply receive a SIGBUS. However, currently when a hole in the file is accessed via the mapped address, kernel/mm attempts to automatically allocate a page at page fault time, resulting in implicitly filling the hole in the file. This may not be the desired behavior for applications like the database that want to explicitly manage page allocations of hugetlbfs files. Using userfaultfd mechanism with this support to get a signal, database application can prevent pages from being allocated implicitly when processes access mapped address over holes in the file. This patch adds UFFD_FEATURE_SIGBUS feature to userfaultfd mechnism to request for a SIGBUS signal. See following for previous discussion about the database requirement leading to this proposal as suggested by Andrea. http://www.spinics.net/lists/linux-mm/msg129224.html Link: http://lkml.kernel.org/r/1501552446-748335-2-git-send-email-prakash.sangappa@oracle.com Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07mm: shm: use new hugetlb size encoding definitionsMike Kravetz1-2/+29
Use the common definitions from hugetlb_encode.h header file for encoding hugetlb size definitions in shmget system call flags. In addition, move these definitions from the internal (kernel) to user (uapi) header file. Link: http://lkml.kernel.org/r/1501527386-10736-4-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07mm: arch: consolidate mmap hugetlb size encodingsMike Kravetz1-0/+22
A non-default huge page size can be encoded in the flags argument of the mmap system call. The definitions for these encodings are in arch specific header files. However, all architectures use the same values. Consolidate all the definitions in the primary user header file (uapi/linux/mman.h). Include definitions for all known huge page sizes. Use the generic encoding definitions in hugetlb_encode.h as the basis for these definitions. Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds19-31/+457
Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-09-06Merge tag 'dlm-4.14' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This set includes a bunch of minor code cleanups that have accumulated, probably from code analyzers people like to run. There is one nice fix that avoids some socket leaks by switching to use sock_create_lite()" * tag 'dlm-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: use sock_create_lite inside tcp_accept_from_sock uapi linux/dlm_netlink.h: include linux/dlmconstants.h dlm: avoid double-free on error path in dlm_device_{register,unregister} dlm: constify kset_uevent_ops structure dlm: print log message when cluster name is not set dlm: Delete an unnecessary variable initialisation in dlm_ls_start() dlm: Improve a size determination in two functions dlm: Use kcalloc() in two functions dlm: Use kmalloc_array() in make_member_array() dlm: Delete an error message for a failed memory allocation in dlm_recover_waiters_pre() dlm: Improve a size determination in dlm_recover_waiters_pre() dlm: Use kcalloc() in dlm_scan_waiters() dlm: Improve a size determination in table_seq_start() dlm: Add spaces for better code readability dlm: Replace six seq_puts() calls by seq_putc() dlm: Make dismatch error message more clear dlm: Fix kernel memory disclosure
2017-09-06Merge tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-1/+1
Pull XFS updates from Darrick Wong: "Here are the changes for xfs for 4.14. Most of these are cleanups and fixes for bad behavior, as we're mostly focusing on improving reliablity this cycle (read: there's potentially a lot of stuff on the horizon for 4.15 so better to spend a few weeks killing other bugs now). Summary: - Write unmount record for a ro mount to avoid unnecessary log replay - Clean up orphaned inodes when mounting fs readonly - Resubmit inode log items when buffer writeback fails to avoid umount hang - Fix log recovery corruption problems when log headers wrap around the end - Avoid infinite loop searching for free inodes when inode counters are wrong - Evict inodes involved with log redo so that we don't leak them later - Fix a potential race between reclaim and inode cluster freeing - Refactor the inode joining code w.r.t. transaction rolling & deferred ops - Fix a bug where the log doesn't properly deal with dirty buffers that are about to become ordered buffers - Fix the extent swap code to deal with making dirty buffers ordered properly - Consolidate page fault handlers - Refactor the incore extent manipulation functions to use the iext abstractions instead of directly modifying with extent data - Disable crashy chattr +/-x until we fix it - Don't allow us to set S_DAX for v2 inodes - Various cleanups - Clarify some documentation - Fix a problem where fsync and a log commit race to send the disk a flush command, resulting in a small window where power fail data loss could occur - Simplify some rmap operations in the fcollapse code - Fix some use-after-free problems in async writeback" * tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits) xfs: use kmem_free to free return value of kmem_zalloc xfs: open code end_buffer_async_write in xfs_finish_page_writeback xfs: don't set v3 xflags for v2 inodes xfs: fix compiler warnings fsmap: fix documentation of FMR_OF_LAST xfs: simplify the rmap code in xfs_bmse_merge xfs: remove unused flags arg from xfs_file_iomap_begin_delay xfs: fix incorrect log_flushed on fsync xfs: disable per-inode DAX flag xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extent xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents xfs: move some code around inside xfs_bmap_shift_extents xfs: use xfs_iext_get_extent in xfs_bmap_first_unused xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert xfs: add a xfs_iext_update_extent helper xfs: consolidate the various page fault handlers iomap: return VM_FAULT_* codes from iomap_page_mkwrite xfs: relog dirty buffers during swapext bmbt owner change ...
2017-09-05Merge tag 'char-misc-4.14-rc1' of ↵Linus Torvalds1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver update for 4.14-rc1. Lots of different stuff in here, it's been an active development cycle for some reason. Highlights are: - updated binder driver, this brings binder up to date with what shipped in the Android O release, plus some more changes that happened since then that are in the Android development trees. - coresight updates and fixes - mux driver file renames to be a bit "nicer" - intel_th driver updates - normal set of hyper-v updates and changes - small fpga subsystem and driver updates - lots of const code changes all over the driver trees - extcon driver updates - fmc driver subsystem upadates - w1 subsystem minor reworks and new features and drivers added - spmi driver updates Plus a smattering of other minor driver updates and fixes. All of these have been in linux-next with no reported issues for a while" * tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits) ANDROID: binder: don't queue async transactions to thread. ANDROID: binder: don't enqueue death notifications to thread todo. ANDROID: binder: Don't BUG_ON(!spin_is_locked()). ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl ANDROID: binder: push new transactions to waiting threads. ANDROID: binder: remove proc waitqueue android: binder: Add page usage in binder stats android: binder: fixup crash introduced by moving buffer hdr drivers: w1: add hwmon temp support for w1_therm drivers: w1: refactor w1_slave_show to make the temp reading functionality separate drivers: w1: add hwmon support structures eeprom: idt_89hpesx: Support both ACPI and OF probing mcb: Fix an error handling path in 'chameleon_parse_cells()' MCB: add support for SC31 to mcb-lpc mux: make device_type const char: virtio: constify attribute_group structures. Documentation/ABI: document the nvmem sysfs files lkdtm: fix spelling mistake: "incremeted" -> "incremented" perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file nvmem: include linux/err.h from header ...
2017-09-05Merge tag 'tty-4.14-rc1' of ↵Linus Torvalds1-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here is the big tty/serial driver update for 4.14-rc1. Well, not all that big, just a number of small serial driver fixes, and a new serial driver. Also in here are some much needed goldfish tty driver (emulator) fixes to try to get that codebase under control. All of these have been in linux-next for a while with no reported issues" * tag 'tty-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (94 commits) tty: goldfish: Implement support for kernel 'earlycon' parameter tty: goldfish: Use streaming DMA for r/w operations on Ranchu platforms tty: goldfish: Refactor constants to better reflect their nature serial: 8250_port: Remove useless NULL checks earlycon: initialise baud field of earlycon device structure tty: hvcs: make ktermios const pty: show associative slave of ptmx in fdinfo tty: n_gsm: Add compat_ioctl tty: hvcs: constify vio_device_id tty: hvc_vio: constify vio_device_id tty: mips_ejtag_fdc: constify mips_cdmm_device_id Introduce 8250_men_mcb mcb: introduce mcb_get_resource() serial: imx: Avoid post-PIO cleanup if TX DMA is started tty: serial: imx: disable irq after suspend serial: 8250_uniphier: add suspend/resume support serial: 8250_uniphier: use CHAR register for canary to detect power-off serial: 8250_uniphier: fix serial port index in private data serial: 8250: of: Add new port type for MediaTek BTIF controller on MT7622/23 SoC dt-bindings: serial: 8250: Add MediaTek BTIF controller bindings ...
2017-09-05Merge tag 'usb-4.14-rc1' of ↵Linus Torvalds1-0/+31
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/PHY driver updates from Greg KH: "Here is the large USB and PHY driver update for 4.14-rc1. Not all that exciting, a few new PHY drivers, the usual mess of gadget driver updates and fixes, and of course, xhci updates to try to tame that beast. A number of usb-serial updates and other small fixes all over the USB driver tree are in here as well. Full details are in the shortlog. All of these have been in linux-next for a while with no reported issues" * tag 'usb-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (171 commits) usbip: vhci-hcd: make vhci_hc_driver const usb: phy: Avoid unchecked dereference warning usb: imx21-hcd: make imx21_hc_driver const usb: host: make ehci_fsl_overrides const and __initconst dt-bindings: mt8173-mtu3: add generic compatible and rename file dt-bindings: mt8173-xhci: add generic compatible and rename file usb: xhci-mtk: add generic compatible string usbip: auto retry for concurrent attach USB: serial: option: simplify 3 D-Link device entries USB: serial: option: add support for D-Link DWM-157 C1 usb: core: usbport: fix "BUG: key not in .data" when lockdep is enabled usb: chipidea: usb2: check memory allocation failure usb: Add device quirk for Logitech HD Pro Webcam C920-C usb: misc: lvstest: add entry to place port in compliance mode usb: xhci: Support enabling of compliance mode for xhci 1.1 usb:xhci:Fix regression when ATI chipsets detected usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard usb: gadget: make snd_pcm_hardware const usb: common: use of_property_read_bool() USB: core: constify vm_operations_struct ...
2017-09-04Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: - Introduce the ORC unwinder, which can be enabled via CONFIG_ORC_UNWINDER=y. The ORC unwinder is a lightweight, Linux kernel specific debuginfo implementation, which aims to be DWARF done right for unwinding. Objtool is used to generate the ORC unwinder tables during build, so the data format is flexible and kernel internal: there's no dependency on debuginfo created by an external toolchain. The ORC unwinder is almost two orders of magnitude faster than the (out of tree) DWARF unwinder - which is important for perf call graph profiling. It is also significantly simpler and is coded defensively: there has not been a single ORC related kernel crash so far, even with early versions. (knock on wood!) But the main advantage is that enabling the ORC unwinder allows CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel measurably: With frame pointers disabled, GCC does not have to add frame pointer instrumentation code to every function in the kernel. The kernel's .text size decreases by about 3.2%, resulting in better cache utilization and fewer instructions executed, resulting in a broad kernel-wide speedup. Average speedup of system calls should be roughly in the 1-3% range - measurements by Mel Gorman [1] have shown a speedup of 5-10% for some function execution intense workloads. The main cost of the unwinder is that the unwinder data has to be stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel config - which is a modest cost on modern x86 systems. Given how young the ORC unwinder code is it's not enabled by default - but given the performance advantages the plan is to eventually make it the default unwinder on x86. See Documentation/x86/orc-unwinder.txt for more details. - Remove lguest support: its intended role was that of a temporary proof of concept for virtualization, plus its removal will enable the reduction (removal) of the paravirt API as well, so Rusty agreed to its removal. (Juergen Gross) - Clean up and fix FSGS related functionality (Andy Lutomirski) - Clean up IO access APIs (Andy Shevchenko) - Enhance the symbol namespace (Jiri Slaby) * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) objtool: Handle GCC stack pointer adjustment bug x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone() x86/fpu/math-emu: Add ENDPROC to functions x86/boot/64: Extract efi_pe_entry() from startup_64() x86/boot/32: Extract efi_pe_entry() from startup_32() x86/lguest: Remove lguest support x86/paravirt/xen: Remove xen_patch() objtool: Fix objtool fallthrough detection with function padding x86/xen/64: Fix the reported SS and CS in SYSCALL objtool: Track DRAP separately from callee-saved registers objtool: Fix validate_branch() return codes x86: Clarify/fix no-op barriers for text_poke_bp() x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs selftests/x86/fsgsbase: Test selectors 1, 2, and 3 x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common x86/asm: Fix UNWIND_HINT_REGS macro for older binutils x86/asm/32: Fix regs_get_register() on segment registers x86/xen/64: Rearrange the SYSCALL entries x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads ...
2017-09-04Merge branch 'perf-core-for-linus' of ↵Linus Torvalds1-4/+57
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Add branch type profiling/tracing support. (Jin Yao) - Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of physical memory addresses, where the PMU supports it. (Kan Liang) - Export some PMU capability details in the new /sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi Kleen) - Aux data fixes and updates (Will Deacon) - kprobes fixes and updates (Masami Hiramatsu) - AMD uncore PMU driver fixes and updates (Janakarajan Natarajan) On the tooling side, here's a (limited!) list of highlights - there were many other changes that I could not list, see the shortlog and git history for details: UI improvements: - Implement a visual marker for fused x86 instructions in the annotate TUI browser, available now in 'perf report', more work needed to have it available as well in 'perf top' (Jin Yao) Further explanation from one of Jin's patches: │ ┌──cmpl $0x0,argp_program_version_hook 81.93 │ ├──je 20 │ │ lock cmpxchg %esi,0x38a9a4(%rip) │ │↓ jne 29 │ │↓ jmp 43 11.47 │20:└─→cmpxch %esi,0x38a999(%rip) That means the cmpl+je is a fused instruction pair and they should be considered together. - Record the branch type and then show statistics and info about in callchain entries (Jin Yao) Example from one of Jin's patches: # perf record -g -j any,save_type # perf report --branch-history --stdio --no-children 38.50% div.c:45 [.] main div | ---main div.c:42 (RET CROSS_2M cycles:2) compute_flag div.c:28 (cycles:2) compute_flag div.c:27 (RET CROSS_2M cycles:1) rand rand.c:28 (cycles:1) rand rand.c:28 (RET CROSS_2M cycles:1) __random random.c:298 (cycles:1) __random random.c:297 (COND_BWD CROSS_2M cycles:1) __random random.c:295 (cycles:1) __random random.c:295 (COND_BWD CROSS_2M cycles:1) __random random.c:295 (cycles:1) __random random.c:295 (RET CROSS_2M cycles:9) namespaces support: - Add initial support for namespaces, using setns to access files in namespaces, grabbing their build-ids, etc. (Krister Johansen) perf trace enhancements: - Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace' (Arnaldo Carvalho de Melo) - Add initial 'clone' syscall args beautifier in 'perf trace' (Arnaldo Carvalho de Melo) - Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace' (Arnaldo Carvalho de Melo) - Beautifiers for the 'cmd' arg of several ioctl types, including: sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de Melo) - Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data' CTF conversion, allowing CTF trace visualization tools to show callchains and to resolve symbols (Geneviève Bastien) - Beautify the fcntl syscall, which is an interesting one in the sense that infrastructure had to be put in place to change the formatters of some arguments according to the value in a previous one, i.e. cmd dictates how arg and the syscall return will be formatted. (Arnaldo Carvalho de Melo perf stat enhancements: - Use group read for event groups in 'perf stat', reducing overhead when groups are defined in the event specification, i.e. when using {} to enclose a list of events, asking them to be read at the same time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa) pipe mode improvements: - Process tracing data in 'perf annotate' pipe mode (David Carrillo-Cisneros) - Add header record types to pipe-mode, now this command: $ perf record -o - -e cycles sleep 1 | perf report --stdio --header Will show the same as in non-pipe mode, i.e. involving a perf.data file (David Carrillo-Cisneros) Vendor specific hardware event support updates/enhancements: - Update POWER9 vendor events tables (Sukadev Bhattiprolu) - Add POWER9 PMU events Sukadev (Bhattiprolu) - Support additional POWER8+ PVR in PMU mapfile (Shriya) - Add Skylake server uncore JSON vendor events (Andi Kleen) - Support exporting Intel PT data to sqlite3 with python perf scripts, this is in addition to the postgresql support that was already there (Adrian Hunter)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits) perf symbols: Fix plt entry calculation for ARM and AARCH64 perf probe: Fix kprobe blacklist checking condition perf/x86: Fix caps/ for !Intel perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR perf/core, pt, bts: Get rid of itrace_started perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments tools headers: Sync cpu features kernel ABI headers with tooling headers perf tools: Pass full path of FEATURES_DUMP perf tools: Robustify detection of clang binary tools lib: Allow external definition of CC, AR and LD perf tools: Allow external definition of flex and bison binary names tools build tests: Don't hardcode gcc name perf report: Group stat values on global event id perf values: Zero value buffers perf values: Fix allocation check perf values: Fix thread index bug perf report: Add dump_read function perf record: Set read_format for inherit_stat perf c2c: Fix remote HITM detection for Skylake perf tools: Fix static build with newer toolchains ...
2017-09-04netlink: add NLM_F_NONREC flag for deletion requestsPablo Neira Ayuso1-0/+3
In the last NFWS in Faro, Portugal, we discussed that netlink is lacking the semantics to request non recursive deletions, ie. do not delete an object iff it has child objects that hang from this parent object that the user requests to be deleted. We need this new flag to solve a problem for the iptables-compat backward compatibility utility, that runs iptables commands using the existing nf_tables netlink interface. Specifically, custom chains in iptables cannot be deleted if there are rules in it, however, nf_tables allows to remove any chain that is populated with content. To sort out this asymmetry, iptables-compat userspace sets this new NLM_F_NONREC flag to obtain the same semantics that iptables provides. This new flag should only be used for deletion requests. Note this new flag value overlaps with the existing: * NLM_F_ROOT for get requests. * NLM_F_REPLACE for new requests. However, those flags should not ever be used in deletion requests. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds1-2/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnad: "The main RCU related changes in this cycle were: - Removal of spin_unlock_wait() - SRCU updates - RCU torture-test updates - RCU Documentation updates - Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant - Miscellaneous RCU fixes - CPU-hotplug fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits) arch: Remove spin_unlock_wait() arch-specific definitions locking: Remove spin_unlock_wait() generic definitions drivers/ata: Replace spin_unlock_wait() with lock/unlock pair ipc: Replace spin_unlock_wait() with lock/unlock pair exit: Replace spin_unlock_wait() with lock/unlock pair completion: Replace spin_unlock_wait() with lock/unlock pair doc: Set down RCU's scheduling-clock-interrupt needs doc: No longer allowed to use rcu_dereference on non-pointers doc: Add RCU files to docbook-generation files doc: Update memory-barriers.txt for read-to-write dependencies doc: Update RCU documentation membarrier: Provide expedited private command rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() rcu: Add warning to rcu_idle_enter() for irqs enabled rcu: Make rcu_idle_enter() rely on callers disabling irqs rcu: Add assertions verifying blocked-tasks list rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() rcu: Add TPS() protection for _rcu_barrier_trace strings rcu: Use idle versions of swait to make idle-hack clear swait: Add idle variants which don't contribute to load average ...
2017-09-04netfilter: nft_limit: add stateful object typePablo M. Bermudo Garay1-1/+2
Register a new limit stateful object type into the stateful object infrastructure. Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04netfilter: xt_hashlimit: add rate match modeVishwanath Pai1-6/+30
This patch adds a new feature to hashlimit that allows matching on the current packet/byte rate without rate limiting. This can be enabled with a new flag --hashlimit-rate-match. The match returns true if the current rate of packets is above/below the user specified value. The main difference between the existing algorithm and the new one is that the existing algorithm rate-limits the flow whereas the new algorithm does not. Instead it *classifies* the flow based on whether it is above or below a certain rate. I will demonstrate this with an example below. Let us assume this rule: iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain If the packet rate is 15/s, the existing algorithm would ACCEPT 10 packets every second and send 5 packets to "new_chain". But with the new algorithm, as long as the rate of 15/s is sustained, all packets will continue to match and every packet is sent to new_chain. This new functionality will let us classify different flows based on their current rate, so that further decisions can be made on them based on what the current rate is. This is how the new algorithm works: We divide time into intervals of 1 (sec/min/hour) as specified by the user. We keep track of the number of packets/bytes processed in the current interval. After each interval we reset the counter to 0. When we receive a packet for match, we look at the packet rate during the current interval and the previous interval to make a decision: if [ prev_rate < user and cur_rate < user ] return Below else return Above Where cur_rate is the number of packets/bytes seen in the current interval, prev is the number of packets/bytes seen in the previous interval and 'user' is the rate specified by the user. We also provide flexibility to the user for choosing the time interval using the option --hashilmit-interval. For example the user can keep a low rate like x/hour but still keep the interval as small as 1 second. To preserve backwards compatibility we have to add this feature in a new revision, so I've created revision 3 for hashlimit. The two new options we add are: --hashlimit-rate-match --hashlimit-rate-interval I have updated the help text to add these new options. Also added a few tests for the new options. Suggested-by: Igor Lubashev <ilubashe@akamai.com> Reviewed-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-5/+12
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Basically, updates to the conntrack core, enhancements for nf_tables, conversion of netfilter hooks from linked list to array to improve memory locality and asorted improvements for the Netfilter codebase. More specifically, they are: 1) Add expection to hashes after timer initialization to prevent access from another CPU that walks on the hashes and calls del_timer(), from Florian Westphal. 2) Don't update nf_tables chain counters from hot path, this is only used by the x_tables compatibility layer. 3) Get rid of nested rcu_read_lock() calls from netfilter hook path. Hooks are always guaranteed to run from rcu read side, so remove nested rcu_read_lock() where possible. Patch from Taehee Yoo. 4) nf_tables new ruleset generation notifications include PID and name of the process that has updated the ruleset, from Phil Sutter. 5) Use skb_header_pointer() from nft_fib, so we can reuse this code from the nf_family netdev family. Patch from Pablo M. Bermudo. 6) Add support for nft_fib in nf_tables netdev family, also from Pablo. 7) Use deferrable workqueue for conntrack garbage collection, to reduce power consumption, from Patch from Subash Abhinov Kasiviswanathan. 8) Add nf_ct_expect_iterate_net() helper and use it. From Florian Westphal. 9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian. 10) Drop references on conntrack removal path when skbuffs has escaped via nfqueue, from Florian. 11) Don't queue packets to nfqueue with dying conntrack, from Florian. 12) Constify nf_hook_ops structure, from Florian. 13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter. 14) Add nla_strdup(), from Phil Sutter. 15) Rise nf_tables objects name size up to 255 chars, people want to use DNS names, so increase this according to what RFC 1035 specifies. Patch series from Phil Sutter. 16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook registration on demand, suggested by Eric Dumazet, patch from Florian. 17) Remove unused variables in compat_copy_entry_from_user both in ip_tables and arp_tables code. Patch from Taehee Yoo. 18) Constify struct nf_conntrack_l4proto, from Julia Lawall. 19) Constify nf_loginfo structure, also from Julia. 20) Use a single rb root in connlimit, from Taehee Yoo. 21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo. 22) Use audit_log() instead of open-coding it, from Geliang Tang. 23) Allow to mangle tcp options via nft_exthdr, from Florian. 24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes a fix for a miscalculation of the minimal length. 25) Simplify branch logic in h323 helper, from Nick Desaulniers. 26) Calculate netlink attribute size for conntrack tuple at compile time, from Florian. 27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure. From Florian. 28) Remove holes in nf_conntrack_l4proto structure, so it becomes smaller. From Florian. 29) Get rid of print_tuple() indirection for /proc conntrack listing. Place all the code in net/netfilter/nf_conntrack_standalone.c. Patch from Florian. 30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is off. From Florian. 31) Constify most nf_conntrack_{l3,l4}proto helper functions, from Florian. 32) Fix broken indentation in ebtables extensions, from Colin Ian King. 33) Fix several harmless sparse warning, from Florian. 34) Convert netfilter hook infrastructure to use array for better memory locality, joint work done by Florian and Aaron Conole. Moreover, add some instrumentation to debug this. 35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once per batch, from Florian. 36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian. 37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao. 38) Remove unused code in the generic protocol tracker, from Davide Caratti. I think I will have material for a second Netfilter batch in my queue if time allow to make it fit in this merge window. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-04Merge tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-1/+36
Pull drm updates from Dave Airlie: "This is the main drm pull request for 4.14 merge window. I'm sending this early, as my continuing journey into fatherhood is occurring really soon now, I'm going to be mostly useless for the next couple of weeks, though I may be able to read email, I doubt I'll be doing much patch applications or git sending. If anything urgent pops up I've asked Daniel/Jani/Alex/Sean to try and direct stuff towards you. Outside drm changes: Some rcar-du updates that touch the V4L tree, all acks should be in place. It adds one export to the radix tree code for new i915 use case. There are some minor AGP cleanups (don't see that too often). Changes to the vbox driver in staging to avoid breaking compilation. Summary: core: - Atomic helper fixes - Atomic UAPI fixes - Add YCBCR 4:2:0 support - Drop set_busid hook - Refactor fb_helper locking - Remove a bunch of internal APIs - Add a bunch of better default handlers - Format modifier/blob plane property added - More internal header refactoring - Make more internal API names consistent - Enhanced syncobj APIs (wait/signal/reset/create signalled) bridge: - Add Synopsys Designware MIPI DSI host bridge driver tiny: - Add Pervasive Displays RePaper displays - Add support for LEGO MINDSTORMS EV3 LCD i915: - Lots of GEN10/CNL support patches - drm syncobj support - Skylake+ watermark refactoring - GVT vGPU 48-bit ppgtt support - GVT performance improvements - NOA change ioctl - CCS (color compression) scanout support - GPU reset improvements amdgpu: - Initial hugepage support - BO migration logic rework - Vega10 improvements - Powerplay fixes - Stop reprogramming the MC - Fixes for ACP audio on stoney - SR-IOV fixes/improvements - Command submission overhead improvements amdkfd: - Non-dGPU upstreaming patches - Scratch VA ioctl - Image tiling modes - Update PM4 headers for new firmware - Drop all BUG_ONs. nouveau: - GP108 modesetting support. - Disable MSI on big endian. vmwgfx: - Add fence fd support. msm: - Runtime PM improvements exynos: - NV12MT support - Refactor KMS drivers imx-drm: - Lock scanout channel to improve memory bw - Cleanups etnaviv: - GEM object population fixes tegra: - Prep work for Tegra186 support - PRIME mmap support sunxi: - HDMI support improvements - HDMI CEC support omapdrm: - HDMI hotplug IRQ support - Big driver cleanup - OMAP5 DSI support rcar-du: - vblank fixes - VSP1 updates arcgpu: - Minor fixes stm: - Add STM32 DSI controller driver dw_hdmi: - Add support for Rockchip RK3399 - HDMI CEC support atmel-hlcdc: - Add 8-bit color support vc4: - Atomic fixes - New ioctl to attach a label to a buffer object - HDMI CEC support - Allow userspace to dictate rendering order on submit ioctl" * tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linux: (1074 commits) drm/syncobj: Add a signal ioctl (v3) drm/syncobj: Add a reset ioctl (v3) drm/syncobj: Add a syncobj_array_find helper drm/syncobj: Allow wait for submit and signal behavior (v5) drm/syncobj: Add a CREATE_SIGNALED flag drm/syncobj: Add a callback mechanism for replace_fence (v3) drm/syncobj: add sync obj wait interface. (v8) i915: Use drm_syncobj_fence_get drm/syncobj: Add a race-free drm_syncobj_fence_get helper (v2) drm/syncobj: Rename fence_get to find_fence drm: kirin: Add mode_valid logic to avoid mode clocks we can't generate drm/vmwgfx: Bump the version for fence FD support drm/vmwgfx: Add export fence to file descriptor support drm/vmwgfx: Add support for imported Fence File Descriptor drm/vmwgfx: Prepare to support fence fd drm/vmwgfx: Fix incorrect command header offset at restart drm/vmwgfx: Support the NOP_ERROR command drm/vmwgfx: Restart command buffers after errors drm/vmwgfx: Move irq bottom half processing to threads drm/vmwgfx: Don't use drm_irq_[un]install ...
2017-09-04Merge branch 'for-linus' of ↵Linus Torvalds2-18/+31
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc fixes from Al Viro: "Loose ends and regressions from the last merge window. Strictly speaking, only binfmt_flat thing is a build regression per se - the rest is 'only sparse cares about that' stuff" [ This came in before the 4.13 release and could have gone there, but it was late in the release and nothing seemed critical enough to care, so I'm pulling it in the 4.14 merge window instead - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: binfmt_flat: fix arch/m32r and arch/microblaze flat_put_addr_at_rp() compat_hdio_ioctl: Fix a declaration <linux/uaccess.h>: Fix copy_in_user() declaration annotate RWF_... flags teach SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE to handle __bitwise arguments
2017-09-02tcp_diag: report TCP MD5 signing keys and addressesIvan Delalande2-0/+10
Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is not possible to retrieve these from the kernel once they have been configured on sockets. Signed-off-by: Ivan Delalande <colona@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-40/+0
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01fsmap: fix documentation of FMR_OF_LASTDarrick J. Wong1-1/+1
The FMR_OF_LAST flag is set on the last fsmap record being returned for the dataset requested, contrary to what the header file says. Fix the docs to reflect the behavior of all fsmap implementations. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2017-09-01ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctlColin Cross1-0/+14
The BINDER_GET_NODE_DEBUG_INFO ioctl will return debug info on a node. Each successive call reusing the previous return value will return the next node. The data will be used by libmemunreachable to mark the pointers with kernel references as reachable. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-01bpf: Add mark and priority to sock options that can be setDavid Ahern1-0/+2
Add socket mark and priority to fields that can be set by ebpf program when a socket is created. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01devlink: Add IPv6 header for dpipeArkadi Sharshevsky1-0/+5
This will be used by the IPv6 host table which will be introduced in the following patches. The fields in the header are added per-use. This header is global and can be reused by many drivers. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01annotate RWF_... flagsChristoph Hellwig2-18/+31
[AV: added missing annotations in syscalls.h/compat.h] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-08-30net: arp: Add support for raw IP deviceSubash Abhinov Kasiviswanathan1-0/+1
Define the raw IP type. This is needed for raw IP net devices like rmnet. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30net: ether: Add support for multiplexing and aggregation typeSubash Abhinov Kasiviswanathan1-0/+3
Define the Qualcomm multiplexing and aggregation (MAP) ether type 0x00F9. This is needed for receiving data in the MAP protocol like RMNET. This is not an officially registered ID. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30tcp: Revert "tcp: remove header prediction"Florian Westphal1-0/+2
This reverts commit 45f119bf936b1f9f546a0b139c5b56f9bb2bdc78. Eric Dumazet says: We found at Google a significant regression caused by 45f119bf936b1f9f546a0b139c5b56f9bb2bdc78 tcp: remove header prediction In typical RPC (TCP_RR), when a TCP socket receives data, we now call tcp_ack() while we used to not call it. This touches enough cache lines to cause a slowdown. so problem does not seem to be HP removal itself but the tcp_ack() call. Therefore, it might be possible to remove HP after all, provided one finds a way to elide tcp_ack for most cases. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30ether: add NSH ethertypeJiri Benc1-0/+1
The NSH draft says: An IEEE EtherType, 0x894F, has been allocated for NSH. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-30if_ether: add forces ife lfb typeAlexander Aring1-0/+1
This patch adds the forces IFE lfb type according to IEEE registered ethertypes. See http://standards-oui.ieee.org/ethertype/eth.txt for more information. Since there exists the IFE subsystem it can be used there. This patch also use the correct word "ForCES" instead of "FoRCES" which is a spelling error inside the IEEE ethertype specification. Signed-off-by: Alexander Aring <aring@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29perf/core, x86: Add PERF_SAMPLE_PHYS_ADDRKan Liang1-1/+3
For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. Add a new sample type for physical address. perf already has a facility to collect data virtual address. This patch introduces a function to convert the virtual address to physical address. The function is quite generic and can be extended to any architecture as long as a virtual address is provided. - For kernel direct mapping addresses, virt_to_phys is used to convert the virtual addresses to physical address. - For user virtual addresses, __get_user_pages_fast is used to walk the pages tables for user physical address. - This does not work for vmalloc addresses right now. These are not resolved, but code to do that could be added. The new sample type requires collecting the virtual address. The virtual address will not be output unless SAMPLE_ADDR is applied. For security, the physical address can only be exposed to root or privileged user. Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mpe@ellerman.id.au Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar1-3/+0
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-28serial: 8250: of: Add new port type for MediaTek BTIF controller on ↵Sean Wang1-0/+3
MT7622/23 SoC MediaTek BTIF controller is the serial interface similar to UART but it works only as the digital device which is mainly used to communicate with the connectivity module called CONNSYS inside the SoC which could be mostly found on those MediaTek SoCs with Bluetooth feature such as MT7622 and MT7623 SoCs. And the controller is made as being compatible with the 8250 register layout with extra registers such as DMA enablement so it tends to be integrated with reusing 8250 OF driver. However, DMA mode is not being supported yet in the current driver. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28serial: Fix port type numbering for TI DA8xxAndy Shevchenko1-1/+3
The UAPI has a global list of unique numbers for different port types. The commit a2d6a987bfe4 ("serial: 8250: Add new port type for TI DA8xx/66AK2x") introduced a new port type and brought the collision with two other port types. Reuse 95 for it instead. Fixes: a2d6a987bfe4 ("serial: 8250: Add new port type for TI DA8xx/66AK2x") Cc: David Lechner <david@lechnology.com> Cc: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28serial: Remove unused port typeAndy Shevchenko1-3/+0
PORT_MFD is not in use since commit 1bd187de5364 ("x86, intel-mid: remove Intel MID specific serial support") Remove leftover. Fixes: 1bd187de5364 ("x86, intel-mid: remove Intel MID specific serial support") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28serial: pch_uart: Make port type explicitAndy Shevchenko1-1/+4
It used to be a gap in port definitions after PORT_MAX_8250. Since the new drivers are coming the gap become shorter and shorter until the commit a2d6a987bfe4 ("serial: 8250: Add new port type for TI DA8xx/66AK2x") completely removed it. So, while type here is just a formality, make things a little bit more explicit for this driver and move port types to UAPI header. Note, it uses two types for now. Fixes: fddceb8b5399 ("tty: 8250: Add 64byte UART support for FSL platforms") Cc: Priyanka Jain <Priyanka.Jain@freescale.com> Cc: Poonam Aggrwal <poonam.aggrwal@freescale.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28bpf: sockmap, remove STRPARSER map_flags and add multi-map supportJohn Fastabend1-3/+0
The addition of map_flags BPF_SOCKMAP_STRPARSER flags was to handle a specific use case where we want to have BPF parse program disabled on an entry in a sockmap. However, Alexei found the API a bit cumbersome and I agreed. Lets remove the STRPARSER flag and support the use case by allowing socks to be in multiple maps. This allows users to create two maps one with programs attached and one without. When socks are added to maps they now inherit any programs attached to the map. This is a nice generalization and IMO improves the API. The API rules are less ambiguous and do not need a flag: - When a sock is added to a sockmap we have two cases, i. The sock map does not have any attached programs so we can add sock to map without inheriting bpf programs. The sock may exist in 0 or more other maps. ii. The sock map has an attached BPF program. To avoid duplicate bpf programs we only add the sock entry if it does not have an existing strparser/verdict attached, returning -EBUSY if a program is already attached. Otherwise attach the program and inherit strparser/verdict programs from the sock map. This allows for socks to be in a multiple maps for redirects and inherit a BPF program from a single map. Also this patch simplifies the logic around BPF_{EXIST|NOEXIST|ANY} flags. In the original patch I tried to be extra clever and only update map entries when necessary. Now I've decided the complexity is not worth it. If users constantly update an entry with the same sock for no reason (i.e. update an entry without actually changing any parameters on map or sock) we still do an alloc/release. Using this and allowing multiple entries of a sock to exist in a map the logic becomes much simpler. Note: Now that multiple maps are supported the "maps" pointer called when a socket is closed becomes a list of maps to remove the sock from. To keep the map up to date when a sock is added to the sockmap we must add the map/elem in the list. Likewise when it is removed we must remove it from the list. This results in searching the per psock list on delete operation. On TCP_CLOSE events we walk the list and remove the psock from all map/entry locations. I don't see any perf implications in this because at most I have a psock in two maps. If a psock were to be in many maps its possibly this might be noticeable on delete but I can't think of a reason to dup a psock in many maps. The sk_callback_lock is used to protect read/writes to the list. This was convenient because in all locations we were taking the lock anyways just after working on the list. Also the lock is per sock so in normal cases we shouldn't see any contention. Suggested-by: Alexei Starovoitov <ast@kernel.org> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: convert sockmap field attach_bpf_fd2 to typeJohn Fastabend1-6/+3
In the initial sockmap API we provided strparser and verdict programs using a single attach command by extending the attach API with a the attach_bpf_fd2 field. However, if we add other programs in the future we will be adding a field for every new possible type, attach_bpf_fd(3,4,..). This seems a bit clumsy for an API. So lets push the programs using two new type fields. BPF_SK_SKB_STREAM_PARSER BPF_SK_SKB_STREAM_VERDICT This has the advantage of having a readable name and can easily be extended in the future. Updates to samples and sockmap included here also generalize tests slightly to support upcoming patch for multiple map support. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Suggested-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28libnvdimm: clean up command definitionsDan Williams1-37/+0
Remove the command payloads that do not have an associated libnvdimm ioctl. I.e. remove the payloads that would only ever be carried in the ND_CMD_CALL envelope. This prevents userspace from growing unnecessary dependencies on this kernel header when userspace already has everything it needs to craft and send these commands. Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Reported-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-08-28Merge 4.13-rc7 into char-misc-nextGreg Kroah-Hartman1-3/+0
We want the binder fix in here as well for testing and merge issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-26ipv6: sr: add support for encapsulation of L2 framesDavid Lebrun1-4/+14
This patch implements the L2 frame encapsulation mechanism, referred to as T.Encaps.L2 in the SRv6 specifications [1]. A new type of SRv6 tunnel mode is added (SEG6_IPTUN_MODE_L2ENCAP). It only accepts packets with an existing MAC header (i.e., it will not work for locally generated packets). The resulting packet looks like IPv6 -> SRH -> Ethernet -> original L3 payload. The next header field of the SRH is set to NEXTHDR_NONE. [1] https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01 Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25perf/x86: Fix data source decoding for SkylakeAndi Kleen1-2/+28
Skylake changed the encoding of the PEBS data source field. Some combinations are not available anymore, but some new cases e.g. for L4 cache hit are added. Fix up the conversion table for Skylake, similar as had been done for Nehalem. On Skylake server the encoding for L4 actually means persistent memory. Handle this case too. To properly describe it in the abstracted perf format I had to add some new fields. Since a hit can have only one level add a new field that is an enumeration, not a bit field to describe the level. It can describe any level. Some numbers are also used to describe PMEM and LFB. Also add a new generic remote flag that can be combined with the generic level to signify a remote cache. And there is an extension field for the snoop indication to handle the Forward state. I didn't add a generic flag for hops because it's not needed for Skylake. I changed the existing encodings for older CPUs to also fill in the new level and remote fields. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/20170816222156.19953-3-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-24devlink: Add IPv4 header for dpipeArkadi Sharshevsky1-0/+5
This will be used by the IPv4 host table which will be introduced in the following patches. This header is global and can be reused by many drivers. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24devlink: Add Ethernet header for dpipeArkadi Sharshevsky1-0/+8
This will be used by the IPv4 host table which will be introduced in the following patches. This header is global and can be reused by many drivers. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24x86/lguest: Remove lguest supportJuergen Gross1-2/+2
Lguest seems to be rather unused these days. It has seen only patches ensuring it still builds the last two years and its official state is "Odd Fixes". Remove it in order to be able to clean up the paravirt code. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: boris.ostrovsky@oracle.com Cc: lguest@lists.ozlabs.org Cc: rusty@rustcorp.com.au Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-24Revert "loop: support 4k physical blocksize"Omar Sandoval1-3/+0
There's some stuff still up in the air, let's not get stuck with a subpar ABI. I'll follow up with something better for 4.14. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>