summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-02-02Merge tag 'vfio-v4.16-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds6-116/+45
Pull VFIO updates from Alex Williamson: - Mask INTx from user if pdev->irq is zero (Alexey Kardashevskiy) - Capability helper cleanup (Alex Williamson) - Allow mmaps overlapping MSI-X vector table with region capability exposing this feature (Alexey Kardashevskiy) - mdev static cleanups (Xiongwei Song) * tag 'vfio-v4.16-rc1' of git://github.com/awilliam/linux-vfio: vfio: mdev: make a couple of functions and structure vfio_mdev_driver static vfio-pci: Allow mapping MSIX BAR vfio: Simplify capability helper vfio-pci: Mask INTx if a device is not capabable of enabling it
2018-02-02Merge tag 'trace-v4.16' of ↵Linus Torvalds6-27/+80
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "There's not much changes for the tracing system this release. Mostly small clean ups and fixes. The biggest change is to how bprintf works. bprintf is used by trace_printk() to just save the format and args of a printf call, and the formatting is done when the trace buffer is read. This is done to keep the formatting out of the fast path (this was recommended by you). The issue is when arguments are de-referenced. If a pointer is saved, and the format has something like "%*pbl", when the buffer is read, it will de-reference the argument then. The problem is if the data no longer exists. This can cause the kernel to oops. The fix for this was to make these de-reference pointes do the formatting at the time it is called (the fast path), as this guarantees that the data exists (and doesn't change later)" * tag 'trace-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: vsprintf: Do not have bprintf dereference pointers ftrace: Mark function tracer test functions noinline/noclone trace_uprobe: Display correct offset in uprobe_events tracing: Make sure the parsed string always terminates with '\0' tracing: Clear parser->idx if only spaces are read tracing: Detect the string nul character when parsing user input string
2018-02-01Merge branch 'KASAN-read_word_at_a_time'Linus Torvalds3-15/+16
Merge KASAN word-at-a-time fixups from Andrey Ryabinin. The word-at-a-time optimizations have caused headaches for KASAN, since the whole point is that we access byte streams in bigger chunks, and KASAN can be unhappy about the potential extra access at the end of the string. We used to have a horrible hack in dcache, and then people got complaints from the strscpy() case. This fixes it all up properly, by adding an explicit helper for the "access byte stream one word at a time" case. * emailed patches from Andrey Ryabinin <aryabinin@virtuozzo.com>: fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports" fs/dcache: Use read_word_at_a_time() in dentry_string_cmp() lib/strscpy: Shut up KASAN false-positives in strscpy() compiler.h: Add read_word_at_a_time() function. compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
2018-02-01fs: dcache: Revert "manually unpoison dname after allocation to shut up ↵Andrey Ryabinin1-5/+0
kasan's reports" This reverts commit df4c0e36f1b1782b0611a77c52cc240e5c4752dd. It's no longer needed since dentry_string_cmp() now uses read_word_at_a_time() to avoid kasan's reports. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()Andrey Ryabinin1-1/+1
dentry_string_cmp() performs the word-at-a-time reads from 'cs' and may read slightly more than it was requested in kmallac(). Normally this would make KASAN to report out-of-bounds access, but this was workarounded by commit df4c0e36f1b1 ("fs: dcache: manually unpoison dname after allocation to shut up kasan's reports"). This workaround is not perfect, since it allows out-of-bounds access to dentry's name for all the code, not just in dentry_string_cmp(). So it would be better to use read_word_at_a_time() instead and revert commit df4c0e36f1b1. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01lib/strscpy: Shut up KASAN false-positives in strscpy()Andrey Ryabinin1-1/+1
strscpy() performs the word-at-a-time optimistic reads. So it may may access the memory past the end of the object, which is perfectly fine since strscpy() doesn't use that (past-the-end) data and makes sure the optimistic read won't cross a page boundary. Use new read_word_at_a_time() to shut up the KASAN. Note that this potentially could hide some bugs. In example bellow, stscpy() will copy more than we should (1-3 extra uninitialized bytes): char dst[8]; char *src; src = kmalloc(5, GFP_KERNEL); memset(src, 0xff, 5); strscpy(dst, src, 8); Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01compiler.h: Add read_word_at_a_time() function.Andrey Ryabinin1-0/+8
Sometimes we know that it's safe to do potentially out-of-bounds access because we know it won't cross a page boundary. Still, KASAN will report this as a bug. Add read_word_at_a_time() function which is supposed to be used in such cases. In read_word_at_a_time() KASAN performs relaxed check - only the first byte of access is validated. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()Andrey Ryabinin1-8/+6
Instead of having two identical __read_once_size_nocheck() functions with different attributes, consolidate all the difference in new macro __no_kasan_or_inline and use it. No functional changes. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01Merge tag 'kconfig-v4.16' of ↵Linus Torvalds20-5065/+548
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kconfig updates from Masahiro Yamada: "A pretty big batch of Kconfig updates. I have to mention the lexer and parser of Kconfig are now built from real .l and .y sources. So, flex and bison are the requirement for building the kernel. Both of them (unlike gperf) have been stable for a long time. This change has been tested several weeks in linux-next, and I did not receive any problem report about this. Summary: - add checks for mistakes, like the choice default is not in choice, help is doubled - document data structure and complex code - fix various memory leaks - change Makefile to build lexer and parser instead of using pre-generated C files - drop 'boolean' keyword, which is equivalent to 'bool' - use default 'yy' prefix and remove unneeded Make variables - fix gettext() check for xconfig - announce that oldnoconfig will be finally removed - make 'Selected by:' and 'Implied by' readable in help and search result - hide silentoldconfig from 'make help' to stop confusing people - fix misc things and cleanups" * tag 'kconfig-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (37 commits) kconfig: Remove silentoldconfig from help and docs; fix kconfig/conf's help kconfig: make "Selected by:" and "Implied by:" readable kconfig: announce removal of oldnoconfig if used kconfig: fix make xconfig when gettext is missing kconfig: Clarify menu and 'if' dependency propagation kconfig: Document 'if' flattening logic kconfig: Clarify choice dependency propagation kconfig: Document SYMBOL_OPTIONAL logic kbuild: remove unnecessary LEX_PREFIX and YACC_PREFIX kconfig: use default 'yy' prefix for lexer and parser kconfig: make conf_unsaved a local variable of conf_read() kconfig: make xfgets() really static kconfig: make input_mode static kconfig: Warn if there is more than one help text kconfig: drop 'boolean' keyword kconfig: use bool instead of boolean for type definition attributes, again kconfig: Remove menu_end_entry() kconfig: Document important expression functions kconfig: Document automatic submenu creation code kconfig: Fix choice symbol expression leak ...
2018-02-01Merge tag 'kbuild-misc-v4.16' of ↵Linus Torvalds10-115/+565
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild misc updates from Masahiro Yamada: - add snap-pkg target to create Linux kernel snap package - make out-of-tree creation of source packages fail correctly - improve and fix several semantic patches * tag 'kbuild-misc-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: Coccinelle: coccicheck: fix typo Coccinelle: memdup: drop spurious line Coccinelle: kzalloc-simple: Rename kzalloc-simple to zalloc-simple Coccinelle: ifnullfree: Trim the warning reported in report mode Coccinelle: alloc_cast: Add more memory allocating functions to the list Coccinelle: array_size: report even if include is missing Coccinelle: kzalloc-simple: Add all zero allocating functions kbuild: pkg: make out-of-tree rpm/deb-pkg build immediately fail scripts/package: snap-pkg target
2018-02-01Merge tag 'kbuild-v4.16' of ↵Linus Torvalds3-142/+101
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - terminate the build correctly in case of fixdep errors - clean up fixdep - suppress packed-not-aligned warnings from GCC-8 - fix W= handling for extra DTC warnings * tag 'kbuild-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: fix W= option checks for extra DTC warnings Kbuild: suppress packed-not-aligned warning for default setting only fixdep: use existing helper to check modular CONFIG options fixdep: refactor parse_dep_file() fixdep: move global variables to local variables of main() fixdep: remove unneeded memcpy() in parse_dep_file() fixdep: factor out common code for reading files fixdep: use malloc() and read() to load dep_file to buffer fixdep: remove unnecessary <arpa/inet.h> inclusion fixdep: exit with error code in error branches of do_config_file()
2018-02-01Merge tag 'devicetree-for-4.16' of ↵Linus Torvalds91-285/+151
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree updates from Rob Herring: - Convert to use memblock_virt_alloc in DT code which supports bootmem arches. With this we can remove the arch specific early_init_dt_alloc_memory_arch() functions. - Enable running the DT unittests on UML - Use SPDX license tags on DT files - Fix early FDT kconfig ifdef logic - Clean-up unittest Makefile - Fix function comment for of_irq_parse_raw - Add missing documentation for linux,initrd-{start,end} properties - Clean-up of binding examples using uppercase hex - Add trivial devices W83773G and Infineon TLV493D-A1B6 - Add missing STM32 SoC bindings - Various small binding doc fixes * tag 'devicetree-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (23 commits) xtensa: remove arch specific early DT functions x86: remove arch specific early_init_dt_alloc_memory_arch nios2: remove arch specific early_init_dt_alloc_memory_arch mips: remove arch specific early_init_dt_alloc_memory_arch metag: remove arch specific early DT functions cris: remove arch specific early DT functions libfdt: remove unnecessary include directive from <linux/libfdt.h> of: unittest: refactor Makefile of/fdt: use memblock_virt_alloc for early alloc of: Use SPDX license tag for DT files of/fdt: Fix #ifdef dependency of early flattree declarations dt-bindings: h8300 clocksource: correct spelling of pulse dt-bindings: imx6q-pcie: Add required property for i.MX6SX mmc: Don't reference Linux-specific OF_GPIO_ACTIVE_LOW flag in DT binding dt-bindings: Use lower case hex in unit-addresses dt-bindings: display: panel: Fix compatible string for Toshiba LT089AC29000 dt-bindings: Add Infineon TLV493D-A1B6 dt-bindings: mailbox: ti,message-manager: Fix interrupt name error dt-bindings: chosen: Document linux,initrd-{start,end} dt-bindings: arm: document supported STM32 SoC family ...
2018-02-01Merge branch 'for-linus' of ↵Linus Torvalds65-1746/+491
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input layer updates from Dmitry Torokhov: - evdev interface has been adjusted to extend the life of timestamps on 32 bit systems to the year of 2108 - Synaptics RMI4 driver's PS/2 guest handling ha beed updated to improve chances of detecting trackpoints on the pass-through port - mms114 touchcsreen controller driver has been updated to support generic device properties and work with mms152 cntrollers - Goodix driver now supports generic touchscreen properties - couple of drivers for AVR32 architecture are gone as the architecture support has been removed from the kernel - gpio-tilt driver has been removed as there are no mainline users and the driver itself is using legacy APIs and relies on platform data - MODULE_LINECSE/MODULE_VERSION cleanups * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (45 commits) Input: goodix - use generic touchscreen_properties Input: mms114 - fix typo in definition Input: mms114 - use BIT() macro instead of explicit shifting Input: mms114 - replace mdelay with msleep Input: mms114 - add support for mms152 Input: mms114 - drop platform data and use generic APIs Input: mms114 - mark as direct input device Input: mms114 - do not clobber interrupt trigger Input: edt-ft5x06 - fix error handling for factory mode on non-M06 Input: stmfts - set IRQ_NOAUTOEN to the irq flag Input: auo-pixcir-ts - delete an unnecessary return statement Input: auo-pixcir-ts - remove custom log for a failed memory allocation Input: da9052_tsi - remove unused mutex Input: docs - use PROPERTY_ENTRY_U32() directly Input: synaptics-rmi4 - log when we create a guest serio port Input: synaptics-rmi4 - unmask F03 interrupts when port is opened Input: synaptics-rmi4 - do not delete interrupt memory too early Input: ad7877 - use managed resource allocations Input: stmfts,s6sy671 - add SPDX identifier Input: remove atmel-wm97xx touchscreen driver ...
2018-02-01Merge tag 'char-misc-4.16-rc1' of ↵Linus Torvalds150-1142/+14288
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big pull request for char/misc drivers for 4.16-rc1. There's a lot of stuff in here. Three new driver subsystems were added for various types of hardware busses: - siox - slimbus - soundwire as well as a new vboxguest subsystem for the VirtualBox hypervisor drivers. There's also big updates from the FPGA subsystem, lots of Android binder fixes, the usual handful of hyper-v updates, and lots of other smaller driver updates. All of these have been in linux-next for a long time, with no reported issues" * tag 'char-misc-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (155 commits) char: lp: use true or false for boolean values android: binder: use VM_ALLOC to get vm area android: binder: Use true and false for boolean values lkdtm: fix handle_irq_event symbol for INT_HW_IRQ_EN EISA: Delete error message for a failed memory allocation in eisa_probe() EISA: Whitespace cleanup misc: remove AVR32 dependencies virt: vbox: Add error mapping for VERR_INVALID_NAME and VERR_NO_MORE_FILES soundwire: Fix a signedness bug uio_hv_generic: fix new type mismatch warnings uio_hv_generic: fix type mismatch warnings auxdisplay: img-ascii-lcd: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE uio_hv_generic: add rescind support uio_hv_generic: check that host supports monitor page uio_hv_generic: create send and receive buffers uio: document uio_hv_generic regions doc: fix documentation about uio_hv_generic vmbus: add monitor_id and subchannel_id to sysfs per channel vmbus: fix ABI documentation uio_hv_generic: use ISR callback method ...
2018-02-01Merge tag 'driver-core-4.16-rc1' of ↵Linus Torvalds140-1047/+918
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of "big" driver core patches for 4.16-rc1. The majority of the work here is in the firmware subsystem, with reworks to try to attempt to make the code easier to handle in the long run, but no functional change. There's also some tree-wide sysfs attribute fixups with lots of acks from the various subsystem maintainers, as well as a handful of other normal fixes and changes. And finally, some license cleanups for the driver core and sysfs code. All have been in linux-next for a while with no reported issues" * tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits) device property: Define type of PROPERTY_ENRTY_*() macros device property: Reuse property_entry_free_data() device property: Move property_entry_free_data() upper firmware: Fix up docs referring to FIRMWARE_IN_KERNEL firmware: Drop FIRMWARE_IN_KERNEL Kconfig option USB: serial: keyspan: Drop firmware Kconfig options sysfs: remove DEBUG defines sysfs: use SPDX identifiers drivers: base: add coredump driver ops sysfs: add attribute specification for /sysfs/devices/.../coredump test_firmware: fix missing unlock on error in config_num_requests_store() test_firmware: make local symbol test_fw_config static sysfs: turn WARN() into pr_warn() firmware: Fix a typo in fallback-mechanisms.rst treewide: Use DEVICE_ATTR_WO treewide: Use DEVICE_ATTR_RO treewide: Use DEVICE_ATTR_RW sysfs.h: Use octal permissions component: add debugfs support bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate ...
2018-02-01Merge tag 'staging-4.16-rc1' of ↵Linus Torvalds819-22474/+14848
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO updates from Greg KH: "Here is the big Staging and IIO driver patches for 4.16-rc1. There is the normal amount of new IIO drivers added, like all releases. The networking IPX and the ncpfs filesystem are moved into the staging tree, as they are on their way out of the kernel due to lack of use anymore. The visorbus subsystem finall has started moving out of the staging tree to the "real" part of the kernel, and the most and fsl-mc codebases are almost ready to move out, that will probably happen for 4.17-rc1 if all goes well. Other than that, there is a bunch of license header cleanups in the tree, along with the normal amount of coding style churn that we all know and love for this codebase. I also got frustrated at the Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting huge chunks of it that were never even being used. Full details of everything is in the shortlog. All of these patches have been in linux-next for a while with no reported issues" * tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits) staging: rtlwifi: remove redundant initialization of 'cfg_cmd' staging: rtl8723bs: remove a couple of redundant initializations staging: comedi: reformat lines to 80 chars or less staging: lustre: separate a connection destroy from free struct kib_conn Staging: rtl8723bs: Use !x instead of NULL comparison Staging: rtl8723bs: Remove dead code Staging: rtl8723bs: Change names to conform to the kernel code staging: ccree: Fix missing blank line after declaration staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd' staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST staging: fbtft: remove unused FB_TFT_SSD1325 kconfig staging: comedi: dt2811: remove redundant initialization of 'ns' staging: wilc1000: fix alignments to match open parenthesis staging: wilc1000: removed unnecessary defined enums typedef staging: wilc1000: remove unnecessary use of parentheses staging: rtl8192u: remove redundant initialization of 'timeout' staging: sm750fb: fix CamelCase for dispSet var staging: lustre: lnet/selftest: fix compile error on UP build staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons staging: rts5208: Fix "seg_no" calculation in reset_ms_card() ...
2018-02-01Merge tag 'tty-4.16-rc1' of ↵Linus Torvalds43-350/+546
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/staging driver updates from Greg KH: "Here is the big tty/serial driver update for 4.16-rc1. The usual number of various serial driver fixes and updates to try to get them to work with crazy hardware configurations (seriously, how many different ways are hardware engineers going to come up with to hook up a simple UART?) There is also some serdev bugfixes and updates, as well as a smattering of other small fixes in here. All have been in the linux-next tree for a while, with no reported issues" * tag 'tty-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits) tty: serial: exar: Relocate sleep wake-up handling tty: fix data race between tty_init_dev and flush of buf serial: imx: fix endless loop during suspend serial: core: mark port as initialized after successful IRQ change serdev: only match serdev devices serdev: do not generate modaliases for controllers serial: mxs-auart: don't use GPIOF_* with gpiod_get_direction serial: 8250_dw: Revert "Improve clock rate setting" MAINTAINERS: Add myself as designated reviewer for 8250_dw gpio: serial: max310x: Support open-drain configuration for GPIOs serdev: Fix serdev_uevent failure on ACPI enumerated serdev-controllers serial: 8250_ingenic: Parse earlycon options serial: 8250_ingenic: Add support for the JZ4770 SoC serial: core: Make uart_parse_options take const char* argument serial: 8250_of: fix return code when probe function fails to get reset serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS serial: 8250_uniphier: fix error return code in uniphier_uart_probe() tty: n_gsm: Allow ADM response in addition to UA for control dlci tty: omap-serial: Fix initial on-boot RTS GPIO level tty: serial: jsm: Add one check against NULL pointer dereference ...
2018-02-01Merge tag 'usb-4.16-rc1' of ↵Linus Torvalds160-1454/+3875
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/PHY updates from Greg KH: "Here is the big USB and PHY driver update for 4.16-rc1. Along with the normally expected XHCI, MUSB, and Gadget driver patches, there are some PHY driver fixes, license cleanups, sysfs attribute cleanups, usbip changes, and a raft of other smaller fixes and additions. Full details are in the shortlog. All of these have been in the linux-next tree for a long time with no reported issues" * tag 'usb-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (137 commits) USB: serial: pl2303: new device id for Chilitag USB: misc: fix up some remaining DEVICE_ATTR() usages USB: musb: fix up one odd DEVICE_ATTR() usage USB: atm: fix up some remaining DEVICE_ATTR() usage USB: move many drivers to use DEVICE_ATTR_WO USB: move many drivers to use DEVICE_ATTR_RO USB: move many drivers to use DEVICE_ATTR_RW USB: misc: chaoskey: Use true and false for boolean values USB: storage: remove old wording about how to submit a change USB: storage: remove invalid URL from drivers usb: ehci-omap: don't complain on -EPROBE_DEFER when no PHY found usbip: list: don't list devices attached to vhci_hcd usbip: prevent bind loops on devices attached to vhci_hcd USB: serial: remove redundant initializations of 'mos_parport' usb/gadget: Fix "high bandwidth" check in usb_gadget_ep_match_desc() usb: gadget: compress return logic into one line usbip: vhci_hcd: update 'status' file header and format USB: serial: simple: add Motorola Tetra driver CDC-ACM: apply quirk for card reader usb: option: Add support for FS040U modem ...
2018-02-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ideLinus Torvalds1-1/+0
Pull small IDE cleanup from David Miller. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide: ide: remove duplicated assignment to 'cursg'
2018-02-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextLinus Torvalds11-4/+3164
Pull sparc updates from David Miller: "Of note is the addition of a driver for the Data Analytics Accelerator, and some small cleanups" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: oradax: Fix return value check in dax_attach() sparc: vDSO: remove an extra tab sparc64: drop unneeded compat include sparc64: Oracle DAX driver sparc64: Oracle DAX infrastructure
2018-02-01Merge branch 'for-linus' of ↵Linus Torvalds37-863/+856
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "Bug fixes, small improvements and one notable change: the system call table and the unistd.h header are now generated automatically with a shell script from a text file" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/decompressor: discard __ksymtab and .eh_frame sections s390: fix handling of -1 in set{,fs}[gu]id16 syscalls s390/tools: generate header files in arch/s390/include/generated/ s390/syscalls: use generated syscall_table.h and unistd.h header files s390/syscalls: add Makefile to generate system call header files s390/syscalls: add syscalltbl script s390/syscalls: add system call table s390/decompressor: swap .text and .rodata.compressed sections s390/sclp: fix .data section specification s390/ipl: avoid usage of __section(.data) s390/head: replace hard coded values with constants s390/disassembler: add generated gen_opcode_table tool to .gitignore s390: remove bogus system call table entries s390/kprobes: remove duplicate includes s390/dasd: Remove dead return code checks s390/dasd: Simplify code s390/vdso: revise CFI annotations of vDSO functions s390/kernel: emit CFI data in .debug_frame and discard .eh_frame sections
2018-02-01Coccinelle: coccicheck: fix typoJulia Lawall1-1/+1
Correct spelling of "coccinelle". Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-01Merge branch 'next' into for-linusDmitry Torokhov65-1746/+491
Prepare input updates for 4.16 merge window.
2018-02-01Merge tag 'docs-4.16' of git://git.lwn.net/linuxLinus Torvalds67-2006/+3887
Pull documentation updates from Jonathan Corbet: "Documentation updates for 4.16. New stuff includes refcount_t documentation, errseq documentation, kernel-doc support for nested structure definitions, the removal of lots of crufty kernel-doc support for unused formats, SPDX tag documentation, the beginnings of a manual for subsystem maintainers, and lots of fixes and updates. As usual, some of the changesets reach outside of Documentation/ to effect kerneldoc comment fixes. It also adds the new LICENSES directory, of which Thomas promises I do not need to be the maintainer" * tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits) linux-next: docs-rst: Fix typos in kfigure.py linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt Documentation: Fix misconversion of #if docs: add index entry for networking/msg_zerocopy Documentation: security/credentials.rst: explain need to sort group_list LICENSES: Add MPL-1.1 license LICENSES: Add the GPL 1.0 license LICENSES: Add Linux syscall note exception LICENSES: Add the MIT license LICENSES: Add the BSD-3-clause "Clear" license LICENSES: Add the BSD 3-clause "New" or "Revised" License LICENSES: Add the BSD 2-clause "Simplified" license LICENSES: Add the LGPL-2.1 license LICENSES: Add the LGPL 2.0 license LICENSES: Add the GPL 2.0 license Documentation: Add license-rules.rst to describe how to properly identify file licenses scripts: kernel_doc: better handle show warnings logic fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at doc: md: Fix a file name to md-fault.c in fault-injection.txt errseq: Add to documentation tree ...
2018-02-01Merge branch 'work.vmci' of ↵Linus Torvalds1-133/+46
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vmci iov_iter updates from Al Viro: "Get rid of "is it an iovec or an entire array?" flags in vmxi - just use iov_iter. Simplifies the living hell out of that code..." * 'work.vmci' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vmci: the same on the send side... vmci: simplify qp_dequeue_locked() vmci: get rid of qp_memcpy_from_queue() vmci: fix buf_size in case of iovec-based accesses
2018-02-01Merge branch 'work.whack-a-mole' of ↵Linus Torvalds7-7/+6
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull asm/uaccess.h whack-a-mole from Al Viro: "It's linux/uaccess.h, damnit... Oh, well - eventually they'll stop cropping up..." * 'work.whack-a-mole' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: asm-prototypes.h: use linux/uaccess.h, not asm/uaccess.h riscv: use linux/uaccess.h, not asm/uaccess.h... ppc: for put_user() pull linux/uaccess.h, not asm/uaccess.h
2018-02-01Merge branch 'work.dcache' of ↵Linus Torvalds1-7/+16
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Neil Brown's d_move()/d_path() race fix" * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: close race between getcwd() and d_move()
2018-02-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds118-1863/+2607
Merge updates from Andrew Morton: - misc fixes - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) mm: remove PG_highmem description tools, vm: new option to specify kpageflags file mm/swap.c: make functions and their kernel-doc agree mm, memory_hotplug: fix memmap initialization mm: correct comments regarding do_fault_around() mm: numa: do not trap faults on shared data section pages. hugetlb, mbind: fall back to default policy if vma is NULL hugetlb, mempolicy: fix the mbind hugetlb migration mm, hugetlb: further simplify hugetlb allocation API mm, hugetlb: get rid of surplus page accounting tricks mm, hugetlb: do not rely on overcommit limit during migration mm, hugetlb: integrate giga hugetlb more naturally to the allocation path mm, hugetlb: unify core page allocation accounting and initialization mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes mm/memcontrol.c: make local symbol static mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd() include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer mm/compaction.c: fix comment for try_to_compact_pages() mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it zsmalloc: use U suffix for negative literals being shifted ...
2018-02-01mm: remove PG_highmem descriptionMiles Chen1-5/+0
Commit cbe37d093707 ("[PATCH] mm: remove PG_highmem") removed PG_highmem to save a page flag. So the description of PG_highmem is no longer needed. Link: http://lkml.kernel.org/r/1517391212-2950-1-git-send-email-miles.chen@mediatek.com Signed-off-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01tools, vm: new option to specify kpageflags fileDavid Rientjes1-7/+21
page-types currently hardcodes /proc/kpageflags as the file to parse. This works when using the tool to examine the state of pageflags on the same system, but does not allow storing a snapshot of pageflags at a given time to debug issues nor on a different system. This allows the user to specify a saved version of kpageflags with a new page-types -F option. [akpm@linux-foundation.org: add "filename" to fix usage() string] [rientjes@google.com: fix layout] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801301840050.140969@chino.kir.corp.google.com Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801301458180.153857@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm/swap.c: make functions and their kernel-doc agreeRandy Dunlap1-6/+5
Fix some basic kernel-doc notation in mm/swap.c: - for function lru_cache_add_anon(), make its kernel-doc function name match its function name and change colon to hyphen following the function name - for function pagevec_lookup_entries(), change the function parameter name from nr_pages to nr_entries since that is more descriptive of what the parameter actually is and then it matches the kernel-doc comments also Fix function kernel-doc to match the change in commit 67fd707f4681: - drop the kernel-doc notation for @nr_pages from pagevec_lookup_range() and correct the function description for that change Link: http://lkml.kernel.org/r/3b42ee3e-04a9-a6ca-6be4-f00752a114fe@infradead.org Fixes: 67fd707f4681 ("mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm, memory_hotplug: fix memmap initializationMichal Hocko1-8/+14
Bharata has noticed that onlining a newly added memory doesn't increase the total memory, pointing to commit f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") as a culprit. This commit has changed the way how the memory for memmaps is initialized and moves it from the allocation time to the initialization time. This works properly for the early memmap init path. It doesn't work for the memory hotplug though because we need to mark page as reserved when the sparsemem section is created and later initialize it completely during onlining. memmap_init_zone is called in the early stage of onlining. With the current code it calls __init_single_page and as such it clears up the whole stage and therefore online_pages_range skips those pages. Fix this by skipping mm_zero_struct_page in __init_single_page for memory hotplug path. This is quite uggly but unifying both early init and memory hotplug init paths is a large project. Make sure we plug the regression at least. Link: http://lkml.kernel.org/r/20180130101141.GW21609@dhcp22.suse.cz Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Tested-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Bob Picco <bob.picco@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm: correct comments regarding do_fault_around()William Kucharski1-11/+11
There are multiple comments surrounding do_fault_around that memtion fault_around_pages() and fault_around_mask(), two routines that do not exist. These comments should be reworded to reference fault_around_bytes, the value which is used to determine how much do_fault_around() will attempt to read when processing a fault. These comments should have been updated when fault_around_pages() and fault_around_mask() were removed in commit aecd6f44266c ("mm: close race between do_fault_around() and fault_around_bytes_set()"). Fixes: aecd6f44266c1 ("mm: close race between do_fault_around() and fault_around_bytes_set()") Link: http://lkml.kernel.org/r/302D0B14-C7E9-44C6-8BED-033F9ACBD030@oracle.com Signed-off-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Larry Bassel <larry.bassel@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm: numa: do not trap faults on shared data section pages.Henry Willard1-0/+5
Workloads consisting of a large number of processes running the same program with a very large shared data segment may experience performance problems when numa balancing attempts to migrate the shared cow pages. This manifests itself with many processes or tasks in TASK_UNINTERRUPTIBLE state waiting for the shared pages to be migrated. The program listed below simulates the conditions with these results when run with 288 processes on a 144 core/8 socket machine. Average throughput Average throughput Average throughput with numa_balancing=0 with numa_balancing=1 with numa_balancing=1 without the patch with the patch --------------------- --------------------- --------------------- 2118782 2021534 2107979 Complex production environments show less variability and fewer poorly performing outliers accompanied with a smaller number of processes waiting on NUMA page migration with this patch applied. In some cases, %iowait drops from 16%-26% to 0. // SPDX-License-Identifier: GPL-2.0 /* * Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved. */ #include <sys/time.h> #include <stdio.h> #include <wait.h> #include <sys/mman.h> int a[1000000] = {13}; int main(int argc, const char **argv) { int n = 0; int i; pid_t pid; int stat; int *count_array; int cpu_count = 288; long total = 0; struct timeval t1, t2 = {(argc > 1 ? atoi(argv[1]) : 10), 0}; if (argc > 2) cpu_count = atoi(argv[2]); count_array = mmap(NULL, cpu_count * sizeof(int), (PROT_READ|PROT_WRITE), (MAP_SHARED|MAP_ANONYMOUS), 0, 0); if (count_array == MAP_FAILED) { perror("mmap:"); return 0; } for (i = 0; i < cpu_count; ++i) { pid = fork(); if (pid <= 0) break; if ((i & 0xf) == 0) usleep(2); } if (pid != 0) { if (i == 0) { perror("fork:"); return 0; } for (;;) { pid = wait(&stat); if (pid < 0) break; } for (i = 0; i < cpu_count; ++i) total += count_array[i]; printf("Total %ld\n", total); munmap(count_array, cpu_count * sizeof(int)); return 0; } gettimeofday(&t1, 0); timeradd(&t1, &t2, &t1); while (timercmp(&t2, &t1, <)) { int b = 0; int j; for (j = 0; j < 1000000; j++) b += a[j]; gettimeofday(&t2, 0); n++; } count_array[i] = n; return 0; } This patch changes change_pte_range() to skip shared copy-on-write pages when called from change_prot_numa(). NOTE: change_prot_numa() is nominally called from task_numa_work() and queue_pages_test_walk(). task_numa_work() is the auto NUMA balancing path, and queue_pages_test_walk() is part of explicit NUMA policy management. However, queue_pages_test_walk() only calls change_prot_numa() when MPOL_MF_LAZY is specified and currently that is not allowed, so change_prot_numa() is only called from auto NUMA balancing. In the case of explicit NUMA policy management, shared pages are not migrated unless MPOL_MF_MOVE_ALL is specified, and MPOL_MF_MOVE_ALL depends on CAP_SYS_NICE. Currently, there is no way to pass information about MPOL_MF_MOVE_ALL to change_pte_range. This will have to be fixed if MPOL_MF_LAZY is enabled and MPOL_MF_MOVE_ALL is to be honored in lazy migration mode. task_numa_work() skips the read-only VMAs of programs and shared libraries. Link: http://lkml.kernel.org/r/1516751617-7369-1-git-send-email-henry.willard@oracle.com Signed-off-by: Henry Willard <henry.willard@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01hugetlb, mbind: fall back to default policy if vma is NULLMichal Hocko3-6/+7
Dan Carpenter has noticed that mbind migration callback (new_page) can get a NULL vma pointer and choke on it inside alloc_huge_page_vma which relies on the VMA to get the hstate. We used to BUG_ON this case but the BUG_+ON has been removed recently by "hugetlb, mempolicy: fix the mbind hugetlb migration". The proper way to handle this is to get the hstate from the migrated page and rely on huge_node (resp. get_vma_policy) do the right thing with null VMA. We are currently falling back to the default mempolicy in that case which is in line what THP path is doing here. Link: http://lkml.kernel.org/r/20180110104712.GR1732@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01hugetlb, mempolicy: fix the mbind hugetlb migrationMichal Hocko3-19/+22
do_mbind migration code relies on alloc_huge_page_noerr for hugetlb pages. alloc_huge_page_noerr uses alloc_huge_page which is a highlevel allocation function which has to take care of reserves, overcommit or hugetlb cgroup accounting. None of that is really required for the page migration because the new page is only temporal and either will replace the original page or it will be dropped. This is essentially as for other migration call paths and there shouldn't be any reason to handle mbind in a special way. The current implementation is even suboptimal because the migration might fail just because the hugetlb cgroup limit is reached, or the overcommit is saturated. Fix this by making mbind like other hugetlb migration paths. Add a new migration helper alloc_huge_page_vma as a wrapper around alloc_huge_page_nodemask with additional mempolicy handling. alloc_huge_page_noerr has no more users and it can go. Link: http://lkml.kernel.org/r/20180103093213.26329-7-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm, hugetlb: further simplify hugetlb allocation APIMichal Hocko1-37/+43
Hugetlb allocator has several layer of allocation functions depending and the purpose of the allocation. There are two allocators depending on whether the page can be allocated from the page allocator or we need a contiguous allocator. This is currently opencoded in alloc_fresh_huge_page which is the only path that might allocate giga pages which require the later allocator. Create alloc_fresh_huge_page which hides this implementation detail and use it in all callers which hardcoded the buddy allocator path (__hugetlb_alloc_buddy_huge_page). This shouldn't introduce any funtional change because both migration and surplus allocators exlude giga pages explicitly. While we are at it let's do some renaming. The current scheme is not consistent and overly painfull to read and understand. Get rid of prefix underscores from most functions. There is no real reason to make names longer. * alloc_fresh_huge_page is the new layer to abstract underlying allocator * __hugetlb_alloc_buddy_huge_page becomes shorter and neater alloc_buddy_huge_page. * Former alloc_fresh_huge_page becomes alloc_pool_huge_page because we put the new page directly to the pool * alloc_surplus_huge_page can drop the opencoded prep_new_huge_page code as it uses alloc_fresh_huge_page now * others lose their excessive prefix underscores to make names shorter [dan.carpenter@oracle.com: fix double unlock bug in alloc_surplus_huge_page()] Link: http://lkml.kernel.org/r/20180109200559.g3iz5kvbdrz7yydp@mwanda Link: http://lkml.kernel.org/r/20180103093213.26329-6-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm, hugetlb: get rid of surplus page accounting tricksMichal Hocko1-39/+23
alloc_surplus_huge_page increases the pool size and the number of surplus pages opportunistically to prevent from races with the pool size change. See commit d1c3fb1f8f29 ("hugetlb: introduce nr_overcommit_hugepages sysctl") for more details. The resulting code is unnecessarily hairy, cause code duplication and doesn't allow to share the allocation paths. Moreover pool size changes tend to be very seldom so optimizing for them is not really reasonable. Simplify the code and allow to allocate a fresh surplus page as long as we are under the overcommit limit and then recheck the condition after the allocation and drop the new page if the situation has changed. This should provide a reasonable guarantee that an abrupt allocation requests will not go way off the limit. If we consider races with the pool shrinking and enlarging then we should be reasonably safe as well. In the first case we are off by one in the worst case and the second case should work OK because the page is not yet visible. We can waste CPU cycles for the allocation but that should be acceptable for a relatively rare condition. Link: http://lkml.kernel.org/r/20180103093213.26329-5-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm, hugetlb: do not rely on overcommit limit during migrationMichal Hocko3-18/+99
hugepage migration relies on __alloc_buddy_huge_page to get a new page. This has 2 main disadvantages. 1) it doesn't allow to migrate any huge page if the pool is used completely which is not an exceptional case as the pool is static and unused memory is just wasted. 2) it leads to a weird semantic when migration between two numa nodes might increase the pool size of the destination NUMA node while the page is in use. The issue is caused by per NUMA node surplus pages tracking (see free_huge_page). Address both issues by changing the way how we allocate and account pages allocated for migration. Those should temporal by definition. So we mark them that way (we will abuse page flags in the 3rd page) and update free_huge_page to free such pages to the page allocator. Page migration path then just transfers the temporal status from the new page to the old one which will be freed on the last reference. The global surplus count will never change during this path but we still have to be careful when migrating a per-node suprlus page. This is now handled in move_hugetlb_state which is called from the migration path and it copies the hugetlb specific page state and fixes up the accounting when needed Rename __alloc_buddy_huge_page to __alloc_surplus_huge_page to better reflect its purpose. The new allocation routine for the migration path is __alloc_migrate_huge_page. The user visible effect of this patch is that migrated pages are really temporal and they travel between NUMA nodes as per the migration request: Before migration /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0 After /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:1 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0 with the previous implementation, both nodes would have nr_hugepages:1 until the page is freed. Link: http://lkml.kernel.org/r/20180103093213.26329-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm, hugetlb: integrate giga hugetlb more naturally to the allocation pathMichal Hocko1-41/+14
Gigantic hugetlb pages were ingrown to the hugetlb code as an alien specie with a lot of special casing. The allocation path is not an exception. Unnecessarily so to be honest. It is true that the underlying allocator is different but that is an implementation detail. This patch unifies the hugetlb allocation path that a prepares fresh pool pages. alloc_fresh_gigantic_page basically copies alloc_fresh_huge_page logic so we can move everything there. This will simplify set_max_huge_pages which doesn't have to care about what kind of huge page we allocate. Link: http://lkml.kernel.org/r/20180103093213.26329-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm, hugetlb: unify core page allocation accounting and initializationMichal Hocko1-32/+29
Patch series "mm, hugetlb: allocation API and migration improvements" Motivation: this is a follow up for [3] for the allocation API and [4] for the hugetlb migration. It wasn't really easy to split those into two separate patch series as they share some code. My primary motivation to touch this code is to make the gigantic pages migration working. The giga pages allocation code is just too fragile and hacked into the hugetlb code now. This series tries to move giga pages closer to the first class citizen. We are not there yet but having 5 patches is quite a lot already and it will already make the code much easier to follow. I will come with other changes on top after this sees some review. The first two patches should be trivial to review. The third patch changes the way how we migrate huge pages. Newly allocated pages are a subject of the overcommit check and they participate surplus accounting which is quite unfortunate as the changelog explains. This patch doesn't change anything wrt. giga pages. Patch #4 removes the surplus accounting hack from __alloc_surplus_huge_page. I hope I didn't miss anything there and a deeper review is really due there. Patch #5 finally unifies allocation paths and giga pages shouldn't be any special anymore. There is also some renaming going on as well. This patch (of 6): hugetlb allocator has two entry points to the page allocator - alloc_fresh_huge_page_node - __hugetlb_alloc_buddy_huge_page The two differ very subtly in two aspects. The first one doesn't care about HTLB_BUDDY_* stats and it doesn't initialize the huge page. prep_new_huge_page is not used because it not only initializes hugetlb specific stuff but because it also put_page and releases the page to the hugetlb pool which is not what is required in some contexts. This makes things more complicated than necessary. Simplify things by a) removing the page allocator entry point duplicity and only keep __hugetlb_alloc_buddy_huge_page and b) make prep_new_huge_page more reusable by removing the put_page which moves the page to the allocator pool. All current callers are updated to call put_page explicitly. Later patches will add new callers which won't need it. This patch shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/20180103093213.26329-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Reale <ar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytesAndrey Ryabinin1-36/+6
mem_cgroup_resize_[memsw]_limit() tries to free only 32 (SWAP_CLUSTER_MAX) pages on each iteration. This makes it practically impossible to decrease limit of memory cgroup. Tasks could easily allocate back 32 pages, so we can't reduce memory usage, and once retry_count reaches zero we return -EBUSY. Easy to reproduce the problem by running the following commands: mkdir /sys/fs/cgroup/memory/test echo $$ >> /sys/fs/cgroup/memory/test/tasks cat big_file > /dev/null & sleep 1 && echo $((100*1024*1024)) > /sys/fs/cgroup/memory/test/memory.limit_in_bytes -bash: echo: write error: Device or resource busy Instead of relying on retry_count, keep retrying the reclaim until the desired limit is reached or fail if the reclaim doesn't make any progress or a signal is pending. Link: http://lkml.kernel.org/r/20180119132544.19569-1-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm/memcontrol.c: make local symbol staticChristopher Díaz Riveros1-1/+1
Fix the following sparse warning: mm/memcontrol.c:1097:14: warning: symbol 'memcg1_stats' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20180118193327.14200-1-chrisadr@gentoo.org Signed-off-by: Christopher Díaz Riveros <chrisadr@gentoo.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()Ralph Campbell1-3/+1
The variable 'entry' is used before being initialized in hmm_vma_walk_pmd(). No bad effect (beside performance hit) so !non_swap_entry(0) evaluate to true which trigger a fault as if CPU was trying to access migrated memory and migrate memory back from device memory to regular memory. This function (hmm_vma_walk_pmd()) is called when a device driver tries to populate its own page table. For migrated memory it should not happen as the device driver should already have populated its page table correctly during the migration. Only case I can think of is multi-GPU where a second GPU triggers migration back to regular memory. Again this would just result in a performance hit, nothing bad would happen. Link: http://lkml.kernel.org/r/20180122185759.26286-1-jglisse@redhat.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM ↵Petr Tesarik2-3/+15
mem_map pointer The comment is confusing. On the one hand, it refers to 32-bit alignment (struct page alignment on 32-bit platforms), but this would only guarantee that the 2 lowest bits must be zero. On the other hand, it claims that at least 3 bits are available, and 3 bits are actually used. This is not broken, because there is a stronger alignment guarantee, just less obvious. Let's fix the comment to make it clear how many bits are available and why. Although memmap arrays are allocated in various places, the resulting pointer is encoded eventually, so I am adding a BUG_ON() here to enforce at runtime that all expected bits are indeed available. I have also added a BUILD_BUG_ON to check that PFN_SECTION_SHIFT is sufficient, because this part of the calculation can be easily checked at build time. [ptesarik@suse.com: v2] Link: http://lkml.kernel.org/r/20180125100516.589ea6af@ezekiel.suse.cz Link: http://lkml.kernel.org/r/20180119080908.3a662e6f@ezekiel.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemi Wang <kemi.wang@intel.com> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm/compaction.c: fix comment for try_to_compact_pages()Yang Shi1-1/+1
"mode" argument is not used by try_to_compact_pages() and sub functions anymore, it has been replaced by "prio". Fix the comment to explain the use of "prio" argument. Link: http://lkml.kernel.org/r/1515801336-20611-1-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but ↵Oscar Salvador1-0/+2
nothing uses it static struct page_ext_operations *page_ext_ops[] always contains debug_guardpage_ops, static struct page_ext_operations *page_ext_ops[] = { &debug_guardpage_ops, #ifdef CONFIG_PAGE_OWNER &page_owner_ops, #endif ... } but for it to work, CONFIG_DEBUG_PAGEALLOC must be enabled first. If someone has CONFIG_PAGE_EXTENSION, but has none of its users, eg: (CONFIG_PAGE_OWNER, CONFIG_DEBUG_PAGEALLOC, CONFIG_IDLE_PAGE_TRACKING), we can shrink page_ext_init() to a simple retq. $ size vmlinux (before patch) text data bss dec hex filename 14356698 5681582 1687748 21726028 14b834c vmlinux $ size vmlinux (after patch) text data bss dec hex filename 14356008 5681538 1687748 21725294 14b806e vmlinux On the other hand, it might does not even make sense, since if someone enables CONFIG_PAGE_EXTENSION, I would expect him to enable also at least one of its users. Link: http://lkml.kernel.org/r/20180105130235.GA21241@techadventures.net Signed-off-by: Oscar Salvador <osalvador@techadventures.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jaewon Kim <jaewon31.kim@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01zsmalloc: use U suffix for negative literals being shiftedNick Desaulniers1-1/+1
Fix warning about shifting unsigned literals being undefined behavior. Link: http://lkml.kernel.org/r/1515642078-4259-1-git-send-email-nick.desaulniers@gmail.com Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nick Desaulniers <nick.desaulniers@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm/page_owner.c: clean up init_pages_in_zone()Oscar Salvador1-9/+7
Remove two redundant assignments in init_pages_in_zone(). [osalvador@techadventures.net: v3] Link: http://lkml.kernel.org/r/20180117124513.GA876@techadventures.net [akpm@linux-foundation.org: coding style tweaks] Link: http://lkml.kernel.org/r/20180110084355.GA22822@techadventures.net Signed-off-by: Oscar Salvador <osalvador@techadventures.net> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01mm/page_alloc.c: fix typos in commentsShile Zhang1-3/+3
Link: http://lkml.kernel.org/r/1515485774-4768-1-git-send-email-zhangshile@gmail.com Signed-off-by: Shile Zhang <zhangshile@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>