From 1d4a9c17d4d204a159139361e8d4db7f9f267879 Mon Sep 17 00:00:00 2001 From: Brian Norris Date: Sun, 22 Feb 2015 21:16:49 -0800 Subject: PM / sleep: add configurable delay for pm_test When CONFIG_PM_DEBUG=y, we provide a sysfs file (/sys/power/pm_test) for selecting one of a few suspend test modes, where rather than entering a full suspend state, the kernel will perform some subset of suspend steps, wait 5 seconds, and then resume back to normal operation. This mode is useful for (among other things) observing the state of the system just before entering a sleep mode, for debugging or analysis purposes. However, a constant 5 second wait is not sufficient for some sorts of analysis; for example, on an SoC, one might want to use external tools to probe the power states of various on-chip controllers or clocks. This patch turns this 5 second delay into a configurable module parameter, so users can determine how long to wait in this pseudo-suspend state before resuming the system. Example (wait 30 seconds); # echo 30 > /sys/module/suspend/parameters/pm_test_delay # echo core > /sys/power/pm_test # time echo mem > /sys/power/state ... [ 17.583625] suspend debug: Waiting for 30 second(s). ... real 0m30.381s user 0m0.017s sys 0m0.080s Signed-off-by: Brian Norris Acked-by: Pavel Machek Reviewed-by: Kevin Cernekee Acked-by: Florian Fainelli Signed-off-by: Rafael J. Wysocki --- Documentation/kernel-parameters.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index bfcb1a62a7b4..8b1fa5e129ac 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3462,6 +3462,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. improve throughput, but will also increase the amount of memory reserved for use by the client. + suspend.pm_test_delay= + [SUSPEND] + Sets the number of seconds to remain in a suspend test + mode before resuming the system (see + /sys/power/pm_test). Only available when CONFIG_PM_DEBUG + is set. Default value is 5. + swapaccount=[0|1] [KNL] Enable accounting of swap in memory resource controller if no parameter or 1 is given or disable -- cgit v1.2.3 From d2af1ad73e7a22ed3e04374896fee0eb300c05c3 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 20 Jan 2015 23:54:59 -0800 Subject: documentation: Update rcutree.kthread_prio for grace-period kthread use Now that the rcutree.kthread_prio kernel boot parameter also controls the priority of the grace-period kthreads, update the documentation to reflect this change. Signed-off-by: Paul E. McKenney --- Documentation/kernel-parameters.txt | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index bfcb1a62a7b4..d913e3b4bf0d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2991,11 +2991,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted. value is one, and maximum value is HZ. rcutree.kthread_prio= [KNL,BOOT] - Set the SCHED_FIFO priority of the RCU - per-CPU kthreads (rcuc/N). This value is also - used for the priority of the RCU boost threads - (rcub/N). Valid values are 1-99 and the default - is 1 (the least-favored priority). + Set the SCHED_FIFO priority of the RCU per-CPU + kthreads (rcuc/N). This value is also used for + the priority of the RCU boost threads (rcub/N) + and for the RCU grace-period kthreads (rcu_bh, + rcu_preempt, and rcu_sched). If RCU_BOOST is + set, valid values are 1-99 and the default is 1 + (the least-favored priority). Otherwise, when + RCU_BOOST is not set, valid values are 0-99 and + the default is zero (non-realtime operation). rcutree.rcu_nocb_leader_stride= [KNL] Set the number of NOCB kthread groups, which -- cgit v1.2.3 From 37745d281069682d901f00c0121949a7d224195f Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 22 Jan 2015 18:24:08 -0800 Subject: rcu: Provide diagnostic option to slow down grace-period initialization Grace-period initialization normally proceeds quite quickly, so that it is very difficult to reproduce races against grace-period initialization. This commit therefore allows grace-period initialization to be artificially slowed down, increasing race-reproduction probability. A pair of new Kconfig parameters are provided, CONFIG_RCU_TORTURE_TEST_SLOW_INIT to enable the slowdowns, and CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY to specify the number of jiffies of slowdown to apply. A boot-time parameter named rcutree.gp_init_delay allows boot-time delay to be specified. By default, no delay will be applied even if CONFIG_RCU_TORTURE_TEST_SLOW_INIT is set. Signed-off-by: Paul E. McKenney --- Documentation/kernel-parameters.txt | 6 ++++++ kernel/rcu/tree.c | 10 ++++++++++ lib/Kconfig.debug | 24 ++++++++++++++++++++++++ 3 files changed, 40 insertions(+) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index bfcb1a62a7b4..94de410ec341 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2968,6 +2968,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Set maximum number of finished RCU callbacks to process in one batch. + rcutree.gp_init_delay= [KNL] + Set the number of jiffies to delay each step of + RCU grace-period initialization. This only has + effect when CONFIG_RCU_TORTURE_TEST_SLOW_INIT is + set. + rcutree.rcu_fanout_leaf= [KNL] Increase the number of CPUs assigned to each leaf rcu_node structure. Useful for very large diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 3b7e4133ca99..b42001fd55fb 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -160,6 +160,12 @@ static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp); static int kthread_prio = CONFIG_RCU_KTHREAD_PRIO; module_param(kthread_prio, int, 0644); +/* Delay in jiffies for grace-period initialization delays. */ +static int gp_init_delay = IS_ENABLED(CONFIG_RCU_TORTURE_TEST_SLOW_INIT) + ? CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY + : 0; +module_param(gp_init_delay, int, 0644); + /* * Track the rcutorture test sequence number and the update version * number within a given test. The rcutorture_testseq is incremented @@ -1769,6 +1775,10 @@ static int rcu_gp_init(struct rcu_state *rsp) raw_spin_unlock_irq(&rnp->lock); cond_resched_rcu_qs(); ACCESS_ONCE(rsp->gp_activity) = jiffies; + if (IS_ENABLED(CONFIG_RCU_TORTURE_TEST_SLOW_INIT) && + gp_init_delay > 0 && + !(rsp->gpnum % (rcu_num_nodes * 10))) + schedule_timeout_uninterruptible(gp_init_delay); } mutex_unlock(&rsp->onoff_mutex); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c5cefb3c009c..feee8dab441e 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1257,6 +1257,30 @@ config RCU_TORTURE_TEST_RUNNABLE Say N here if you want the RCU torture tests to start only after being manually enabled via /proc. +config RCU_TORTURE_TEST_SLOW_INIT + bool "Slow down RCU grace-period initialization to expose races" + depends on RCU_TORTURE_TEST + help + This option makes grace-period initialization block for a + few jiffies between initializing each pair of consecutive + rcu_node structures. This helps to expose races involving + grace-period initialization, in other words, it makes your + kernel less stable. It can also greatly increase grace-period + latency, especially on systems with large numbers of CPUs. + This is useful when torture-testing RCU, but in almost no + other circumstance. + + Say Y here if you want your system to crash and hang more often. + Say N if you want a sane system. + +config RCU_TORTURE_TEST_SLOW_INIT_DELAY + int "How much to slow down RCU grace-period initialization" + range 0 5 + default 0 + help + This option specifies the number of jiffies to wait between + each rcu_node structure initialization. + config RCU_CPU_STALL_TIMEOUT int "RCU CPU stall timeout in seconds" depends on RCU_STALL_COMMON -- cgit v1.2.3 From fed6cefe3b6e862dcc74d07324478caa07e84eaf Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Thu, 5 Feb 2015 11:44:41 +0100 Subject: x86/efi: Add a "debug" option to the efi= cmdline ... and hide the memory regions dump behind it. Make it default-off. Signed-off-by: Borislav Petkov Link: http://lkml.kernel.org/r/20141209095843.GA3990@pd.tnic Acked-by: Laszlo Ersek Acked-by: Dave Young Signed-off-by: Matt Fleming --- Documentation/kernel-parameters.txt | 3 ++- arch/x86/platform/efi/efi.c | 5 ++++- include/linux/efi.h | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index bfcb1a62a7b4..01aa47d3b6ab 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1036,7 +1036,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Format: {"off" | "on" | "skip[mbr]"} efi= [EFI] - Format: { "old_map", "nochunk", "noruntime" } + Format: { "old_map", "nochunk", "noruntime", "debug" } old_map [X86-64]: switch to the old ioremap-based EFI runtime services mapping. 32-bit still uses this one by default. @@ -1044,6 +1044,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. boot stub, as chunking can cause problems with some firmware implementations. noruntime : disable EFI runtime services support + debug: enable misc debug output efi_no_storage_paranoia [EFI; X86] Using this parameter you can use more than 50% of diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index dbc8627a5cdf..e859d56ce9f8 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -491,7 +491,8 @@ void __init efi_init(void) if (efi_memmap_init()) return; - print_efi_memmap(); + if (efi_enabled(EFI_DBG)) + print_efi_memmap(); } void __init efi_late_init(void) @@ -939,6 +940,8 @@ static int __init arch_parse_efi_cmdline(char *str) { if (parse_option_str(str, "old_map")) set_bit(EFI_OLD_MEMMAP, &efi.flags); + if (parse_option_str(str, "debug")) + set_bit(EFI_DBG, &efi.flags); return 0; } diff --git a/include/linux/efi.h b/include/linux/efi.h index cf7e431cbc73..af5be0368dec 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -942,6 +942,7 @@ extern int __init efi_setup_pcdp_console(char *); #define EFI_64BIT 5 /* Is the firmware 64-bit? */ #define EFI_PARAVIRT 6 /* Access is via a paravirt interface */ #define EFI_ARCH_1 7 /* First arch-specific bit */ +#define EFI_DBG 8 /* Print additional debug info at runtime */ #ifdef CONFIG_EFI /* -- cgit v1.2.3 From ec776ef6bbe1734c29cd6bd05219cd93b2731bd4 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 1 Apr 2015 09:12:18 +0200 Subject: x86/mm: Add support for the non-standard protected e820 type Various recent BIOSes support NVDIMMs or ADR using a non-standard e820 memory type, and Intel supplied reference Linux code using this type to various vendors. Wire this e820 table type up to export platform devices for the pmem driver so that we can use it in Linux. Based on earlier work from: Dave Jiang Dan Williams Includes fixes for NUMA regions from Boaz Harrosh. Tested-by: Ross Zwisler Signed-off-by: Christoph Hellwig Acked-by: Dan Williams Cc: Andrew Morton Cc: Andy Lutomirski Cc: Boaz Harrosh Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Jens Axboe Cc: Jens Axboe Cc: Keith Busch Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Thomas Gleixner Cc: linux-nvdimm@ml01.01.org Link: http://lkml.kernel.org/r/1427872339-6688-2-git-send-email-hch@lst.de [ Minor cleanups. ] Signed-off-by: Ingo Molnar --- Documentation/kernel-parameters.txt | 6 +++++ arch/x86/Kconfig | 10 +++++++ arch/x86/include/uapi/asm/e820.h | 10 +++++++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/e820.c | 26 +++++++++++++----- arch/x86/kernel/pmem.c | 53 +++++++++++++++++++++++++++++++++++++ 6 files changed, 100 insertions(+), 6 deletions(-) create mode 100644 arch/x86/kernel/pmem.c (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index bfcb1a62a7b4..c87122dd790f 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1965,6 +1965,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. or memmap=0x10000$0x18690000 + memmap=nn[KMG]!ss[KMG] + [KNL,X86] Mark specific memory as protected. + Region of memory to be used, from ss to ss+nn. + The memory region may be marked as e820 type 12 (0xc) + and is NVDIMM or ADR memory. + memory_corruption_check=0/1 [X86] Some BIOSes seem to corrupt the first 64k of memory when doing things like suspend/resume. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index b7d31ca55187..9e3bcd6f4a48 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1430,6 +1430,16 @@ config ILLEGAL_POINTER_VALUE source "mm/Kconfig" +config X86_PMEM_LEGACY + bool "Support non-standard NVDIMMs and ADR protected memory" + help + Treat memory marked using the non-standard e820 type of 12 as used + by the Intel Sandy Bridge-EP reference BIOS as protected memory. + The kernel will offer these regions to the 'pmem' driver so + they can be used for persistent storage. + + Say Y if unsure. + config HIGHPTE bool "Allocate 3rd-level pagetables from highmem" depends on HIGHMEM diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h index d993e33f5236..960a8a9dc4ab 100644 --- a/arch/x86/include/uapi/asm/e820.h +++ b/arch/x86/include/uapi/asm/e820.h @@ -33,6 +33,16 @@ #define E820_NVS 4 #define E820_UNUSABLE 5 +/* + * This is a non-standardized way to represent ADR or NVDIMM regions that + * persist over a reboot. The kernel will ignore their special capabilities + * unless the CONFIG_X86_PMEM_LEGACY=y option is set. + * + * ( Note that older platforms also used 6 for the same type of memory, + * but newer versions switched to 12 as 6 was assigned differently. Some + * time they will learn... ) + */ +#define E820_PRAM 12 /* * reserved RAM used by kernel itself diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index cdb1b70ddad0..971f18cd9ca0 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -94,6 +94,7 @@ obj-$(CONFIG_KVM_GUEST) += kvm.o kvmclock.o obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o obj-$(CONFIG_PARAVIRT_CLOCK) += pvclock.o +obj-$(CONFIG_X86_PMEM_LEGACY) += pmem.o obj-$(CONFIG_PCSPKR_PLATFORM) += pcspeaker.o diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 46201deee923..11cc7d54ec3f 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -149,6 +149,9 @@ static void __init e820_print_type(u32 type) case E820_UNUSABLE: printk(KERN_CONT "unusable"); break; + case E820_PRAM: + printk(KERN_CONT "persistent (type %u)", type); + break; default: printk(KERN_CONT "type %u", type); break; @@ -343,7 +346,7 @@ int __init sanitize_e820_map(struct e820entry *biosmap, int max_nr_map, * continue building up new bios map based on this * information */ - if (current_type != last_type) { + if (current_type != last_type || current_type == E820_PRAM) { if (last_type != 0) { new_bios[new_bios_entry].size = change_point[chgidx]->addr - last_addr; @@ -688,6 +691,7 @@ void __init e820_mark_nosave_regions(unsigned long limit_pfn) register_nosave_region(pfn, PFN_UP(ei->addr)); pfn = PFN_DOWN(ei->addr + ei->size); + if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) register_nosave_region(PFN_UP(ei->addr), pfn); @@ -748,7 +752,7 @@ u64 __init early_reserve_e820(u64 size, u64 align) /* * Find the highest page frame number we have available */ -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type) +static unsigned long __init e820_end_pfn(unsigned long limit_pfn) { int i; unsigned long last_pfn = 0; @@ -759,7 +763,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type) unsigned long start_pfn; unsigned long end_pfn; - if (ei->type != type) + /* + * Persistent memory is accounted as ram for purposes of + * establishing max_pfn and mem_map. + */ + if (ei->type != E820_RAM && ei->type != E820_PRAM) continue; start_pfn = ei->addr >> PAGE_SHIFT; @@ -784,12 +792,12 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type) } unsigned long __init e820_end_of_ram_pfn(void) { - return e820_end_pfn(MAX_ARCH_PFN, E820_RAM); + return e820_end_pfn(MAX_ARCH_PFN); } unsigned long __init e820_end_of_low_ram_pfn(void) { - return e820_end_pfn(1UL<<(32 - PAGE_SHIFT), E820_RAM); + return e820_end_pfn(1UL << (32-PAGE_SHIFT)); } static void early_panic(char *msg) @@ -866,6 +874,9 @@ static int __init parse_memmap_one(char *p) } else if (*p == '$') { start_at = memparse(p+1, &p); e820_add_region(start_at, mem_size, E820_RESERVED); + } else if (*p == '!') { + start_at = memparse(p+1, &p); + e820_add_region(start_at, mem_size, E820_PRAM); } else e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1); @@ -907,6 +918,7 @@ static inline const char *e820_type_to_string(int e820_type) case E820_ACPI: return "ACPI Tables"; case E820_NVS: return "ACPI Non-volatile Storage"; case E820_UNUSABLE: return "Unusable memory"; + case E820_PRAM: return "Persistent RAM"; default: return "reserved"; } } @@ -940,7 +952,9 @@ void __init e820_reserve_resources(void) * pci device BAR resource and insert them later in * pcibios_resource_survey() */ - if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) { + if (((e820.map[i].type != E820_RESERVED) && + (e820.map[i].type != E820_PRAM)) || + res->start < (1ULL<<20)) { res->flags |= IORESOURCE_BUSY; insert_resource(&iomem_resource, res); } diff --git a/arch/x86/kernel/pmem.c b/arch/x86/kernel/pmem.c new file mode 100644 index 000000000000..3420c874ddc5 --- /dev/null +++ b/arch/x86/kernel/pmem.c @@ -0,0 +1,53 @@ +/* + * Copyright (c) 2015, Christoph Hellwig. + */ +#include +#include +#include +#include +#include +#include + +static __init void register_pmem_device(struct resource *res) +{ + struct platform_device *pdev; + int error; + + pdev = platform_device_alloc("pmem", PLATFORM_DEVID_AUTO); + if (!pdev) + return; + + error = platform_device_add_resources(pdev, res, 1); + if (error) + goto out_put_pdev; + + error = platform_device_add(pdev); + if (error) + goto out_put_pdev; + return; + +out_put_pdev: + dev_warn(&pdev->dev, "failed to add 'pmem' (persistent memory) device!\n"); + platform_device_put(pdev); +} + +static __init int register_pmem_devices(void) +{ + int i; + + for (i = 0; i < e820.nr_map; i++) { + struct e820entry *ei = &e820.map[i]; + + if (ei->type == E820_PRAM) { + struct resource res = { + .flags = IORESOURCE_MEM, + .start = ei->addr, + .end = ei->addr + ei->size - 1, + }; + register_pmem_device(&res); + } + } + + return 0; +} +device_initcall(register_pmem_devices); -- cgit v1.2.3 From f29ba61d0abad6ad0beed5237cd337b5e25adad7 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 27 Mar 2015 16:15:18 +0100 Subject: Documentation/kernel-parameters: Move "eagerfpu" to its right place We're at least trying to be alphabetically sorted. So move "eagerfpu=" in the vicinity of where it belongs at least. Signed-off-by: Borislav Petkov Signed-off-by: Jonathan Corbet --- Documentation/kernel-parameters.txt | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index bfcb1a62a7b4..155d736c1210 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -928,6 +928,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Enable debug messages at boot time. See Documentation/dynamic-debug-howto.txt for details. + eagerfpu= [X86] + on enable eager fpu restore + off disable eager fpu restore + auto selects the default scheme, which automatically + enables eagerfpu restore for xsaveopt. + early_ioremap_debug [KNL] Enable debug messages in early_ioremap support. This is useful for tracking down temporary early mappings @@ -2340,12 +2346,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter, xsave area per process might occupy more memory on xsaves enabled systems. - eagerfpu= [X86] - on enable eager fpu restore - off disable eager fpu restore - auto selects the default scheme, which automatically - enables eagerfpu restore for xsaveopt. - nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or wfi(ARM) instruction doesn't work correctly and not to use it. This is also useful when using JTAG debugger. -- cgit v1.2.3 From fab43ef4c8b825d8cbe05e8f70c9138de1951980 Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Fri, 3 Apr 2015 23:23:34 +0100 Subject: DOC: kernel-parameters.txt: Mark `nofpu' for MIPS too The MIPS port has supported this option since forever, long before SH was even in plans. Signed-off-by: Maciej W. Rozycki Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/9665/ Signed-off-by: Ralf Baechle --- Documentation/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index bfcb1a62a7b4..f23384402ee4 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2315,7 +2315,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. noexec32=off: disable non-executable mappings read implies executable mappings - nofpu [SH] Disable hardware FPU at boot time. + nofpu [MIPS,SH] Disable hardware FPU at boot time. nofxsr [BUGS=X86-32] Disables x86 floating point extended register save and restore. The kernel will only save -- cgit v1.2.3 From 195daf665a6299de98a4da3843fed2dd9de19d3a Mon Sep 17 00:00:00 2001 From: Ulrich Obergfell Date: Tue, 14 Apr 2015 15:44:13 -0700 Subject: watchdog: enable the new user interface of the watchdog mechanism With the current user interface of the watchdog mechanism it is only possible to disable or enable both lockup detectors at the same time. This series introduces new kernel parameters and changes the semantics of some existing kernel parameters, so that the hard lockup detector and the soft lockup detector can be disabled or enabled individually. With this series applied, the user interface is as follows. - parameters in /proc/sys/kernel . soft_watchdog This is a new parameter to control and examine the run state of the soft lockup detector. . nmi_watchdog The semantics of this parameter have changed. It can now be used to control and examine the run state of the hard lockup detector. . watchdog This parameter is still available to control the run state of both lockup detectors at the same time. If this parameter is examined, it shows the logical OR of soft_watchdog and nmi_watchdog. . watchdog_thresh The semantics of this parameter are not affected by the patch. - kernel command line parameters . nosoftlockup The semantics of this parameter have changed. It can now be used to disable the soft lockup detector at boot time. . nmi_watchdog=0 or nmi_watchdog=1 Disable or enable the hard lockup detector at boot time. The patch introduces '=1' as a new option. . nowatchdog The semantics of this parameter are not affected by the patch. It is still available to disable both lockup detectors at boot time. Also, remove the proc_dowatchdog() function which is no longer needed. [dzickus@redhat.com: wrote changelog] [dzickus@redhat.com: update documentation for kernel params and sysctl] Signed-off-by: Ulrich Obergfell Signed-off-by: Don Zickus Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/kernel-parameters.txt | 6 ++- Documentation/sysctl/kernel.txt | 62 +++++++++++++++++++++++----- include/linux/nmi.h | 2 - kernel/sysctl.c | 35 +++++++++++----- kernel/watchdog.c | 81 ++++++++----------------------------- 5 files changed, 97 insertions(+), 89 deletions(-) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 01aa47d3b6ab..71eecb263250 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2236,8 +2236,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels Format: [panic,][nopanic,][num] - Valid num: 0 + Valid num: 0 or 1 0 - turn nmi_watchdog off + 1 - turn nmi_watchdog on When panic is specified, panic when an NMI watchdog timeout occurs (or 'nopanic' to override the opposite default). @@ -2464,7 +2465,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nousb [USB] Disable the USB subsystem - nowatchdog [KNL] Disable the lockup detector (NMI watchdog). + nowatchdog [KNL] Disable both lockup detectors, i.e. + soft-lockup and NMI watchdog (hard-lockup). nowb [ARM] diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 83ab25660fc9..99d7eb3a1416 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -77,12 +77,14 @@ show up in /proc/sys/kernel: - shmmax [ sysv ipc ] - shmmni - softlockup_all_cpu_backtrace +- soft_watchdog - stop-a [ SPARC only ] - sysrq ==> Documentation/sysrq.txt - sysctl_writes_strict - tainted - threads-max - unknown_nmi_panic +- watchdog - watchdog_thresh - version @@ -417,16 +419,23 @@ successful IPC object allocation. nmi_watchdog: -Enables/Disables the NMI watchdog on x86 systems. When the value is -non-zero the NMI watchdog is enabled and will continuously test all -online cpus to determine whether or not they are still functioning -properly. Currently, passing "nmi_watchdog=" parameter at boot time is -required for this function to work. +This parameter can be used to control the NMI watchdog +(i.e. the hard lockup detector) on x86 systems. -If LAPIC NMI watchdog method is in use (nmi_watchdog=2 kernel -parameter), the NMI watchdog shares registers with oprofile. By -disabling the NMI watchdog, oprofile may have more registers to -utilize. + 0 - disable the hard lockup detector + 1 - enable the hard lockup detector + +The hard lockup detector monitors each CPU for its ability to respond to +timer interrupts. The mechanism utilizes CPU performance counter registers +that are programmed to generate Non-Maskable Interrupts (NMIs) periodically +while a CPU is busy. Hence, the alternative name 'NMI watchdog'. + +The NMI watchdog is disabled by default if the kernel is running as a guest +in a KVM virtual machine. This default can be overridden by adding + + nmi_watchdog=1 + +to the guest kernel command line (see Documentation/kernel-parameters.txt). ============================================================== @@ -816,6 +825,22 @@ NMI. ============================================================== +soft_watchdog + +This parameter can be used to control the soft lockup detector. + + 0 - disable the soft lockup detector + 1 - enable the soft lockup detector + +The soft lockup detector monitors CPUs for threads that are hogging the CPUs +without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads +from running. The mechanism depends on the CPUs ability to respond to timer +interrupts which are needed for the 'watchdog/N' threads to be woken up by +the watchdog timer function, otherwise the NMI watchdog - if enabled - can +detect a hard lockup condition. + +============================================================== + tainted: Non-zero if the kernel has been tainted. Numeric values, which @@ -858,6 +883,25 @@ example. If a system hangs up, try pressing the NMI switch. ============================================================== +watchdog: + +This parameter can be used to disable or enable the soft lockup detector +_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time. + + 0 - disable both lockup detectors + 1 - enable both lockup detectors + +The soft lockup detector and the NMI watchdog can also be disabled or +enabled individually, using the soft_watchdog and nmi_watchdog parameters. +If the watchdog parameter is read, for example by executing + + cat /proc/sys/kernel/watchdog + +the output of this command (0 or 1) shows the logical OR of soft_watchdog +and nmi_watchdog. + +============================================================== + watchdog_thresh: This value can be used to control the frequency of hrtimer and NMI diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 5b5450585b8a..0426357297d5 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -82,8 +82,6 @@ extern int proc_soft_watchdog(struct ctl_table *, int , void __user *, size_t *, loff_t *); extern int proc_watchdog_thresh(struct ctl_table *, int , void __user *, size_t *, loff_t *); -extern int proc_dowatchdog(struct ctl_table *, int , - void __user *, size_t *, loff_t *); #endif #ifdef CONFIG_HAVE_ACPI_APEI_NMI diff --git a/kernel/sysctl.c b/kernel/sysctl.c index ce410bb9f2e1..245e7dcc3741 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -846,7 +846,7 @@ static struct ctl_table kern_table[] = { .data = &watchdog_user_enabled, .maxlen = sizeof (int), .mode = 0644, - .proc_handler = proc_dowatchdog, + .proc_handler = proc_watchdog, .extra1 = &zero, .extra2 = &one, }, @@ -855,10 +855,32 @@ static struct ctl_table kern_table[] = { .data = &watchdog_thresh, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dowatchdog, + .proc_handler = proc_watchdog_thresh, .extra1 = &zero, .extra2 = &sixty, }, + { + .procname = "nmi_watchdog", + .data = &nmi_watchdog_enabled, + .maxlen = sizeof (int), + .mode = 0644, + .proc_handler = proc_nmi_watchdog, + .extra1 = &zero, +#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) + .extra2 = &one, +#else + .extra2 = &zero, +#endif + }, + { + .procname = "soft_watchdog", + .data = &soft_watchdog_enabled, + .maxlen = sizeof (int), + .mode = 0644, + .proc_handler = proc_soft_watchdog, + .extra1 = &zero, + .extra2 = &one, + }, { .procname = "softlockup_panic", .data = &softlockup_panic, @@ -879,15 +901,6 @@ static struct ctl_table kern_table[] = { .extra2 = &one, }, #endif /* CONFIG_SMP */ - { - .procname = "nmi_watchdog", - .data = &watchdog_user_enabled, - .maxlen = sizeof (int), - .mode = 0644, - .proc_handler = proc_dowatchdog, - .extra1 = &zero, - .extra2 = &one, - }, #endif #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) { diff --git a/kernel/watchdog.c b/kernel/watchdog.c index fd2b6dc14486..63d702885686 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -110,15 +110,9 @@ static int __init hardlockup_panic_setup(char *str) else if (!strncmp(str, "nopanic", 7)) hardlockup_panic = 0; else if (!strncmp(str, "0", 1)) - watchdog_user_enabled = 0; - else if (!strncmp(str, "1", 1) || !strncmp(str, "2", 1)) { - /* - * Setting 'nmi_watchdog=1' or 'nmi_watchdog=2' (legacy option) - * has the same effect. - */ - watchdog_user_enabled = 1; - watchdog_enable_hardlockup_detector(true); - } + watchdog_enabled &= ~NMI_WATCHDOG_ENABLED; + else if (!strncmp(str, "1", 1)) + watchdog_enabled |= NMI_WATCHDOG_ENABLED; return 1; } __setup("nmi_watchdog=", hardlockup_panic_setup); @@ -137,19 +131,18 @@ __setup("softlockup_panic=", softlockup_panic_setup); static int __init nowatchdog_setup(char *str) { - watchdog_user_enabled = 0; + watchdog_enabled = 0; return 1; } __setup("nowatchdog", nowatchdog_setup); -/* deprecated */ static int __init nosoftlockup_setup(char *str) { - watchdog_user_enabled = 0; + watchdog_enabled &= ~SOFT_WATCHDOG_ENABLED; return 1; } __setup("nosoftlockup", nosoftlockup_setup); -/* */ + #ifdef CONFIG_SMP static int __init softlockup_all_cpu_backtrace_setup(char *str) { @@ -264,10 +257,11 @@ static int is_softlockup(unsigned long touch_ts) { unsigned long now = get_timestamp(); - /* Warn about unreasonable delays: */ - if (time_after(now, touch_ts + get_softlockup_thresh())) - return now - touch_ts; - + if (watchdog_enabled & SOFT_WATCHDOG_ENABLED) { + /* Warn about unreasonable delays. */ + if (time_after(now, touch_ts + get_softlockup_thresh())) + return now - touch_ts; + } return 0; } @@ -532,6 +526,10 @@ static int watchdog_nmi_enable(unsigned int cpu) struct perf_event_attr *wd_attr; struct perf_event *event = per_cpu(watchdog_ev, cpu); + /* nothing to do if the hard lockup detector is disabled */ + if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED)) + goto out; + /* * Some kernels need to default hard lockup detection to * 'disabled', for example a guest on a hypervisor. @@ -856,59 +854,12 @@ out: mutex_unlock(&watchdog_proc_mutex); return err; } - -/* - * proc handler for /proc/sys/kernel/nmi_watchdog,watchdog_thresh - */ - -int proc_dowatchdog(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) -{ - int err, old_thresh, old_enabled; - bool old_hardlockup; - - mutex_lock(&watchdog_proc_mutex); - old_thresh = ACCESS_ONCE(watchdog_thresh); - old_enabled = ACCESS_ONCE(watchdog_user_enabled); - old_hardlockup = watchdog_hardlockup_detector_is_enabled(); - - err = proc_dointvec_minmax(table, write, buffer, lenp, ppos); - if (err || !write) - goto out; - - set_sample_period(); - /* - * Watchdog threads shouldn't be enabled if they are - * disabled. The 'watchdog_running' variable check in - * watchdog_*_all_cpus() function takes care of this. - */ - if (watchdog_user_enabled && watchdog_thresh) { - /* - * Prevent a change in watchdog_thresh accidentally overriding - * the enablement of the hardlockup detector. - */ - if (watchdog_user_enabled != old_enabled) - watchdog_enable_hardlockup_detector(true); - err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh); - } else - watchdog_disable_all_cpus(); - - /* Restore old values on failure */ - if (err) { - watchdog_thresh = old_thresh; - watchdog_user_enabled = old_enabled; - watchdog_enable_hardlockup_detector(old_hardlockup); - } -out: - mutex_unlock(&watchdog_proc_mutex); - return err; -} #endif /* CONFIG_SYSCTL */ void __init lockup_detector_init(void) { set_sample_period(); - if (watchdog_user_enabled) + if (watchdog_enabled) watchdog_enable_all_cpus(false); } -- cgit v1.2.3 From 0ddab1d2ed664c85c95488eef569786a84aedf37 Mon Sep 17 00:00:00 2001 From: Toshi Kani Date: Tue, 14 Apr 2015 15:47:20 -0700 Subject: lib/ioremap.c: add huge I/O map capability interfaces Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when I/O mappings with pud/pmd are enabled on the kernel. ioremap_huge_init() calls arch_ioremap_pud_supported() and arch_ioremap_pmd_supported() to initialize the capabilities at boot-time. A new kernel option "nohugeiomap" is also added, so that user can disable the huge I/O map capabilities when necessary. Signed-off-by: Toshi Kani Cc: "H. Peter Anvin" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Arnd Bergmann Cc: Dave Hansen Cc: Robert Elliott Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/kernel-parameters.txt | 2 ++ arch/Kconfig | 3 +++ include/linux/io.h | 8 ++++++++ init/main.c | 2 ++ lib/ioremap.c | 37 +++++++++++++++++++++++++++++++++++++ 5 files changed, 52 insertions(+) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 71eecb263250..b1fa70907ccf 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2323,6 +2323,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. register save and restore. The kernel will only save legacy floating-point registers on task switch. + nohugeiomap [KNL,x86] Disable kernel huge I/O mappings. + noxsave [BUGS=X86] Disables x86 extended register state save and restore using xsave. The kernel will fallback to enabling legacy floating-point and sse state. diff --git a/arch/Kconfig b/arch/Kconfig index a9c95d36ba70..c88c23f0a1da 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -446,6 +446,9 @@ config HAVE_IRQ_TIME_ACCOUNTING config HAVE_ARCH_TRANSPARENT_HUGEPAGE bool +config HAVE_ARCH_HUGE_VMAP + bool + config HAVE_ARCH_SOFT_DIRTY bool diff --git a/include/linux/io.h b/include/linux/io.h index fa02e55e5a2e..4cc299c598e0 100644 --- a/include/linux/io.h +++ b/include/linux/io.h @@ -38,6 +38,14 @@ static inline int ioremap_page_range(unsigned long addr, unsigned long end, } #endif +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP +void __init ioremap_huge_init(void); +int arch_ioremap_pud_supported(void); +int arch_ioremap_pmd_supported(void); +#else +static inline void ioremap_huge_init(void) { } +#endif + /* * Managed iomap interface */ diff --git a/init/main.c b/init/main.c index 4a6974e67839..f6dd8fe1f22c 100644 --- a/init/main.c +++ b/init/main.c @@ -80,6 +80,7 @@ #include #include #include +#include #include #include @@ -484,6 +485,7 @@ static void __init mm_init(void) percpu_init_late(); pgtable_init(); vmalloc_init(); + ioremap_huge_init(); } asmlinkage __visible void __init start_kernel(void) diff --git a/lib/ioremap.c b/lib/ioremap.c index 0c9216c48762..2008652c9a1f 100644 --- a/lib/ioremap.c +++ b/lib/ioremap.c @@ -13,6 +13,43 @@ #include #include +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP +int __read_mostly ioremap_pud_capable; +int __read_mostly ioremap_pmd_capable; +int __read_mostly ioremap_huge_disabled; + +static int __init set_nohugeiomap(char *str) +{ + ioremap_huge_disabled = 1; + return 0; +} +early_param("nohugeiomap", set_nohugeiomap); + +void __init ioremap_huge_init(void) +{ + if (!ioremap_huge_disabled) { + if (arch_ioremap_pud_supported()) + ioremap_pud_capable = 1; + if (arch_ioremap_pmd_supported()) + ioremap_pmd_capable = 1; + } +} + +static inline int ioremap_pud_enabled(void) +{ + return ioremap_pud_capable; +} + +static inline int ioremap_pmd_enabled(void) +{ + return ioremap_pmd_capable; +} + +#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */ +static inline int ioremap_pud_enabled(void) { return 0; } +static inline int ioremap_pmd_enabled(void) { return 0; } +#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + static int ioremap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, phys_addr_t phys_addr, pgprot_t prot) { -- cgit v1.2.3 From e4b0db72be2487bae0e3251c22f82c104f7c1cfd Mon Sep 17 00:00:00 2001 From: Vladimir Murzin Date: Tue, 14 Apr 2015 15:48:43 -0700 Subject: Documentation: update arch list in the 'memtest' entry Since arm64/arm support memtest command line option update the "memtest" entry. Signed-off-by: Vladimir Murzin Cc: "H. Peter Anvin" Cc: Catalin Marinas Cc: Ingo Molnar Cc: Mark Rutland Cc: Russell King Cc: Thomas Gleixner Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index b1fa70907ccf..8cc635f4c4f0 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1989,7 +1989,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted. seconds. Use this parameter to check at some other rate. 0 disables periodic checking. - memtest= [KNL,X86] Enable memtest + memtest= [KNL,X86,ARM] Enable memtest Format: default : 0 Specifies the number of memtest passes to be -- cgit v1.2.3