summaryrefslogtreecommitdiff
path: root/Documentation/kernel-parameters.txt
AgeCommit message (Collapse)AuthorFilesLines
2013-04-30Merge branch 'for-linus' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small code cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits) mm: Convert print_symbol to %pSR gfs2: Convert print_symbol to %pSR m32r: Convert print_symbol to %pSR iostats.txt: add easy-to-find description for field 6 x86 cmpxchg.h: fix wrong comment treewide: Fix typo in printk and comments doc: devicetree: Fix various typos docbook: fix 8250 naming in device-drivers pata_pdc2027x: Fix compiler warning treewide: Fix typo in printks mei: Fix comments in drivers/misc/mei treewide: Fix typos in kernel messages pm44xx: Fix comment for "CONFIG_CPU_IDLE" doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" mmzone: correct "pags" to "pages" in comment. kernel-parameters: remove outdated 'noresidual' parameter Remove spurious _H suffixes from ifdef comments sound: Remove stray pluses from Kconfig file radio-shark: Fix printk "CONFIG_LED_CLASS" doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE ...
2013-04-30Merge branch 'x86-debug-for-linus' of ↵Linus Torvalds1-2/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 debug update from Ingo Molnar: "Two small changes: a documentation update and a constification" * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, early-printk: Update earlyprintk documentation (and kill x86 copy) x86: Constify a few items
2013-04-30Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds1-11/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle are mostly related to preparatory work for the full-dynticks work: - Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take advantage of numbered callbacks, do callback accelerations based on numbered callbacks. Posted to LKML at https://lkml.org/lkml/2013/3/18/960 - RCU documentation updates. Posted to LKML at https://lkml.org/lkml/2013/3/18/570 - Miscellaneous fixes. Posted to LKML at https://lkml.org/lkml/2013/3/18/594" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) rcu: Make rcu_accelerate_cbs() note need for future grace periods rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp() rcu: Rename n_nocb_gp_requests to need_future_gp rcu: Push lock release to rcu_start_gp()'s callers rcu: Repurpose no-CBs event tracing to future-GP events rcu: Rearrange locking in rcu_start_gp() rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks rcu: Accelerate RCU callbacks at grace-period end rcu: Export RCU_FAST_NO_HZ parameters to sysfs rcu: Distinguish "rcuo" kthreads by RCU flavor rcu: Add event tracing for no-CBs CPUs' grace periods rcu: Add event tracing for no-CBs CPUs' callback registration rcu: Introduce proper blocking to no-CBs kthreads GP waits rcu: Provide compile-time control for no-CBs CPUs rcu: Tone down debugging during boot-up and shutdown. rcu: Add softirq-stall indications to stall-warning messages rcu: Documentation update rcu: Make bugginess of code sample more evident rcu: Fix hlist_bl_set_first_rcu() annotation rcu: Delete unused rcu_node "wakemask" field ...
2013-04-30Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-0/+9
Pull workqueue updates from Tejun Heo: "A lot of activities on workqueue side this time. The changes achieve the followings. - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are updated to be able to interface with multiple backend worker pools. This involved a lot of churning but the end result seems actually neater as unbound workqueues are now a lot closer to per-cpu ones. - The ability to interface with multiple backend worker pools are used to implement unbound workqueues with custom attributes. Currently the supported attributes are the nice level and CPU affinity. It may be expanded to include cgroup association in future. The attributes can be specified either by calling apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if the workqueue in question is exported through sysfs. The backend worker pools are keyed by the actual attributes and shared by any workqueues which share the same attributes. When attributes of a workqueue are changed, the workqueue binds to the worker pool with the specified attributes while leaving the work items which are already executing in its previous worker pools alone. This allows converting custom worker pool implementations which want worker attribute tuning to use workqueues. The writeback pool is already converted in block tree and there are a couple others are likely to follow including btrfs io workers. - WQ_UNBOUND's ability to bind to multiple worker pools is also used to make it NUMA-aware. Because there's no association between work item issuer and the specific worker assigned to execute it, before this change, using unbound workqueue led to unnecessary cross-node bouncing and it couldn't be helped by autonuma as it requires tasks to have implicit node affinity and workers are assigned randomly. After these changes, an unbound workqueue now binds to multiple NUMA-affine worker pools so that queued work items are executed in the same node. This is turned on by default but can be disabled system-wide or for individual workqueues. Crypto was requesting NUMA affinity as encrypting data across different nodes can contribute noticeable overhead and doing it per-cpu was too limiting for certain cases and IO throughput could be bottlenecked by one CPU being fully occupied while others have idle cycles. While the new features required a lot of changes including restructuring locking, it didn't complicate the execution paths much. The unbound workqueue handling is now closer to per-cpu ones and the new features are implemented by simply associating a workqueue with different sets of backend worker pools without changing queue, execution or flush paths. As such, even though the amount of change is very high, I feel relatively safe in that it isn't likely to cause subtle issues with basic correctness of work item execution and handling. If something is wrong, it's likely to show up as being associated with worker pools with the wrong attributes or OOPS while workqueue attributes are being changed or during CPU hotplug. While this creates more backend worker pools, it doesn't add too many more workers unless, of course, there are many workqueues with unique combinations of attributes. Assuming everything else is the same, NUMA awareness costs an extra worker pool per NUMA node with online CPUs. There are also a couple things which are being routed outside the workqueue tree. - block tree pulled in workqueue for-3.10 so that writeback worker pool can be converted to unbound workqueue with sysfs control exposed. This simplifies the code, makes writeback workers NUMA-aware and allows tuning nice level and CPU affinity via sysfs. - The conversion to workqueue means that there's no 1:1 association between a specific worker, which makes writeback folks unhappy as they want to be able to tell which filesystem caused a problem from backtrace on systems with many filesystems mounted. This is resolved by allowing work items to set debug info string which is printed when the task is dumped. As this change involves unifying implementations of dump_stack() and friends in arch codes, it's being routed through Andrew's -mm tree." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits) workqueue: use kmem_cache_free() instead of kfree() workqueue: avoid false negative WARN_ON() in destroy_workqueue() workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity workqueue: implement NUMA affinity for unbound workqueues workqueue: introduce put_pwq_unlocked() workqueue: introduce numa_pwq_tbl_install() workqueue: use NUMA-aware allocation for pool_workqueues workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() workqueue: map an unbound workqueues to multiple per-node pool_workqueues workqueue: move hot fields of workqueue_struct to the end workqueue: make workqueue->name[] fixed len workqueue: add workqueue->unbound_attrs workqueue: determine NUMA node of workers accourding to the allowed cpumask workqueue: drop 'H' from kworker names of unbound worker pools workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] workqueue: move pwq_pool_locking outside of get/put_unbound_pool() workqueue: fix memory leak in apply_workqueue_attrs() workqueue: fix unbound workqueue attrs hashing / comparison workqueue: fix race condition in unbound workqueue free path workqueue: remove pwq_lock which is no longer used ...
2013-04-30Merge tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linuxLinus Torvalds1-0/+8
Pull clock framework update from Michael Turquette: "The common clock framework changes for 3.10 include many fixes for existing platforms, as well as adoption of the framework by new platforms and devices. Some long-needed fixes to the core framework are here as well as new features such as improved initialization of clocks from DT as well as framework reentrancy for nested clock operations." * tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux: (44 commits) clk: add clk_ignore_unused option to keep boot clocks on clk: ux500: fix mismatched types clk: vexpress: Add separate SP810 driver clk: si5351: make clk-si5351 depend on CONFIG_OF clk: export __clk_get_flags for modular clock providers clk: vt8500: Missing breaks in vtwm_pll_round_rate/_set_rate. clk: sunxi: Unify oscillator clock clk: composite: allow fixed rates & fixed dividers clk: composite: rename 'div' references to 'rate' clk: add si5351 i2c common clock driver clk: add device tree fixed-factor-clock binding support clk: Properly handle notifier return values clk: ux500: abx500: Define clock tree for ab850x clk: ux500: Add support for sysctrl clocks clk: mvebu: Fix valid value range checking for cpu_freq_select clk: Fixup locking issues for clk_set_parent clk: Fixup errorhandling for clk_set_parent clk: Restructure code for __clk_reparent clk: sunxi: drop an unnecesary kmalloc clk: sunxi: drop CLK_IGNORE_UNUSED ...
2013-04-30Merge tag 'trace-3.10' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Along with the usual minor fixes and clean ups there are a few major changes with this pull request. 1) Multiple buffers for the ftrace facility This feature has been requested by many people over the last few years. I even heard that Google was about to implement it themselves. I finally had time and cleaned up the code such that you can now create multiple instances of the ftrace buffer and have different events go to different buffers. This way, a low frequency event will not be lost in the noise of a high frequency event. Note, currently only events can go to different buffers, the tracers (ie function, function_graph and the latency tracers) still can only be written to the main buffer. 2) The function tracer triggers have now been extended. The function tracer had two triggers. One to enable tracing when a function is hit, and one to disable tracing. Now you can record a stack trace on a single (or many) function(s), take a snapshot of the buffer (copy it to the snapshot buffer), and you can enable or disable an event to be traced when a function is hit. 3) A perf clock has been added. A "perf" clock can be chosen to be used when tracing. This will cause ftrace to use the same clock as perf uses, and hopefully this will make it easier to interleave the perf and ftrace data for analysis." * tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits) tracepoints: Prevent null probe from being added tracing: Compare to 1 instead of zero for is_signed_type() tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT ftrace: Get rid of ftrace_profile_bits tracing: Check return value of tracing_init_dentry() tracing: Get rid of unneeded key calculation in ftrace_hash_move() tracing: Reset ftrace_graph_filter_enabled if count is zero tracing: Fix off-by-one on allocating stat->pages kernel: tracing: Use strlcpy instead of strncpy tracing: Update debugfs README file tracing: Fix ftrace_dump() tracing: Rename trace_event_mutex to trace_event_sem tracing: Fix comment about prefix in arch_syscall_match_sym_name() tracing: Convert trace_destroy_fields() to static tracing: Move find_event_field() into trace_events.c tracing: Use TRACE_MAX_PRINT instead of constant tracing: Use pr_warn_once instead of open coded implementation ring-buffer: Add ring buffer startup selftest tracing: Bring Documentation/trace/ftrace.txt up to date tracing: Add "perf" trace_clock ... Conflicts: kernel/trace/ftrace.c kernel/trace/trace.c
2013-04-28clk: add clk_ignore_unused option to keep boot clocks onOlof Johansson1-0/+8
This is primarily useful when there's a driver that doesn't claim clocks properly, but the bootloader leaves them on. It's not expected to be used in normal cases, but for bringup and debug it's very useful to have the option to not gate unclaimed clocks that are still on. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Mike Turquette <mturquette@linaro.org> [mturquette@linaro.org: fixed up trivial merge issue]
2013-04-21Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds1-3/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kdump fixes from Peter Anvin: "The kexec/kdump people have found several problems with the support for loading over 4 GiB that was introduced in this merge cycle. This is partly due to a number of design problems inherent in the way the various pieces of kdump fit together (it is pretty horrifically manual in many places.) After a *lot* of iterations this is the patchset that was agreed upon, but of course it is now very late in the cycle. However, because it changes both the syntax and semantics of the crashkernel option, it would be desirable to avoid a stable release with the broken interfaces." I'm not happy with the timing, since originally the plan was to release the final 3.9 tomorrow. But apparently I'm doing an -rc8 instead... * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kexec: use Crash kernel for Crash kernel low x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low x86, kdump: Retore crashkernel= to allocate under 896M x86, kdump: Set crashkernel_low automatically
2013-04-17x86, kdump: Change crashkernel_high/low= to crashkernel=,high/lowYinghai Lu1-5/+5
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of crashkernel_hign=X crashkernel_low=Y. As that could be extensible. -v2: according to Vivek, change delimiter to ; -v3: let hign and low only handle simple form and it conforms to description in kernel-parameters.txt still keep crashkernel=X override any crashkernel=X,high crashkernel=Y,low -v4: update get_last_crashkernel returning and add more strict checking in parse_crashkernel_simple() found by HATAYAMA. -v5: Change delimiter back to , according to HPA. also separate parse_suffix from parse_simper according to vivek. so we can avoid @pos in that path. -v6: Tight the checking about crashkernel=X,highblahblah,high found by HTYAYAMA. Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17x86, kdump: Retore crashkernel= to allocate under 896MYinghai Lu1-2/+11
Vivek found old kexec-tools does not work new kernel anymore. So change back crashkernel= back to old behavoir, and add crashkernel_high= to let user decide if buffer could be above 4G, and also new kexec-tools will be needed. -v2: let crashkernel=X override crashkernel_high= update description about _high will be ignored by crashkernel=X -v3: update description about kernel-parameters.txt according to Vivek. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-4-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17x86, kdump: Set crashkernel_low automaticallyYinghai Lu1-3/+11
Chao said that kdump does does work well on his system on 3.8 without extra parameter, even iommu does not work with kdump. And now have to append crashkernel_low=Y in first kernel to make kdump work. We have now modified crashkernel=X to allocate memory beyong 4G (if available) and do not allocate low range for crashkernel if the user does not specify that with crashkernel_low=Y. This causes regression if iommu is not enabled. Without iommu, swiotlb needs to be setup in first 4G and there is no low memory available to second kernel. Set crashkernel_low automatically if the user does not specify that. For system that does support IOMMU with kdump properly, user could specify crashkernel_low=0 to save that 72M low ram. -v3: add swiotlb_size() according to Konrad. -v4: add comments what 8M is for according to hpa. also update more crashkernel_low= in kernel-parameters.txt -v5: update changelog according to Vivek. -v6: Change description about swiotlb referring according to HATAYAMA. Reported-by: WANG Chao <chaowang@redhat.com> Tested-by: WANG Chao <chaowang@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-2-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17x86,efi: Implement efi_no_storage_paranoia parameterRichard Weinberger1-0/+6
Using this parameter one can disable the storage_size/2 check if he is really sure that the UEFI does sane gc and fulfills the spec. This parameter is useful if a devices uses more than 50% of the storage by default. The Intel DQSW67 desktop board is such a sucker for exmaple. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-11x86, early-printk: Update earlyprintk documentation (and kill x86 copy)Dave Hansen1-2/+14
Documentation/kernel-parameters.txt and Documentation/x86/x86_64/boot-options.txt contain virtually identical text describing earlyprintk. This consolidates the two copies and updates the documentation a bit. No one ever documented the: earlyprintk=serial,0x1008,115200 syntax, nor mentioned that ARM is now a supported earlyprintk arch. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rob Landley <rob@landley.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20130410210338.E2930E98@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-01workqueue: update sysfs interface to reflect NUMA awareness and a kernel ↵Tejun Heo1-0/+9
param to disable NUMA affinity Unbound workqueues are now NUMA aware. Let's add some control knobs and update sysfs interface accordingly. * Add kernel param workqueue.numa_disable which disables NUMA affinity globally. * Replace sysfs file "pool_id" with "pool_ids" which contain node:pool_id pairs. This change is userland-visible but "pool_id" hasn't seen a release yet, so this is okay. * Add a new sysf files "numa" which can toggle NUMA affinity on individual workqueues. This is implemented as attrs->no_numa whichn is special in that it isn't part of a pool's attributes. It only affects how apply_workqueue_attrs() picks which pools to use. After "pool_ids" change, first_pwq() doesn't have any user left. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-27kernel-parameters: remove outdated 'noresidual' parameterPaul Bolle1-2/+0
The PPC specific kernel parameter 'noresidual' was removed in v2.6.27, though commit 917f0af9e5a9ceecf9e72537fabb501254ba321d ("powerpc: Remove arch/ppc and include/asm-ppc"). Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-03-27doc: put proper reference to CONFIG_MODULE_SIG_ENFORCEPaul Bolle1-1/+1
Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-03-26rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacksPaul E. McKenney1-9/+19
Because RCU callbacks are now associated with the number of the grace period that they must wait for, CPUs can now take advance callbacks corresponding to grace periods that ended while a given CPU was in dyntick-idle mode. This eliminates the need to try forcing the RCU state machine while entering idle, thus reducing the CPU intensiveness of RCU_FAST_NO_HZ, which should increase its energy efficiency. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-03-26rcu: Distinguish "rcuo" kthreads by RCU flavorPaul E. McKenney1-2/+5
Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by the CPU number, for example, "rcuo". This is problematic given that there are either two or three RCU flavors, each of which gets a per-CPU kthread with exactly the same name. This commit therefore introduces a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh, 'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have "rcuob/0", "rcuop/0", and "rcuos/0". Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
2013-03-15tracing: Add alloc_snapshot kernel command line parameterSteven Rostedt (Red Hat)1-0/+7
If debugging the kernel, and the developer wants to use tracing_snapshot() in places where tracing_snapshot_alloc() may be difficult (or more likely, the developer is lazy and doesn't want to bother with tracing_snapshot_alloc() at all), then adding alloc_snapshot to the kernel command line parameter will tell ftrace to allocate the snapshot buffer (if configured) when it allocates the main tracing buffer. I also noticed that ring_buffer_expanded and tracing_selftest_disabled had inconsistent use of boolean "true" and "false" with "0" and "1". I cleaned that up too. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-04Merge tag 'metag-v3.9-rc1-v4' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag Pull new ImgTec Meta architecture from James Hogan: "This adds core architecture support for Imagination's Meta processor cores, followed by some later miscellaneous arch/metag cleanups and fixes which I kept separate to ease review: - Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture - A few fixes all over, particularly for symbol prefixes - A few privilege protection fixes - Several cleanups (setup.c includes, split out a lot of metag_ksyms.c) - Fix some missing exports - Convert hugetlb to use vm_unmapped_area() - Copy device tree to non-init memory - Provide dma_get_sgtable()" * tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits) metag: Provide dma_get_sgtable() metag: prom.h: remove declaration of metag_dt_memblock_reserve() metag: copy devicetree to non-init memory metag: cleanup metag_ksyms.c includes metag: move mm/init.c exports out of metag_ksyms.c metag: move usercopy.c exports out of metag_ksyms.c metag: move setup.c exports out of metag_ksyms.c metag: move kick.c exports out of metag_ksyms.c metag: move traps.c exports out of metag_ksyms.c metag: move irq enable out of irqflags.h on SMP genksyms: fix metag symbol prefix on crc symbols metag: hugetlb: convert to vm_unmapped_area() metag: export clear_page and copy_page metag: export metag_code_cache_flush_all metag: protect more non-MMU memory regions metag: make TXPRIVEXT bits explicit metag: kernel/setup.c: sort includes perf: Enable building perf tools for Meta metag: add boot time LNKGET/LNKSET check metag: add __init to metag_cache_probe() ...
2013-03-03metag: Basic documentationJames Hogan1-0/+4
Add basic metag documentation. This includes an outline description of the ABIs (including syscall ABI) and calling conventions, similar to the one in Documentation/frv/. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Rob Landley <rob@landley.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: linux-doc@vger.kernel.org
2013-03-02x86, ACPI, mm: Revert movablemem_map supportYinghai Lu1-36/+0
Tim found: WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80() Hardware name: S2600CP sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. smpboot: Booting Node 1, Processors #1 Modules linked in: Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1 Call Trace: set_cpu_sibling_map+0x279/0x449 start_secondary+0x11d/0x1e5 Don Morris reproduced on a HP z620 workstation, and bisected it to commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is ready") It turns out movable_map has some problems, and it breaks several things 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(&numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy. and make fall back path working. 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that.... c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes critical x86 code. It caused x86 guys did not pay attention to find the problem early. Those patches really should be routed via tip/x86/mm. 4. after that commit, following range can not use movable ram: a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed? b. initrd... it will be freed after booting, so it could be on movable... c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G anymore. d. init_mem_mapping: can not put page table high anymore. e. initmem_init: vmemmap can not be high local node anymore. That is not good. If node is hotplugable, the mem related range like page table and vmemmap could be on the that node without problem and should be on that node. We have workaround patch that could fix some problems, but some can not be fixed. So just remove that offending commit and related ones including: f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region().") 01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from SRAT") 27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to the end of node") e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is ready") fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map") 42f47e27e761 ("page_alloc: make movablemem_map have higher priority") 6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes") 34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter") 4d59a75125d5 ("x86: get pg_data_t's memory from other node") Later we should have patches that will make sure kernel put page table and vmemmap on local node ram instead of push them down to node0. Also need to find way to put other kernel used ram to local node ram. Reported-by: Tim Gardner <tim.gardner@canonical.com> Reported-by: Don Morris <don.morris@hp.com> Bisected-by: Don Morris <don.morris@hp.com> Tested-by: Don Morris <don.morris@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 fixes from Peter Anvin: "Additional x86 fixes. Three of these patches are pure documentation, two are pretty trivial; the remaining one fixes boot problems on some non-BIOS machines." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Make sure we can boot in the case the BDA contains pure garbage x86, efi: Mark disable_runtime as __initdata x86, doc: Fix incorrect comment about 64-bit code segment descriptors doc, kernel-parameters: Document 'console=hvc<n>' doc, xen: Mention 'earlyprintk=xen' in the documentation. ACPI: Overriding ACPI tables via initrd only works with an initrd and on X86
2013-02-26Merge tag 'pci-v3.9-changes' of ↵Linus Torvalds1-0/+21
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug - Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu) - Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu) - Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu) - Stop caching _PRT and make independent of bus numbers (Yinghai Lu) PCI device hotplug - Clean up cpqphp dead code (Sasha Levin) - Disable ARI unless device and upstream bridge support it (Yijing Wang) - Initialize all hot-added devices (not functions 0-7) (Yijing Wang) Power management - Don't touch ASPM if disabled (Joe Lawrence) - Fix ASPM link state management (Myron Stowe) Miscellaneous - Fix PCI_EXP_FLAGS accessor (Alex Williamson) - Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov) - Document hotplug resource and MPS parameters (Yijing Wang) - Add accessor for PCIe capabilities (Myron Stowe) - Drop pciehp suspend/resume messages (Paul Bolle) - Make pci_slot built-in only (not a module) (Jiang Liu) - Remove unused PCI/ACPI bind ops (Jiang Liu) - Removed used pci_root_bus (Bjorn Helgaas)" * tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (51 commits) PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers PCI: Fix PCI Express Capability accessors for PCI_EXP_FLAGS ACPI / PCI: Make pci_slot built-in only, not a module PCI/PM: Clear state_saved during suspend PCI: Use atomic_inc_return() rather than atomic_add_return() PCI: Catch attempts to disable already-disabled devices PCI: Disable Bus Master unconditionally in pci_device_shutdown() PCI: acpiphp: Remove dead code for PCI host bridge hotplug PCI: acpiphp: Create companion ACPI devices before creating PCI devices PCI: Remove unused "rc" in virtfn_add_bus() PCI: pciehp: Drop suspend/resume ENTRY messages PCI/ASPM: Don't touch ASPM if forcibly disabled PCI/ASPM: Deallocate upstream link state even if device is not PCIe PCI: Document MPS parameters pci=pcie_bus_safe, pci=pcie_bus_perf, etc PCI: Document hpiosize= and hpmemsize= resource reservation parameters PCI: Use PCI Express Capability accessor PCI: Introduce accessor to retrieve PCIe Capabilities Register PCI: Put pci_dev in device tree as early as possible PCI: Skip attaching driver in device_add() PCI: acpiphp: Keep driver loaded even if no slots found ...
2013-02-26doc, kernel-parameters: Document 'console=hvc<n>'Konrad Rzeszutek Wilk1-0/+2
Both the PowerPC hypervisor and Xen hypervisor can utilize the hvc driver. Cc: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361825650-14031-3-git-send-email-konrad.wilk@oracle.com Cc: <stable@vger.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-26doc, xen: Mention 'earlyprintk=xen' in the documentation.Konrad Rzeszutek Wilk1-0/+3
The earlyprintk for Xen PV guests utilizes a simple hypercall (console_io) to provide output to Xen emergency console. Note that the Xen hypervisor should be booted with 'loglevel=all' to output said information. Reported-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1361825650-14031-2-git-send-email-konrad.wilk@oracle.com Cc: <stable@vger.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-02-24acpi, memory-hotplug: support getting hotplug info from SRATTang Chen1-5/+24
We now provide an option for users who don't want to specify physical memory address in kernel commandline. /* * For movablemem_map=acpi: * * SRAT: |_____| |_____| |_________| |_________| ...... * node id: 0 1 1 2 * hotpluggable: n y y n * movablemem_map: |_____| |_________| * * Using movablemem_map, we can prevent memblock from allocating memory * on ZONE_MOVABLE at boot time. */ So user just specify movablemem_map=acpi, and the kernel will use hotpluggable info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE. If all the memory ranges in SRAT is hotpluggable, then no memory can be used by kernel. But before parsing SRAT, memblock has already reserve some memory ranges for other purposes, such as for kernel image, and so on. We cannot prevent kernel from using these memory. So we need to exclude these ranges even if these memory is hotpluggable. Furthermore, there could be several memory ranges in the single node which the kernel resides in. We may skip one range that have memory reserved by memblock, but if the rest of memory is too small, then the kernel will fail to boot. So, make the whole node which the kernel resides in un-hotpluggable. Then the kernel has enough memory to use. NOTE: Using this way will cause NUMA performance down because the whole node will be set as ZONE_MOVABLE, and kernel cannot use memory on it. If users don't want to lose NUMA performance, just don't use it. [akpm@linux-foundation.org: fix warning] [akpm@linux-foundation.org: use strcmp()] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24page_alloc: add movable_memmap kernel parameterTang Chen1-0/+17
Add functions to parse movablemem_map boot option. Since the option could be specified more then once, all the maps will be stored in the global variable movablemem_map.map array. And also, we keep the array in monotonic increasing order by start_pfn. And merge all overlapped ranges. [akpm@linux-foundation.org: improve comment] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: remove unneeded parens] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Tested-by: Lin Feng <linfeng@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The *really* big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...
2013-02-19Merge branch 'release' of ↵Rafael J. Wysocki1-10/+1
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (35 commits) PM idle: remove global declaration of pm_idle unicore32 idle: delete stray pm_idle comment openrisc idle: delete pm_idle mn10300 idle: delete pm_idle microblaze idle: delete pm_idle m32r idle: delete pm_idle, and other dead idle code ia64 idle: delete pm_idle cris idle: delete idle and pm_idle ARM64 idle: delete pm_idle ARM idle: delete pm_idle blackfin idle: delete pm_idle sparc idle: rename pm_idle to sparc_idle sh idle: rename global pm_idle to static sh_idle x86 idle: rename global pm_idle to static x86_idle APM idle: register apm_cpu_idle via cpuidle tools/power turbostat: display SMI count by default intel_idle: export both C1 and C1E cpuidle: remove vestage definition of cpuidle_state_usage.driver_data x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag x86 idle: remove mwait_idle() and "idle=mwait" cmdline param ... Conflicts: arch/x86/kernel/process.c (with PM / tracing commit 43720bd) drivers/acpi/processor_idle.c (with ACPICA commit 4f84291)
2013-02-17Merge branch 'pm-cpufreq'Rafael J. Wysocki1-0/+5
* pm-cpufreq: cpufreq / intel_pstate: Add kernel command line option disable intel_pstate. cpufreq / intel_pstate: Change to disallow module build
2013-02-16cpufreq / intel_pstate: Add kernel command line option disable intel_pstate.Dirk Brandewie1-0/+5
When intel_pstate is configured into the kernel it will become the preferred scaling driver for processors that it supports. Allow the user to override this by adding: intel_pstate=disable on the kernel command line. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-02-10x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flagLen Brown1-4/+0
Remove 32-bit x86 a cmdline param "no-hlt", and the cpuinfo_x86.hlt_works_ok that it sets. If a user wants to avoid HLT, then "idle=poll" is much more useful, as it avoids invocation of HLT in idle, while "no-hlt" failed to do so. Indeed, hlt_works_ok was consulted in only 3 places. First, in /proc/cpuinfo where "hlt_bug yes" would be printed if and only if the user booted the system with "no-hlt" -- as there was no other code to set that flag. Second, check_hlt() would not invoke halt() if "no-hlt" were on the cmdline. Third, it was consulted in stop_this_cpu(), which is invoked by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() -- all cases where the machine is being shutdown/reset. The flag was not consulted in the more frequently invoked play_dead()/hlt_play_dead() used in processor offline and suspend. Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations indicating that it would be removed in 2012. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
2013-02-10x86 idle: remove mwait_idle() and "idle=mwait" cmdline paramLen Brown1-6/+1
mwait_idle() is a C1-only idle loop intended to be more efficient than HLT, starting on Pentium-4 HT-enabled processors. But mwait_idle() has been replaced by the more general mwait_idle_with_hints(), which handles both C1 and deeper C-states. ACPI processor_idle and intel_idle use only mwait_idle_with_hints(), and no longer use mwait_idle(). Here we simplify the x86 native idle code by removing mwait_idle(), and the "idle=mwait" bootparam used to invoke it. Since Linux 3.0 there has been a boot-time warning when "idle=mwait" was invoked saying it would be removed in 2012. This removal was also noted in the (now removed:-) feature-removal-schedule.txt. After this change, kernels configured with (CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware that supports MWAIT will simply use HLT. If MWAIT is desired on those systems, cpuidle and the cpuidle drivers above can be enabled. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
2013-01-31PCI: Document MPS parameters pci=pcie_bus_safe, pci=pcie_bus_perf, etcYijing Wang1-0/+15
Document PCIe bus MPS parameters pcie_bus_tune_off, pcie_bus_safe, pcie_bus_peer2peer, pcie_bus_perf. These parameters were introduced by Jon Mason <jdmason@kudzu.us> at commit 5f39e6705 and commit b03e7495a8. [bhelgaas: mention hot-add for pcie_bus_peer2peer] Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-31PCI: Document hpiosize= and hpmemsize= resource reservation parametersYijing Wang1-0/+6
Document PCI hotplug resource reservation parameters hpiosize and hpmemsize. These parameters override default hotplug I/O size (256 bytes) and default mem size (2M). Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-30x86: Add Crash kernel low reservationYinghai Lu1-0/+3
During kdump kernel's booting stage, it need to find low ram for swiotlb buffer when system does not support intel iommu/dmar remapping. kexed-tools is appending memmap=exactmap and range from /proc/iomem with "Crash kernel", and that range is above 4G for 64bit after boot protocol 2.12. We need to add another range in /proc/iomem like "Crash kernel low", so kexec-tools could find that info and append to kdump kernel command line. Try to reserve some under 4G if the normal "Crash kernel" is above 4G. User could specify the size with crashkernel_low=XX[KMG]. -v2: fix warning that is found by Fengguang's test robot. -v3: move out get_mem_size change to another patch, to solve compiling warning that is found by Borislav Petkov <bp@alien8.de> -v4: user must specify crashkernel_low if system does not support intel or amd iommu. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org Cc: Eric Biederman <ebiederm@xmission.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-09rcu: Make rcu_nocb_poll an early_param instead of module_paramPaul Gortmaker1-1/+1
The as-documented rcu_nocb_poll will fail to enable this feature for two reasons. (1) there is an extra "s" in the documented name which is not in the code, and (2) since it uses module_param, it really is expecting a prefix, akin to "rcutree.fanout_leaf" and the prefix isn't documented. However, there are several reasons why we might not want to simply fix the typo and add the prefix: 1) we'd end up with rcutree.rcu_nocb_poll, and rather probably make a change to rcutree.nocb_poll 2) if we did #1, then the prefix wouldn't be consistent with the rcu_nocbs=<cpumap> parameter (i.e. one with, one without prefix) 3) the use of module_param in a header file is less than desired, since it isn't immediately obvious that it will get processed via rcutree.c and get the prefix from that (although use of module_param_named() could clarify that.) 4) the implied export of /sys/module/rcutree/parameters/rcu_nocb_poll data to userspace via module_param() doesn't really buy us anything, as it is read-only and we can tell if it is enabled already without it, since there is a printk at early boot telling us so. In light of all that, just change it from a module_param() to an early_setup() call, and worry about adding it to /sys later on if we decide to allow a dynamic setting of it. Also change the variable to be tagged as read_mostly, since it will only ever be fiddled with at most, once at boot. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-12-21Documentation: kernel-parameters.txt remove capability.disableJosh Boyer1-6/+0
Remove the documentation for capability.disable. The code supporting this parameter was removed with commit 5915eb53861c ("security: remove dummy module") Signed-off-by: Josh Boyer <jwboyer@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Rob Landley <rob@landley.net> Cc: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18Documentation/kernel-parameters.txt: update mem= option's spec according to ↵Wen Congyang1-3/+4
its implementation Current mem= implementation seems buggy because the specification and implementation don't match. The current mem= has been working for many years and it's not buggy - it works as expected. So we should update the specification. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17Merge tag 'balancenuma-v11' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma Pull Automatic NUMA Balancing bare-bones from Mel Gorman: "There are three implementations for NUMA balancing, this tree (balancenuma), numacore which has been developed in tip/master and autonuma which is in aa.git. In almost all respects balancenuma is the dumbest of the three because its main impact is on the VM side with no attempt to be smart about scheduling. In the interest of getting the ball rolling, it would be desirable to see this much merged for 3.8 with the view to building scheduler smarts on top and adapting the VM where required for 3.9. The most recent set of comparisons available from different people are mel: https://lkml.org/lkml/2012/12/9/108 mingo: https://lkml.org/lkml/2012/12/7/331 tglx: https://lkml.org/lkml/2012/12/10/437 srikar: https://lkml.org/lkml/2012/12/10/397 The results are a mixed bag. In my own tests, balancenuma does reasonably well. It's dumb as rocks and does not regress against mainline. On the other hand, Ingo's tests shows that balancenuma is incapable of converging for this workloads driven by perf which is bad but is potentially explained by the lack of scheduler smarts. Thomas' results show balancenuma improves on mainline but falls far short of numacore or autonuma. Srikar's results indicate we all suffer on a large machine with imbalanced node sizes. My own testing showed that recent numacore results have improved dramatically, particularly in the last week but not universally. We've butted heads heavily on system CPU usage and high levels of migration even when it shows that overall performance is better. There are also cases where it regresses. Of interest is that for specjbb in some configurations it will regress for lower numbers of warehouses and show gains for higher numbers which is not reported by the tool by default and sometimes missed in treports. Recently I reported for numacore that the JVM was crashing with NullPointerExceptions but currently it's unclear what the source of this problem is. Initially I thought it was in how numacore batch handles PTEs but I'm no longer think this is the case. It's possible numacore is just able to trigger it due to higher rates of migration. These reports were quite late in the cycle so I/we would like to start with this tree as it contains much of the code we can agree on and has not changed significantly over the last 2-3 weeks." * tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits) mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable mm/rmap: Convert the struct anon_vma::mutex to an rwsem mm: migrate: Account a transhuge page properly when rate limiting mm: numa: Account for failed allocations and isolations as migration failures mm: numa: Add THP migration for the NUMA working set scanning fault case build fix mm: numa: Add THP migration for the NUMA working set scanning fault case. mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG mm: sched: numa: Control enabling and disabling of NUMA balancing mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships mm: numa: migrate: Set last_nid on newly allocated page mm: numa: split_huge_page: Transfer last_nid on tail page mm: numa: Introduce last_nid to the page frame sched: numa: Slowly increase the scanning period as NUMA faults are handled mm: numa: Rate limit setting of pte_numa if node is saturated mm: numa: Rate limit the amount of memory that is migrated between nodes mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting mm: numa: Migrate pages handled during a pmd_numa hinting fault mm: numa: Migrate on reference policy ...
2012-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-0/+18
Pull networking changes from David Miller: 1) Allow to dump, monitor, and change the bridge multicast database using netlink. From Cong Wang. 2) RFC 5961 TCP blind data injection attack mitigation, from Eric Dumazet. 3) Networking user namespace support from Eric W. Biederman. 4) tuntap/virtio-net multiqueue support by Jason Wang. 5) Support for checksum offload of encapsulated packets (basically, tunneled traffic can still be checksummed by HW). From Joseph Gasparakis. 6) Allow BPF filter access to VLAN tags, from Eric Dumazet and Daniel Borkmann. 7) Bridge port parameters over netlink and BPDU blocking support from Stephen Hemminger. 8) Improve data access patterns during inet socket demux by rearranging socket layout, from Eric Dumazet. 9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and Jon Maloy. 10) Update TCP socket hash sizing to be more in line with current day realities. The existing heurstics were choosen a decade ago. From Eric Dumazet. 11) Fix races, queue bloat, and excessive wakeups in ATM and associated drivers, from Krzysztof Mazur and David Woodhouse. 12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions in VXLAN driver, from David Stevens. 13) Add "oops_only" mode to netconsole, from Amerigo Wang. 14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also allow DCB netlink to work on namespaces other than the initial namespace. From John Fastabend. 15) Support PTP in the Tigon3 driver, from Matt Carlson. 16) tun/vhost zero copy fixes and improvements, plus turn it on by default, from Michael S. Tsirkin. 17) Support per-association statistics in SCTP, from Michele Baldessari. And many, many, driver updates, cleanups, and improvements. Too numerous to mention individually. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) net/mlx4_en: Add support for destination MAC in steering rules net/mlx4_en: Use generic etherdevice.h functions. net: ethtool: Add destination MAC address to flow steering API bridge: add support of adding and deleting mdb entries bridge: notify mdb changes via netlink ndisc: Unexport ndisc_{build,send}_skb(). uapi: add missing netconf.h to export list pkt_sched: avoid requeues if possible solos-pci: fix double-free of TX skb in DMA mode bnx2: Fix accidental reversions. bna: Driver Version Updated to 3.1.2.1 bna: Firmware update bna: Add RX State bna: Rx Page Based Allocation bna: TX Intr Coalescing Fix bna: Tx and Rx Optimizations bna: Code Cleanup and Enhancements ath9k: check pdata variable before dereferencing it ath5k: RX timestamp is reported at end of frame ath9k_htc: RX timestamp is reported at end of frame ...
2012-12-12Merge branch 'x86-timers-for-linus' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer update from Ingo Molnar: "This tree includes HPET fixes and also implements a calibration-free, TSC match driven APIC timer interrupt mode: 'TSC deadline mode' supported in SandyBridge and later CPUs." * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: hpet: Fix inverted return value check in arch_setup_hpet_msi() x86: hpet: Fix masking of MSI interrupts x86: apic: Use tsc deadline for oneshot when available
2012-12-12Merge branch 'x86-bsp-hotplug-for-linus' of ↵Linus Torvalds1-0/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 BSP hotplug changes from Ingo Molnar: "This tree enables CPU#0 (the boot processor) to be onlined/offlined on x86, just like any other CPU. Enabled on Intel CPUs for now. Allowing this required the identification and fixing of latent CPU#0 assumptions (such as CPU#0 initializations, etc.) in the x86 architecture code, plus the identification of barriers to BSP-offlining, such as active PIC interrupts which can only be serviced on the BSP. It's behind a default-off option, and there's a debug option that allows the automatic testing of this feature. The motivation of this feature is to allow and prepare for true CPU-hotplug hardware support: recent changes to MCE support enable us to detect a deteriorating but not yet hard-failing L1/L2 cache on a CPU that could be soft-unplugged - or a failing L3 cache on a multi-socket system. Note that true hardware hot-plug is not yet fully enabled by this, because that requires a special platform wakeup sequence to be sent to the freshly powered up CPU#0. Future patches for this are planned, once such a platform exists. Chicken and egg" * 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, topology: Debug CPU0 hotplug x86/i387.c: Initialize thread xstate only on CPU0 only once x86, hotplug: Handle retrigger irq by the first available CPU x86, hotplug: The first online processor saves the MTRR state x86, hotplug: During CPU0 online, enable x2apic, set_numa_node. x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI x86-32, hotplug: Add start_cpu0() entry point to head_32.S x86-64, hotplug: Add start_cpu0() entry point to head_64.S kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback x86, hotplug, suspend: Online CPU0 for suspend or hibernate x86, hotplug: Support functions for CPU0 online/offline x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it x86, Kconfig: Add config switch for CPU0 hotplug doc: Add x86 CPU0 online/offline feature
2012-12-12Merge branch 'perf-core-for-linus' of ↵Linus Torvalds1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Lots of activity: 211 files changed, 8328 insertions(+), 4116 deletions(-) most of it on the tooling side. Main changes: * ftrace enhancements and fixes from Steve Rostedt. * uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov. * UAPI fixes, from David Howels - prepares the arch/x86 UAPI transition * Separate perf tests into multiple objects, one per test, from Jiri Olsa. * Make hardware event translations available in sysfs, from Jiri Olsa. * Fixes to /proc/pid/maps parsing, preparatory to supporting data maps, from Namhyung Kim * Implement ui_progress for GTK, from Namhyung Kim * Add framework for automated perf_event_attr tests, where tools with different command line options will be run from a 'perf test', via python glue, and the perf syscall will be intercepted to verify that the perf_event_attr fields set by the tool are those expected, from Jiri Olsa * Add a 'link' method for hists, so that we can have the leader with buckets for all the entries in all the hists. This new method is now used in the default 'diff' output, making the sum of the 'baseline' column be 100%, eliminating blind spots. * libtraceevent fixes for compiler warnings trying to make perf it build on some distros, like fedora 14, 32-bit, some of the warnings really pointed to real bugs. * Add a browser for 'perf script' and make it available from the report and annotate browsers. It does filtering to find the scripts that handle events found in the perf.data file used. From Feng Tang * perf inject changes to allow showing where a task sleeps, from Andrew Vagin. * Makefile improvements from Namhyung Kim. * Add --pre and --post command hooks in 'stat', from Peter Zijlstra. * Don't stop synthesizing threads when one vanishes, this is for the existing threads when we start a tool like trace. * Use sched:sched_stat_runtime to provide a thread summary, this produces the same output as the 'trace summary' subcommand of tglx's original "trace" tool. * Support interrupted syscalls in 'trace' * Add an event duration column and filter in 'trace'. * There are references to the man pages in some tools, so try to build Documentation when installing, warning the user if that is not possible, from Borislav Petkov. * Give user better message if precise is not supported, from David Ahern. * Try to find cross-built objdump path by using the session environment information in the perf.data file header, from Irina Tirdea, original patch and idea by Namhyung Kim. * Diplays more output on features check for make V=1, so that one can figure out what is happening by looking at gcc output, etc. From Jiri Olsa. * Add on_exit implementation for systems without one, e.g. Android, from Bernhard Rosenkraenzer. * Only process events for vcpus of interest, helps handling large number of events, from David Ahern. * Cross compilation fixes for Android, from Irina Tirdea. * Add documentation on compiling for Android, from Irina Tirdea. * perf diff improvements from Jiri Olsa. * Target (task/user/cpu/syswide) handling improvements, from Namhyung Kim. * Add support in 'trace' for tracing workload given by command line, from Namhyung Kim. * ... and much more." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits) uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race perf evsel: Introduce is_group_member method perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing perf ui: Always compile browser setup code perf ui: Add ui_progress__finish() perf ui gtk: Implement ui_progress functions perf ui: Introduce generic ui_progress helper perf ui tui: Move progress.c under ui/tui directory perf tools: Add basic event modifier sanity check perf tools: Omit group members from perf_evlist__disable/enable perf tools: Ensure single disable call per event in record comand perf tools: Fix 'disabled' attribute config for record command perf tools: Fix attributes for '{}' defined event groups perf tools: Use sscanf for parsing /proc/pid/maps perf tools: Add gtk.<command> config option for launching GTK browser perf tools: Fix compile error on NO_NEWT=1 build perf hists: Initialize all of he->stat with zeroes ...
2012-12-11mm: sched: numa: Control enabling and disabling of NUMA balancingMel Gorman1-0/+3
This patch adds Kconfig options and kernel parameters to allow the enabling and disabling of automatic NUMA balancing. The existance of such a switch was and is very important when debugging problems related to transparent hugepages and we should have the same for automatic NUMA placement. Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-11-16rcu: Add callback-free CPUsPaul E. McKenney1-0/+21
RCU callback execution can add significant OS jitter and also can degrade both scheduling latency and, in asymmetric multiprocessors, energy efficiency. This commit therefore adds the ability for selected CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded to kthreads. If the "rcu_nocb_poll" boot parameter is also specified, these kthreads will do polling, removing the need for the offloaded CPUs to do wakeups. At least one CPU must be doing normal callback processing: currently CPU 0 cannot be selected as a no-CBs CPU. In addition, attempts to offline the last normal-CBs CPU will fail. This feature was inspired by Jim Houston's and Joe Korty's JRCU, and this commit includes fixes to problems located by Fengguang Wu's kbuild test robot. [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ] Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-15can: grcan: Add device driver for GRCAN and GRHCAN coresAndreas Larsson1-0/+18
This driver supports GRCAN and CRHCAN CAN controllers available in the GRLIB VHDL IP core library. Signed-off-by: Andreas Larsson <andreas@gaisler.com> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-11-14doc: Add x86 CPU0 online/offline featureFenghua Yu1-0/+14
If CONFIG_BOOTPARAM_HOTPLUG_CPU0 is turned on, CPU0 is hotpluggable. Otherwise, by default CPU0 is not hotpluggable and kernel parameter cpu0_hotplug enables CPU0 online/offline feature. The documentations point out two known CPU0 dependencies. First, resume from hibernate or suspend always starts from CPU0. So hibernate and suspend are prevented if CPU0 is offline. Another dependency is PIC interrupts always go to CPU0. It's said that some machines may depend on CPU0 to poweroff/reboot. But I haven't seen such dependency on a few tested machines. Please let me know if you see any CPU0 dependencies on your machine. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1352835171-3958-2-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-02tracing: Add trace_options kernel command line parameterSteven Rostedt1-0/+16
Add trace_options to the kernel command line parameter to be able to set options at early boot. For example, to enable stack dumps of events, add the following: trace_options=stacktrace This along with the trace_event option, you can get not only traces of the events but also the stack dumps with them. Requested-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>