kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2013-04-30	Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	Linus Torvalds	3	-22/+55
	Pull workqueue updates from Tejun Heo: "A lot of activities on workqueue side this time. The changes achieve the followings. - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are updated to be able to interface with multiple backend worker pools. This involved a lot of churning but the end result seems actually neater as unbound workqueues are now a lot closer to per-cpu ones. - The ability to interface with multiple backend worker pools are used to implement unbound workqueues with custom attributes. Currently the supported attributes are the nice level and CPU affinity. It may be expanded to include cgroup association in future. The attributes can be specified either by calling apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if the workqueue in question is exported through sysfs. The backend worker pools are keyed by the actual attributes and shared by any workqueues which share the same attributes. When attributes of a workqueue are changed, the workqueue binds to the worker pool with the specified attributes while leaving the work items which are already executing in its previous worker pools alone. This allows converting custom worker pool implementations which want worker attribute tuning to use workqueues. The writeback pool is already converted in block tree and there are a couple others are likely to follow including btrfs io workers. - WQ_UNBOUND's ability to bind to multiple worker pools is also used to make it NUMA-aware. Because there's no association between work item issuer and the specific worker assigned to execute it, before this change, using unbound workqueue led to unnecessary cross-node bouncing and it couldn't be helped by autonuma as it requires tasks to have implicit node affinity and workers are assigned randomly. After these changes, an unbound workqueue now binds to multiple NUMA-affine worker pools so that queued work items are executed in the same node. This is turned on by default but can be disabled system-wide or for individual workqueues. Crypto was requesting NUMA affinity as encrypting data across different nodes can contribute noticeable overhead and doing it per-cpu was too limiting for certain cases and IO throughput could be bottlenecked by one CPU being fully occupied while others have idle cycles. While the new features required a lot of changes including restructuring locking, it didn't complicate the execution paths much. The unbound workqueue handling is now closer to per-cpu ones and the new features are implemented by simply associating a workqueue with different sets of backend worker pools without changing queue, execution or flush paths. As such, even though the amount of change is very high, I feel relatively safe in that it isn't likely to cause subtle issues with basic correctness of work item execution and handling. If something is wrong, it's likely to show up as being associated with worker pools with the wrong attributes or OOPS while workqueue attributes are being changed or during CPU hotplug. While this creates more backend worker pools, it doesn't add too many more workers unless, of course, there are many workqueues with unique combinations of attributes. Assuming everything else is the same, NUMA awareness costs an extra worker pool per NUMA node with online CPUs. There are also a couple things which are being routed outside the workqueue tree. - block tree pulled in workqueue for-3.10 so that writeback worker pool can be converted to unbound workqueue with sysfs control exposed. This simplifies the code, makes writeback workers NUMA-aware and allows tuning nice level and CPU affinity via sysfs. - The conversion to workqueue means that there's no 1:1 association between a specific worker, which makes writeback folks unhappy as they want to be able to tell which filesystem caused a problem from backtrace on systems with many filesystems mounted. This is resolved by allowing work items to set debug info string which is printed when the task is dumped. As this change involves unifying implementations of dump_stack() and friends in arch codes, it's being routed through Andrew's -mm tree." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits) workqueue: use kmem_cache_free() instead of kfree() workqueue: avoid false negative WARN_ON() in destroy_workqueue() workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity workqueue: implement NUMA affinity for unbound workqueues workqueue: introduce put_pwq_unlocked() workqueue: introduce numa_pwq_tbl_install() workqueue: use NUMA-aware allocation for pool_workqueues workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() workqueue: map an unbound workqueues to multiple per-node pool_workqueues workqueue: move hot fields of workqueue_struct to the end workqueue: make workqueue->name[] fixed len workqueue: add workqueue->unbound_attrs workqueue: determine NUMA node of workers accourding to the allowed cpumask workqueue: drop 'H' from kworker names of unbound worker pools workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] workqueue: move pwq_pool_locking outside of get/put_unbound_pool() workqueue: fix memory leak in apply_workqueue_attrs() workqueue: fix unbound workqueue attrs hashing / comparison workqueue: fix race condition in unbound workqueue free path workqueue: remove pwq_lock which is no longer used ...
2013-04-30	Merge branch 'akpm' (incoming from Andrew)	Linus Torvalds	3	-28/+54
	Merge first batch of fixes from Andrew Morton: - A couple of kthread changes - A few minor audit patches - A number of fbdev patches. Florian remains AWOL so I'm picking up some of these. - A few kbuild things - ocfs2 updates - Almost all of the MM queue (And in the meantime, I already have the second big batch from Andrew pending in my mailbox ;^) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (149 commits) memcg: take reference before releasing rcu_read_lock mem hotunplug: fix kfree() of bootmem memory mmKconfig: add an option to disable bounce mm, nobootmem: do memset() after memblock_reserve() mm, nobootmem: clean-up of free_low_memory_core_early() fs/buffer.c: remove unnecessary init operation after allocating buffer_head. numa, cpu hotplug: change links of CPU and node when changing node number by onlining CPU mm: fix memory_hotplug.c printk format warning mm: swap: mark swap pages writeback before queueing for direct IO swap: redirty page if page write fails on swap file mm, memcg: give exiting processes access to memory reserves thp: fix huge zero page logic for page with pfn == 0 memcg: avoid accessing memcg after releasing reference fs: fix fsync() error reporting memblock: fix missing comment of memblock_insert_region() mm: Remove unused parameter of pages_correctly_reserved() firmware, memmap: fix firmware_map_entry leak mm/vmstat: add note on safety of drain_zonestat mm: thp: add split tail pages to shrink page list in page reclaim mm: allow for outstanding swap writeback accounting ...
2013-04-30	Merge tag 'regmap-v3.10' of ↵	Linus Torvalds	7	-118/+413
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "In user visible terms just a couple of enhancements here, though there was a moderate amount of refactoring required in order to support the register cache sync performance improvements. - Support for block and asynchronous I/O during register cache syncing; this provides a use case dependant performance improvement. - Additional debugfs information on the memory consuption and register set" * tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (23 commits) regmap: don't corrupt work buffer in _regmap_raw_write() regmap: cache: Fix format specifier in dev_dbg regmap: cache: Make regcache_sync_block_raw static regmap: cache: Write consecutive registers in a single block write regmap: cache: Split raw and non-raw syncs regmap: cache: Factor out block sync regmap: cache: Factor out reg_present support from rbtree cache regmap: cache: Use raw I/O to sync rbtrees if we can regmap: core: Provide regmap_can_raw_write() operation regmap: cache: Provide a get address of value operation regmap: Cut down on the average # of nodes in the rbtree cache regmap: core: Make raw write available to regcache regmap: core: Warn on invalid operation combinations regmap: irq: Clarify error message when we fail to request primary IRQ regmap: rbtree Expose total memory consumption in the rbtree debugfs entry regmap: debugfs: Add a registers `range' file regmap: debugfs: Simplify calculation of `c->max_reg' regmap: cache: Store caches in native register format where possible regmap: core: Split out in place value parsing regmap: cache: Use regcache_get_value() to check if we updated ...
2013-04-30	numa, cpu hotplug: change links of CPU and node when changing node number by ↵	Yasuaki Ishimatsu	1	-2/+23
	onlining CPU When booting x86 system contains memoryless node, node numbers of CPUs on memoryless node were changed to nearest online node number by init_cpu_to_node() because the node is not online. In my system, node numbers of cpu#30-44 and 75-89 were changed from 2 to 0 as follows: $ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 node 0 size: 32394 MB node 0 free: 27898 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 node 1 size: 32768 MB node 1 free: 30335 MB If we hot add memory to memoryless node and offine/online all CPUs on the node, node numbers of these CPUs are changed to correct node numbers by srat_detect_node() because the node become online. In this case, node numbers of cpu#30-44 and 75-89 were changed from 0 to 2 in my system as follows: $ numactl --hardware available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 node 0 size: 32394 MB node 0 free: 27218 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 node 1 size: 32768 MB node 1 free: 30014 MB node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 node 2 size: 16384 MB node 2 free: 16384 MB But "cpu to node" and "node to cpu" links were not changed as follows: $ ls /sys/devices/system/cpu/cpu30/\|grep node node0 $ ls /sys/devices/system/node/node0/\|grep cpu30 cpu30 "numactl --hardware" shows that cpu30 belongs to node 2. But sysfs links does not change. This patch changes "cpu to node" and "node to cpu" links when node number changed by onlining CPU. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30	mm: Remove unused parameter of pages_correctly_reserved()	Tang Chen	1	-3/+2
	nr_pages is not used in pages_correctly_reserved(). So remove it. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30	mm, hotplug: avoid compiling memory hotremove functions when disabled	David Rientjes	1	-21/+23
	__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC pseries will return -EOPNOTSUPP if unsupported. Adding an #ifdef causes several other functions it depends on to also become unnecessary, which saves in .text when disabled (it's disabled in most defconfigs besides powerpc, including x86). remove_memory_block() becomes static since it is not referenced outside of drivers/base/memory.c. Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled and disabled. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30	drivers/base/node.c: switch to register_hotmemory_notifier()	Andrew Morton	1	-2/+6
	Squishes a warning which my change to hotplug_memory_notifier() added. I want to keep that warning, because it is punishment for failnig to check the hotplug_memory_notifier() return value. Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-16	Merge remote-tracking branch 'regmap/topic/range' into regmap-next	Mark Brown	2	-7/+88

2013-04-16	Merge remote-tracking branch 'regmap/topic/irq' into regmap-next	Mark Brown	1	-1/+2

2013-04-16	Merge remote-tracking branch 'regmap/topic/cache' into regmap-next	Mark Brown	5	-110/+315

2013-04-16	Merge remote-tracking branch 'regmap/topic/async' into regmap-next	Mark Brown	1	-0/+8

2013-04-16	regmap: don't corrupt work buffer in _regmap_raw_write()	Stephen Warren	1	-2/+1
	_regmap_raw_write() contains code to call regcache_write() to write values to the cache. That code calls memcpy() to copy the value data to the start of the work_buf. However, at least when _regmap_raw_write() is called from _regmap_bus_raw_write(), the value data is in the work_buf, and this memcpy() operation may over-write part of that value data, depending on the value of reg_bytes + pad_bytes. At least when using reg_bytes==1 and pad_bytes==0, corruption of the value data does occur. To solve this, remove the memcpy() operation, and modify the subsequent .parse_val() call to parse the original value buffer directly. At least in the case of 8-bit register address and 16-bit values, and writes of single registers at a time, this memcpy-then-parse combination used to cancel each-other out; for a work-buffer containing xx 89 03, the memcpy changed it to 89 03 03, and the parse_val changed it back to 89 89 03, thus leaving the value uncorrupted. This appears completely accidental though. Since commit 8a819ff "regmap: core: Split out in place value parsing", .parse_val only returns the parsed value, and does not modify the buffer, and hence does not (accidentally) undo the corruption caused by memcpy(). This caused bogus values to get written to HW, thus preventing e.g. audio playback on systems with a WM8903 CODEC. This patch fixes that. Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-04-16	Merge tag 'v3.9-rc7' into regmap-cache	Mark Brown	7	-105/+166
	Linux 3.9-rc7
2013-04-15	Merge 3.9-rc7 into driver-core-next	Greg Kroah-Hartman	3	-16/+52
	Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-11	PM / Runtime: Idle devices asynchronously after probe\|release	Ulf Hansson	1	-3/+3
	Putting devices into idle\|suspend in a synchronous manner means we are waiting for each device to become idle\|suspended before the probe\|release is fully done. This patch switch to use the asynchronous runtime PM API:s instead and thus improves the parallelism since we can move on and handle the next device in queue in an earlier phase. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Alan Stern <stern@rowland.harvard.edu> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-11	driver core: handle user namespaces properly with the uid/gid devtmpfs change	Greg Kroah-Hartman	2	-16/+16
	Now that devtmpfs is caring about uid/gid, we need to use the correct internal types so users who have USER_NS enabled will have things work properly for them. Thanks to Eric for pointing this out, and the patch review. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-10	driver core: devtmpfs: fix compile failure with CONFIG_UIDGID_STRICT_TYPE_CHECKS	Ming Lei	1	-2/+2
	If CONFIG_UIDGID_STRICT_TYPE_CHECKS is enalbed, the below compile failure will be triggered: drivers/base/devtmpfs.c: In function 'handle_create': drivers/base/devtmpfs.c:214:19: error: incompatible types when assigning to type 'kuid_t' from type 'uid_t' drivers/base/devtmpfs.c:215:19: error: incompatible types when assigning to type 'kgid_t' from type 'gid_t' make[2]: *** [drivers/base/devtmpfs.o] Error 1 This patch fixes the compile failure. Signed-off-by: Ming Lei <ming.lei@canonical.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-10	devtmpfs: add base.h include	Greg Kroah-Hartman	1	-0/+1
	This fixes a sparse warning, and is a good idea given that the devtmpfs_init() prototype is in this file. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-09	regmap: Back out work buffer fix	Mark Brown	1	-1/+2
	This reverts commit bc8ce4 (regmap: don't corrupt work buffer in _regmap_raw_write()) since it turns out that it can cause issues when taken in isolation from the other changes in -next that lead to its discovery. On the basis that nobody noticed the problems for quite some time without that subsequent work let's drop it from v3.9. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-04-08	driver core: add uid and gid to devtmpfs	Kay Sievers	2	-14/+30
	Some drivers want to tell userspace what uid and gid should be used for their device nodes, so allow that information to percolate through the driver core to userspace in order to make this happen. This means that some systems (i.e. Android and friends) will not need to even run a udev-like daemon for their device node manager and can just rely in devtmpfs fully, reducing their footprint even more. Signed-off-by: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05	regmap: cache: Fix format specifier in dev_dbg	Stratos Karafotis	1	-1/+1
	Fix format specifier in dev_dbg and suppress the following warning drivers/base/regmap/regcache.c: In function ‘regcache_sync_block_raw_flush’: drivers/base/regmap/regcache.c:593:2: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘size_t’ [-Wformat] Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-04-05	regmap: cache: Make regcache_sync_block_raw static	Sachin Kamat	1	-1/+1
	regcache_sync_block_raw is used only in this file. Hence make it static. Silences the following warning: drivers/base/regmap/regcache.c:608:5: warning: symbol 'regcache_sync_block_raw' was not declared. Should it be static? Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-04-05	Merge tag 'pm+acpi-3.9-rc6' of ↵	Linus Torvalds	1	-13/+47
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: - Revert of a recent cpuidle change that caused Nehalem machines to hang on boot from Alex Shi. - USB power management fix addressing a crash in the port device object's release routine from Rafael J Wysocki. - Device PM QoS fix for a potential deadlock related to sysfs interface from Rafael J Wysocki. - Fix for a cpufreq crash when the /cpus Device Tree node is missing from Paolo Pisati. - Fix for a build issue on ia64 related to the Boot Graphics Resource Table (BGRT) from Tony Luck. - Two fixes for ACPI handles being set incorrectly for device objects that don't correspond to any ACPI namespace nodes in the I2C and SPI subsystems from Rafael J Wysocki. - Fix for compiler warnings related to CONFIG_PM_DEVFREQ being unset from Rajagopal Venkat. - Fix for a symbol definition typo in cpufreq_governor.h from Borislav Petkov. * tag 'pm+acpi-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / BGRT: Don't let users configure BGRT on non X86 systems cpuidle / ACPI: recover percpu ACPI processor cstate ACPI / I2C: Use parent's ACPI_HANDLE() in acpi_i2c_register_devices() cpufreq: Correct header guards typo ACPI / SPI: Use parent's ACPI_HANDLE() in acpi_register_spi_devices() cpufreq: check OF node /cpus presence before dereferencing it PM / devfreq: Fix compiler warnings for CONFIG_PM_DEVFREQ unset PM / QoS: Avoid possible deadlock related to sysfs access USB / PM: Don't try to hide PM QoS flags from usb_port_device_release()
2013-04-03	sysfs: fix crash_notes_size build warning	Arnd Bergmann	1	-1/+1
	commit eca4549f57 "sysfs: Add crash_notes_size to export percpu note size" adds a printk that outputs a size_t value as %lu when it should be %zu, resulting in this warning. drivers/base/cpu.c: In function 'show_crash_notes_size': drivers/base/cpu.c:142:2: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=] Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-02	Merge tag 'v3.9-rc5' into wq/for-3.10	Tejun Heo	5	-102/+127
	Writeback conversion to workqueue will be based on top of wq/for-3.10 branch to take advantage of custom attrs and NUMA support for unbound workqueues. Mainline currently contains two commits which result in non-trivial merge conflicts with wq/for-3.10 and because block/for-3.10/core is based on v3.9-rc3 which contains one of the conflicting commits, we need a pre-merge-window merge anyway. Let's pull v3.9-rc5 into wq/for-3.10 so that the block tree doesn't suffer from workqueue merge conflicts. The two conflicts and their resolutions: * e68035fb65 ("workqueue: convert to idr_alloc()") in mainline changes worker_pool_assign_id() to use idr_alloc() instead of the old idr interface. worker_pool_assign_id() goes through multiple locking changes in wq/for-3.10 causing the following conflict. static int worker_pool_assign_id(struct worker_pool pool) { int ret; <<<<<<< HEAD lockdep_assert_held(&wq_pool_mutex); do { if (!idr_pre_get(&worker_pool_idr, GFP_KERNEL)) return -ENOMEM; ret = idr_get_new(&worker_pool_idr, pool, &pool->id); } while (ret == -EAGAIN); ======= mutex_lock(&worker_pool_idr_mutex); ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL); if (ret >= 0) pool->id = ret; mutex_unlock(&worker_pool_idr_mutex); >>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89 return ret < 0 ? ret : 0; } We want locking from the former and idr_alloc() usage from the latter, which can be combined to the following. static int worker_pool_assign_id(struct worker_pool pool) { int ret; lockdep_assert_held(&wq_pool_mutex); ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL); if (ret >= 0) { pool->id = ret; return 0; } return ret; } * eb2834285c ("workqueue: fix possible pool stall bug in wq_unbind_fn()") updated wq_unbind_fn() such that it has single larger for_each_std_worker_pool() loop instead of two separate loops with a schedule() call inbetween. wq/for-3.10 renamed pool->assoc_mutex to pool->manager_mutex causing the following conflict (earlier function body and comments omitted for brevity). static void wq_unbind_fn(struct work_struct work) { ... spin_unlock_irq(&pool->lock); <<<<<<< HEAD mutex_unlock(&pool->manager_mutex); } ======= mutex_unlock(&pool->assoc_mutex); >>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89 schedule(); <<<<<<< HEAD for_each_cpu_worker_pool(pool, cpu) ======= >>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89 atomic_set(&pool->nr_running, 0); spin_lock_irq(&pool->lock); wake_up_worker(pool); spin_unlock_irq(&pool->lock); } } The resolution is mostly trivial. We want the control flow of the latter with the rename of the former. static void wq_unbind_fn(struct work_struct work) { ... spin_unlock_irq(&pool->lock); mutex_unlock(&pool->manager_mutex); schedule(); atomic_set(&pool->nr_running, 0); spin_lock_irq(&pool->lock); wake_up_worker(pool); spin_unlock_irq(&pool->lock); } } Signed-off-by: Tejun Heo <tj@kernel.org>
2013-04-02	PM / QoS: Avoid possible deadlock related to sysfs access	Rafael J. Wysocki	1	-13/+47
	Commit b81ea1b (PM / QoS: Fix concurrency issues and memory leaks in device PM QoS) put calls to pm_qos_sysfs_add_latency(), pm_qos_sysfs_add_flags(), pm_qos_sysfs_remove_latency(), and pm_qos_sysfs_remove_flags() under dev_pm_qos_mtx, which was a mistake, because it may lead to deadlocks in some situations. For example, if pm_qos_remote_wakeup_store() is run in parallel with dev_pm_qos_constraints_destroy(), they may deadlock in the following way: ====================================================== [ INFO: possible circular locking dependency detected ] 3.9.0-rc4-next-20130328-sasha-00014-g91a3267 #319 Tainted: G W ------------------------------------------------------- trinity-child6/12371 is trying to acquire lock: (s_active#54){++++.+}, at: [<ffffffff81301631>] sysfs_addrm_finish+0x31/0x60 but task is already holding lock: (dev_pm_qos_mtx){+.+.+.}, at: [<ffffffff81f07cc3>] dev_pm_qos_constraints_destroy+0x23/0x250 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (dev_pm_qos_mtx){+.+.+.}: [<ffffffff811811da>] lock_acquire+0x1aa/0x240 [<ffffffff83dab809>] __mutex_lock_common+0x59/0x5e0 [<ffffffff83dabebf>] mutex_lock_nested+0x3f/0x50 [<ffffffff81f07f2f>] dev_pm_qos_update_flags+0x3f/0xc0 [<ffffffff81f05f4f>] pm_qos_remote_wakeup_store+0x3f/0x70 [<ffffffff81efbb43>] dev_attr_store+0x13/0x20 [<ffffffff812ffdaa>] sysfs_write_file+0xfa/0x150 [<ffffffff8127f2c1>] __kernel_write+0x81/0x150 [<ffffffff812afc2d>] write_pipe_buf+0x4d/0x80 [<ffffffff812af57c>] splice_from_pipe_feed+0x7c/0x120 [<ffffffff812afa25>] __splice_from_pipe+0x45/0x80 [<ffffffff812b14fc>] splice_from_pipe+0x4c/0x70 [<ffffffff812b1538>] default_file_splice_write+0x18/0x30 [<ffffffff812afae3>] do_splice_from+0x83/0xb0 [<ffffffff812afb2e>] direct_splice_actor+0x1e/0x20 [<ffffffff812b0277>] splice_direct_to_actor+0xe7/0x200 [<ffffffff812b15bc>] do_splice_direct+0x4c/0x70 [<ffffffff8127eda9>] do_sendfile+0x169/0x300 [<ffffffff8127ff94>] SyS_sendfile64+0x64/0xb0 [<ffffffff83db7d18>] tracesys+0xe1/0xe6 -> #0 (s_active#54){++++.+}: [<ffffffff811800cf>] __lock_acquire+0x15bf/0x1e50 [<ffffffff811811da>] lock_acquire+0x1aa/0x240 [<ffffffff81300aa2>] sysfs_deactivate+0x122/0x1a0 [<ffffffff81301631>] sysfs_addrm_finish+0x31/0x60 [<ffffffff812ff77f>] sysfs_hash_and_remove+0x7f/0xb0 [<ffffffff813035a1>] sysfs_unmerge_group+0x51/0x70 [<ffffffff81f068f4>] pm_qos_sysfs_remove_flags+0x14/0x20 [<ffffffff81f07490>] __dev_pm_qos_hide_flags+0x30/0x70 [<ffffffff81f07cd5>] dev_pm_qos_constraints_destroy+0x35/0x250 [<ffffffff81f06931>] dpm_sysfs_remove+0x11/0x50 [<ffffffff81efcf6f>] device_del+0x3f/0x1b0 [<ffffffff81efd128>] device_unregister+0x48/0x60 [<ffffffff82d4083c>] usb_hub_remove_port_device+0x1c/0x20 [<ffffffff82d2a9cd>] hub_disconnect+0xdd/0x160 [<ffffffff82d36ab7>] usb_unbind_interface+0x67/0x170 [<ffffffff81f001a7>] __device_release_driver+0x87/0xe0 [<ffffffff81f00559>] device_release_driver+0x29/0x40 [<ffffffff81effc58>] bus_remove_device+0x148/0x160 [<ffffffff81efd07f>] device_del+0x14f/0x1b0 [<ffffffff82d344f9>] usb_disable_device+0xf9/0x280 [<ffffffff82d34ff8>] usb_set_configuration+0x268/0x840 [<ffffffff82d3a7fc>] usb_remove_store+0x4c/0x80 [<ffffffff81efbb43>] dev_attr_store+0x13/0x20 [<ffffffff812ffdaa>] sysfs_write_file+0xfa/0x150 [<ffffffff8127f71d>] do_loop_readv_writev+0x4d/0x90 [<ffffffff8127f999>] do_readv_writev+0xf9/0x1e0 [<ffffffff8127faba>] vfs_writev+0x3a/0x60 [<ffffffff8127fc60>] SyS_writev+0x50/0xd0 [<ffffffff83db7d18>] tracesys+0xe1/0xe6 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(dev_pm_qos_mtx); lock(s_active#54); lock(dev_pm_qos_mtx); lock(s_active#54); * DEADLOCK * To avoid that, remove the calls to functions mentioned above from under dev_pm_qos_mtx and introduce a separate lock to prevent races between functions that add or remove device PM QoS sysfs attributes from happening. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01	Merge remote-tracking branch 'regmap/fix/async' into tmp	Mark Brown	1	-0/+2

2013-04-01	Merge remote-tracking branch 'regmap/fix/core' into tmp	Mark Brown	1	-2/+2

2013-03-30	regmap: cache: Write consecutive registers in a single block write	Mark Brown	1	-17/+47
	When syncing blocks of data using raw writes combine the writes into a single block write, saving us bus overhead for setup, addressing and teardown. Currently the block write is done unconditionally as it is expected that hardware which has a register format which can support raw writes will support auto incrementing writes, this decision may need to be revised in future. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
2013-03-30	regmap: cache: Split raw and non-raw syncs	Mark Brown	1	-11/+53
	For code clarity after implementing block writes split out the raw and non-raw I/O sync implementations. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
2013-03-30	regmap: cache: Factor out block sync	Mark Brown	3	-42/+51
	The idea of holding blocks of registers in device format is shared between at least rbtree and lzo cache formats so split out the loop that does the sync from the rbtree code so optimisations on it can be reused. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
2013-03-30	regmap: cache: Factor out reg_present support from rbtree cache	Mark Brown	3	-58/+54
	The idea of maintaining a bitmap of present registers is something that can usefully be used by other cache types that maintain blocks of cached registers so move the code out of the rbtree cache and into the generic regcache code. Refactor the interface slightly as we go to wrap the set bit and enlarge bitmap operations (since we never do one without the other) and make it more robust for reads of uncached registers by bounds checking before we look at the bitmap. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
2013-03-29	sysfs: Add crash_notes_size to export percpu note size	Zhang Yanfei	1	-0/+14
	For percpu notes, we are exporting only address and not size. So the userspace tool kexec-tools is putting an upper limit of 1024 and putting the value in p_memsz and p_filesz fields. So the patch add the new sysfile crash_notes_size to export the exact percpu note size and let the kexec-tools parse it intead of using 1024. The idea came from Vivek Goyal. And a later patch will be sent to kexec-tools to let it parse the size. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-29	driver core: platform.c: fix checkpatch errors and warnings	Fabio Porcedda	1	-10/+8
	Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-29	driver core: warn that platform_driver_probe can not use deferred probing	Fabio Porcedda	1	-1/+5
	Add documentation that platform_driver_probe() is incompatible with deferred probing. Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-27	regmap: async: Add missing return	Mark Brown	1	-0/+2
	Let's only write once... Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-27	regmap: cache: Use raw I/O to sync rbtrees if we can	Mark Brown	1	-1/+18
	This will bring no meaningful benefit by itself, it is done as a separate commit to aid bisection if there are problems with the following commits adding support for coalescing adjacent writes. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-27	regmap: core: Provide regmap_can_raw_write() operation	Mark Brown	1	-3/+12
	Mainly useful internally but exported since this is a public API that's being checked for. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-27	regmap: cache: Provide a get address of value operation	Mark Brown	2	-2/+9
	Provide a helper to do the size based index into a block of registers and use it when reading a value. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-27	regmap: Cut down on the average # of nodes in the rbtree cache	Dimitris Papastamos	1	-1/+69
	This patch aims to bring down the average number of nodes in the rbtree cache and increase the average number of registers per node. This should improve general lookup and traversal times. This is achieved by setting the minimum size of a block within the rbnode to the size of the rbnode itself. This will essentially cache possibly non-existent registers so to combat this scenario, we keep a separate bitmap in memory which keeps track of which register exists. The memory overhead of this change is likely in the order of ~5-10%, possibly less depending on the register file layout. On my test system with a bitmap of ~4300 bits and a relatively sparse register layout, the memory requirements for the entire cache did not increase (the cutting down of nodes which was about 50% of the original number compensated the situation). A second patch that can be built on top of this can look at the ratio `sizeof(*rbnode) / map->cache_word_size' in order to suitably adjust the block length of each block. Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-26	regmap: core: Make raw write available to regcache	Mark Brown	2	-2/+5
	This allows the cache to sync values directly to the device when stored in native format and also allows asynchronous I/O. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-26	regmap: core: Warn on invalid operation combinations	Mark Brown	1	-5/+5
	Don't grind to a screaming halt, just generate a warning. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-21	regmap: don't corrupt work buffer in _regmap_raw_write()	Stephen Warren	1	-2/+1
	_regmap_raw_write() contains code to call regcache_write() to write values to the cache. That code calls memcpy() to copy the value data to the start of the work_buf. However, at least when _regmap_raw_write() is called from _regmap_bus_raw_write(), the value data is in the work_buf, and this memcpy() operation may over-write part of that value data, depending on the value of reg_bytes + pad_bytes. At least when using reg_bytes==1 and pad_bytes==0, corruption of the value data does occur. To solve this, remove the memcpy() operation, and modify the subsequent .parse_val() call to parse the original value buffer directly. At least in the case of 8-bit register address and 16-bit values, and writes of single registers at a time, this memcpy-then-parse combination used to cancel each-other out; for a work-buffer containing xx 89 03, the memcpy changed it to 89 03 03, and the parse_val changed it back to 89 89 03, thus leaving the value uncorrupted. This appears completely accidental though. Since commit 8a819ff "regmap: core: Split out in place value parsing", .parse_val only returns the parsed value, and does not modify the buffer, and hence does not (accidentally) undo the corruption caused by memcpy(). This caused bogus values to get written to HW, thus preventing e.g. audio playback on systems with a WM8903 CODEC. This patch fixes that. Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-19	regmap: irq: Clarify error message when we fail to request primary IRQ	Mark Brown	1	-1/+2
	Display the name for the chip rather than just the primary IRQ so it is clearer what exactly has failed. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-15	base: core: WARN() about bogus permissions on device attributes	Felipe Balbi	1	-1/+8
	Whenever a struct device_attribute is registered with mismatched permissions - read permission without a show routine or write permission without store routine - we will issue a big warning so we catch those early enough. Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-13	regmap: cache Fix regcache-rbtree sync	Lars-Peter Clausen	1	-1/+1
	The last register block, which falls into the specified range, is not handled correctly. The formula which calculates the number of register which should be synced is inverse (and off by one). E.g. if all registers in that block should be synced only one is synced, and if only one should be synced all (but one) are synced. To calculate the number of registers that need to be synced we need to subtract the number of the first register in the block from the max register number and add one. This patch updates the code accordingly. The issue was introduced in commit ac8d91c ("regmap: Supply ranges to the sync operations"). Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: stable@vger.kernel.org
2013-03-13	device: separate all subsys mutexes	Michal Hocko	1	-4/+4
	ca22e56d (driver-core: implement 'sysdev' functionality for regular devices and buses) has introduced bus_register macro with a static key to distinguish different subsys mutex classes. This however doesn't work for different subsys which use a common registering function. One example is subsys_system_register (and mce_device and cpu_device). In the end this leads to the following lockdep splat: [ 207.271924] ====================================================== [ 207.271932] [ INFO: possible circular locking dependency detected ] [ 207.271942] 3.9.0-rc1-0.7-default+ #34 Not tainted [ 207.271948] ------------------------------------------------------- [ 207.271957] bash/10493 is trying to acquire lock: [ 207.271963] (subsys mutex){+.+.+.}, at: [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0 [ 207.271987] [ 207.271987] but task is already holding lock: [ 207.271995] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60 [ 207.272012] [ 207.272012] which lock already depends on the new lock. [ 207.272012] [ 207.272023] [ 207.272023] the existing dependency chain (in reverse order) is: [ 207.272033] [ 207.272033] -> #4 (cpu_hotplug.lock){+.+.+.}: [ 207.272044] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272056] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.272069] [<ffffffff81046ba9>] get_online_cpus+0x29/0x40 [ 207.272082] [<ffffffff81185210>] drain_all_stock+0x30/0x150 [ 207.272094] [<ffffffff811853da>] mem_cgroup_reclaim+0xaa/0xe0 [ 207.272104] [<ffffffff8118775e>] __mem_cgroup_try_charge+0x51e/0xcf0 [ 207.272114] [<ffffffff81188486>] mem_cgroup_charge_common+0x36/0x60 [ 207.272125] [<ffffffff811884da>] mem_cgroup_newpage_charge+0x2a/0x30 [ 207.272135] [<ffffffff81150531>] do_wp_page+0x231/0x830 [ 207.272147] [<ffffffff8115151e>] handle_pte_fault+0x19e/0x8d0 [ 207.272157] [<ffffffff81151da8>] handle_mm_fault+0x158/0x1e0 [ 207.272166] [<ffffffff814b6153>] do_page_fault+0x2a3/0x4e0 [ 207.272178] [<ffffffff814b2578>] page_fault+0x28/0x30 [ 207.272189] [ 207.272189] -> #3 (&mm->mmap_sem){++++++}: [ 207.272199] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272208] [<ffffffff8114c5ad>] might_fault+0x6d/0x90 [ 207.272218] [<ffffffff811a11e3>] filldir64+0xb3/0x120 [ 207.272229] [<ffffffffa013fc19>] call_filldir+0x89/0x130 [ext3] [ 207.272248] [<ffffffffa0140377>] ext3_readdir+0x6b7/0x7e0 [ext3] [ 207.272263] [<ffffffff811a1519>] vfs_readdir+0xa9/0xc0 [ 207.272273] [<ffffffff811a15cb>] sys_getdents64+0x9b/0x110 [ 207.272284] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272296] [ 207.272296] -> #2 (&type->i_mutex_dir_key#3){+.+.+.}: [ 207.272309] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272319] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.272329] [<ffffffff8119c254>] link_path_walk+0x6f4/0x9a0 [ 207.272339] [<ffffffff8119e7fa>] path_openat+0xba/0x470 [ 207.272349] [<ffffffff8119ecf8>] do_filp_open+0x48/0xa0 [ 207.272358] [<ffffffff8118d81c>] file_open_name+0xdc/0x110 [ 207.272369] [<ffffffff8118d885>] filp_open+0x35/0x40 [ 207.272378] [<ffffffff8135c76e>] _request_firmware+0x52e/0xb20 [ 207.272389] [<ffffffff8135cdd6>] request_firmware+0x16/0x20 [ 207.272399] [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode] [ 207.272416] [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode] [ 207.272431] [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode] [ 207.272444] [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100 [ 207.272457] [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4 [ 207.272472] [<ffffffff81000202>] do_one_initcall+0x42/0x180 [ 207.272485] [<ffffffff810bbeff>] load_module+0x19df/0x1b70 [ 207.272499] [<ffffffff810bc376>] sys_init_module+0xe6/0x130 [ 207.272511] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272523] [ 207.272523] -> #1 (umhelper_sem){++++.+}: [ 207.272537] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272548] [<ffffffff814ae9c4>] down_read+0x34/0x50 [ 207.272559] [<ffffffff81062bff>] usermodehelper_read_trylock+0x4f/0x100 [ 207.272575] [<ffffffff8135c7dd>] _request_firmware+0x59d/0xb20 [ 207.272587] [<ffffffff8135cdd6>] request_firmware+0x16/0x20 [ 207.272599] [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode] [ 207.272613] [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode] [ 207.272627] [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode] [ 207.272641] [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100 [ 207.272654] [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4 [ 207.272666] [<ffffffff81000202>] do_one_initcall+0x42/0x180 [ 207.272678] [<ffffffff810bbeff>] load_module+0x19df/0x1b70 [ 207.272690] [<ffffffff810bc376>] sys_init_module+0xe6/0x130 [ 207.272702] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272715] [ 207.272715] -> #0 (subsys mutex){+.+.+.}: [ 207.272729] [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0 [ 207.272740] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272751] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.272763] [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0 [ 207.272775] [<ffffffff81349114>] device_del+0x134/0x1f0 [ 207.272786] [<ffffffff813491f2>] device_unregister+0x22/0x60 [ 207.272798] [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad [ 207.272812] [<ffffffff814b6402>] notifier_call_chain+0x72/0x130 [ 207.272824] [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10 [ 207.272839] [<ffffffff81498f76>] _cpu_down+0x1d6/0x350 [ 207.272851] [<ffffffff81499130>] cpu_down+0x40/0x60 [ 207.272862] [<ffffffff8149cc55>] store_online+0x75/0xe0 [ 207.272874] [<ffffffff813474a0>] dev_attr_store+0x20/0x30 [ 207.272886] [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150 [ 207.272900] [<ffffffff8118e10b>] vfs_write+0xcb/0x130 [ 207.272911] [<ffffffff8118e924>] sys_write+0x64/0xa0 [ 207.272923] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272936] [ 207.272936] other info that might help us debug this: [ 207.272936] [ 207.272952] Chain exists of: [ 207.272952] subsys mutex --> &mm->mmap_sem --> cpu_hotplug.lock [ 207.272952] [ 207.272973] Possible unsafe locking scenario: [ 207.272973] [ 207.272984] CPU0 CPU1 [ 207.272992] ---- ---- [ 207.273000] lock(cpu_hotplug.lock); [ 207.273009] lock(&mm->mmap_sem); [ 207.273020] lock(cpu_hotplug.lock); [ 207.273031] lock(subsys mutex); [ 207.273040] [ 207.273040] * DEADLOCK * [ 207.273040] [ 207.273055] 5 locks held by bash/10493: [ 207.273062] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff81209049>] sysfs_write_file+0x49/0x150 [ 207.273080] #1: (s_active#150){.+.+.+}, at: [<ffffffff812090c2>] sysfs_write_file+0xc2/0x150 [ 207.273099] #2: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81027557>] cpu_hotplug_driver_lock+0x17/0x20 [ 207.273121] #3: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8149911c>] cpu_down+0x2c/0x60 [ 207.273140] #4: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60 [ 207.273158] [ 207.273158] stack backtrace: [ 207.273170] Pid: 10493, comm: bash Not tainted 3.9.0-rc1-0.7-default+ #34 [ 207.273180] Call Trace: [ 207.273192] [<ffffffff810ab373>] print_circular_bug+0x223/0x310 [ 207.273204] [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0 [ 207.273216] [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0 [ 207.273227] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.273239] [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0 [ 207.273251] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.273263] [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0 [ 207.273274] [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0 [ 207.273286] [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0 [ 207.273298] [<ffffffff81349114>] device_del+0x134/0x1f0 [ 207.273309] [<ffffffff813491f2>] device_unregister+0x22/0x60 [ 207.273321] [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad [ 207.273332] [<ffffffff814b6402>] notifier_call_chain+0x72/0x130 [ 207.273344] [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10 [ 207.273356] [<ffffffff81498f76>] _cpu_down+0x1d6/0x350 [ 207.273368] [<ffffffff81027557>] ? cpu_hotplug_driver_lock+0x17/0x20 [ 207.273380] [<ffffffff81499130>] cpu_down+0x40/0x60 [ 207.273391] [<ffffffff8149cc55>] store_online+0x75/0xe0 [ 207.273402] [<ffffffff813474a0>] dev_attr_store+0x20/0x30 [ 207.273413] [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150 [ 207.273425] [<ffffffff8118e10b>] vfs_write+0xcb/0x130 [ 207.273436] [<ffffffff8118e924>] sys_write+0x64/0xa0 [ 207.273447] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b Which reports a false possitive deadlock because it sees: 1) load_module -> subsys_interface_register -> mc_deveice_add () -> subsys->p->mutex -> link_path_walk -> lookup_slow -> i_mutex 2) sys_write -> _cpu_down -> cpu_hotplug_begin -> cpu_hotplug.lock -> mce_cpu_callback -> mce_device_remove() -> device_unregister -> bus_remove_device -> subsys mutex 3) vfs_readdir -> i_mutex -> filldir64 -> might_fault -> might_lock_read(mmap_sem) -> page_fault -> mmap_sem -> drain_all_stock -> cpu_hotplug.lock but 1) takes cpu_subsys subsys () but 2) takes mce_device subsys (**) so the deadlock is not possible AFAICS. The fix is quite simple. We can pull the key inside bus_type structure because they are defined per device so the pointer will be unique as well. bus_register doesn't need to be a macro anymore so change it to the inline. We could get rid of __bus_register as there is no other caller but maybe somebody will want to use a different key so keep it around for now. Reported-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-13	regmap: rbtree Expose total memory consumption in the rbtree debugfs entry	Dimitris Papastamos	1	-2/+7
	Provide a feel of how much overhead the rbtree cache adds to the game. [Slightly reworded output in debugfs -- broonie] Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2013-03-12	driver/base: implement subsys_virtual_register()	Tejun Heo	3	-22/+55
	Kay tells me the most appropriate place to expose workqueues to userland would be /sys/devices/virtual/workqueues/WQ_NAME which is symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking a way to do that outside of driver core as virtual_device_parent() isn't exported and there's no inteface to conveniently create a virtual subsystem. This patch implements subsys_virtual_register() by factoring out subsys_register() from subsys_system_register() and using it with virtual_device_parent() as the origin directory. It's identical to subsys_system_register() other than the origin directory but we aren't gonna restrict the device names which should be used under it. This will be used to expose workqueue attributes to userland. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kay Sievers <kay.sievers@vrfy.org>
2013-03-12	regmap: Initialize `map->debugfs' before regcache	Dimitris Papastamos	1	-2/+2
	In the rbtree code we are exposing statistics relating to the number of nodes/registers of the rbtree cache for each of the devices. Ensure that `map->debugfs' has been initialized before we attempt to initialize the debugfs entry for the rbtree cache. Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: stable@vger.kernel.org