Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull workqueue updates from Tejun Heo:
"A lot of activities on workqueue side this time. The changes achieve
the followings.
- WQ_UNBOUND workqueues - the workqueues which are per-cpu - are
updated to be able to interface with multiple backend worker pools.
This involved a lot of churning but the end result seems actually
neater as unbound workqueues are now a lot closer to per-cpu ones.
- The ability to interface with multiple backend worker pools are
used to implement unbound workqueues with custom attributes.
Currently the supported attributes are the nice level and CPU
affinity. It may be expanded to include cgroup association in
future. The attributes can be specified either by calling
apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if
the workqueue in question is exported through sysfs.
The backend worker pools are keyed by the actual attributes and
shared by any workqueues which share the same attributes. When
attributes of a workqueue are changed, the workqueue binds to the
worker pool with the specified attributes while leaving the work
items which are already executing in its previous worker pools
alone.
This allows converting custom worker pool implementations which
want worker attribute tuning to use workqueues. The writeback pool
is already converted in block tree and there are a couple others
are likely to follow including btrfs io workers.
- WQ_UNBOUND's ability to bind to multiple worker pools is also used
to make it NUMA-aware. Because there's no association between work
item issuer and the specific worker assigned to execute it, before
this change, using unbound workqueue led to unnecessary cross-node
bouncing and it couldn't be helped by autonuma as it requires tasks
to have implicit node affinity and workers are assigned randomly.
After these changes, an unbound workqueue now binds to multiple
NUMA-affine worker pools so that queued work items are executed in
the same node. This is turned on by default but can be disabled
system-wide or for individual workqueues.
Crypto was requesting NUMA affinity as encrypting data across
different nodes can contribute noticeable overhead and doing it
per-cpu was too limiting for certain cases and IO throughput could
be bottlenecked by one CPU being fully occupied while others have
idle cycles.
While the new features required a lot of changes including
restructuring locking, it didn't complicate the execution paths much.
The unbound workqueue handling is now closer to per-cpu ones and the
new features are implemented by simply associating a workqueue with
different sets of backend worker pools without changing queue,
execution or flush paths.
As such, even though the amount of change is very high, I feel
relatively safe in that it isn't likely to cause subtle issues with
basic correctness of work item execution and handling. If something
is wrong, it's likely to show up as being associated with worker pools
with the wrong attributes or OOPS while workqueue attributes are being
changed or during CPU hotplug.
While this creates more backend worker pools, it doesn't add too many
more workers unless, of course, there are many workqueues with unique
combinations of attributes. Assuming everything else is the same,
NUMA awareness costs an extra worker pool per NUMA node with online
CPUs.
There are also a couple things which are being routed outside the
workqueue tree.
- block tree pulled in workqueue for-3.10 so that writeback worker
pool can be converted to unbound workqueue with sysfs control
exposed. This simplifies the code, makes writeback workers
NUMA-aware and allows tuning nice level and CPU affinity via sysfs.
- The conversion to workqueue means that there's no 1:1 association
between a specific worker, which makes writeback folks unhappy as
they want to be able to tell which filesystem caused a problem from
backtrace on systems with many filesystems mounted. This is
resolved by allowing work items to set debug info string which is
printed when the task is dumped. As this change involves unifying
implementations of dump_stack() and friends in arch codes, it's
being routed through Andrew's -mm tree."
* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits)
workqueue: use kmem_cache_free() instead of kfree()
workqueue: avoid false negative WARN_ON() in destroy_workqueue()
workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
workqueue: implement NUMA affinity for unbound workqueues
workqueue: introduce put_pwq_unlocked()
workqueue: introduce numa_pwq_tbl_install()
workqueue: use NUMA-aware allocation for pool_workqueues
workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
workqueue: map an unbound workqueues to multiple per-node pool_workqueues
workqueue: move hot fields of workqueue_struct to the end
workqueue: make workqueue->name[] fixed len
workqueue: add workqueue->unbound_attrs
workqueue: determine NUMA node of workers accourding to the allowed cpumask
workqueue: drop 'H' from kworker names of unbound worker pools
workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
workqueue: fix memory leak in apply_workqueue_attrs()
workqueue: fix unbound workqueue attrs hashing / comparison
workqueue: fix race condition in unbound workqueue free path
workqueue: remove pwq_lock which is no longer used
...
|
|
Merge first batch of fixes from Andrew Morton:
- A couple of kthread changes
- A few minor audit patches
- A number of fbdev patches. Florian remains AWOL so I'm picking up
some of these.
- A few kbuild things
- ocfs2 updates
- Almost all of the MM queue
(And in the meantime, I already have the second big batch from Andrew
pending in my mailbox ;^)
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (149 commits)
memcg: take reference before releasing rcu_read_lock
mem hotunplug: fix kfree() of bootmem memory
mmKconfig: add an option to disable bounce
mm, nobootmem: do memset() after memblock_reserve()
mm, nobootmem: clean-up of free_low_memory_core_early()
fs/buffer.c: remove unnecessary init operation after allocating buffer_head.
numa, cpu hotplug: change links of CPU and node when changing node number by onlining CPU
mm: fix memory_hotplug.c printk format warning
mm: swap: mark swap pages writeback before queueing for direct IO
swap: redirty page if page write fails on swap file
mm, memcg: give exiting processes access to memory reserves
thp: fix huge zero page logic for page with pfn == 0
memcg: avoid accessing memcg after releasing reference
fs: fix fsync() error reporting
memblock: fix missing comment of memblock_insert_region()
mm: Remove unused parameter of pages_correctly_reserved()
firmware, memmap: fix firmware_map_entry leak
mm/vmstat: add note on safety of drain_zonestat
mm: thp: add split tail pages to shrink page list in page reclaim
mm: allow for outstanding swap writeback accounting
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"In user visible terms just a couple of enhancements here, though there
was a moderate amount of refactoring required in order to support the
register cache sync performance improvements.
- Support for block and asynchronous I/O during register cache
syncing; this provides a use case dependant performance
improvement.
- Additional debugfs information on the memory consuption and
register set"
* tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (23 commits)
regmap: don't corrupt work buffer in _regmap_raw_write()
regmap: cache: Fix format specifier in dev_dbg
regmap: cache: Make regcache_sync_block_raw static
regmap: cache: Write consecutive registers in a single block write
regmap: cache: Split raw and non-raw syncs
regmap: cache: Factor out block sync
regmap: cache: Factor out reg_present support from rbtree cache
regmap: cache: Use raw I/O to sync rbtrees if we can
regmap: core: Provide regmap_can_raw_write() operation
regmap: cache: Provide a get address of value operation
regmap: Cut down on the average # of nodes in the rbtree cache
regmap: core: Make raw write available to regcache
regmap: core: Warn on invalid operation combinations
regmap: irq: Clarify error message when we fail to request primary IRQ
regmap: rbtree Expose total memory consumption in the rbtree debugfs entry
regmap: debugfs: Add a registers `range' file
regmap: debugfs: Simplify calculation of `c->max_reg'
regmap: cache: Store caches in native register format where possible
regmap: core: Split out in place value parsing
regmap: cache: Use regcache_get_value() to check if we updated
...
|
|
onlining CPU
When booting x86 system contains memoryless node, node numbers of CPUs
on memoryless node were changed to nearest online node number by
init_cpu_to_node() because the node is not online.
In my system, node numbers of cpu#30-44 and 75-89 were changed from 2 to
0 as follows:
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 75 76 77 78 79 80 81 82
83 84 85 86 87 88 89
node 0 size: 32394 MB
node 0 free: 27898 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74
node 1 size: 32768 MB
node 1 free: 30335 MB
If we hot add memory to memoryless node and offine/online all CPUs on
the node, node numbers of these CPUs are changed to correct node numbers
by srat_detect_node() because the node become online.
In this case, node numbers of cpu#30-44 and 75-89 were changed from 0 to
2 in my system as follows:
$ numactl --hardware
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 45 46 47 48 49 50 51 52 53 54 55
56 57 58 59
node 0 size: 32394 MB
node 0 free: 27218 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74
node 1 size: 32768 MB
node 1 free: 30014 MB
node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 75 76 77 78 79 80 81
82 83 84 85 86 87 88 89
node 2 size: 16384 MB
node 2 free: 16384 MB
But "cpu to node" and "node to cpu" links were not changed as follows:
$ ls /sys/devices/system/cpu/cpu30/|grep node
node0
$ ls /sys/devices/system/node/node0/|grep cpu30
cpu30
"numactl --hardware" shows that cpu30 belongs to node 2. But sysfs
links does not change.
This patch changes "cpu to node" and "node to cpu" links when node
number changed by onlining CPU.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
nr_pages is not used in pages_correctly_reserved().
So remove it.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC
pseries will return -EOPNOTSUPP if unsupported.
Adding an #ifdef causes several other functions it depends on to also
become unnecessary, which saves in .text when disabled (it's disabled in
most defconfigs besides powerpc, including x86). remove_memory_block()
becomes static since it is not referenced outside of
drivers/base/memory.c.
Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
and disabled.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Squishes a warning which my change to hotplug_memory_notifier() added.
I want to keep that warning, because it is punishment for failnig to check
the hotplug_memory_notifier() return value.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
_regmap_raw_write() contains code to call regcache_write() to write
values to the cache. That code calls memcpy() to copy the value data to
the start of the work_buf. However, at least when _regmap_raw_write() is
called from _regmap_bus_raw_write(), the value data is in the work_buf,
and this memcpy() operation may over-write part of that value data,
depending on the value of reg_bytes + pad_bytes. At least when using
reg_bytes==1 and pad_bytes==0, corruption of the value data does occur.
To solve this, remove the memcpy() operation, and modify the subsequent
.parse_val() call to parse the original value buffer directly.
At least in the case of 8-bit register address and 16-bit values, and
writes of single registers at a time, this memcpy-then-parse combination
used to cancel each-other out; for a work-buffer containing xx 89 03,
the memcpy changed it to 89 03 03, and the parse_val changed it back to
89 89 03, thus leaving the value uncorrupted. This appears completely
accidental though. Since commit 8a819ff "regmap: core: Split out in
place value parsing", .parse_val only returns the parsed value, and does
not modify the buffer, and hence does not (accidentally) undo the
corruption caused by memcpy(). This caused bogus values to get written
to HW, thus preventing e.g. audio playback on systems with a WM8903
CODEC. This patch fixes that.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
Linux 3.9-rc7
|
|
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Putting devices into idle|suspend in a synchronous manner means we are
waiting for each device to become idle|suspended before the probe|release
is fully done.
This patch switch to use the asynchronous runtime PM API:s instead and
thus improves the parallelism since we can move on and handle the next
device in queue in an earlier phase.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Now that devtmpfs is caring about uid/gid, we need to use the correct
internal types so users who have USER_NS enabled will have things work
properly for them.
Thanks to Eric for pointing this out, and the patch review.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If CONFIG_UIDGID_STRICT_TYPE_CHECKS is enalbed, the below compile
failure will be triggered:
drivers/base/devtmpfs.c: In function 'handle_create':
drivers/base/devtmpfs.c:214:19: error: incompatible types when assigning to type 'kuid_t' from type 'uid_t'
drivers/base/devtmpfs.c:215:19: error: incompatible types when assigning to type 'kgid_t' from type 'gid_t'
make[2]: *** [drivers/base/devtmpfs.o] Error 1
This patch fixes the compile failure.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This fixes a sparse warning, and is a good idea given that the
devtmpfs_init() prototype is in this file.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit bc8ce4 (regmap: don't corrupt work buffer in
_regmap_raw_write()) since it turns out that it can cause issues when
taken in isolation from the other changes in -next that lead to its
discovery. On the basis that nobody noticed the problems for quite some
time without that subsequent work let's drop it from v3.9.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
Some drivers want to tell userspace what uid and gid should be used for
their device nodes, so allow that information to percolate through the
driver core to userspace in order to make this happen. This means that
some systems (i.e. Android and friends) will not need to even run a
udev-like daemon for their device node manager and can just rely in
devtmpfs fully, reducing their footprint even more.
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix format specifier in dev_dbg and suppress the following warning
drivers/base/regmap/regcache.c: In function
‘regcache_sync_block_raw_flush’:
drivers/base/regmap/regcache.c:593:2: warning: format ‘%d’ expects
argument of type ‘int’, but argument 4 has type ‘size_t’ [-Wformat]
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
regcache_sync_block_raw is used only in this file. Hence make it static.
Silences the following warning:
drivers/base/regmap/regcache.c:608:5: warning:
symbol 'regcache_sync_block_raw' was not declared. Should it be static?
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
- Revert of a recent cpuidle change that caused Nehalem machines to
hang on boot from Alex Shi.
- USB power management fix addressing a crash in the port device
object's release routine from Rafael J Wysocki.
- Device PM QoS fix for a potential deadlock related to sysfs interface
from Rafael J Wysocki.
- Fix for a cpufreq crash when the /cpus Device Tree node is missing
from Paolo Pisati.
- Fix for a build issue on ia64 related to the Boot Graphics Resource
Table (BGRT) from Tony Luck.
- Two fixes for ACPI handles being set incorrectly for device objects
that don't correspond to any ACPI namespace nodes in the I2C and SPI
subsystems from Rafael J Wysocki.
- Fix for compiler warnings related to CONFIG_PM_DEVFREQ being unset
from Rajagopal Venkat.
- Fix for a symbol definition typo in cpufreq_governor.h from Borislav
Petkov.
* tag 'pm+acpi-3.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / BGRT: Don't let users configure BGRT on non X86 systems
cpuidle / ACPI: recover percpu ACPI processor cstate
ACPI / I2C: Use parent's ACPI_HANDLE() in acpi_i2c_register_devices()
cpufreq: Correct header guards typo
ACPI / SPI: Use parent's ACPI_HANDLE() in acpi_register_spi_devices()
cpufreq: check OF node /cpus presence before dereferencing it
PM / devfreq: Fix compiler warnings for CONFIG_PM_DEVFREQ unset
PM / QoS: Avoid possible deadlock related to sysfs access
USB / PM: Don't try to hide PM QoS flags from usb_port_device_release()
|
|
commit eca4549f57 "sysfs: Add crash_notes_size to export percpu
note size" adds a printk that outputs a size_t value as %lu
when it should be %zu, resulting in this warning.
drivers/base/cpu.c: In function 'show_crash_notes_size':
drivers/base/cpu.c:142:2: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=]
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Writeback conversion to workqueue will be based on top of wq/for-3.10
branch to take advantage of custom attrs and NUMA support for unbound
workqueues. Mainline currently contains two commits which result in
non-trivial merge conflicts with wq/for-3.10 and because
block/for-3.10/core is based on v3.9-rc3 which contains one of the
conflicting commits, we need a pre-merge-window merge anyway. Let's
pull v3.9-rc5 into wq/for-3.10 so that the block tree doesn't suffer
from workqueue merge conflicts.
The two conflicts and their resolutions:
* e68035fb65 ("workqueue: convert to idr_alloc()") in mainline changes
worker_pool_assign_id() to use idr_alloc() instead of the old idr
interface. worker_pool_assign_id() goes through multiple locking
changes in wq/for-3.10 causing the following conflict.
static int worker_pool_assign_id(struct worker_pool *pool)
{
int ret;
<<<<<<< HEAD
lockdep_assert_held(&wq_pool_mutex);
do {
if (!idr_pre_get(&worker_pool_idr, GFP_KERNEL))
return -ENOMEM;
ret = idr_get_new(&worker_pool_idr, pool, &pool->id);
} while (ret == -EAGAIN);
=======
mutex_lock(&worker_pool_idr_mutex);
ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL);
if (ret >= 0)
pool->id = ret;
mutex_unlock(&worker_pool_idr_mutex);
>>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89
return ret < 0 ? ret : 0;
}
We want locking from the former and idr_alloc() usage from the
latter, which can be combined to the following.
static int worker_pool_assign_id(struct worker_pool *pool)
{
int ret;
lockdep_assert_held(&wq_pool_mutex);
ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL);
if (ret >= 0) {
pool->id = ret;
return 0;
}
return ret;
}
* eb2834285c ("workqueue: fix possible pool stall bug in
wq_unbind_fn()") updated wq_unbind_fn() such that it has single
larger for_each_std_worker_pool() loop instead of two separate loops
with a schedule() call inbetween. wq/for-3.10 renamed
pool->assoc_mutex to pool->manager_mutex causing the following
conflict (earlier function body and comments omitted for brevity).
static void wq_unbind_fn(struct work_struct *work)
{
...
spin_unlock_irq(&pool->lock);
<<<<<<< HEAD
mutex_unlock(&pool->manager_mutex);
}
=======
mutex_unlock(&pool->assoc_mutex);
>>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89
schedule();
<<<<<<< HEAD
for_each_cpu_worker_pool(pool, cpu)
=======
>>>>>>> c67bf5361e7e66a0ff1f4caf95f89347d55dfb89
atomic_set(&pool->nr_running, 0);
spin_lock_irq(&pool->lock);
wake_up_worker(pool);
spin_unlock_irq(&pool->lock);
}
}
The resolution is mostly trivial. We want the control flow of the
latter with the rename of the former.
static void wq_unbind_fn(struct work_struct *work)
{
...
spin_unlock_irq(&pool->lock);
mutex_unlock(&pool->manager_mutex);
schedule();
atomic_set(&pool->nr_running, 0);
spin_lock_irq(&pool->lock);
wake_up_worker(pool);
spin_unlock_irq(&pool->lock);
}
}
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Commit b81ea1b (PM / QoS: Fix concurrency issues and memory leaks in
device PM QoS) put calls to pm_qos_sysfs_add_latency(),
pm_qos_sysfs_add_flags(), pm_qos_sysfs_remove_latency(), and
pm_qos_sysfs_remove_flags() under dev_pm_qos_mtx, which was a
mistake, because it may lead to deadlocks in some situations.
For example, if pm_qos_remote_wakeup_store() is run in parallel
with dev_pm_qos_constraints_destroy(), they may deadlock in the
following way:
======================================================
[ INFO: possible circular locking dependency detected ]
3.9.0-rc4-next-20130328-sasha-00014-g91a3267 #319 Tainted: G W
-------------------------------------------------------
trinity-child6/12371 is trying to acquire lock:
(s_active#54){++++.+}, at: [<ffffffff81301631>] sysfs_addrm_finish+0x31/0x60
but task is already holding lock:
(dev_pm_qos_mtx){+.+.+.}, at: [<ffffffff81f07cc3>] dev_pm_qos_constraints_destroy+0x23/0x250
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (dev_pm_qos_mtx){+.+.+.}:
[<ffffffff811811da>] lock_acquire+0x1aa/0x240
[<ffffffff83dab809>] __mutex_lock_common+0x59/0x5e0
[<ffffffff83dabebf>] mutex_lock_nested+0x3f/0x50
[<ffffffff81f07f2f>] dev_pm_qos_update_flags+0x3f/0xc0
[<ffffffff81f05f4f>] pm_qos_remote_wakeup_store+0x3f/0x70
[<ffffffff81efbb43>] dev_attr_store+0x13/0x20
[<ffffffff812ffdaa>] sysfs_write_file+0xfa/0x150
[<ffffffff8127f2c1>] __kernel_write+0x81/0x150
[<ffffffff812afc2d>] write_pipe_buf+0x4d/0x80
[<ffffffff812af57c>] splice_from_pipe_feed+0x7c/0x120
[<ffffffff812afa25>] __splice_from_pipe+0x45/0x80
[<ffffffff812b14fc>] splice_from_pipe+0x4c/0x70
[<ffffffff812b1538>] default_file_splice_write+0x18/0x30
[<ffffffff812afae3>] do_splice_from+0x83/0xb0
[<ffffffff812afb2e>] direct_splice_actor+0x1e/0x20
[<ffffffff812b0277>] splice_direct_to_actor+0xe7/0x200
[<ffffffff812b15bc>] do_splice_direct+0x4c/0x70
[<ffffffff8127eda9>] do_sendfile+0x169/0x300
[<ffffffff8127ff94>] SyS_sendfile64+0x64/0xb0
[<ffffffff83db7d18>] tracesys+0xe1/0xe6
-> #0 (s_active#54){++++.+}:
[<ffffffff811800cf>] __lock_acquire+0x15bf/0x1e50
[<ffffffff811811da>] lock_acquire+0x1aa/0x240
[<ffffffff81300aa2>] sysfs_deactivate+0x122/0x1a0
[<ffffffff81301631>] sysfs_addrm_finish+0x31/0x60
[<ffffffff812ff77f>] sysfs_hash_and_remove+0x7f/0xb0
[<ffffffff813035a1>] sysfs_unmerge_group+0x51/0x70
[<ffffffff81f068f4>] pm_qos_sysfs_remove_flags+0x14/0x20
[<ffffffff81f07490>] __dev_pm_qos_hide_flags+0x30/0x70
[<ffffffff81f07cd5>] dev_pm_qos_constraints_destroy+0x35/0x250
[<ffffffff81f06931>] dpm_sysfs_remove+0x11/0x50
[<ffffffff81efcf6f>] device_del+0x3f/0x1b0
[<ffffffff81efd128>] device_unregister+0x48/0x60
[<ffffffff82d4083c>] usb_hub_remove_port_device+0x1c/0x20
[<ffffffff82d2a9cd>] hub_disconnect+0xdd/0x160
[<ffffffff82d36ab7>] usb_unbind_interface+0x67/0x170
[<ffffffff81f001a7>] __device_release_driver+0x87/0xe0
[<ffffffff81f00559>] device_release_driver+0x29/0x40
[<ffffffff81effc58>] bus_remove_device+0x148/0x160
[<ffffffff81efd07f>] device_del+0x14f/0x1b0
[<ffffffff82d344f9>] usb_disable_device+0xf9/0x280
[<ffffffff82d34ff8>] usb_set_configuration+0x268/0x840
[<ffffffff82d3a7fc>] usb_remove_store+0x4c/0x80
[<ffffffff81efbb43>] dev_attr_store+0x13/0x20
[<ffffffff812ffdaa>] sysfs_write_file+0xfa/0x150
[<ffffffff8127f71d>] do_loop_readv_writev+0x4d/0x90
[<ffffffff8127f999>] do_readv_writev+0xf9/0x1e0
[<ffffffff8127faba>] vfs_writev+0x3a/0x60
[<ffffffff8127fc60>] SyS_writev+0x50/0xd0
[<ffffffff83db7d18>] tracesys+0xe1/0xe6
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(dev_pm_qos_mtx);
lock(s_active#54);
lock(dev_pm_qos_mtx);
lock(s_active#54);
*** DEADLOCK ***
To avoid that, remove the calls to functions mentioned above from
under dev_pm_qos_mtx and introduce a separate lock to prevent races
between functions that add or remove device PM QoS sysfs attributes
from happening.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
|
|
|
|
When syncing blocks of data using raw writes combine the writes into a
single block write, saving us bus overhead for setup, addressing and
teardown.
Currently the block write is done unconditionally as it is expected that
hardware which has a register format which can support raw writes will
support auto incrementing writes, this decision may need to be revised in
future.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
|
|
For code clarity after implementing block writes split out the raw and
non-raw I/O sync implementations.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
|
|
The idea of holding blocks of registers in device format is shared between
at least rbtree and lzo cache formats so split out the loop that does the
sync from the rbtree code so optimisations on it can be reused.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
|
|
The idea of maintaining a bitmap of present registers is something that
can usefully be used by other cache types that maintain blocks of cached
registers so move the code out of the rbtree cache and into the generic
regcache code.
Refactor the interface slightly as we go to wrap the set bit and enlarge
bitmap operations (since we never do one without the other) and make it
more robust for reads of uncached registers by bounds checking before we
look at the bitmap.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Reviewed-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
|
|
For percpu notes, we are exporting only address and not size. So
the userspace tool kexec-tools is putting an upper limit of 1024
and putting the value in p_memsz and p_filesz fields. So the patch
add the new sysfile crash_notes_size to export the exact percpu
note size and let the kexec-tools parse it intead of using 1024.
The idea came from Vivek Goyal. And a later patch will be sent to
kexec-tools to let it parse the size.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add documentation that platform_driver_probe() is incompatible with
deferred probing.
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Let's only write once...
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
This will bring no meaningful benefit by itself, it is done as a separate
commit to aid bisection if there are problems with the following commits
adding support for coalescing adjacent writes.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
Mainly useful internally but exported since this is a public API that's
being checked for.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
Provide a helper to do the size based index into a block of registers and
use it when reading a value.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
This patch aims to bring down the average number of nodes
in the rbtree cache and increase the average number of registers
per node. This should improve general lookup and traversal times.
This is achieved by setting the minimum size of a block within the
rbnode to the size of the rbnode itself. This will essentially
cache possibly non-existent registers so to combat this scenario,
we keep a separate bitmap in memory which keeps track of which register
exists. The memory overhead of this change is likely in the order of
~5-10%, possibly less depending on the register file layout. On my test
system with a bitmap of ~4300 bits and a relatively sparse register
layout, the memory requirements for the entire cache did not increase
(the cutting down of nodes which was about 50% of the original number
compensated the situation).
A second patch that can be built on top of this can look at the
ratio `sizeof(*rbnode) / map->cache_word_size' in order to suitably
adjust the block length of each block.
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
This allows the cache to sync values directly to the device when stored
in native format and also allows asynchronous I/O.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
Don't grind to a screaming halt, just generate a warning.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
_regmap_raw_write() contains code to call regcache_write() to write
values to the cache. That code calls memcpy() to copy the value data to
the start of the work_buf. However, at least when _regmap_raw_write() is
called from _regmap_bus_raw_write(), the value data is in the work_buf,
and this memcpy() operation may over-write part of that value data,
depending on the value of reg_bytes + pad_bytes. At least when using
reg_bytes==1 and pad_bytes==0, corruption of the value data does occur.
To solve this, remove the memcpy() operation, and modify the subsequent
.parse_val() call to parse the original value buffer directly.
At least in the case of 8-bit register address and 16-bit values, and
writes of single registers at a time, this memcpy-then-parse combination
used to cancel each-other out; for a work-buffer containing xx 89 03,
the memcpy changed it to 89 03 03, and the parse_val changed it back to
89 89 03, thus leaving the value uncorrupted. This appears completely
accidental though. Since commit 8a819ff "regmap: core: Split out in
place value parsing", .parse_val only returns the parsed value, and does
not modify the buffer, and hence does not (accidentally) undo the
corruption caused by memcpy(). This caused bogus values to get written
to HW, thus preventing e.g. audio playback on systems with a WM8903
CODEC. This patch fixes that.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
Display the name for the chip rather than just the primary IRQ so it is
clearer what exactly has failed.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
Whenever a struct device_attribute is registered
with mismatched permissions - read permission without
a show routine or write permission without store
routine - we will issue a big warning so we catch
those early enough.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The last register block, which falls into the specified range, is not handled
correctly. The formula which calculates the number of register which should be
synced is inverse (and off by one). E.g. if all registers in that block should
be synced only one is synced, and if only one should be synced all (but one) are
synced. To calculate the number of registers that need to be synced we need to
subtract the number of the first register in the block from the max register
number and add one. This patch updates the code accordingly.
The issue was introduced in commit ac8d91c ("regmap: Supply ranges to the sync
operations").
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@vger.kernel.org
|
|
ca22e56d (driver-core: implement 'sysdev' functionality for regular
devices and buses) has introduced bus_register macro with a static
key to distinguish different subsys mutex classes.
This however doesn't work for different subsys which use a common
registering function. One example is subsys_system_register (and
mce_device and cpu_device).
In the end this leads to the following lockdep splat:
[ 207.271924] ======================================================
[ 207.271932] [ INFO: possible circular locking dependency detected ]
[ 207.271942] 3.9.0-rc1-0.7-default+ #34 Not tainted
[ 207.271948] -------------------------------------------------------
[ 207.271957] bash/10493 is trying to acquire lock:
[ 207.271963] (subsys mutex){+.+.+.}, at: [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0
[ 207.271987]
[ 207.271987] but task is already holding lock:
[ 207.271995] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60
[ 207.272012]
[ 207.272012] which lock already depends on the new lock.
[ 207.272012]
[ 207.272023]
[ 207.272023] the existing dependency chain (in reverse order) is:
[ 207.272033]
[ 207.272033] -> #4 (cpu_hotplug.lock){+.+.+.}:
[ 207.272044] [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[ 207.272056] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[ 207.272069] [<ffffffff81046ba9>] get_online_cpus+0x29/0x40
[ 207.272082] [<ffffffff81185210>] drain_all_stock+0x30/0x150
[ 207.272094] [<ffffffff811853da>] mem_cgroup_reclaim+0xaa/0xe0
[ 207.272104] [<ffffffff8118775e>] __mem_cgroup_try_charge+0x51e/0xcf0
[ 207.272114] [<ffffffff81188486>] mem_cgroup_charge_common+0x36/0x60
[ 207.272125] [<ffffffff811884da>] mem_cgroup_newpage_charge+0x2a/0x30
[ 207.272135] [<ffffffff81150531>] do_wp_page+0x231/0x830
[ 207.272147] [<ffffffff8115151e>] handle_pte_fault+0x19e/0x8d0
[ 207.272157] [<ffffffff81151da8>] handle_mm_fault+0x158/0x1e0
[ 207.272166] [<ffffffff814b6153>] do_page_fault+0x2a3/0x4e0
[ 207.272178] [<ffffffff814b2578>] page_fault+0x28/0x30
[ 207.272189]
[ 207.272189] -> #3 (&mm->mmap_sem){++++++}:
[ 207.272199] [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[ 207.272208] [<ffffffff8114c5ad>] might_fault+0x6d/0x90
[ 207.272218] [<ffffffff811a11e3>] filldir64+0xb3/0x120
[ 207.272229] [<ffffffffa013fc19>] call_filldir+0x89/0x130 [ext3]
[ 207.272248] [<ffffffffa0140377>] ext3_readdir+0x6b7/0x7e0 [ext3]
[ 207.272263] [<ffffffff811a1519>] vfs_readdir+0xa9/0xc0
[ 207.272273] [<ffffffff811a15cb>] sys_getdents64+0x9b/0x110
[ 207.272284] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[ 207.272296]
[ 207.272296] -> #2 (&type->i_mutex_dir_key#3){+.+.+.}:
[ 207.272309] [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[ 207.272319] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[ 207.272329] [<ffffffff8119c254>] link_path_walk+0x6f4/0x9a0
[ 207.272339] [<ffffffff8119e7fa>] path_openat+0xba/0x470
[ 207.272349] [<ffffffff8119ecf8>] do_filp_open+0x48/0xa0
[ 207.272358] [<ffffffff8118d81c>] file_open_name+0xdc/0x110
[ 207.272369] [<ffffffff8118d885>] filp_open+0x35/0x40
[ 207.272378] [<ffffffff8135c76e>] _request_firmware+0x52e/0xb20
[ 207.272389] [<ffffffff8135cdd6>] request_firmware+0x16/0x20
[ 207.272399] [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode]
[ 207.272416] [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode]
[ 207.272431] [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode]
[ 207.272444] [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100
[ 207.272457] [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4
[ 207.272472] [<ffffffff81000202>] do_one_initcall+0x42/0x180
[ 207.272485] [<ffffffff810bbeff>] load_module+0x19df/0x1b70
[ 207.272499] [<ffffffff810bc376>] sys_init_module+0xe6/0x130
[ 207.272511] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[ 207.272523]
[ 207.272523] -> #1 (umhelper_sem){++++.+}:
[ 207.272537] [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[ 207.272548] [<ffffffff814ae9c4>] down_read+0x34/0x50
[ 207.272559] [<ffffffff81062bff>] usermodehelper_read_trylock+0x4f/0x100
[ 207.272575] [<ffffffff8135c7dd>] _request_firmware+0x59d/0xb20
[ 207.272587] [<ffffffff8135cdd6>] request_firmware+0x16/0x20
[ 207.272599] [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode]
[ 207.272613] [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode]
[ 207.272627] [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode]
[ 207.272641] [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100
[ 207.272654] [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4
[ 207.272666] [<ffffffff81000202>] do_one_initcall+0x42/0x180
[ 207.272678] [<ffffffff810bbeff>] load_module+0x19df/0x1b70
[ 207.272690] [<ffffffff810bc376>] sys_init_module+0xe6/0x130
[ 207.272702] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[ 207.272715]
[ 207.272715] -> #0 (subsys mutex){+.+.+.}:
[ 207.272729] [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0
[ 207.272740] [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[ 207.272751] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[ 207.272763] [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0
[ 207.272775] [<ffffffff81349114>] device_del+0x134/0x1f0
[ 207.272786] [<ffffffff813491f2>] device_unregister+0x22/0x60
[ 207.272798] [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad
[ 207.272812] [<ffffffff814b6402>] notifier_call_chain+0x72/0x130
[ 207.272824] [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10
[ 207.272839] [<ffffffff81498f76>] _cpu_down+0x1d6/0x350
[ 207.272851] [<ffffffff81499130>] cpu_down+0x40/0x60
[ 207.272862] [<ffffffff8149cc55>] store_online+0x75/0xe0
[ 207.272874] [<ffffffff813474a0>] dev_attr_store+0x20/0x30
[ 207.272886] [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150
[ 207.272900] [<ffffffff8118e10b>] vfs_write+0xcb/0x130
[ 207.272911] [<ffffffff8118e924>] sys_write+0x64/0xa0
[ 207.272923] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
[ 207.272936]
[ 207.272936] other info that might help us debug this:
[ 207.272936]
[ 207.272952] Chain exists of:
[ 207.272952] subsys mutex --> &mm->mmap_sem --> cpu_hotplug.lock
[ 207.272952]
[ 207.272973] Possible unsafe locking scenario:
[ 207.272973]
[ 207.272984] CPU0 CPU1
[ 207.272992] ---- ----
[ 207.273000] lock(cpu_hotplug.lock);
[ 207.273009] lock(&mm->mmap_sem);
[ 207.273020] lock(cpu_hotplug.lock);
[ 207.273031] lock(subsys mutex);
[ 207.273040]
[ 207.273040] *** DEADLOCK ***
[ 207.273040]
[ 207.273055] 5 locks held by bash/10493:
[ 207.273062] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff81209049>] sysfs_write_file+0x49/0x150
[ 207.273080] #1: (s_active#150){.+.+.+}, at: [<ffffffff812090c2>] sysfs_write_file+0xc2/0x150
[ 207.273099] #2: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81027557>] cpu_hotplug_driver_lock+0x17/0x20
[ 207.273121] #3: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8149911c>] cpu_down+0x2c/0x60
[ 207.273140] #4: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60
[ 207.273158]
[ 207.273158] stack backtrace:
[ 207.273170] Pid: 10493, comm: bash Not tainted 3.9.0-rc1-0.7-default+ #34
[ 207.273180] Call Trace:
[ 207.273192] [<ffffffff810ab373>] print_circular_bug+0x223/0x310
[ 207.273204] [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0
[ 207.273216] [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0
[ 207.273227] [<ffffffff810ae329>] lock_acquire+0xe9/0x120
[ 207.273239] [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0
[ 207.273251] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360
[ 207.273263] [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0
[ 207.273274] [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0
[ 207.273286] [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0
[ 207.273298] [<ffffffff81349114>] device_del+0x134/0x1f0
[ 207.273309] [<ffffffff813491f2>] device_unregister+0x22/0x60
[ 207.273321] [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad
[ 207.273332] [<ffffffff814b6402>] notifier_call_chain+0x72/0x130
[ 207.273344] [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10
[ 207.273356] [<ffffffff81498f76>] _cpu_down+0x1d6/0x350
[ 207.273368] [<ffffffff81027557>] ? cpu_hotplug_driver_lock+0x17/0x20
[ 207.273380] [<ffffffff81499130>] cpu_down+0x40/0x60
[ 207.273391] [<ffffffff8149cc55>] store_online+0x75/0xe0
[ 207.273402] [<ffffffff813474a0>] dev_attr_store+0x20/0x30
[ 207.273413] [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150
[ 207.273425] [<ffffffff8118e10b>] vfs_write+0xcb/0x130
[ 207.273436] [<ffffffff8118e924>] sys_write+0x64/0xa0
[ 207.273447] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b
Which reports a false possitive deadlock because it sees:
1) load_module -> subsys_interface_register -> mc_deveice_add (*) -> subsys->p->mutex -> link_path_walk -> lookup_slow -> i_mutex
2) sys_write -> _cpu_down -> cpu_hotplug_begin -> cpu_hotplug.lock -> mce_cpu_callback -> mce_device_remove(**) -> device_unregister -> bus_remove_device -> subsys mutex
3) vfs_readdir -> i_mutex -> filldir64 -> might_fault -> might_lock_read(mmap_sem) -> page_fault -> mmap_sem -> drain_all_stock -> cpu_hotplug.lock
but
1) takes cpu_subsys subsys (*) but 2) takes mce_device subsys (**) so
the deadlock is not possible AFAICS.
The fix is quite simple. We can pull the key inside bus_type structure
because they are defined per device so the pointer will be unique as
well. bus_register doesn't need to be a macro anymore so change it
to the inline. We could get rid of __bus_register as there is no other
caller but maybe somebody will want to use a different key so keep it
around for now.
Reported-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Provide a feel of how much overhead the rbtree cache adds to
the game.
[Slightly reworded output in debugfs -- broonie]
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
Kay tells me the most appropriate place to expose workqueues to
userland would be /sys/devices/virtual/workqueues/WQ_NAME which is
symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking
a way to do that outside of driver core as virtual_device_parent()
isn't exported and there's no inteface to conveniently create a
virtual subsystem.
This patch implements subsys_virtual_register() by factoring out
subsys_register() from subsys_system_register() and using it with
virtual_device_parent() as the origin directory. It's identical to
subsys_system_register() other than the origin directory but we aren't
gonna restrict the device names which should be used under it.
This will be used to expose workqueue attributes to userland.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
|
|
In the rbtree code we are exposing statistics relating to the
number of nodes/registers of the rbtree cache for each of the
devices. Ensure that `map->debugfs' has been initialized before
we attempt to initialize the debugfs entry for the rbtree cache.
Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@vger.kernel.org
|