summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2011-03-10SUNRPC: Close a race in __rpc_wait_for_completion_task()Trond Myklebust1-0/+1
Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10hrtimer: Update hrtimer->state documentationThomas Gleixner1-5/+11
We changed some of the state bits and combinations thereof over time, but never updated the documentation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-10tracing: Remove lock_depth from event entrySteven Rostedt1-1/+0
The lock_depth field in the event headers was added as a temporary data point for help in removing the BKL. Now that the BKL is pretty much been removed, we can remove this field. This in turn changes the header from 12 bytes to 8 bytes, removing the 4 byte buffer that gcc would insert if the first field in the data load was 8 bytes in size. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10Merge branch 'for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules
2011-03-10sysctl: the include of rcupdate.h is only needed in the kernelStephen Rothwell1-1/+1
Fixes this build-check error: include/linux/sysctl.h:28: included file 'linux/rcupdate.h' is not exported Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-10net: don't allow CAP_NET_ADMIN to load non-netdev kernel modulesVasiliy Kulikov1-0/+3
Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't allow anybody load any module not related to networking. This patch restricts an ability of autoloading modules to netdev modules with explicit aliases. This fixes CVE-2011-1019. Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior of loading netdev modules by name (without any prefix) for processes with CAP_SYS_MODULE to maintain the compatibility with network scripts that use autoloading netdev modules by aliases like "eth0", "wlan0". Currently there are only three users of the feature in the upstream kernel: ipip, ip_gre and sit. root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) -- root@albatros:~# grep Cap /proc/$$/status CapInh: 0000000000000000 CapPrm: fffffff800001000 CapEff: fffffff800001000 CapBnd: fffffff800001000 root@albatros:~# modprobe xfs FATAL: Error inserting xfs (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted root@albatros:~# lsmod | grep xfs root@albatros:~# ifconfig xfs xfs: error fetching interface information: Device not found root@albatros:~# lsmod | grep xfs root@albatros:~# lsmod | grep sit root@albatros:~# ifconfig sit sit: error fetching interface information: Device not found root@albatros:~# lsmod | grep sit root@albatros:~# ifconfig sit0 sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 root@albatros:~# lsmod | grep sit sit 10457 0 tunnel4 2957 1 sit For CAP_SYS_MODULE module loading is still relaxed: root@albatros:~# grep Cap /proc/$$/status CapInh: 0000000000000000 CapPrm: ffffffffffffffff CapEff: ffffffffffffffff CapBnd: ffffffffffffffff root@albatros:~# ifconfig xfs xfs: error fetching interface information: Device not found root@albatros:~# lsmod | grep xfs xfs 745319 0 Reference: https://lkml.org/lkml/2011/2/24/203 Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-03-10Merge branch 'for-linus' of ↵Linus Torvalds1-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: nd->inode is not set on the second attempt in path_walk() unfuck proc_sysctl ->d_compare() minimal fix for do_filp_open() race
2011-03-09RTC: Cleanup rtc_class_ops->update_irq_enable()John Stultz1-1/+0
Now that the generic code handles UIE mode irqs via periodic alarm interrupts, no one calls the rtc_class_ops->update_irq_enable() method anymore. This patch removes the driver hooks and implementations of update_irq_enable if no one else is calling it. CC: Thomas Gleixner <tglx@linutronix.de> CC: Alessandro Zummo <a.zummo@towertech.it> CC: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br> CC: rtc-linux@googlegroups.com Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-03-09RTC: Cleanup rtc_class_ops->irq_set_freq()John Stultz1-2/+0
With the generic rtc code now emulating PIE mode irqs via an hrtimer, no one calls the rtc_class_ops->irq_set_freq call. This patch removes the hook and deletes the driver functions if no one else calls them. CC: Thomas Gleixner <tglx@linutronix.de> CC: Alessandro Zummo <a.zummo@towertech.it> CC: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br> CC: rtc-linux@googlegroups.com Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-03-09RTC: Cleanup rtc_class_ops->irq_set_stateJohn Stultz1-1/+0
With PIE mode interrupts now emulated in generic code via an hrtimer, no one calls rtc_class_ops->irq_set_state(), so this patch removes it along with driver implementations. CC: Thomas Gleixner <tglx@linutronix.de> CC: Alessandro Zummo <a.zummo@towertech.it> CC: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br> CC: rtc-linux@googlegroups.com Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-03-09RTC: Initialize kernel state from RTCJohn Stultz1-0/+1
Mark Brown pointed out a corner case: that RTC alarms should be allowed to be persistent across reboots if the hardware supported it. The rework of the generic layer to virtualize the RTC alarm virtualized much of the alarm handling, and removed the code used to read the alarm time from the hardware. Mark noted if we want the alarm to be persistent across reboots, we need to re-read the alarm value into the virtualized generic layer at boot up, so that the generic layer properly exposes that value. This patch restores much of the earlier removed rtc_read_alarm code and wires it in so that we set the kernel's alarm value to what we find in the hardware at boot time. NOTE: Not all hardware supports persistent RTC alarm state across system reset. rtc-cmos for example will keep the alarm time, but disables the AIE mode irq. Applications should not expect the RTC alarm to be valid after a system reset. We will preserve what we can, to represent the hardware state at boot, but its not guarenteed. Further, in the future, with multiplexed RTC alarms, the soonest alarm to fire may not be the one set via the /dev/rt ioctls. So an application may set the alarm with RTC_ALM_SET, but after a reset find that RTC_ALM_READ returns an earlier time. Again, we preserve what we can, but applications should not expect the RTC alarm state to persist across a system reset. Big thanks to Mark for pointing out the issue! Thanks also to Marcelo for helping think through the solution. CC: Mark Brown <broonie@opensource.wolfsonmicro.com> CC: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br> CC: Thomas Gleixner <tglx@linutronix.de> CC: Alessandro Zummo <a.zummo@towertech.it> CC: rtc-linux@googlegroups.com Reported-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-03-09tracing: Add an 'overwrite' trace_option.David Sharp1-0/+2
Add an "overwrite" trace_option for ftrace to control whether the buffer should be overwritten on overflow or not. The default remains to overwrite old events when the buffer is full. This patch adds the option to instead discard newest events when the buffer is full. This is useful to get a snapshot of traces just after enabling traces. Dropping the current event is also a simpler code path. Signed-off-by: David Sharp <dhsharp@google.com> LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-08Merge commit 'v2.6.38-rc8' into perf/coreIngo Molnar12-27/+41
Merge reason: Merge latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-08debugobjects: Add hint for better object identificationStanislaw Gruszka1-1/+4
In complex subsystems like mac80211 structures can contain several timers and work structs, so identifying a specific instance from the call trace and object type output of debugobjects can be hard. Allow the subsystems which support debugobjects to provide a hint function. This function returns a pointer to a kernel address (preferrably the objects callback function) which is printed along with the debugobjects type. Add hint methods for timer_list, work_struct and hrtimer. [ tglx: Massaged changelog, made it compile ] Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <20110307085809.GA9334@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-08unfuck proc_sysctl ->d_compare()Al Viro1-4/+10
a) struct inode is not going to be freed under ->d_compare(); however, the thing PROC_I(inode)->sysctl points to just might. Fortunately, it's enough to make freeing that sucker delayed, provided that we don't step on its ->unregistering, clear the pointer to it in PROC_I(inode) before dropping the reference and check if it's NULL in ->d_compare(). b) I'm not sure that we *can* walk into NULL inode here (we recheck dentry->seq between verifying that it's still hashed / fetching dentry->d_inode and passing it to ->d_compare() and there's no negative hashed dentries in /proc/sys/*), but if we can walk into that, we really should not have ->d_compare() return 0 on it! Said that, I really suspect that this check can be simply killed. Nick? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-08Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into nextJames Morris3-16/+24
2011-03-08KEYS: Add an iovec version of KEYCTL_INSTANTIATEDavid Howells1-0/+1
Add a keyctl op (KEYCTL_INSTANTIATE_IOV) that is like KEYCTL_INSTANTIATE, but takes an iovec array and concatenates the data in-kernel into one buffer. Since the KEYCTL_INSTANTIATE copies the data anyway, this isn't too much of a problem. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-03-08KEYS: Add a new keyctl op to reject a key with a specified error codeDavid Howells3-1/+12
Add a new keyctl op to reject a key with a specified error code. This works much the same as negating a key, and so keyctl_negate_key() is made a special case of keyctl_reject_key(). The difference is that keyctl_negate_key() selects ENOKEY as the error to be reported. Typically the key would be rejected with EKEYEXPIRED, EKEYREVOKED or EKEYREJECTED, but this is not mandatory. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-03-08KEYS: Add a key type op to permit the key description to be vettedDavid Howells1-0/+3
Add a key type operation to permit the key type to vet the description of a new key that key_alloc() is about to allocate. The operation may reject the description if it wishes with an error of its choosing. If it does this, the key will not be allocated. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-03-08KEYS: Add an RCU payload dereference macroDavid Howells1-0/+4
Add an RCU payload dereference macro as this seems to be a common piece of code amongst key types that use RCU referenced payloads. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2011-03-08Merge branch 'master'; commit 'v2.6.38-rc7' into nextJames Morris231-1637/+3724
2011-03-05Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: no .snap inside of snapped namespace libceph: fix msgr standby handling libceph: fix msgr keepalive flag libceph: fix msgr backoff libceph: retry after authorization failure libceph: fix handling of short returns from get_user_pages ceph: do not clear I_COMPLETE from d_release ceph: do not set I_COMPLETE Revert "ceph: keep reference to parent inode on ceph_dentry"
2011-03-05mm: add alloc_page_vma_node()Andi Kleen1-0/+2
Add a alloc_page_vma_node that allows passing the "local" node in. Used in a followon patch. Acked-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-05mm: change alloc_pages_vma to pass down the policy node for local policyAndi Kleen1-4/+5
Currently alloc_pages_vma() always uses the local node as policy node for the LOCAL policy. Pass this node down as an argument instead. No behaviour change from this patch, but will be needed for followons. Acked-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04libceph: fix msgr keepalive flagSage Weil1-1/+0
There was some broken keepalive code using a dead variable. Shift to using the proper bit flag. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04libceph: fix msgr backoffSage Weil1-0/+1
With commit f363e45f we replaced a bunch of hacky workqueue mutual exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is that the exponential backoff breaks in certain cases: * con_work attempts to connect. * we get an immediate failure, and the socket state change handler queues immediate work. * con_work calls con_fault, we decide to back off, but can't queue delayed work. In this case, we add a BACKOFF bit to make con_work reschedule delayed work next time it runs (which should be immediately). Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04Mark ptrace_{traceme,attach,detach} staticLinus Torvalds1-3/+0
They are only used inside kernel/ptrace.c, and have been for a long time. We don't want to go back to the bad-old-days when architectures did things on their own, so make them static and private. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04perf: Add support for supplementary event registersAndi Kleen1-2/+11
Change logs against Andi's original version: - Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra) - Fixed a major event scheduling issue. There cannot be a ref++ on an event that has already done ref++ once and without calling put_constraint() in between. (Stephane Eranian) - Use thread_cpumask for percore allocation. (Lin Ming) - Use MSR names in the extra reg lists. (Lin Ming) - Remove redundant "c = NULL" in intel_percore_constraints - Fix comment of perf_event_attr::config1 Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event that can be used to monitor any offcore accesses from a core. This is a very useful event for various tunings, and it's also needed to implement the generic LLC-* events correctly. Unfortunately this event requires programming a mask in a separate register. And worse this separate register is per core, not per CPU thread. This patch: - Teaches perf_events that OFFCORE_RESPONSE needs extra parameters. The extra parameters are passed by user space in the perf_event_attr::config1 field. - Adds support to the Intel perf_event core to schedule per core resources. This adds fairly generic infrastructure that can be also used for other per core resources. The basic code has is patterned after the similar AMD northbridge constraints code. Thanks to Stephane Eranian who pointed out some problems in the original version and suggested improvements. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1299119690-13991-2-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-04Merge branch 'sched/urgent' into sched/coreIngo Molnar9-19/+33
Merge reason: Add fixes before applying dependent patches. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-04Merge branch 'perf/urgent' into perf/coreIngo Molnar6-10/+31
Merge reason: Pick up updates before queueing up dependent patches. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-04Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2-5/+1
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: kill loop_mutex blktrace: Remove blk_fill_rwbs_rq. block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue() block: add @force_kblockd to __blk_run_queue() block: fix kernel-doc format for blkdev_issue_zeroout blk-throttle: Do not use kblockd workqueue for throtl work
2011-03-04LSM: Pass -o remount options to the LSMEric Paris1-0/+13
The VFS mount code passes the mount options to the LSM. The LSM will remove options it understands from the data and the VFS will then pass the remaining options onto the underlying filesystem. This is how options like the SELinux context= work. The problem comes in that -o remount never calls into LSM code. So if you include an LSM specific option it will get passed to the filesystem and will cause the remount to fail. An example of where this is a problem is the 'seclabel' option. The SELinux LSM hook will print this word in /proc/mounts if the filesystem is being labeled using xattrs. If you pass this word on mount it will be silently stripped and ignored. But if you pass this word on remount the LSM never gets called and it will be passed to the FS. The FS doesn't know what seclabel means and thus should fail the mount. For example an ext3 fs mounted over loop # mount -o loop /tmp/fs /mnt/tmp # cat /proc/mounts | grep /mnt/tmp /dev/loop0 /mnt/tmp ext3 rw,seclabel,relatime,errors=continue,barrier=0,data=ordered 0 0 # mount -o remount /mnt/tmp mount: /mnt/tmp not mounted already, or bad option # dmesg EXT3-fs (loop0): error: unrecognized mount option "seclabel" or missing value This patch passes the remount mount options to an new LSM hook. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: James Morris <jmorris@namei.org>
2011-03-03blktrace: Remove blk_fill_rwbs_rq.Tao Ma1-1/+0
If we enable trace events to trace block actions, We use blk_fill_rwbs_rq to analyze the corresponding actions in request's cmd_flags, but we only choose the minor 2 bits from it, so most of other flags(e.g, REQ_SYNC) are missing. For example, with a sync write we get: write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + = 8 [write_test] Since now we have integrated the flags of both bio and request, it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and blk_fill_rwbs_rq isn't needed any more. With this patch, after a sync write we get: write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 += 8 [write_test] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-03Merge branch '/tip/perf/filter' of ↵Frederic Weisbecker1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git into perf/core
2011-03-02block: add @force_kblockd to __blk_run_queue()Tejun Heo1-1/+1
__blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02mfd: Don't suspend WM8994 if the CODEC is not suspendedMark Brown1-0/+1
ASoC supports keeping the audio subsysetm active over suspend in order to support use cases such as audio passthrough from a cellular modem with the main CPU suspended. Ensure that we don't power down the CODEC when this is happening by checking to see if VMID is up and skipping suspend and resume when it is. If the CODEC has suspended then it'll turn VMID off before the core suspend() gets called. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-02libata: remove ATA_FLAG_LPMSergei Shtylyov1-1/+0
Commit 6b7ae9545ad9875a289f4191c0216b473e313cb9 (libata: reimplement link power management) removed the check of ATA_FLAG_LPM but neglected to remove the flag itself. Do it now... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02libata: remove ATA_FLAG_NO_LEGACYSergei Shtylyov1-1/+0
All checks of ATA_FLAG_NO_LEGACY have been removed by the commits c791c30670ea61f19eec390124128bf278e854fe ([libata] minor PCI IDE probe fixes and cleanups) and f0d36efdc624beb3d9e29b9ab9e9537bf0f25d5b (libata: update libata core layer to use devres), so I think it's time to finally get rid of this flag... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02libata: remove ATA_FLAG_MMIOSergei Shtylyov1-1/+0
Commit 0d5ff566779f894ca9937231a181eb31e4adff0e (libata: convert to iomap) removed all checks of ATA_FLAG_MMIO but neglected to remove the flag itself. Do it now, at last... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02libata: remove ATA_FLAG_{SRST|SATA_RESET}Sergei Shtylyov1-2/+0
These flags are marked as obsolete and the checks for them have been removed by commit 294440887b32c58d220fb54b73b7a58079b78f20 (libata-sff: kill unused ata_bus_reset()), so I think it's time to finally get rid of them... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02libata: separate error handler into usable componentsJames Bottomley1-0/+2
Right at the moment, the libata error handler is incredibly monolithic. This makes it impossible to use from composite drivers like libsas and ipr which have to handle error themselves in the first instance. The essence of the change is to split the monolithic error handler into two components: one which handles a queue of ata commands for processing and the other which handles the back end of readying a port. This allows the upper error handler fine grained control in calling libsas functions (and making sure they only get called for ATA commands whose lower errors have been fixed up). Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-01blk-throttle: Do not use kblockd workqueue for throtl workVivek Goyal1-3/+0
o Dominik Klein reported a system hang issue while doing some blkio throttling testing. https://lkml.org/lkml/2011/2/24/173 o Some tracing revealed that CFQ was not dispatching any more jobs as queue unplug was not happening. And queue unplug was not happening because unplug work was not being called as there was one throttling work on same cpu which as not finished yet. And throttling work had not finished as it was tyring to dispatch a bio to CFQ but all the request descriptors were consume to it was put to sleep. o So basically it is a cyclic dependecny between CFQ unplug work and throtl dispatch work. Tejun suggested that use separate workqueue for such cases. o This patch uses a separate workqueue for throttle related work and does not rely on kblockd workqueue anymore. Cc: stable@kernel.org Reported-by: Dominik Klein <dk@in-telegence.net> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-01ACPI: Fix build for CONFIG_NET unsetRafael J. Wysocki1-0/+8
Several ACPI drivers fail to build if CONFIG_NET is unset, because they refer to things depending on CONFIG_THERMAL that in turn depends on CONFIG_NET. However, CONFIG_THERMAL doesn't really need to depend on CONFIG_NET, because the only part of it requiring CONFIG_NET is the netlink interface in thermal_sys.c. Put the netlink interface in thermal_sys.c under #ifdef CONFIG_NET and remove the dependency of CONFIG_THERMAL on CONFIG_NET from drivers/thermal/Kconfig. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Len Brown <lenb@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Luming Yu <luming.yu@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-28percpu: Generic support for this_cpu_cmpxchg_double()Christoph Lameter1-0/+128
Introduce this_cpu_cmpxchg_double(). this_cpu_cmpxchg_double() allows the comparison between two consecutive words and replaces them if there is a match. bool this_cpu_cmpxchg_double(pcp1, pcp2, old_word1, old_word2, new_word1, new_word2) this_cpu_cmpxchg_double does not return the old value (difficult since there are two words) but a boolean indicating if the operation was successful. The first percpu variable must be double word aligned! -tj: Updated to return bool instead of int, converted size check to BUILD_BUG_ON() instead of VM_BUG_ON() and other cosmetic changes. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2011-02-26genirq: Provide forced interrupt threadingThomas Gleixner1-0/+7
Add a commandline parameter "threadirqs" which forces all interrupts except those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to allow retrieving better debug data from crashing interrupt handlers. If "threadirqs" is not enabled on the kernel command line, then there is no impact in the interrupt hotpath. Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after marking the interrupts which cant be threaded IRQF_NO_THREAD. All interrupts which have IRQF_TIMER set are implict marked IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded. Forced threading hard interrupts also forces all soft interrupt handling into thread context. When enabled it might slow down things a bit, but for debugging problems in interrupt code it's a reasonable penalty as it does not immediately crash and burn the machine when an interrupt handler is buggy. Some test results on a Core2Duo machine: Cache cold run of: # time git grep irq_desc non-threaded threaded real 1m18.741s 1m19.061s user 0m1.874s 0m1.757s sys 0m5.843s 0m5.427s # iperf -c server non-threaded [ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec [ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec [ 3] 0.0-10.0 sec 1.09 GBytes 933 Mbits/sec threaded [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec [ 3] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec [ 3] 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20110223234956.772668648@linutronix.de>
2011-02-26Merge branch 'pm-fixes' of ↵Linus Torvalds2-11/+16
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: Make ACPI wakeup from S5 work again when CONFIG_PM_SLEEP is unset
2011-02-26rapidio: fix sysfs config attribute to access 16MB of maint spaceAlexandre Bounine1-1/+3
Fixes sysfs config attribute to allow access to entire 16MB maintenance space of RapidIO devices. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-25genirq: Add IRQF_NO_THREADThomas Gleixner1-1/+3
Some low level interrupts cannot be threaded even when we force thread all interrupt handlers. Add a flag to annotate such interrupts. Add all timer interrupts to this category by default. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20110223234956.578893460@linutronix.de>
2011-02-25genirq: Prepare the handling of shared oneshot interruptsThomas Gleixner2-0/+4
For level type interrupts we need to track how many threads are on flight to avoid useless interrupt storms when not all thread handlers have finished yet. Keep track of the woken threads and only unmask when there are no more threads in flight. Yes, I'm lazy and using a bitfield. But not only because I'm lazy, the main reason is that it's way simpler than using a refcount. A refcount based solution would need to keep track of various things like crashing the irq thread, spurious interrupts coming in, disables/enables, free_irq() and some more. The bitfield keeps the tracking simple and makes things just work. It's also nicely confined to the thread code pathes and does not require additional checks all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20110223234956.388095876@linutronix.de>
2011-02-25Merge branch 'for-linus' of git://neil.brown.name/mdLinus Torvalds1-1/+1
* 'for-linus' of git://neil.brown.name/md: md: Fix - again - partition detection when array becomes active Fix over-zealous flush_disk when changing device size. md: avoid spinlock problem in blk_throtl_exit md: correctly handle probe of an 'mdp' device. md: don't set_capacity before array is active. md: Fix raid1->raid0 takeover