summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2008-12-29sparseirq: move __weak symbols into separate compilation unitYinghai Lu2-9/+20
GCC has a bug with __weak alias functions: if the functions are in the same compilation unit as their call site, GCC can decide to inline them - and thus rob the linker of the opportunity to override the weak alias with the real thing. So move all the IRQ handling related __weak symbols to kernel/irq/chip.c. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29sparseirq: work around __weak alias bugIngo Molnar2-5/+9
Impact: fix boot crash if the kernel is built with certain GCC versions GCC has a bug with __weak alias functions: if the functions are in the same compilation unit as their call site, GCC can decide to inline them - and thus rob the linker of the opportunity to override the weak alias with the real thing. This can lead to the boot crash reported by Kamalesh Babulal: ACPI: Core revision 20080926 Setting APIC routing to flat BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: [<ffffffff8021f9a8>] add_pin_to_irq_cpu+0x14/0x74 PGD 0 Oops: 0000 [#1] SMP [...] So move the arch_init_chip_data() function from handle.c to manage.c. Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-27sparseirq: fix hang with !SPARSE_IRQYinghai Lu1-0/+15
Impact: fix hang Suresh report his two sockets system only works with SPARSE_IRQ enable it turns out we miss the setting desc->irq so provide early_irq_init() even !SPARSE_IRQ to set desc->irq Reported-by: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-27sparseirq: set lock_class for legacy irq when sparse_irq is selectedYinghai Lu1-0/+1
Impact: add lockdep annotation to legacy IRQ descs Warnings resulting out of this were not seen in practice, but it's prudent to initialize the legacy descriptors to the lock class as well, symmetric to how we do it with other descriptors. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-27sparseirq: work around compiler optimizing away __weak functionsYinghai Lu1-3/+4
Impact: fix panic on null pointer with sparseirq Some GCC versions seem to inline the weak global function, when that function is empty. Work it around, by making the functions return a (dummy) integer. Signed-off-by: Yinghai <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-27sparseirq: fix desc->lock initIngo Molnar2-0/+3
Impact: cleanup init_one_irq_desc() does not initialize the desc->lock properly - you cannot init a lock by memcpying some other lock on it. This happens to work right now (because irq_desc_init is never in use), but it's a dangerous construct nevertheless, so fix it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-27sparseirq: do not printk when migrating IRQ descriptorsIngo Molnar1-5/+1
Impact: reduce printk noise There were a couple of leftover KERN_DEBUG debugging printks, remove them. Also clarify an error message. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-26sparseirq: remove duplicated arch_early_irq_init()Yinghai Lu1-4/+0
Impact: clean up We already have a weak copy of this function in init/main.c Signed-off-by: Yinghai <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-26irq: simplify for_each_irq_desc() usageKOSAKI Motohiro3-23/+0
Impact: cleanup all for_each_irq_desc() usage point have !desc check. then its check can move into for_each_irq_desc() macro. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-26proc: remove ifdef CONFIG_SPARSE_IRQ from stat.cKOSAKI Motohiro1-1/+1
Impact: cleanup irq_desc can be NULL when CONFIG_SPARSE_IRQ=y only. therefore, NULL checking can move into kstat_irqs_cpu() of SPARSE_IRQ version. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: "Yinghai Lu" <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-26irq: for_each_irq_desc() move to irqnr.hKOSAKI Motohiro1-2/+11
Impact: cleanup before CONFIG_SPARSE_IRQ age, for_each_irq_desc() sat in irqnr.h and could be called from generic code. CONFIG_SPARSE_IRQ breaks this assumption, but SPARSE_IRQ version for_each_irq_desc() also can move into irqnr.h easily. Also, this patch unifies CONFIG_SPARSE_IRQ and !CONFIG_SPARSE_IRQ for_each_irq_desc(). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-26hrtimer: remove #include <linux/irq.h>KOSAKI Motohiro1-1/+0
Impact: cleanup <linux/irq.h> can be removed and should be, because: - hrtimer doesn't use any irq feature. - <linux/irq.h> shouldn't be include from generic code. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-25Merge branches 'irq/sparseirq', 'irq/genirq' and 'irq/urgent'; commit ↵Ingo Molnar33-113/+551
'v2.6.28' into irq/core
2008-12-24cgroups: avoid accessing uninitialized data in failure pathLi Zefan1-2/+3
If cgroup_get_rootdir() failed, free_cg_links() will be called in the failure path, but tmp_cg_links hasn't been initialized at that time. I introduced this bug in the 2.6.27 merge window. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-24cgroups: suppress bogus warning messagesSharyathi Nagesh1-3/+0
Remove spurious warning messages that are thrown onto the console during cgroup operations. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Sharyathi Nagesh <sharyathi@in.ibm.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-21Null pointer deref with hrtimer_try_to_cancel()Thomas Gleixner1-0/+6
Impact: Prevent kernel crash with posix timer clockid CLOCK_MONOTONIC_RAW commit 2d42244ae71d6c7b0884b5664cf2eda30fb2ae68 (clocksource: introduce CLOCK_MONOTONIC_RAW) introduced a new clockid, which is only available to read out the raw not NTP adjusted system time. The above commit did not prevent that a posix timer can be created with that clockid. The timer_create() syscall succeeds and initializes the timer to a non existing hrtimer base. When the timer is deleted either by timer_delete() or by the exit() cleanup the kernel crashes. Prevent the creation of timers for CLOCK_MONOTONIC_RAW by setting the posix clock function to no_timer_create which returns an error code. Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-20sparseirq: fix numa_migrate_irq_desc dependency and commentsYinghai Lu1-8/+3
Impact: reduce kconfig variable scope and clean up Bartlomiej pointed out that the config dependencies and comments are not right. update it depend to NUMA, and fix some comments Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18locking, irq: enclose irq_desc_lock_class in CONFIG_LOCKDEPKOSAKI Motohiro1-5/+0
Impact: simplify code commit "08678b0: generic: sparse irqs: use irq_desc() [...]" introduced the irq_desc_lock_class variable. But it is used only if CONFIG_SPARSE_IRQ=Y or CONFIG_TRACE_IRQFLAGS=Y. Otherwise, following warnings happen: CC kernel/irq/handle.o kernel/irq/handle.c:26: warning: 'irq_desc_lock_class' defined but not used Actually, current early_init_irq_lock_class has a bit strange and messy ifdef. In addition, it is not valueable. 1. this function is protected by !CONFIG_SPARSE_IRQ, but that is not necessary. if CONFIG_SPARSE_IRQ=Y, desc of all irq number are initialized by NULL at first - then this function calling is safe. 2. this function protected by CONFIG_TRACE_IRQFLAGS too. but it is not necessary either, because lockdep_set_class() doesn't have bad side effect even if CONFIG_TRACE_IRQFLAGS=n. This patch bloat kernel size a bit on CONFIG_TRACE_IRQFLAGS=n and CONFIG_SPARSE_IRQ=Y - but that's ok. early_init_irq_lock_class() is not a fastpatch at all. To avoid messy ifdefs is more important than a few bytes diet. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17x86, sparseirq: move irq_desc according to smp_affinity, v7Yinghai Lu5-7/+153
Impact: improve NUMA handling by migrating irq_desc on smp_affinity changes if CONFIG_NUMA_MIGRATE_IRQ_DESC is set: - make irq_desc to go with affinity aka irq_desc moving etc - call move_irq_desc in irq_complete_move() - legacy irq_desc is not moved, because they are allocated via static array for logical apic mode, need to add move_desc_in_progress_in_same_domain, otherwise it will not be moved ==> also could need two phases to get irq_desc moved. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16cgroups: fix a race between rmdir and remountPaul Menage1-1/+1
When a cgroup is removed, it's unlinked from its parent's children list, but not actually freed until the last dentry on it is released (at which point cgrp->root->number_of_cgroups is decremented). Currently rebind_subsystems checks for the top cgroup's child list being empty in order to rebind subsystems into or out of a hierarchy - this can result in the set of subsystems bound to a hierarchy being removed-but-not-freed cgroup. The simplest fix for this is to forbid remounts that change the set of subsystems on a hierarchy that has removed-but-not-freed cgroups. This bug can be reproduced via: mkdir /mnt/cg mount -t cgroup -o ns,freezer cgroup /mnt/cg mkdir /mnt/cg/foo sleep 1h < /mnt/cg/foo & rmdir /mnt/cg/foo mount -t cgroup -o remount,ns,devices,freezer cgroup /mnt/cg kill $! Though the above will cause oops in -mm only but not mainline, but the bug can cause memory leak in mainline (and even oops) Signed-off-by: Paul Menage <menage@google.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-15Revert "sched_clock: prevent scd->clock from moving backwards"Linus Torvalds1-3/+3
This reverts commit 5b7dba4ff834259a5623e03a565748704a8fe449, which caused a regression in hibernate, reported by and bisected by Fabio Comolli. This revert fixes http://bugzilla.kernel.org/show_bug.cgi?id=12155 http://bugzilla.kernel.org/show_bug.cgi?id=12149 Bisected-by: Fabio Comolli <fabio.comolli@gmail.com> Requested-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-11Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: CPU remove deadlock fix
2008-12-11fix mapping_writably_mapped()Hugh Dickins1-6/+9
Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped test in 2.6.7! Bad bad bug, good good find. The i_mmap_writable count must be incremented for VM_SHARED (just as i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock) when dup_mmap() copies the vma for fork: it has its own more optimal version of __vma_link_file(), and I missed this out. So the count was later going down to 0 (dangerous) when one end unmapped, then wrapping negative (inefficient) when the other end unmapped. The only impact on x86 would have been that setting a mandatory lock on a file which has at some time been opened O_RDWR and mapped MAP_SHARED (but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN when it should succeed, or succeed when it should fail. But those architectures which rely on flush_dcache_page() to flush userspace modifications back into the page before the kernel reads it, may in some cases have skipped the flush after such a fork - though any repetitive test will soon wrap the count negative, in which case it will flush_dcache_page() unnecessarily. Fix would be a two-liner, but mapping variable added, and comment moved. Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10KSYM_SYMBOL_LEN fixesHugh Dickins1-1/+1
Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked to my 966c8c12dc9e77f931e2281ba25d2f0244b06949 sprint_symbol(): use less stack exposing a bug in slub's list_locations() - kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was beyond the end of page provided. The 100 slop which list_locations() allows at end of page looks roughly enough for all the other stuff it might print after the symbol before it checks again: break out KSYM_SYMBOL_LEN earlier than before. Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies them. [akpm@linux-foundation.org: ftrace.h needs module.h] Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc Miles Lane <miles.lane@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10relayfs: fix infinite loop with splice()Tom Zanussi1-5/+2
Running kmemtraced, which uses splice() on relayfs, causes a hard lock on x86-64 SMP. As described by Tom Zanussi: It looks like you hit the same problem as described here: commit 8191ecd1d14c6914c660dfa007154860a7908857 splice: fix infinite loop in generic_file_splice_read() relay uses the same loop but it never got noticed or fixed. Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Tested-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-09sched: CPU remove deadlock fixBrian King1-0/+2
Impact: fix possible deadlock in CPU hot-remove path This patch fixes a possible deadlock scenario in the CPU remove path. migration_call grabs rq->lock, then wakes up everything on rq->migration_queue with the lock held. Then one of the tasks on the migration queue ends up calling tg_shares_up which then also tries to acquire the same rq->lock. [c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0 [c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c [c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc [c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4 [c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0 [c000000058eab640] c00000000007ada4 .complete+0x54/0x80 [c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8 [c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0 [c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4 [c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108 [c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8 [c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50 [c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c [c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc [c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114 [c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40 Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-09[PATCH] fix broken timestamps in AVC generated by kernel threadsAl Viro2-4/+5
Timestamp in audit_context is valid only if ->in_syscall is set. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-09[patch 1/1] audit: remove excess kernel-docRandy Dunlap1-2/+0
Delete excess kernel-doc notation in kernel/auditsc.c: Warning(linux-2.6.27-git10//kernel/auditsc.c:1481): Excess function parameter or struct member 'tsk' description in 'audit_syscall_entry' Warning(linux-2.6.27-git10//kernel/auditsc.c:1564): Excess function parameter or struct member 'tsk' description in 'audit_syscall_exit' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-09[PATCH] return records for fork() both to child and parentAl Viro2-0/+18
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-09[PATCH] Audit: make audit=0 actually turn off auditEric Paris1-7/+21
Currently audit=0 on the kernel command line does absolutely nothing. Audit always loads and always uses its resources such as creating the kernel netlink socket. This patch causes audit=0 to actually disable audit. Audit will use no resources and starting the userspace auditd daemon will not cause the kernel audit system to activate. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-08x86: use NR_IRQS_LEGACYYinghai Lu1-3/+3
Impact: cleanup Introduce NR_IRQS_LEGACY instead of hard coded number. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08sparse irq_desc[] array: core kernel and x86 changesYinghai Lu5-8/+202
Impact: new feature Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with NR_CPUS set to large values. The goal is to be able to scale up to much larger NR_IRQS value without impacting the (important) common case. To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of irq_desc pointers. When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc, this also makes the IRQ descriptors NUMA-local (to the site that calls request_irq()). This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now uses desc->chip_data for x86 to store irq_cfg. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-05Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: [PATCH] fix bogus argument of blkdev_put() in pktcdvd [PATCH 2/2] documnt FMODE_ constants [PATCH 1/2] kill FMODE_NDELAY_NOW [PATCH] clean up blkdev_get a little bit [PATCH] Fix block dev compat ioctl handling [PATCH] kill obsolete temporary comment in swsusp_close()
2008-12-05Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds2-1/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: time: catch xtime_nsec underflows and fix them posix-cpu-timers: fix clock_gettime with CLOCK_PROCESS_CPUTIME_ID
2008-12-05Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: check_hung_task(): unsigned sysctl_hung_task_warnings cannot be less than 0 documentation: local_ops fix on_each_cpu
2008-12-04[PATCH] kill obsolete temporary comment in swsusp_close()Al Viro1-1/+1
it had been put there to mark the call of blkdev_put() that needed proper argument propagated to it; later patch in the same series had done just that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-04time: catch xtime_nsec underflows and fix themjohn stultz1-0/+22
Impact: fix time warp bug Alex Shi, along with Yanmin Zhang have been noticing occasional time inconsistencies recently. Through their great diagnosis, they found that the xtime_nsec value used in update_wall_time was occasionally going negative. After looking through the code for awhile, I realized we have the possibility for an underflow when three conditions are met in update_wall_time(): 1) We have accumulated a second's worth of nanoseconds, so we incremented xtime.tv_sec and appropriately decrement xtime_nsec. (This doesn't cause xtime_nsec to go negative, but it can cause it to be small). 2) The remaining offset value is large, but just slightly less then cycle_interval. 3) clocksource_adjust() is speeding up the clock, causing a corrective amount (compensating for the increase in the multiplier being multiplied against the unaccumulated offset value) to be subtracted from xtime_nsec. This can cause xtime_nsec to underflow. Unfortunately, since we notify the NTP subsystem via second_overflow() whenever we accumulate a full second, and this effects the error accumulation that has already occured, we cannot simply revert the accumulated second from xtime nor move the second accumulation to after the clocksource_adjust call without a change in behavior. This leaves us with (at least) two options: 1) Simply return from clocksource_adjust() without making a change if we notice the adjustment would cause xtime_nsec to go negative. This would work, but I'm concerned that if a large adjustment was needed (due to the error being large), it may be possible to get stuck with an ever increasing error that becomes too large to correct (since it may always force xtime_nsec negative). This may just be paranoia on my part. 2) Catch xtime_nsec if it is negative, then add back the amount its negative to both xtime_nsec and the error. This second method is consistent with how we've handled earlier rounding issues, and also has the benefit that the error being added is always in the oposite direction also always equal or smaller then the correction being applied. So the risk of a corner case where things get out of control is lessened. This patch fixes bug 11970, as tested by Yanmin Zhang http://bugzilla.kernel.org/show_bug.cgi?id=11970 Reported-by: alex.shi@intel.com Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Tested-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03check_hung_task(): unsigned sysctl_hung_task_warnings cannot be less than 0Roel Kluin1-1/+1
Impact: fix warnings-limit cutoff check for debug feature unsigned sysctl_hung_task_warnings cannot be less than 0 Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-02genirq: record IRQ_LEVEL in irq_desc[]David Brownell2-6/+10
Impact: fix __irq_set_trigger() for IRQ_LEVEL When recording the irq trigger type, let's also make sure that IRQ_LEVEL gets set correctly. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-02taint: add missing commentArjan van de Ven1-0/+1
The description for 'D' was missing in the comment... (causing me a minute of WTF followed by looking at more of the code) Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-02epoll: introduce resource usage limitsDavide Libenzi1-0/+10
It has been thought that the per-user file descriptors limit would also limit the resources that a normal user can request via the epoll interface. Vegard Nossum reported a very simple program (a modified version attached) that can make a normal user to request a pretty large amount of kernel memory, well within the its maximum number of fds. To solve such problem, default limits are now imposed, and /proc based configuration has been introduced. A new directory has been created, named /proc/sys/fs/epoll/ and inside there, there are two configuration points: max_user_instances = Maximum number of devices - per user max_user_watches = Maximum number of "watched" fds - per user The current default for "max_user_watches" limits the memory used by epoll to store "watches", to 1/32 of the amount of the low RAM. As example, a 256MB 32bit machine, will have "max_user_watches" set to roughly 90000. That should be enough to not break existing heavy epoll users. The default value for "max_user_instances" is set to 128, that should be enough too. This also changes the userspace, because a new error code can now come out from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already listed, so that should be ok. [akpm@linux-foundation.org: use get_current_user()] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <stable@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Reported-by: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: prevent divide by zero error in cpu_avg_load_per_task, update sched, cpusets: fix warning in kernel/cpuset.c sched: prevent divide by zero error in cpu_avg_load_per_task
2008-12-01Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds4-27/+56
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq.h: fix missing/extra kernel-doc genirq: __irq_set_trigger: change pr_warning to pr_debug irq: fix typo x86: apic honour irq affinity which was set in early boot genirq: fix the affinity setting in setup_irq genirq: keep affinities set from userspace across free/request_irq()
2008-12-01Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: consistent alignement for lockdep info
2008-12-01Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds3-19/+23
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ftrace: prevent recursion tracing, doc: update mmiotrace documentation x86, mmiotrace: fix buffer overrun detection function tracing: fix wrong position computing of stack_trace
2008-11-30remove __ARCH_WANT_COMPAT_SYS_PTRACEChristoph Hellwig1-2/+2
All architectures now use the generic compat_sys_ptrace, as should every new architecture that needs 32bit compat (if we'll ever get another). Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also kill a comment about __ARCH_SYS_PTRACE that was added after __ARCH_SYS_PTRACE was already gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30cpuinit fixes in kernel/*Al Viro2-3/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-29sched: prevent divide by zero error in cpu_avg_load_per_task, updateIngo Molnar1-1/+1
Regarding the bug addressed in: 4cd4262: sched: prevent divide by zero error in cpu_avg_load_per_task Linus points out that the fix is not complete: > There's nothing that keeps gcc from deciding not to reload > rq->nr_running. > > Of course, in _practice_, I don't think gcc ever will (if it decides > that it will spill, gcc is likely going to decide that it will > literally spill the local variable to the stack rather than decide to > reload off the pointer), but it's a valid compiler optimization, and > it even has a name (rematerialization). > > So I suspect that your patch does fix the bug, but it still leaves the > fairly unlikely _potential_ for it to re-appear at some point. > > We have ACCESS_ONCE() as a macro to guarantee that the compiler > doesn't rematerialize a pointer access. That also would clarify > the fact that we access something unsafe outside a lock. So make sure our nr_running value is immutable and cannot change after we check it for nonzero. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-29sched, cpusets: fix warning in kernel/cpuset.cIngo Molnar1-1/+1
this warning: kernel/cpuset.c: In function ‘generate_sched_domains’: kernel/cpuset.c:588: warning: ‘ndoms’ may be used uninitialized in this function triggers because GCC does not recognize that ndoms stays uninitialized only if doms is NULL - but that flow is covered at the end of generate_sched_domains(). Help out GCC by initializing this variable to 0. (that's prudent anyway) Also, this function needs a splitup and code flow simplification: with 160 lines length it's clearly too long. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27sched: prevent divide by zero error in cpu_avg_load_per_taskSteven Rostedt1-2/+3
Impact: fix divide by zero crash in scheduler rebalance irq While testing the branch profiler, I hit this crash: divide error: 0000 [#1] PREEMPT SMP [...] RIP: 0010:[<ffffffff8024a008>] [<ffffffff8024a008>] cpu_avg_load_per_task+0x50/0x7f [...] Call Trace: <IRQ> <0> [<ffffffff8024fd43>] find_busiest_group+0x3e5/0xcaa [<ffffffff8025da75>] rebalance_domains+0x2da/0xa21 [<ffffffff80478769>] ? find_next_bit+0x1b2/0x1e6 [<ffffffff8025e2ce>] run_rebalance_domains+0x112/0x19f [<ffffffff8026d7c2>] __do_softirq+0xa8/0x232 [<ffffffff8020ea7c>] call_softirq+0x1c/0x3e [<ffffffff8021047a>] do_softirq+0x94/0x1cd [<ffffffff8026d5eb>] irq_exit+0x6b/0x10e [<ffffffff8022e6ec>] smp_apic_timer_interrupt+0xd3/0xff [<ffffffff8020e4b3>] apic_timer_interrupt+0x13/0x20 The code for cpu_avg_load_per_task has: if (rq->nr_running) rq->avg_load_per_task = rq->load.weight / rq->nr_running; The runqueue lock is not held here, and there is nothing that prevents the rq->nr_running from going to zero after it passes the if condition. The branch profiler simply made the race window bigger. This patch saves off the rq->nr_running to a local variable and uses that for both the condition and the division. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>