Age | Commit message (Collapse) | Author | Files | Lines |
|
Since lazy MMU batching mode still allows interrupts to enter, it is
possible for interrupt handlers to try to use kmap_atomic, which fails when
lazy mode is active, since the PTE update to highmem will be delayed. The
best workaround is to issue an explicit flush in kmap_atomic_functions
case; this is the only way nested PTE updates can happen in the interrupt
handler.
Thanks to Jeremy Fitzhardinge for noting the bug and suggestions on a fix.
This patch gets reverted again when we start 2.6.22 and the bug gets fixed
differently.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Soeren Sonnenburg reported that upon resume he is getting
this backtrace:
[<c0119637>] smp_apic_timer_interrupt+0x57/0x90
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0104d30>] apic_timer_interrupt+0x28/0x30
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0140068>] __kfifo_put+0x8/0x90
[<c0130fe5>] on_each_cpu+0x35/0x60
[<c0143538>] clock_was_set+0x18/0x20
[<c0135cdc>] timekeeping_resume+0x7c/0xa0
[<c02aabe1>] __sysdev_resume+0x11/0x80
[<c02ab0c7>] sysdev_resume+0x47/0x80
[<c02b0b05>] device_power_up+0x5/0x10
it turns out that on resume we mistakenly re-enable interrupts too
early. Do the timer retrigger only on the current CPU.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Soeren Sonnenburg <kernel@nn7.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Just a one-byter for an ia64 thinko/typo - already fixed for i386 and x86_64.
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Revert all this. It can cause device-mapper to receive a different major from
earlier kernels and it turns out that the Amanda backup program (via GNU tar,
apparently) checks major numbers on files when performing incremental backups.
Which is a bit broken of Amanda (or tar), but this feature isn't important
enough to justify the churn.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A device can be removed from an md array via e.g.
echo remove > /sys/block/md3/md/dev-sde/state
This will try to remove the 'dev-sde' subtree which will deadlock
since
commit e7b0d26a86943370c04d6833c6edba2a72a6e240
With this patch we run the kobject_del via schedule_work so as to
avoid the deadlock.
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
patch 4/4:
Limit ATAPI DMA to R/W commands only for TORiSAN DRD-N216 DVD-ROM drives
(http://bugzilla.kernel.org/show_bug.cgi?id=6710)
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
|
patch 3/4:
The TORiSAN drive locks up when max sector == 256.
Limit max sector to 128 for the TORiSAN DRD-N216 drives.
(http://bugzilla.kernel.org/show_bug.cgi?id=6710)
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
|
patch 1/4:
Reorder HSM_ST_FIRST, such that the task state transition is easier decoded with human eyes.
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
|
Add unsigned to unused bit field in a.out.h to make sparse happy.
[ I took care of the sparc64 side as well -DaveM ]
Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
[PATCH] x86: Don't probe for DDC on VBE1.2
[PATCH] x86-64: Increase NMI watchdog probing timeout
[PATCH] x86-64: Let oprofile reserve MSR on all CPUs
[PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E
|
|
Fix the regression resulting from the recent change of suspend code
ordering that causes systems based on Intel x86 CPUs using the microcode
driver to hang during the resume.
The problem occurs since the microcode driver uses request_firmware() in
its CPU hotplug notifier, which is called after tasks has been frozen and
hangs. It can be fixed by telling the microcode driver to use the
microcode stored in memory during the resume instead of trying to load it
from disk.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Adrian Bunk <bunk@stusta.de>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Maxim <maximlevitsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
built-in drivers had broken sysfs links that caused bootup hangs for
certain driver unregistry sequences.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently we have a confused udelay implementation.
* __const_udelay does not accept usecs but xloops in i386 and x86_64
* our implementation requires usecs as arg
* it gets a xloops count when called by asm/arch/delay.h
Bugs related to this (extremely long shutdown times) where reported by some
x86_64 users, especially using Device Mapper.
To hit this bug, a compile-time constant time parameter must be passed -
that's why UML seems to work most times. Fix this with a simple udelay
implementation.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
AMD dual core laptops with C1E do not run the APIC timer correctly
when they go idle. Previously the code assumed this only happened
on C2 or deeper. But not all of these systems report support C2.
Use a AMD supplied snippet to detect C1E being enabled and then disable
local apic timer use.
This supercedes an earlier workaround using DMI detection of specific systems.
Thanks to Mark Langsdorf for the detection snippet.
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 4298/1: fix memory barriers for DMA coherent and SMP platforms
[ARM] 4295/2: Fix error-handling in pxaficp_ir.c (version 2)
[ARM] Fix __NR_kexec_load
[ARM] Export dma_channel_active()
[ARM] 4296/1: ixp4xx: compile fix
[ARM] 4289/1: AT91: SAM9260 NAND flash timing
|
|
This patch:
- Switches mb/rmb/wmb back to being full-blown DMBs on ARM SMP systems,
since mb/rmb/wmb are required to order Normal memory accesses as well.
- Enables the use of DMB and ISB on XSC3 (which is an ARMv5TE ISA core
but conforms to the ARMv6 memory ordering model and supports the
various ARMv6 barriers.)
- Makes DMA coherent platforms (only ixp23xx at the moment) map
mb/rmb/wmb to dmb(), as on DMA coherent platforms, DMA consistent
mappings are done as Normal mappings, which are weakly ordered.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
It's __NR_kexec_load, not __NR_sys_kexec_load
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
There was a typo in commit 7632fc8f809a97f9d82ce125e8e3e579390ce2e5,
preventing it from working - 32bit binaries crashed hopelessly before
the below fix and work perfectly now.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix compilation fail for ixp4xx platforms for the case when CONFIG_IXP4XX_INDIRECT_PCI is set. That is due to the check_signature() is appeared in include/linux/io.h.
Signed-off-by: Vladimir Barinov <vbarinov@ru.mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
local_irq_restore -> raw_local_irq_restore -> irq_restore_epilog ->
smtc_ipi_replay -> smtc_ipi_dq -> spin_unlock_irqrestore ->
_spin_unlock_irqrestore -> local_irq_restore
The recursion does abort when there is no more IPI queued for a CPU, so
this isn't usually fatal which is why we got away with this for so long
until this was discovered by code inspection.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[IFB]: Fix crash on input device removal
[BNX2]: Fix link interrupt problem.
|
|
The definition of struct ucc_slow puts the guemr register immediately after the
utpt register, when it should be at offset 0x90. This patch adds the missing
0x52-byte padding.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|
The input_device pointer is not refcounted, which means the device may
disappear while packets are queued, causing a crash when ifb passes packets
with a stale skb->dev pointer to netif_rx().
Fix by storing the interface index instead and do a lookup where neccessary.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
_PAGE_PROTNONE conflicts with the lowest bit of pgoff. This causes all sorts
of weirdness when nonlinear mappings are used.
Took me a good half day to track this down.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Acked-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[VIDEO]: Fix section mismatch in cg3.c
[SPARC]: sparc64 gcc-4.2.0 20070317 -Werror failure
[VIDEO] ffb: Fix two DAC handling bugs.
[SPARC32]: Fix SMP build regression
[DRM]: Delete sparc64 FFB driver code that never gets built.
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Trivial fix for hp6xx build.
sh: Fixup __cmpxchg() compile breakage with gcc4.
sh: Kill bogus GCC4 symbol exports.
|
|
The IRQ3 define was removed when asm-sh/irq.h was cleaned up,
this updates the hp6xx header to use the IRQ number directly.
Signed-off-by: Kristoffer Ericson <kristoffer_e1@hotmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
As reported by Manuel:
When I build linux with GCC-4.x and enable
CONFIG_CC_OPTIMIZE_FOR_SIZE linking fails with this error:
LD .tmp_vmlinux1
kernel/built-in.o: In function '__cmpxchg_called_with_bad_pointer'
make[1]: *** [.tmp_vmlinux1] Error 1
make: *** [_all] Error 2
This ended up being an inlining problem, fixed by explicitly
including linux/compiler.h and grabbing the definitions from there.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes
|
|
Johannes Berg discovered that kernel space was leaking to
userspace on 64 bit platform. He made a first patch to fix that. This
is an improved version of his patch.
Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
V4L/DVB (5472): Isl6421: don't reference freed memory
V4L/DVB (5441): Saa7146: Fix allocation of clipping memory
V4L/DVB (5421): Fix suspend/resume in msp3400 and tuner
V4L/DVB (5415): Msp_attach must return 0 if no msp3400 was found.
V4L/DVB (5408): Fix SECAM handling on saa7115
V4L/DVB (5400): Core: fix several locking related problems
V4L/DVB (5390): Radio: Fix error in Kbuild file
V4L/DVB (5332): Ir_rc5_timer_end decoder lockup fix
|
|
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
Export __splice_from_pipe()
2/2 splice: dont readpage
1/2 splice: dont steal
make elv_register() output atomic
block: blk_max_pfn is somtimes wrong
|
|
When CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) claims success, but did not actually
clone a new IPC namespace.
Fix this to return -EINVAL so the caller knows his request was denied.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
During a static link, ld has started putting a .note section in the
.uml.setup.init section. This has the result that the UML setups begin
with 32 bytes of garbage and UML crashes immediately on boot.
This patch creates a specific .note section for ld to drop this stuff
into.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When CONFIG_UTS_NS=n, clone(CLONE_NEWUTS) quietly refuses. So correctly does
not unshare a new uts namespace, but also does not return -EINVAL.
Fix this to return -EINVAL so the caller knows his request was denied.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Its now used.. because we added the new definitions so enabled all the
goodies on i386
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
UML/x86_64 needs the same packing of struct epoll_event as x86_64.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Olaf Hering pointed out that SAA7146_CLIPPING_MEM would become
very large for PAGE_SIZE > 4K.
In fact, the number of clipping windows is limited to 16,
and calculate_clipping_registers_rect() does not use more
than 256 bytes. SAA7146_CLIPPING_MEM adjusted accordingly.
Thanks-to: Olaf Hering <olaf@aepfle.de>
Acked-by: Michael Hunold <hunold@linuxtv.org>
Signed-off-by: Oliver Endriss <o.endriss@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
|
|
Compiling 2.6.21-rc5 with gcc-4.2.0 20070317 (prerelease)
for sparc64 fails as follows:
gcc -Wp,-MD,arch/sparc64/kernel/.time.o.d -nostdinc -isystem /home/mikpe/pkgs/linux-sparc64/gcc-4.2.0/lib/gcc/sparc64-unknown-linux-gnu/4.2.0/include -D__KERNEL__ -Iinclude -include include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os -m64 -pipe -mno-fpu -mcpu=ultrasparc -mcmodel=medlow -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wno-sign-compare -Wa,--undeclared-regs -fomit-frame-pointer -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -Werror -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(time)" -D"KBUILD_MODNAME=KBUILD_STR(time)" -c -o arch/sparc64/kernel/time.o arch/sparc64/kernel/time.c
cc1: warnings being treated as errors
arch/sparc64/kernel/time.c: In function 'kick_start_clock':
arch/sparc64/kernel/time.c:559: warning: overflow in implicit constant conversion
make[1]: *** [arch/sparc64/kernel/time.o] Error 1
make: *** [arch/sparc64/kernel] Error 2
gcc gets unhappy when the MSTK_SET macro's u8 __val variable
is updated with &= ~0xff (MSTK_YEAR_MASK). Making the constant
unsigned fixes the problem.
[ I fixed up the sparc32 side as well -DaveM ]
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ocfs2 wants to implement it's own splice write actor so that it can better
manage cluster / page locks. This lets us re-use the rest of splice write
while only providing our own code where it's actually important.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[SUNGEM]: Fix MAC address setting when interface is up.
[IPV4] fib_trie: Document locking.
[NET]: Correct accept(2) recovery after sock_attach_fd()
[PPP]: Don't leak an sk_buff on interface destruction.
[NET_SCHED]: Fix ingress locking
[NET_SCHED]: cls_basic: fix NULL pointer dereference
[DCCP]: make dccp_write_xmit_timer() static again
[TG3]: Update version and reldate.
[TG3]: Exit irq handler during chip reset.
[TG3]: Eliminate the unused TG3_FLAG_SPLIT_MODE flag.
[IPV6]: Fix routing round-robin locking.
[DECNet] fib: Fix out of bound access of dn_fib_props[]
[IPv4] fib: Fix out of bound access of fib_props[]
[NET] AX.25 Kconfig and docs updates and fixes
[NET]: Fix neighbour destructor handling.
[NET]: Fix fib_rules compatibility breakage
[SCTP]: Update SCTP Maintainers entry
[NET]: remove unused header file: drivers/net/wan/lmc/lmc_media.h
|
|
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] zcrypt: Fix ap_poll_requests counter in lost requests error path.
[S390] zcrypt: Fix possible dead lock in AP bus module.
[S390] cio: Device status validity.
[S390] kprobes: Align probe address.
[S390] Fix TCP/UDP pseudo header checksum computation.
[S390] dasd: Work around gcc bug.
|
|
Change prototypes for __chk_user_ptr and __chk_io_ptr to take const
void* instead of void*, so that code can pass "const void *" to them.
(Right now sparse does not warn about passing const void* to void*
functions, but that is a separate bug that I believe Josh is working on,
and once sparse does check this, the changed prototypes will be
necessary.)
Signed-off-by: Russ Cox <rsc@swtch.com>
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Acked-by: Christopher Li <sparse@chrisli.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
IDE error recovery is using IDLE IMMEDIATE if the drive is busy or has DRQ set.
This violates the ATA spec (can only send IDLEÂ IMMEDIATE when drive is not
busy) and really hoses up some drives (modern drives will not be able to
recover using this error handling). The correct thing to do is issue a SRST
followed by a SET FEATURES command. This is what Western Digital recommends
for error recovery and what Western Digital says Windows does.  It also does
not violate the ATA spec as far as I can tell.
Bart:
* port the patch over the current tree
* undo the recalibration code removal
* send SET FEATURES command after checking for good drive status
* don't check whether the current request is of REQ_TYPE_ATA_{CMD,TASK}
type because we need to send SET FEATURES before handling any requests
* some pre-ATA4 drives require INITIALIZE DEVICE PARAMETERS command before
other commands (except IDENTIFY) so send SET FEATURES only if there are
no pending drive->special requests
* update comments and patch description
* any bugs introduced by this patch are mine and not Suleiman's :-)
Signed-off-by: Suleiman Souhlal <suleiman@google.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
|
|
git commit f994aae1bd8e4813d59a2ed64d17585fe42d03fc changed the
function declaration of csum_tcpudp_nofold. Argument types were
changed from unsigned long to __be32 (unsigned int). Therefore we
lost the implicit type conversion that zeroed the upper half of the
registers that are used to pass parameters. Since the inline assembly
relied on this we ended up adding random values and wrong checksums
were created.
Showed only up on machines with more than 4GB since gcc produced code
where the registers that are used to pass 'saddr' and 'daddr' previously
contained addresses before calling this function.
Fix this by using 32 bit arithmetics and convert code to C, since gcc
produces better code than these hand-optimized versions.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
As per RFC2461, section 6.3.6, item #2, when no routers on the
matching list are known to be reachable or probably reachable we
do round robin on those available routes so that we make sure
to probe as many of them as possible to detect when one becomes
reachable faster.
Each routing table has a rwlock protecting the tree and the linked
list of routes at each leaf. The round robin code executes during
lookup and thus with the rwlock taken as a reader. A small local
spinlock tries to provide protection but this does not work at all
for two reasons:
1) The round-robin list manipulation, as coded, goes like this (with
read lock held):
walk routes finding head and tail
spin_lock();
rotate list using head and tail
spin_unlock();
While one thread is rotating the list, another thread can
end up with stale values of head and tail and then proceed
to corrupt the list when it gets the lock. This ends up causing
the OOPS in fib6_add() later onthat many people have been hitting.
2) All the other code paths that run with the rwlock held as
a reader do not expect the list to change on them, they
expect it to remain completely fixed while they hold the
lock in that way.
So, simply stated, it is impossible to implement this correctly using
a manipulation of the list without violating the rwlock locking
semantics.
Reimplement using a per-fib6_node round-robin pointer. This way we
don't need to manipulate the list at all, and since the round-robin
pointer can only ever point to real existing entries we don't need
to perform any locking on the changing of the round-robin pointer
itself. We only need to reset the round-robin pointer to NULL when
the entry it is pointing to is removed.
The idea is from Thomas Graf and it is very similar to how this
was implemented before the advanced router selection code when in.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.
The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.
I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down. But it
would be better to add explicit test for neigh->dead in any case.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Based upon a patch from Patrick McHardy.
The fib_rules netlink attribute policy introduced in 2.6.19 broke
userspace compatibilty. When specifying a rule with "from all"
or "to all", iproute adds a zero byte long netlink attribute,
but the policy requires all addresses to have a size equal to
sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a
validation error.
Check attribute length of FRA_SRC/FRA_DST in the generic framework
by letting the family specific rules implementation provide the
length of an address. Report an error if address length is non
zero but no address attribute is provided. Fix actual bug by
checking address length for non-zero instead of relying on
availability of attribute.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|