summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2005-06-22[PATCH] add OOM debugJanet Morgan2-3/+5
This patch provides more debug info when the system is OOM. It displays memory stats (basically sysrq-m info) from __alloc_pages() when page allocation fails and during OOM kill. Thanks to Dave Jones for coming up with the idea. Signed-off-by: Janet Morgan <janetmor@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] __read_page_state(): pass unsigned long instead of unsignedBenjamin LaHaise2-2/+2
By making the offset argument of __read_page_state an unsigned long instead of unsigned, we can avoid forcing the compiler to sign extend a usually constant argument. This saves 1 instruction on x86-64. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] __mod_page_state(): pass unsigned long instead of unsignedBenjamin LaHaise2-2/+2
By making the offset argument of __mod_page_state an unsigned long instead of unsigned, we can avoid forcing the compiler to sign extend a usually constant argument. This saves 1 instruction on x86-64. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] vm: try_to_free_pages unused argumentDarren Hart4-5/+4
try_to_free_pages accepts a third argument, order, but hasn't used it since before 2.6.0. The following patch removes the argument and updates all the calls to try_to_free_pages. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] mm: remove PG_highmemBadari Pulavarty7-22/+15
Remove PG_highmem, to save a page flag. Use is_highmem() instead. It'll generate a little more code, but we don't use PageHigheMem() in many places. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] mmap topdown fix for large stack limit, large allocationChris Wright1-0/+4
The topdown changes in 2.6.12-rc1 can cause large allocations with large stack limit to fail, despite there being space available. The mmap_base-len is only valid when len >= mmap_base. However, nothing in topdown allocator checks this. It's only (now) caught at higher level, which will cause allocation to simply fail. The following change restores the fallback to bottom-up path, which will allow large allocations with large stack limit to potentially still succeed. Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] Avoiding mmap fragmentationWolfgang Wander14-30/+147
Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: Wolfgang Wander <wwc@rentec.com> Credit-to: "Richard Purdie" <rpurdie@rpsys.net> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> (partly) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] node local per-cpu-pagesChristoph Lameter6-38/+195
This patch modifies the way pagesets in struct zone are managed. Each zone has a per-cpu array of pagesets. So any particular CPU has some memory in each zone structure which belongs to itself. Even if that CPU is not local to that zone. So the patch relocates the pagesets for each cpu to the node that is nearest to the cpu instead of allocating the pagesets in the (possibly remote) target zone. This means that the operations to manage pages on remote zone can be done with information available locally. We play a macro trick so that non-NUMA pmachines avoid the additional pointer chase on the page allocator fastpath. AIM7 benchmark on a 32 CPU SGI Altix w/o patches: Tasks jobs/min jti jobs/min/task real cpu 1 484.68 100 484.6769 12.01 1.97 Fri Mar 25 11:01:42 2005 100 27140.46 89 271.4046 21.44 148.71 Fri Mar 25 11:02:04 2005 200 30792.02 82 153.9601 37.80 296.72 Fri Mar 25 11:02:42 2005 300 32209.27 81 107.3642 54.21 451.34 Fri Mar 25 11:03:37 2005 400 34962.83 78 87.4071 66.59 588.97 Fri Mar 25 11:04:44 2005 500 31676.92 75 63.3538 91.87 742.71 Fri Mar 25 11:06:16 2005 600 36032.69 73 60.0545 96.91 885.44 Fri Mar 25 11:07:54 2005 700 35540.43 77 50.7720 114.63 1024.28 Fri Mar 25 11:09:49 2005 800 33906.70 74 42.3834 137.32 1181.65 Fri Mar 25 11:12:06 2005 900 34120.67 73 37.9119 153.51 1325.26 Fri Mar 25 11:14:41 2005 1000 34802.37 74 34.8024 167.23 1465.26 Fri Mar 25 11:17:28 2005 with slab API changes and pageset patch: Tasks jobs/min jti jobs/min/task real cpu 1 485.00 100 485.0000 12.00 1.96 Fri Mar 25 11:46:18 2005 100 28000.96 89 280.0096 20.79 150.45 Fri Mar 25 11:46:39 2005 200 32285.80 79 161.4290 36.05 293.37 Fri Mar 25 11:47:16 2005 300 40424.15 84 134.7472 43.19 438.42 Fri Mar 25 11:47:59 2005 400 39155.01 79 97.8875 59.46 590.05 Fri Mar 25 11:48:59 2005 500 37881.25 82 75.7625 76.82 730.19 Fri Mar 25 11:50:16 2005 600 39083.14 78 65.1386 89.35 872.79 Fri Mar 25 11:51:46 2005 700 38627.83 77 55.1826 105.47 1022.46 Fri Mar 25 11:53:32 2005 800 39631.94 78 49.5399 117.48 1169.94 Fri Mar 25 11:55:30 2005 900 36903.70 79 41.0041 141.94 1310.78 Fri Mar 25 11:57:53 2005 1000 36201.23 77 36.2012 160.77 1458.31 Fri Mar 25 12:00:34 2005 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Shai Fultheim <Shai@Scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] Hugepage consolidationDavid Gibson19-850/+300
A lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch attempts to consolidate a lot of the code across the arch's, putting the combined version in mm/hugetlb.c. There are a couple of uglyish hacks in order to covert all the hugepage archs, but the result is a very large reduction in the total amount of code. It also means things like hugepage lazy allocation could be implemented in one place, instead of six. Tested, at least a little, on ppc64, i386 and x86_64. Notes: - this patch changes the meaning of set_huge_pte() to be more analagous to set_pte() - does SH4 need s special huge_ptep_get_and_clear()?? Acked-by: William Lee Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] VM: rate limit early reclaimMartin Hicks3-0/+13
When early zone reclaim is turned on the LRU is scanned more frequently when a zone is low on memory. This limits when the zone reclaim can be called by skipping the scan if another thread (either via kswapd or sync reclaim) is already reclaiming from the zone. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] VM: add __GFP_NORECLAIMMartin Hicks3-3/+6
When using the early zone reclaim, it was noticed that allocating new pages that should be spread across the whole system caused eviction of local pages. This adds a new GFP flag to prevent early reclaim from happening during certain allocation attempts. The example that is implemented here is for page cache pages. We want page cache pages to be spread across the whole system, and we don't want page cache pages to evict other pages to get local memory. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] VM: early zone reclaimMartin Hicks9-8/+104
This is the core of the (much simplified) early reclaim. The goal of this patch is to reclaim some easily-freed pages from a zone before falling back onto another zone. One of the major uses of this is NUMA machines. With the default allocator behavior the allocator would look for memory in another zone, which might be off-node, before trying to reclaim from the current zone. This adds a zone tuneable to enable early zone reclaim. It is selected on a per-zone basis and is turned on/off via syscall. Adding some extra throttling on the reclaim was also required (patch 4/4). Without the machine would grind to a crawl when doing a "make -j" kernel build. Even with this patch the System Time is higher on average, but it seems tolerable. Here are some numbers for kernbench runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run: wall user sys %cpu ctx sw. sleeps ---- ---- --- ---- ------ ------ No patch 1009 1384 847 258 298170 504402 w/patch, no reclaim 880 1376 667 288 254064 396745 w/patch & reclaim 1079 1385 926 252 291625 548873 These numbers are the average of 2 runs of 3 "make -j" runs done right after system boot. Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to seee that with reclaim the benchmark still finishes in a reasonable amount of time. I also looked at the NUMA hit/miss stats for the "make -j" runs and the reclaim doesn't make any difference when the machine is thrashing away. Doing a "make -j8" on a single node that is filled with page cache pages takes 700 seconds with reclaim turned on and 735 seconds without reclaim (due to remote memory accesses). The simple zone_reclaim syscall program is at http://www.bork.org/~mort/sgi/zone_reclaim.c Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] VM: add may_swap flag to scan_controlMartin Hicks1-1/+6
Here's the next round of these patches. These are totally different in an attempt to meet the "simpler" request after the last patches. For reference the earlier threads are: http://marc.theaimsgroup.com/?l=linux-kernel&m=110839604924587&w=2 http://marc.theaimsgroup.com/?l=linux-mm&m=111461480721249&w=2 This set of patches replaces my other vm- patches that are currently in -mm. So they're against 2.6.12-rc5-mm1 about half way through the -mm patchset. As I said already this patch is a lot simpler. The reclaim is turned on or off on a per-zone basis using a syscall. I haven't tested the x86 syscall, so it might be wrong. It uses the existing reclaim/pageout code with the small addition of a may_swap flag to scan_control (patch 1/4). I also added __GFP_NORECLAIM (patch 3/4) so that certain allocation types can be flagged to never cause reclaim. This was a deficiency that was in all of my earlier patch sets. Previously, doing a big buffered read would fill one zone with page cache and then start to reclaim from that same zone, leaving the other zones untouched. Adding some extra throttling on the reclaim was also required (patch 4/4). Without the machine would grind to a crawl when doing a "make -j" kernel build. Even with this patch the System Time is higher on average, but it seems tolerable. Here are some numbers for kernbench runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run: wall user sys %cpu ctx sw. sleeps ---- ---- --- ---- ------ ------ No patch 1009 1384 847 258 298170 504402 w/patch, no reclaim 880 1376 667 288 254064 396745 w/patch & reclaim 1079 1385 926 252 291625 548873 These numbers are the average of 2 runs of 3 "make -j" runs done right after system boot. Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to seee that with reclaim the benchmark still finishes in a reasonable amount of time. I also looked at the NUMA hit/miss stats for the "make -j" runs and the reclaim doesn't make any difference when the machine is thrashing away. Doing a "make -j8" on a single node that is filled with page cache pages takes 700 seconds with reclaim turned on and 735 seconds without reclaim (due to remote memory accesses). The simple zone_reclaim syscall program is at http://www.bork.org/~mort/sgi/zone_reclaim.c This patch: This adds an extra switch to the scan_control struct. It simply lets the reclaim code know if its allowed to swap pages out. This was required for a simple per-zone reclaimer. Without this addition pages would be swapped out as soon as a zone ran out of memory and the early reclaim kicked in. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] mm: add /proc/zoneinfoNikita Danilov2-2/+125
Add /proc/zoneinfo file to display information about memory zones. Useful to analyze VM behaviour. Signed-off-by: Nikita Danilov <nikita@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] madvise: merge the mapsPrasanna Meda1-29/+51
This attempts to merge back the split maps. This code is mostly copied from Chrisw's mlock merging from post 2.6.11 trees. The only difference is in munmapped_error handling. Also passed prev to willneed/dontneed, eventhogh they do not handle it now, since I felt it will be cleaner, instead of handling prev in madvise_vma in some cases and in subfunction in some cases. Signed-off-by: Prasanna Meda <pmeda@akamai.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] madvise: do not split the mapsPrasanna Meda1-11/+16
This attempts to avoid splittings when it is not needed, that is when vm_flags are same as new flags. The idea is from the <2.6.11 mlock_fixup and others. This will provide base for the next madvise merging patch. Signed-off-by: Prasanna Meda <pmeda@akamai.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] vmscan: notice slab shrinkingakpm@osdl.org1-5/+14
Fix a problem identified by Andrea Arcangeli <andrea@suse.de> kswapd will set a zone into all_unreclaimable state if it sees that we're not successfully reclaiming LRU pages. But that fails to notice that we're successfully reclaiming slab obects, so we can set all_unreclaimable too soon. So change shrink_slab() to return a success indication if it actually reclaimed some objects, and don't assume that the zone is all_unreclaimable if that is true. This means that we won't enter all_unreclaimable state if we are successfully freeing slab objects but we're not yet actually freeing slab pages, due to internal fragmentation. (hm, this has a shortcoming. We could be successfully freeing ZONE_NORMAL slab objects while being really oom on ZONE_DMA. If that happens then kswapd might burn a lot of CPU. But given that there might be some slab objects in ZONE_DMA, perhaps that is appropriate.) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] smp_processor_id() cleanupIngo Molnar37-125/+119
This patch implements a number of smp_processor_id() cleanup ideas that Arjan van de Ven and I came up with. The previous __smp_processor_id/_smp_processor_id/smp_processor_id API spaghetti was hard to follow both on the implementational and on the usage side. Some of the complexity arose from picking wrong names, some of the complexity comes from the fact that not all architectures defined __smp_processor_id. In the new code, there are two externally visible symbols: - smp_processor_id(): debug variant. - raw_smp_processor_id(): nondebug variant. Replaces all existing uses of _smp_processor_id() and __smp_processor_id(). Defined by every SMP architecture in include/asm-*/smp.h. There is one new internal symbol, dependent on DEBUG_PREEMPT: - debug_smp_processor_id(): internal debug variant, mapped to smp_processor_id(). Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new lib/smp_processor_id.c file. All related comments got updated and/or clarified. I have build/boot tested the following 8 .config combinations on x86: {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT} I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other architectures are untested, but should work just fine.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] x86_64: TASK_SIZE fixes for compatibility mode processesSuresh Siddha7-30/+25
Appended patch will setup compatibility mode TASK_SIZE properly. This will fix atleast three known bugs that can be encountered while running compatibility mode apps. a) A malicious 32bit app can have an elf section at 0xffffe000. During exec of this app, we will have a memory leak as insert_vm_struct() is not checking for return value in syscall32_setup_pages() and thus not freeing the vma allocated for the vsyscall page. And instead of exec failing (as it has addresses > TASK_SIZE), we were allowing it to succeed previously. b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area may return addresses beyond 32bits, ultimately causing corruption because of wrap-around and resulting in SEGFAULT, instead of returning ENOMEM. c) 32bit app doing this below mmap will now fail. mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0); Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] coverity: idr_get_new_above_int() overrun fixZaur Kambarov1-1/+1
This patch fixes overrun of array pa: 92 struct idr_layer *pa[MAX_LEVEL]; in 98 l = idp->layers; 99 pa[l--] = NULL; by passing idp->layers, set in 202 idp->layers = layers; to function sub_alloc in 203 v = sub_alloc(idp, ptr, &id); Signed-off-by: Zaur Kambarov <zkambarov@coverity.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] coverity: ipmi: avoid overrun of ipmi_interfaces[]Zaur Kambarov1-1/+1
Fix overrun of static array "ipmi_interfaces" of size 4 at position 4 with index variable "if_num". Definitions involved: 297 #define MAX_IPMI_INTERFACES 4 298 static ipmi_smi_t ipmi_interfaces[MAX_IPMI_INTERFACES]; Signed-off-by: Zaur Kambarov <zkambarov@coverity.com> Cc: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] megaraid build fixbobl1-1/+1
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] arm: irqs_disabled() type fixAndrew Morton1-1/+1
kernel/sched.c: In function `__might_sleep': kernel/sched.c:5461: warning: int format, long unsigned int arg (arg 3) We expect irqs_disabled() to return an int (poor man's bool). Acked-by: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds2-2/+38
2005-06-22[SPARC64]: Add prefetch support.David S. Miller1-0/+34
The implementation is optimal for UltraSPARC-III and later. It will work, however suboptimally, on UltraSPARC-II and be treated as a NOP on UltraSPARC-I. It is not worth code patching this thing as the highest cost is the code space, and code patching cannot eliminate that. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds53-676/+2823
2005-06-22[PATCH] devfs: remove devfs from Kconfig preventing it from being builtGreg KH1-50/+0
Here's a much smaller patch to simply disable devfs from the build. If this goes well, and there are no complaints for a few weeks, I'll resend my big "devfs-die-die-die" series of patches that rip the whole thing out of the kernel tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[SPARC64]: Fix cmsg length checks in Solaris emulation layer.David S. Miller1-2/+4
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22Merge 'for-linus' branch of ↵Linus Torvalds22-201/+100
rsync://rsync.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
2005-06-22[IPV4]: Fix fib_trie.c's args to fib_dump_info().David S. Miller1-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETFILTER]: Fix ip6t_LOG sit tunnel loggingPatrick McHardy1-35/+19
Sit tunnel logging is currently broken: MAC=01:23:45:67:89:ab->01:23:45:47:89:ac TUNNEL=123.123. 0.123-> 12.123. 6.123 Apart from the broken IP address, MAC addresses are printed differently for sit tunnels than for everything else. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETFILTER]: Drop conntrack reference in ip_call_ra_chain()/ip_mr_input()Patrick McHardy2-0/+2
Drop reference before handing the packets to raw_rcv() Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETFILTER]: Check TCP checksum in ipt_REJECTPatrick McHardy1-1/+12
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETFILTER]: Avoid unncessary checksum validation in UDP connection trackingKeir Fraser1-0/+1
Signed-off-by: Keir Fraser <Keir.Fraser@xl.cam.ac.uk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETFILTER]: Missing owner-field initialization in ip6table_rawPatrick McHardy1-2/+4
I missed this one when fixing up iptable_raw. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETFILTER]: expectation timeouts are compulsoryPhil Oester1-9/+5
Since expectation timeouts were made compulsory [1], there is no need to check for them in ip_conntrack_expect_insert. [1] https://lists.netfilter.org/pipermail/netfilter-devel/2005-January/018143.html Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETFILTER]: Restore netfilter assumptions in IPv6 multicastPatrick McHardy1-21/+26
Netfilter assumes that skb->data == skb->nh.ipv6h Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETFILTER]: Kill nf_debugPatrick McHardy13-239/+0
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETFILTER]: Kill lockhelp.hPatrick McHardy23-302/+160
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[IPV6]: multicast join and miscDavid L Stevens2-2/+24
Here is a simplified version of the patch to fix a bug in IPv6 multicasting. It: 1) adds existence check & EADDRINUSE error for regular joins 2) adds an exception for EADDRINUSE in the source-specific multicast join (where a prior join is ok) 3) adds a missing/needed read_lock on sock_mc_list; would've raced with destroying the socket on interface down without 4) adds a "leave group" in the (INCLUDE, empty) source filter case. This frees unneeded socket buffer memory, but also prevents an inappropriate interaction among the 8 socket options that mess with this. Some would fail as if in the group when you aren't really. Item #4 had a locking bug in the last version of this patch; rather than removing the idev->lock read lock only, I've simplified it to remove all lock state in the path and treat it as a direct "leave group" call for the (INCLUDE,empty) case it covers. Tested on an MP machine. :-) Much thanks to HoerdtMickael <hoerdt@clarinet.u-strasbg.fr> who reported the original bug. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[IPV6]: V6 route events reported with wrong netlink PID and seq numberJamal Hadi Salim7-63/+74
Essentially netlink at the moment always reports a pid and sequence of 0 always for v6 route activities. To understand the repurcassions of this look at: http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html While fixing this, i took the liberty to resolve the outstanding issue of IPV6 routes inserted via ioctls to have the correct pids as well. This patch tries to behave as close as possible to the v4 routes i.e maintains whatever PID the socket issuing the command owns as opposed to the process. That made the patch a little bulky. I have tested against both netlink derived utility to add/del routes as well as ioctl derived one. The Quagga folks have tested against quagga. This fixes the problem and so far hasnt been detected to introduce any new issues. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[IPV4]: Add LC-Trie FIB lookup algorithm.Robert Olsson4-1/+2495
Signed-off-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETLINK]: netlink_callback structure needs 5 args not 4Alexey Kuznetsov1-1/+1
net/ipv4/tcp_diag.c uses up to ->args[4] Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds224-2361/+2987
2005-06-21Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds40-318/+839
2005-06-21[PATCH] PCI: fix show_modalias() function due to attribute changeGreg Kroah-Hartman1-1/+1
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-21[PATCH] USB: fix show_modalias() function due to attribute changeGreg Kroah-Hartman1-1/+1
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-21[PATCH] SYSFS: fix PAGE_SIZE checkJon Smirl1-1/+1
Without this change I can't set an attribute exactly PAGE_SIZE in length. There is no need for zero termination because the interface uses lengths. From: Jon Smirl <jonsmirl@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-21[PATCH] Driver core: Don't "lose" devices on suspend on failureBenjamin Herrenschmidt1-1/+12
I think we need this patch or we might "lose" devices to the dpm_irq_off list if a failure occurs during the suspend process. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-21[PATCH] sysfs-iattr: set inode attributesManeesh Soni3-8/+35
o Following patch sets the attributes for newly allocated inodes for sysfs objects. If the object has non-default attributes, inode attributes are set as saved in sysfs_dirent->s_iattr, pointer to struct iattr. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>