<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/inet_frag.h, branch v3.14.42</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.14.42</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.14.42'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2013-10-23T21:01:41+00:00</updated>
<entry>
<title>inet: remove old fragmentation hash initializing</title>
<updated>2013-10-23T21:01:41+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2013-10-23T09:06:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7088ad74e6e710d0c80ea2cead9500f47a2a5d58'/>
<id>urn:sha1:7088ad74e6e710d0c80ea2cead9500f47a2a5d58</id>
<content type='text'>
All fragmentation hash secrets now get initialized by their
corresponding hash function with net_get_random_once. Thus we can
eliminate the initial seeding.

Also provide a comment that hash secret seeding happens at the first
call to the corresponding hashing function.

Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: frag, fix race conditions in LRU list maintenance</title>
<updated>2013-05-06T15:06:51+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2013-05-05T04:56:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b56141ab34e2c3e2d7960cea12c20c99530c0c76'/>
<id>urn:sha1:b56141ab34e2c3e2d7960cea12c20c99530c0c76</id>
<content type='text'>
This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
which was introduced in commit 3ef0eb0db4bf92c6d2510fe5c4dc51852746f206
("net: frag, move LRU list maintenance outside of rwlock")

One cpu already added new fragment queue into hash but not into LRU.
Other cpu found it in hash and tries to move it to the end of LRU.
This leads to NULL pointer dereference inside of list_move_tail().

Another possible race condition is between inet_frag_lru_move() and
inet_frag_lru_del(): move can happens after deletion.

This patch initializes LRU list head before adding fragment into hash and
inet_frag_lru_move() doesn't touches it if it's empty.

I saw this kernel oops two times in a couple of days.

[119482.128853] BUG: unable to handle kernel NULL pointer dereference at           (null)
[119482.132693] IP: [&lt;ffffffff812ede89&gt;] __list_del_entry+0x29/0xd0
[119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
[119482.140221] Oops: 0000 [#1] SMP
[119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
[119482.152692] CPU 3
[119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
[119482.161478] RIP: 0010:[&lt;ffffffff812ede89&gt;]  [&lt;ffffffff812ede89&gt;] __list_del_entry+0x29/0xd0
[119482.166004] RSP: 0018:ffff880216d5db58  EFLAGS: 00010207
[119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
[119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
[119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
[119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
[119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
[119482.194140] FS:  00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
[119482.198928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
[119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
[119482.223113] Stack:
[119482.228004]  ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
[119482.233038]  ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
[119482.238083]  00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
[119482.243090] Call Trace:
[119482.248009]  [&lt;ffffffff8155dcda&gt;] ip_defrag+0x8fa/0xd10
[119482.252921]  [&lt;ffffffff815a8013&gt;] ipv4_conntrack_defrag+0x83/0xe0
[119482.257803]  [&lt;ffffffff8154485b&gt;] nf_iterate+0x8b/0xa0
[119482.262658]  [&lt;ffffffff8155c7f0&gt;] ? inet_del_offload+0x40/0x40
[119482.267527]  [&lt;ffffffff815448e4&gt;] nf_hook_slow+0x74/0x130
[119482.272412]  [&lt;ffffffff8155c7f0&gt;] ? inet_del_offload+0x40/0x40
[119482.277302]  [&lt;ffffffff8155d068&gt;] ip_rcv+0x268/0x320
[119482.282147]  [&lt;ffffffff81519992&gt;] __netif_receive_skb_core+0x612/0x7e0
[119482.286998]  [&lt;ffffffff81519b78&gt;] __netif_receive_skb+0x18/0x60
[119482.291826]  [&lt;ffffffff8151a650&gt;] process_backlog+0xa0/0x160
[119482.296648]  [&lt;ffffffff81519f29&gt;] net_rx_action+0x139/0x220
[119482.301403]  [&lt;ffffffff81053707&gt;] __do_softirq+0xe7/0x220
[119482.306103]  [&lt;ffffffff81053868&gt;] run_ksoftirqd+0x28/0x40
[119482.310809]  [&lt;ffffffff81074f5f&gt;] smpboot_thread_fn+0xff/0x1a0
[119482.315515]  [&lt;ffffffff81074e60&gt;] ? lg_local_lock_cpu+0x40/0x40
[119482.320219]  [&lt;ffffffff8106d870&gt;] kthread+0xc0/0xd0
[119482.324858]  [&lt;ffffffff8106d7b0&gt;] ? insert_kthread_work+0x40/0x40
[119482.329460]  [&lt;ffffffff816c32dc&gt;] ret_from_fork+0x7c/0xb0
[119482.334057]  [&lt;ffffffff8106d7b0&gt;] ? insert_kthread_work+0x40/0x40
[119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a &lt;4c&gt; 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[119482.343787] RIP  [&lt;ffffffff812ede89&gt;] __list_del_entry+0x29/0xd0
[119482.348675]  RSP &lt;ffff880216d5db58&gt;
[119482.353493] CR2: 0000000000000000

Oops happened on this path:
ip_defrag() -&gt; ip_frag_queue() -&gt; inet_frag_lru_move() -&gt; list_move_tail() -&gt; __list_del_entry()

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: increase frag hash size</title>
<updated>2013-04-29T17:33:06+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2013-04-25T09:52:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a4c4009f4f54dabaaea1bb2b2c3c8930e93cd409'/>
<id>urn:sha1:a4c4009f4f54dabaaea1bb2b2c3c8930e93cd409</id>
<content type='text'>
Increase fragmentation hash bucket size to 1024 from old 64 elems.

After we increased the frag mem limits commit c2a93660 (net: increase
fragment memory usage limits) the hash size of 64 elements is simply
too small.  Also considering the mem limit is per netns and the hash
table is shared for all netns.

For the embedded people, note that this increase will change the hash
table/array from using approx 1 Kbytes to 16 Kbytes.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: frag queue per hash bucket locking</title>
<updated>2013-04-04T21:37:05+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2013-04-03T23:38:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=19952cc4f8f572493293a8caed27c4be89c5fc9d'/>
<id>urn:sha1:19952cc4f8f572493293a8caed27c4be89c5fc9d</id>
<content type='text'>
This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

This patch is part of "net: frag performance followup"
 http://thread.gmane.org/gmane.linux.network/263644
of which two patches have already been accepted:

Same test setup as previous:
 (http://thread.gmane.org/gmane.linux.network/257155)
 Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
 Ethernet flow-control.  A third interface is used for generating the
 DoS attack (with trafgen).

Notice, I have changed the frag DoS generator script to be more
efficient/deadly.  Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Test types summary (netperf UDP_STREAM):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS
 Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
 Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS

When I rebased this-patch(03) (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
Test (C) is what I reported before for this-patch

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)).
Test (D) function as a new base-test.

Performance table summary (in Mbit/s):

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"lab-test" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;

----
V2:
- By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
  need the spinlock _bh versions, as Netfilter currently does a
  local_bh_disable() before entering inet_fragment.
- Fold-in desc from cover-mail
V3:
- Drop the chain_len counter per hash bucket.
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: use the frag lru_lock to protect netns_frags.nqueues update</title>
<updated>2013-03-27T16:48:33+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2013-03-27T05:55:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1b5ab0def4f6e42e8b8097c3b11d2e8d96baafec'/>
<id>urn:sha1:1b5ab0def4f6e42e8b8097c3b11d2e8d96baafec</id>
<content type='text'>
Move the protection of netns_frags.nqueues updates under the LRU_lock,
instead of the write lock.  As they are located on the same cacheline,
and this is also needed when transitioning to use per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: generalize ipv4-only RFC3168 5.3 ecn fragmentation handling for future use by ipv6</title>
<updated>2013-03-24T21:16:30+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2013-03-22T08:24:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=be991971d53e0f5b6d13e3940192054216590072'/>
<id>urn:sha1:be991971d53e0f5b6d13e3940192054216590072</id>
<content type='text'>
This patch just moves some code arround to make the ip4_frag_ecn_table
and IPFRAG_ECN_* constants accessible from the other reassembly engines. I
also renamed ip4_frag_ecn_table to ip_frag_ecn_table.

Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jesper Dangaard Brouer &lt;jbrouer@redhat.com&gt;
Cc: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: limit length of fragment queue hash table bucket lists</title>
<updated>2013-03-19T14:28:36+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2013-03-15T11:32:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5a3da1fe9561828d0ca7eca664b16ec2b9bf0055'/>
<id>urn:sha1:5a3da1fe9561828d0ca7eca664b16ec2b9bf0055</id>
<content type='text'>
This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jesper Dangaard Brouer &lt;jbrouer@redhat.com&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: fix possible deadlock in sum_frag_mem_limit</title>
<updated>2013-02-22T20:10:19+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-02-22T07:43:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4cfb04854d053e4d6391d7f84495f48082342362'/>
<id>urn:sha1:4cfb04854d053e4d6391d7f84495f48082342362</id>
<content type='text'>
Dave Jones reported a lockdep splat occurring in IP defrag code.

commit 6d7b857d541ecd1d (net: use lib/percpu_counter API for
fragmentation mem accounting) added a possible deadlock.

Because percpu_counter_sum_positive() needs to acquire
a lock that can be used from softirq, we need to disable BH
in sum_frag_mem_limit()

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: frag, move LRU list maintenance outside of rwlock</title>
<updated>2013-01-29T18:36:24+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2013-01-28T23:45:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3ef0eb0db4bf92c6d2510fe5c4dc51852746f206'/>
<id>urn:sha1:3ef0eb0db4bf92c6d2510fe5c4dc51852746f206</id>
<content type='text'>
Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock.  However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.

Original-idea-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: use lib/percpu_counter API for fragmentation mem accounting</title>
<updated>2013-01-29T18:36:24+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2013-01-28T23:45:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6d7b857d541ecd1d9bd997c97242d4ef94b19de2'/>
<id>urn:sha1:6d7b857d541ecd1d9bd997c97242d4ef94b19de2</id>
<content type='text'>
Replace the per network namespace shared atomic "mem" accounting
variable, in the fragmentation code, with a lib/percpu_counter.

Getting percpu_counter to scale to the fragmentation code usage
requires some tweaks.

At first view, percpu_counter looks superfast, but it does not
scale on multi-CPU/NUMA machines, because the default batch size
is too small, for frag code usage.  Thus, I have adjusted the
batch size by using __percpu_counter_add() directly, instead of
percpu_counter_sub() and percpu_counter_add().

The batch size is increased to 130.000, based on the largest 64K
fragment memory usage.  This does introduce some imprecise
memory accounting, but its does not need to be strict for this
use-case.

It is also essential, that the percpu_counter, does not
share cacheline with other writers, to make this scale.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
