<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/inet_frag.h, branch v4.19.181</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.181</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.181'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2019-04-27T07:36:33+00:00</updated>
<entry>
<title>net: IP defrag: encapsulate rbtree defrag code into callable functions</title>
<updated>2019-04-27T07:36:33+00:00</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2019-04-23T17:25:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=702ddf862d9d74e670849df659b8706fd6878205'/>
<id>urn:sha1:702ddf862d9d74e670849df659b8706fd6878205</id>
<content type='text'>
[ Upstream commit c23f35d19db3b36ffb9e04b08f1d91565d15f84f ]

This is a refactoring patch: without changing runtime behavior,
it moves rbtree-related code from IPv4-specific files/functions
into .h/.c defrag files shared with IPv6 defragmentation code.

v2: make handling of overlapping packets match upstream.

Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Tom Herbert &lt;tom@herbertland.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ip: add helpers to process in-order fragments faster.</title>
<updated>2018-08-12T00:54:18+00:00</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-08-11T20:27:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=353c9cb360874e737fb000545f783df756c06f9a'/>
<id>urn:sha1:353c9cb360874e737fb000545f783df756c06f9a</id>
<content type='text'>
This patch introduces several helper functions/macros that will be
used in the follow-up patch. No runtime changes yet.

The new logic (fully implemented in the second patch) is as follows:

* Nodes in the rb-tree will now contain not single fragments, but lists
  of consecutive fragments ("runs").

* At each point in time, the current "active" run at the tail is
  maintained/tracked. Fragments that arrive in-order, adjacent
  to the previous tail fragment, are added to this tail run without
  triggering the re-balancing of the rb-tree.

* If a fragment arrives out of order with the offset _before_ the tail run,
  it is inserted into the rb-tree as a single fragment.

* If a fragment arrives after the current tail fragment (with a gap),
  it starts a new "tail" run, as is inserted into the rb-tree
  at the end as the head of the new run.

skb-&gt;cb is used to store additional information
needed here (suggested by Eric Dumazet).

Reported-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ip: use rb trees for IP frag queue.</title>
<updated>2018-08-06T00:16:46+00:00</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-08-02T23:34:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fa0f527358bd900ef92f925878ed6bfbd51305cc'/>
<id>urn:sha1:fa0f527358bd900ef92f925878ed6bfbd51305cc</id>
<content type='text'>
Similar to TCP OOO RX queue, it makes sense to use rb trees to store
IP fragments, so that OOO fragments are inserted faster.

Tested:

- a follow-up patch contains a rather comprehensive ip defrag
  self-test (functional)
- ran neper `udp_stream -c -H &lt;host&gt; -F 100 -l 300 -T 20`:
    netstat --statistics
    Ip:
        282078937 total packets received
        0 forwarded
        0 incoming packets discarded
        946760 incoming packets delivered
        18743456 requests sent out
        101 fragments dropped after timeout
        282077129 reassemblies required
        944952 packets reassembled ok
        262734239 packet reassembles failed
   (The numbers/stats above are somewhat better re:
    reassemblies vs a kernel without this patchset. More
    comprehensive performance testing TBD).

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Reported-by: Juha-Matti Tilli &lt;juha-matti.tilli@iki.fi&gt;
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>rhashtable: split rhashtable.h</title>
<updated>2018-06-22T04:43:27+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2018-06-18T02:52:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0eb71a9da5796851fa87ddc1a534066c0fe54055'/>
<id>urn:sha1:0eb71a9da5796851fa87ddc1a534066c0fe54055</id>
<content type='text'>
Due to the use of rhashtables in net namespaces,
rhashtable.h is included in lots of the kernel,
so a small changes can required a large recompilation.
This makes development painful.

This patch splits out rhashtable-types.h which just includes
the major type declarations, and does not include (non-trivial)
inline code.  rhashtable.h is no longer included by anything
in the include/ directory.
Common include files only include rhashtable-types.h so a large
recompilation is only triggered when that changes.

Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: frags: reorganize struct netns_frags</title>
<updated>2018-04-01T03:25:39+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-03-31T19:58:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c2615cf5a761b32bf74e85bddc223dfff3d9b9f0'/>
<id>urn:sha1:c2615cf5a761b32bf74e85bddc223dfff3d9b9f0</id>
<content type='text'>
Put the read-mostly fields in a separate cache line
at the beginning of struct netns_frags, to reduce
false sharing noticed in inet_frag_kill()

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: frags: break the 2GB limit for frags storage</title>
<updated>2018-04-01T03:25:39+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-03-31T19:58:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e67f106f619dcfaf6f4e2039599bdb69848c714'/>
<id>urn:sha1:3e67f106f619dcfaf6f4e2039599bdb69848c714</id>
<content type='text'>
Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.

Current memory tracking is using one atomic_t and integers.

Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.

Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :

if (... || frag_mem_limit(nf) &gt; nf-&gt;high_thresh)

Tested:

$ echo 16000000000 &gt;/proc/sys/net/ipv4/ipfrag_high_thresh

&lt;frag DDOS&gt;

$ grep FRAG /proc/net/sockstat
FRAG: inuse 14705885 memory 16000002880

$ nstat -n ; sleep 1 ; nstat | grep Reas
IpReasmReqds                    3317150            0.0
IpReasmFails                    3317112            0.0

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: frags: remove inet_frag_maybe_warn_overflow()</title>
<updated>2018-04-01T03:25:39+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-03-31T19:58:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2d44ed22e607f9a285b049de2263e3840673a260'/>
<id>urn:sha1:2d44ed22e607f9a285b049de2263e3840673a260</id>
<content type='text'>
This function is obsolete, after rhashtable addition to inet defrag.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: frags: get rif of inet_frag_evicting()</title>
<updated>2018-04-01T03:25:39+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-03-31T19:58:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=399d1404be660d355192ff4df5ccc3f4159ec1e4'/>
<id>urn:sha1:399d1404be660d355192ff4df5ccc3f4159ec1e4</id>
<content type='text'>
This refactors ip_expire() since one indentation level is removed.

Note: in the future, we should try hard to avoid the skb_clone()
since this is a serious performance cost.
Under DDOS, the ICMP message wont be sent because of rate limits.

Fact that ip6_expire_frag_queue() does not use skb_clone() is
disturbing too. Presumably IPv6 should have the same
issue than the one we fixed in commit ec4fbd64751d
("inet: frag: release spinlock before calling icmp_send()")

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: frags: remove some helpers</title>
<updated>2018-04-01T03:25:39+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-03-31T19:58:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6befe4a78b1553edb6eed3a78b4bcd9748526672'/>
<id>urn:sha1:6befe4a78b1553edb6eed3a78b4bcd9748526672</id>
<content type='text'>
Remove sum_frag_mem_limit(), ip_frag_mem() &amp; ip6_frag_mem()

Also since we use rhashtable we can bring back the number of fragments
in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
removed in commit 434d305405ab ("inet: frag: don't account number
of fragment queues")

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: frags: use rhashtables for reassembly units</title>
<updated>2018-04-01T03:25:39+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-03-31T19:58:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=648700f76b03b7e8149d13cc2bdb3355035258a9'/>
<id>urn:sha1:648700f76b03b7e8149d13cc2bdb3355035258a9</id>
<content type='text'>
Some applications still rely on IP fragmentation, and to be fair linux
reassembly unit is not working under any serious load.

It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)

A work queue is supposed to garbage collect items when host is under memory
pressure, and doing a hash rebuild, changing seed used in hash computations.

This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
occurring every 5 seconds if host is under fire.

Then there is the problem of sharing this hash table for all netns.

It is time to switch to rhashtables, and allocate one of them per netns
to speedup netns dismantle, since this is a critical metric these days.

Lookup is now using RCU. A followup patch will even remove
the refcount hold/release left from prior implementation and save
a couple of atomic operations.

Before this patch, 16 cpus (16 RX queue NIC) could not handle more
than 1 Mpps frags DDOS.

After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
of storage for the fragments (exact number depends on frags being evicted
after timeout)

$ grep FRAG /proc/net/sockstat
FRAG: inuse 1966916 memory 2140004608

A followup patch will change the limits for 64bit arches.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Cc: Alexander Aring &lt;alex.aring@gmail.com&gt;
Cc: Stefan Schmidt &lt;stefan@osg.samsung.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
