<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/sch_generic.h, branch v4.12.9</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.12.9</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.12.9'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2017-04-21T17:58:15+00:00</updated>
<entry>
<title>net_sched: move the empty tp check from -&gt;destroy() to -&gt;delete()</title>
<updated>2017-04-21T17:58:15+00:00</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2017-04-19T21:21:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=763dbf6328e41de7a55851baf5ee49e367552531'/>
<id>urn:sha1:763dbf6328e41de7a55851baf5ee49e367552531</id>
<content type='text'>
We could have a race condition where in -&gt;classify() path we
dereference tp-&gt;root and meanwhile a parallel -&gt;destroy() makes it
a NULL. Daniel cured this bug in commit d936377414fa
("net, sched: respect rcu grace period on cls destruction").

This happens when -&gt;destroy() is called for deleting a filter to
check if we are the last one in tp, this tp is still linked and
visible at that time. The root cause of this problem is the semantic
of -&gt;destroy(), it does two things (for non-force case):

1) check if tp is empty
2) if tp is empty we could really destroy it

and its caller, if cares, needs to check its return value to see if it
is really destroyed. Therefore we can't unlink tp unless we know it is
empty.

As suggested by Daniel, we could actually move the test logic to -&gt;delete()
so that we can safely unlink tp after -&gt;delete() tells us the last one is
just deleted and before -&gt;destroy().

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Cc: Roi Dayan &lt;roid@mellanox.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: make default fifo qdiscs appear in the dump</title>
<updated>2017-03-13T05:53:02+00:00</updated>
<author>
<name>Jiri Kosina</name>
<email>jkosina@suse.cz</email>
</author>
<published>2017-03-08T15:03:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=49b499718fa1b0d639663cfd03085b9bfd23cdc8'/>
<id>urn:sha1:49b499718fa1b0d639663cfd03085b9bfd23cdc8</id>
<content type='text'>
The original reason [1] for having hidden qdiscs (potential scalability
issues in qdisc_match_from_root() with single linked list in case of large
amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert
qdisc linked list to hashtable").

This allows us for bringing more clarity and determinism into the dump by
making default pfifo qdiscs visible.

We're not turning this on by default though, at it was deemed [2] too
intrusive / unnecessary change of default behavior towards userspace.
Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows
applications to request complete qdisc hierarchy dump, including the
ones that have always been implicit/invisible.

Singleton noop_qdisc stays invisible, as teaching the whole infrastructure
about singletons would require quite some surgery with very little gain
(seeing no qdisc or seeing noop qdisc in the dump is probably setting
the same user expectation).

[1] http://lkml.kernel.org/r/1460732328.10638.74.camel@edumazet-glaptop3.roam.corp.google.com
[2] http://lkml.kernel.org/r/20161021.105935.1907696543877061916.davem@davemloft.net

Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api</title>
<updated>2017-02-10T16:38:08+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@mellanox.com</email>
</author>
<published>2017-02-09T13:38:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cf1facda2f61bc3e9ffd985b6d624dec6ad3f279'/>
<id>urn:sha1:cf1facda2f61bc3e9ffd985b6d624dec6ad3f279</id>
<content type='text'>
Creation is done in this file, move destruction to be at the same place.

Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sched: rename tcf_destroy to tcf_destroy_proto</title>
<updated>2017-02-10T16:38:08+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@mellanox.com</email>
</author>
<published>2017-02-09T13:38:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=79112c26f14c38ddbac3b2739469e373ef424fe6'/>
<id>urn:sha1:79112c26f14c38ddbac3b2739469e373ef424fe6</id>
<content type='text'>
This function destroys TC filter protocol, not TC filter. So name it
accordingly.

Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-tc: convert tc_from to tc_from_ingress and tc_redirected</title>
<updated>2017-01-09T01:58:52+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-01-07T22:06:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bc31c905e946b5c55df5d2938335e78ffb3157ca'/>
<id>urn:sha1:bc31c905e946b5c55df5d2938335e78ffb3157ca</id>
<content type='text'>
The tc_from field fulfills two roles. It encodes whether a packet was
redirected by an act_mirred device and, if so, whether act_mirred was
called on ingress or egress. Split it into separate fields.

The information is needed by the special IFB loop, where packets are
taken out of the normal path by act_mirred, forwarded to IFB, then
reinjected at their original location (ingress or egress) by IFB.

The IFB device cannot use skb-&gt;tc_at_ingress, because that may have
been overwritten as the packet travels from act_mirred to ifb_xmit,
when it passes through tc_classify on the IFB egress path. Cache this
value in skb-&gt;tc_from_ingress.

That field is valid only if a packet arriving at ifb_xmit came from
act_mirred. Other packets can be crafted to reach ifb_xmit. These
must be dropped. Set tc_redirected on redirection and drop all packets
that do not have this bit set.

Both fields are set only on cloned skbs in tc actions, so original
packet sources do not have to clear the bit when reusing packets
(notably, pktgen and octeon).

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-tc: convert tc_at to tc_at_ingress</title>
<updated>2017-01-09T01:58:52+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-01-07T22:06:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8dc07fdbf2054f157e8333f940a1ad728916c786'/>
<id>urn:sha1:8dc07fdbf2054f157e8333f940a1ad728916c786</id>
<content type='text'>
Field tc_at is used only within tc actions to distinguish ingress from
egress processing. A single bit is sufficient for this purpose.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-tc: convert tc_verd to integer bitfields</title>
<updated>2017-01-09T01:58:52+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-01-07T22:06:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a5135bcfba7345031df45e02cd150a45add47cf8'/>
<id>urn:sha1:a5135bcfba7345031df45e02cd150a45add47cf8</id>
<content type='text'>
Extract the remaining two fields from tc_verd and remove the __u16
completely. TC_AT and TC_FROM are converted to equivalent two-bit
integer fields tc_at and tc_from. Where possible, use existing
helper skb_at_tc_ingress when reading tc_at. Introduce helper
skb_reset_tc to clear fields.

Not documenting tc_from and tc_at, because they will be replaced
with single bit fields in follow-on patches.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-tc: extract skip classify bit from tc_verd</title>
<updated>2017-01-09T01:58:52+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-01-07T22:06:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e7246e122aaa99ebbb8ad7da80f35a20577bd8af'/>
<id>urn:sha1:e7246e122aaa99ebbb8ad7da80f35a20577bd8af</id>
<content type='text'>
Packets sent by the IFB device skip subsequent tc classification.
A single bit governs this state. Move it out of tc_verd in
anticipation of removing that __u16 completely.

The new bitfield tc_skip_classify temporarily uses one bit of a
hole, until tc_verd is removed completely in a follow-up patch.

Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
With that many options, little value in documenting it.

Introduce a helper function to deduplicate the logic in the two
sites that check this bit.

The field tc_skip_classify is set only in IFB on skbs cloned in
act_mirred, so original packet sources do not have to clear the
bit when reusing packets (notably, pktgen and octeon).

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: gen_estimator: complete rewrite of rate estimators</title>
<updated>2016-12-05T20:21:59+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-12-04T17:48:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1c0d32fde5bdf1184bc274f864c09799278a1114'/>
<id>urn:sha1:1c0d32fde5bdf1184bc274f864c09799278a1114</id>
<content type='text'>
1) Old code was hard to maintain, due to complex lock chains.
   (We probably will be able to remove some kfree_rcu() in callers)

2) Using a single timer to update all estimators does not scale.

3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity
   is not supposed to work well)

In this rewrite :

- I removed the RB tree that had to be scanned in
  gen_estimator_active(). qdisc dumps should be much faster.

- Each estimator has its own timer.

- Estimations are maintained in net_rate_estimator structure,
  instead of dirtying the qdisc. Minor, but part of the simplification.

- Reading the estimator uses RCU and a seqcount to provide proper
  support for 32bit kernels.

- We reduce memory need when estimators are not used, since
  we store a pointer, instead of the bytes/packets counters.

- xt_rateest_mt() no longer has to grab a spinlock.
  (In the future, xt_rateest_tg() could be switched to per cpu counters)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sched: add and use qdisc_skb_head helpers</title>
<updated>2016-09-19T05:47:18+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2016-09-17T22:57:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=48da34b7a74201f15315cb1fc40bb9a7bd2b4940'/>
<id>urn:sha1:48da34b7a74201f15315cb1fc40bb9a7bd2b4940</id>
<content type='text'>
This change replaces sk_buff_head struct in Qdiscs with new qdisc_skb_head.

Its similar to the skb_buff_head api, but does not use skb-&gt;prev pointers.

Qdiscs will commonly enqueue at the tail of a list and dequeue at head.
While skb_buff_head works fine for this, enqueue/dequeue needs to also
adjust the prev pointer of next element.

The -&gt;prev pointer is not required for qdiscs so we can just leave
it undefined and avoid one cacheline write access for en/dequeue.

Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
