<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/sch_generic.h, branch v4.18.18</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.18.18</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.18.18'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2018-05-29T13:49:15+00:00</updated>
<entry>
<title>net: sched: add qstats.qlen to qlen</title>
<updated>2018-05-29T13:49:15+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2018-05-26T04:53:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6172abc1e2ea12743f368875576952ddf4ebe88c'/>
<id>urn:sha1:6172abc1e2ea12743f368875576952ddf4ebe88c</id>
<content type='text'>
AFAICT struct gnet_stats_queue.qlen is not used in Qdiscs.
It may, however, be useful for offloads to report HW queue
length there.  Add that value to the result of qdisc_qlen_sum().

Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: shrink struct Qdisc</title>
<updated>2018-05-29T03:13:39+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2018-05-25T14:28:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e9be0e993d95adbe5efe0e0f03b2a3e71f5bb2b6'/>
<id>urn:sha1:e9be0e993d95adbe5efe0e0f03b2a3e71f5bb2b6</id>
<content type='text'>
The struct Qdisc has a lot of holes, especially after commit
a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback"),
which as a side effect, moved the fields just after 'busylock'
on a new cacheline.

Since both 'padded' and 'refcnt' are not updated frequently, and
there is a hole before 'gso_skb', we can move such fields there,
saving a cacheline without any performance side effect.

Before this commit:

pahole -C Qdisc net/sche/sch_generic.o
	# ...
        /* size: 384, cachelines: 6, members: 25 */
        /* sum members: 236, holes: 3, sum holes: 92 */
        /* padding: 56 */

After this commit:
pahole -C Qdisc net/sche/sch_generic.o
	# ...
	/* size: 320, cachelines: 5, members: 25 */
	/* sum members: 236, holes: 2, sum holes: 28 */
	/* padding: 56 */

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sched: replace __QDISC_STATE_RUNNING bit with a spin lock</title>
<updated>2018-05-17T16:46:54+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2018-05-15T14:24:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=96009c7d500efdd5534e83b2e3eb2c58d4b137ae'/>
<id>urn:sha1:96009c7d500efdd5534e83b2e3eb2c58d4b137ae</id>
<content type='text'>
So that we can use lockdep on it.
The newly introduced sequence lock has the same scope of busylock,
so it shares the same lockdep annotation, but it's only used for
NOLOCK qdiscs.

With this changeset we acquire such lock in the control path around
flushing operation (qdisc reset), to allow more NOLOCK qdisc perf
improvement in the next patch.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sched: manipulate __QDISC_STATE_RUNNING in qdisc_run_* helpers</title>
<updated>2018-05-16T16:26:25+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2018-05-15T08:50:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=32f7b44d0f5661044fcfa84e9ad402ed9d759107'/>
<id>urn:sha1:32f7b44d0f5661044fcfa84e9ad402ed9d759107</id>
<content type='text'>
Currently NOLOCK qdiscs pay a measurable overhead to atomically
manipulate the __QDISC_STATE_RUNNING. Such bit is flipped twice per
packet in the uncontended scenario with packet rate below the
line rate: on packed dequeue and on the next, failing dequeue attempt.

This changeset moves the bit manipulation into the qdisc_run_{begin,end}
helpers, so that the bit is now flipped only once per packet, with
measurable performance improvement in the uncontended scenario.

This also allows simplifying the qdisc teardown code path - since
qdisc_is_running() is now effective for each qdisc type - and avoid a
possible race between qdisc_run() and dev_deactivate_many(), as now
the some_qdisc_is_busy() can properly detect NOLOCK qdiscs being busy
dequeuing packets.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2018-04-01T23:49:34+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2018-04-01T23:49:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c0b458a9463bd6be165374a8e9e3235800ee132e'/>
<id>urn:sha1:c0b458a9463bd6be165374a8e9e3235800ee132e</id>
<content type='text'>
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c,
we had some overlapping changes:

1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --&gt;
   MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE

2) In 'net-next' params-&gt;log_rq_size is renamed to be
   params-&gt;log_rq_mtu_frames.

3) In 'net-next' params-&gt;hard_mtu is added.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched, fix OOO packets with pfifo_fast</title>
<updated>2018-03-26T16:36:23+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2018-03-25T05:25:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=eb82a994479245a79647d302f9b4eb8e7c9d7ca6'/>
<id>urn:sha1:eb82a994479245a79647d302f9b4eb8e7c9d7ca6</id>
<content type='text'>
After the qdisc lock was dropped in pfifo_fast we allow multiple
enqueue threads and dequeue threads to run in parallel. On the
enqueue side the skb bit ooo_okay is used to ensure all related
skbs are enqueued in-order. On the dequeue side though there is
no similar logic. What we observe is with fewer queues than CPUs
it is possible to re-order packets when two instances of
__qdisc_run() are running in parallel. Each thread will dequeue
a skb and then whichever thread calls the ndo op first will
be sent on the wire. This doesn't typically happen because
qdisc_run() is usually triggered by the same core that did the
enqueue. However, drivers will trigger __netif_schedule()
when queues are transitioning from stopped to awake using the
netif_tx_wake_* APIs. When this happens netif_schedule() calls
qdisc_run() on the same CPU that did the netif_tx_wake_* which
is usually done in the interrupt completion context. This CPU
is selected with the irq affinity which is unrelated to the
enqueue operations.

To resolve this we add a RUNNING bit to the qdisc to ensure
only a single dequeue per qdisc is running. Enqueue and dequeue
operations can still run in parallel and also on multi queue
NICs we can still have a dequeue in-flight per qdisc, which
is typically per CPU.

Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
Reported-by: Jakob Unterwurzacher &lt;jakob.unterwurzacher@theobroma-systems.com&gt;
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2018-03-23T15:31:58+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2018-03-23T15:24:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=03fe2debbb2771fb90881e4ce8109b09cf772a5c'/>
<id>urn:sha1:03fe2debbb2771fb90881e4ce8109b09cf772a5c</id>
<content type='text'>
Fun set of conflict resolutions here...

For the mac80211 stuff, these were fortunately just parallel
adds.  Trivially resolved.

In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.

In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.

The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.

The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:

====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch.  This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
            drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524
            (IB/mlx5: Fix cleanup order on unload) added to for-rc and
            commit b5ca15ad7e61 (IB/mlx5: Add proper representors support)
            add as part of the devel cycle both needed to modify the
            init/de-init functions used by mlx5.  To support the new
            representors, the new functions added by the cleanup patch
            needed to be made non-static, and the init/de-init list
            added by the representors patch needed to be modified to
            match the init/de-init list changes made by the cleanup
            patch.
    Updates:
            drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
            prototypes added by representors patch to reflect new function
            names as changed by cleanup patch
            drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
            stage list to match new order from cleanup patch
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sch_netem: fix skb leak in netem_enqueue()</title>
<updated>2018-03-07T16:18:14+00:00</updated>
<author>
<name>Alexey Kodanev</name>
<email>alexey.kodanev@oracle.com</email>
</author>
<published>2018-03-05T17:52:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=35d889d10b649fda66121891ec05eca88150059d'/>
<id>urn:sha1:35d889d10b649fda66121891ec05eca88150059d</id>
<content type='text'>
When we exceed current packets limit and we have more than one
segment in the list returned by skb_gso_segment(), netem drops
only the first one, skipping the rest, hence kmemleak reports:

unreferenced object 0xffff880b5d23b600 (size 1024):
  comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s)
  hex dump (first 32 bytes):
    00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00  ..#]............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;00000000d8a19b9d&gt;] __alloc_skb+0xc9/0x520
    [&lt;000000001709b32f&gt;] skb_segment+0x8c8/0x3710
    [&lt;00000000c7b9bb88&gt;] tcp_gso_segment+0x331/0x1830
    [&lt;00000000c921cba1&gt;] inet_gso_segment+0x476/0x1370
    [&lt;000000008b762dd4&gt;] skb_mac_gso_segment+0x1f9/0x510
    [&lt;000000002182660a&gt;] __skb_gso_segment+0x1dd/0x620
    [&lt;00000000412651b9&gt;] netem_enqueue+0x1536/0x2590 [sch_netem]
    [&lt;0000000005d3b2a9&gt;] __dev_queue_xmit+0x1167/0x2120
    [&lt;00000000fc5f7327&gt;] ip_finish_output2+0x998/0xf00
    [&lt;00000000d309e9d3&gt;] ip_output+0x1aa/0x2c0
    [&lt;000000007ecbd3a4&gt;] tcp_transmit_skb+0x18db/0x3670
    [&lt;0000000042d2a45f&gt;] tcp_write_xmit+0x4d4/0x58c0
    [&lt;0000000056a44199&gt;] tcp_tasklet_func+0x3d9/0x540
    [&lt;0000000013d06d02&gt;] tasklet_action+0x1ca/0x250
    [&lt;00000000fcde0b8b&gt;] __do_softirq+0x1b4/0x5a3
    [&lt;00000000e7ed027c&gt;] irq_exit+0x1e2/0x210

Fix it by adding the rest of the segments, if any, to skb 'to_free'
list. Add new __qdisc_drop_all() and qdisc_drop_all() functions
because they can be useful in the future if we need to drop segmented
GSO packets in other places.

Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
Signed-off-by: Alexey Kodanev &lt;alexey.kodanev@oracle.com&gt;
Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Fix spelling mistake "greater then" -&gt; "greater than"</title>
<updated>2018-03-01T18:39:39+00:00</updated>
<author>
<name>Gal Pressman</name>
<email>galp@mellanox.com</email>
</author>
<published>2018-02-28T13:59:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3a053b1a30dcb4e39569bcce2f4357509260db75'/>
<id>urn:sha1:3a053b1a30dcb4e39569bcce2f4357509260db75</id>
<content type='text'>
Fix trivial spelling mistake "greater then" -&gt; "greater than".

Signed-off-by: Gal Pressman &lt;galp@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: plug in qdisc ops change_tx_queue_len</title>
<updated>2018-01-29T17:42:15+00:00</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2018-01-26T02:26:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=48bfd55e7e4149a304e89c1999436cf52d094a27'/>
<id>urn:sha1:48bfd55e7e4149a304e89c1999436cf52d094a27</id>
<content type='text'>
Introduce a new qdisc ops -&gt;change_tx_queue_len() so that
each qdisc could decide how to implement this if it wants.
Previously we simply read dev-&gt;tx_queue_len, after pfifo_fast
switches to skb array, we need this API to resize the skb array
when we change dev-&gt;tx_queue_len.

To avoid handling race conditions with TX BH, we need to
deactivate all TX queues before change the value and bring them
back after we are done, this also makes implementation easier.

Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
