<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/sock.h, branch v3.12.43</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.12.43</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.12.43'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2015-04-09T12:13:51+00:00</updated>
<entry>
<title>net: fix sparse warning in sk_dst_set()</title>
<updated>2015-04-09T12:13:51+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-07-02T09:39:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1f2251ec984761b34036316a43a5eb2dc6e9df90'/>
<id>urn:sha1:1f2251ec984761b34036316a43a5eb2dc6e9df90</id>
<content type='text'>
commit 5925a0555bdaf0b396a84318cbc21ba085f6c0d3 upstream.

sk_dst_cache has __rcu annotation, so we need a cast to avoid
following sparse error :

include/net/sock.h:1774:19: warning: incorrect type in initializer (different address spaces)
include/net/sock.h:1774:19:    expected struct dst_entry [noderef] &lt;asn:4&gt;*__ret
include/net/sock.h:1774:19:    got struct dst_entry *dst

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: kbuild test robot &lt;fengguang.wu@intel.com&gt;
Fixes: 7f502361531e ("ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()</title>
<updated>2014-10-17T07:43:12+00:00</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2014-08-14T16:40:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8e2aadd34bfee8db9253431c3c8e5457c130f608'/>
<id>urn:sha1:8e2aadd34bfee8db9253431c3c8e5457c130f608</id>
<content type='text'>
[ Upstream commit 4fab9071950c2021d846e18351e0f46a1cffd67b ]

Make sure we use the correct address-family-specific function for
handling MTU reductions from within tcp_release_cb().

Previously AF_INET6 sockets were incorrectly always using the IPv6
code path when sometimes they were handling IPv4 traffic and thus had
an IPv4 dst.

Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Diagnosed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications")
Reviewed-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix</title>
<updated>2014-07-29T14:56:59+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-30T08:26:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8728b0c1932dada25f2d1ccfa1669b5b41fe7937'/>
<id>urn:sha1:8728b0c1932dada25f2d1ccfa1669b5b41fe7937</id>
<content type='text'>
[ Upstream commit 7f502361531e9eecb396cf99bdc9e9a59f7ebd7f ]

We have two different ways to handle changes to sk-&gt;sk_dst

First way (used by TCP) assumes socket lock is owned by caller, and use
no extra lock : __sk_dst_set() &amp; __sk_dst_reset()

Another way (used by UDP) uses sk_dst_lock because socket lock is not
always taken. Note that sk_dst_lock is not softirq safe.

These ways are not inter changeable for a given socket type.

ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used
the socket lock as synchronization, but users might be UDP sockets.

Instead of converting sk_dst_lock to a softirq safe version, use xchg()
as we did for sk_rx_dst in commit e47eb5dfb296b ("udp: ipv4: do not use
sk_dst_lock from softirq context")

In a follow up patch, we probably can remove sk_dst_lock, as it is
only used in IPv6.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Fixes: 9cb3a50c5f63e ("ipv4: Invalidate the socket cached route on pmtu events if possible")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>ipv4: fix dst race in sk_dst_get()</title>
<updated>2014-07-29T14:56:59+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-24T17:05:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=591b1e1bb40152e22cee757f493046a0ca946bf8'/>
<id>urn:sha1:591b1e1bb40152e22cee757f493046a0ca946bf8</id>
<content type='text'>
[ Upstream commit f88649721268999bdff09777847080a52004f691 ]

When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.

In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.

DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.

Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Dormando &lt;dormando@rydia.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>net: Add variants of capable for use on on sockets</title>
<updated>2014-06-23T08:27:56+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2014-04-23T21:26:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8c91777bd4951c1f1ad59842fe4d208f94a97dfe'/>
<id>urn:sha1:8c91777bd4951c1f1ad59842fe4d208f94a97dfe</id>
<content type='text'>
[ Upstream commit a3b299da869d6e78cf42ae0b1b41797bcb8c5e4b ]

sk_net_capable - The common case, operations that are safe in a network namespace.
sk_capable - Operations that are not known to be safe in a network namespace
sk_ns_capable - The general case for special cases.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>tcp: tcp_release_cb() should release socket ownership</title>
<updated>2014-04-18T09:07:01+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2014-03-10T16:50:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e52e9e5c8c315f3514c8b9b85ecaf0e96c1dd00'/>
<id>urn:sha1:3e52e9e5c8c315f3514c8b9b85ecaf0e96c1dd00</id>
<content type='text'>
[ Upstream commit c3f9b01849ef3bc69024990092b9f42e20df7797 ]

Lars Persson reported following deadlock :

-000 |M:0x0:0x802B6AF8(asm) &lt;-- arch_spin_lock
-001 |tcp_v4_rcv(skb = 0x8BD527A0) &lt;-- sk = 0x8BE6B2A0
-002 |ip_local_deliver_finish(skb = 0x8BD527A0)
-003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
-004 |netif_receive_skb(skb = 0x8BD527A0)
-005 |elk_poll(napi = 0x8C770500, budget = 64)
-006 |net_rx_action(?)
-007 |__do_softirq()
-008 |do_softirq()
-009 |local_bh_enable()
-010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
-011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
-012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
-013 |tcp_release_cb(sk = 0x8BE6B2A0)
-014 |release_sock(sk = 0x8BE6B2A0)
-015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
-016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
-017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
-018 |smb_send_kvec()
-019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
-020 |cifs_call_async()
-021 |cifs_async_writev(wdata = 0x87FD6580)
-022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
-023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
-024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
-025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
-026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
-027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
-028 |bdi_writeback_workfn(work = 0x87E4A9CC)
-029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
-030 |worker_thread(__worker = 0x8B045880)
-031 |kthread(_create = 0x87CADD90)
-032 |ret_from_kernel_thread(asm)

Bug occurs because __tcp_checksum_complete_user() enables BH, assuming
it is running from softirq context.

Lars trace involved a NIC without RX checksum support but other points
are problematic as well, like the prequeue stuff.

Problem is triggered by a timer, that found socket being owned by user.

tcp_release_cb() should call tcp_write_timer_handler() or
tcp_delack_timer_handler() in the appropriate context :

BH disabled and socket lock held, but 'owned' field cleared,
as if they were running from timer handlers.

Fixes: 6f458dfb4092 ("tcp: improve latencies of timer triggered events")
Reported-by: Lars Persson &lt;lars.persson@axis.com&gt;
Tested-by: Lars Persson &lt;lars.persson@axis.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>net: fix unsafe set_memory_rw from softirq</title>
<updated>2013-10-07T19:16:45+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2013-10-04T07:14:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d45ed4a4e33ae103053c0a53d280014e7101bb5c'/>
<id>urn:sha1:d45ed4a4e33ae103053c0a53d280014e7101bb5c</id>
<content type='text'>
on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&amp;(&amp;vb-&gt;lock)-&gt;rlock);
[   56.799593]   &lt;Interrupt&gt;
[   56.805889]     lock(&amp;(&amp;vb-&gt;lock)-&gt;rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [&lt;ffffffff8118f44c&gt;] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [&lt;ffffffff8175a145&gt;] dump_stack+0x55/0x76
[   56.936067]  [&lt;ffffffff81755b14&gt;] print_usage_bug+0x1f7/0x208
[   56.942445]  [&lt;ffffffff8101178f&gt;] ? save_stack_trace+0x2f/0x50
[   56.948932]  [&lt;ffffffff810cc0a0&gt;] ? check_usage_backwards+0x150/0x150
[   56.955470]  [&lt;ffffffff810ccb52&gt;] mark_lock+0x282/0x2c0
[   56.961945]  [&lt;ffffffff810ccfed&gt;] __lock_acquire+0x45d/0x1d50
[   56.968474]  [&lt;ffffffff810cce6e&gt;] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [&lt;ffffffff81393bf5&gt;] ? cpumask_next_and+0x55/0x90
[   56.981942]  [&lt;ffffffff810cef72&gt;] lock_acquire+0x92/0x1d0
[   56.988745]  [&lt;ffffffff8118f52a&gt;] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [&lt;ffffffff817628f1&gt;] _raw_spin_lock+0x41/0x50
[   57.002493]  [&lt;ffffffff8118f52a&gt;] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [&lt;ffffffff8118f52a&gt;] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [&lt;ffffffff8118f44c&gt;] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [&lt;ffffffff810436b0&gt;] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [&lt;ffffffff810cfb8d&gt;] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [&lt;ffffffff811a8330&gt;] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [&lt;ffffffff811b59c3&gt;] ? free_object_rcu+0x93/0xa0
[   57.051720]  [&lt;ffffffff81043d9f&gt;] set_memory_rw+0x2f/0x40
[   57.058727]  [&lt;ffffffff8104e17c&gt;] bpf_jit_free+0x2c/0x40
[   57.065577]  [&lt;ffffffff81642cba&gt;] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [&lt;ffffffff811108e2&gt;] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [&lt;ffffffff81057f17&gt;] __do_softirq+0xf7/0x3f0
[   57.085373]  [&lt;ffffffff81058245&gt;] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: Use RCU apis to access sk_user_data.</title>
<updated>2013-09-30T18:22:59+00:00</updated>
<author>
<name>Pravin B Shelar</name>
<email>pshelar@nicira.com</email>
</author>
<published>2013-09-24T17:25:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=559835ea7292e2f09304d81eda16f4209433245e'/>
<id>urn:sha1:559835ea7292e2f09304d81eda16f4209433245e</id>
<content type='text'>
Use of RCU api makes vxlan code easier to understand.  It also
fixes bug due to missing ACCESS_ONCE() on sk_user_data dereference.
In rare case without ACCESS_ONCE() compiler might omit vs on
sk_user_data dereference.
Compiler can use vs as alias for sk-&gt;sk_user_data, resulting in
multiple sk_user_data dereference in rcu read context which
could change.

CC: Jesse Gross &lt;jesse@nicira.com&gt;
Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: TSO packets automatic sizing</title>
<updated>2013-08-29T19:50:06+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-08-27T12:46:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=95bd09eb27507691520d39ee1044d6ad831c1168'/>
<id>urn:sha1:95bd09eb27507691520d39ee1044d6ad831c1168</id>
<content type='text'>
After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.

One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.

This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.

This field could be set by other transports.

Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.

For other flows, this helps better packet scheduling and ACK clocking.

This patch increases performance of TCP flows in lossy environments.

A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).

A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.

This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.

sk_pacing_rate = 2 * cwnd * mss / srtt

v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp-&gt;xmit_size_goal_segs

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Van Jacobson &lt;vanj@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: attempt high order allocations in sock_alloc_send_pskb()</title>
<updated>2013-08-10T08:16:44+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-08-08T21:38:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=28d6427109d13b0f447cba5761f88d3548e83605'/>
<id>urn:sha1:28d6427109d13b0f447cba5761f88d3548e83605</id>
<content type='text'>
Adding paged frags skbs to af_unix sockets introduced a performance
regression on large sends because of additional page allocations, even
if each skb could carry at least 100% more payload than before.

We can instruct sock_alloc_send_pskb() to attempt high order
allocations.

Most of the time, it does a single page allocation instead of 8.

I added an additional parameter to sock_alloc_send_pskb() to
let other users to opt-in for this new feature on followup patches.

Tested:

Before patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 2304  212992  212992    10.00    46861.15

After patch :

$ netperf -t STREAM_STREAM
STREAM STREAM TEST
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 2304  212992  212992    10.00    57981.11

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
