<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/sock.h, branch v4.17.18</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.17.18</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.17.18'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2018-04-06T04:36:24+00:00</updated>
<entry>
<title>slab: make usercopy region 32-bit</title>
<updated>2018-04-06T04:36:24+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-04-05T23:21:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7bbdb81ee3de73f2381ceec1bbee831f4c913b5c'/>
<id>urn:sha1:7bbdb81ee3de73f2381ceec1bbee831f4c913b5c</id>
<content type='text'>
If kmem case sizes are 32-bit, then usecopy region should be too.

Link: http://lkml.kernel.org/r/20180305200730.15812-21-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Hooks for sys_connect</title>
<updated>2018-03-31T00:15:54+00:00</updated>
<author>
<name>Andrey Ignatov</name>
<email>rdna@fb.com</email>
</author>
<published>2018-03-30T22:08:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d74bad4e74ee373787a9ae24197c17b7cdc428d5'/>
<id>urn:sha1:d74bad4e74ee373787a9ae24197c17b7cdc428d5</id>
<content type='text'>
== The problem ==

See description of the problem in the initial patch of this patch set.

== The solution ==

The patch provides much more reliable in-kernel solution for the 2nd
part of the problem: making outgoing connecttion from desired IP.

It adds new attach types `BPF_CGROUP_INET4_CONNECT` and
`BPF_CGROUP_INET6_CONNECT` for program type
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both
source and destination of a connection at connect(2) time.

Local end of connection can be bound to desired IP using newly
introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though,
and doesn't support binding to port, i.e. leverages
`IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this:
* looking for a free port is expensive and can affect performance
  significantly;
* there is no use-case for port.

As for remote end (`struct sockaddr *` passed by user), both parts of it
can be overridden, remote IP and remote port. It's useful if an
application inside cgroup wants to connect to another application inside
same cgroup or to itself, but knows nothing about IP assigned to the
cgroup.

Support is added for IPv4 and IPv6, for TCP and UDP.

IPv4 and IPv6 have separate attach types for same reason as sys_bind
hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields
when user passes sockaddr_in since it'd be out-of-bound.

== Implementation notes ==

The patch introduces new field in `struct proto`: `pre_connect` that is
a pointer to a function with same signature as `connect` but is called
before it. The reason is in some cases BPF hooks should be called way
before control is passed to `sk-&gt;sk_prot-&gt;connect`. Specifically
`inet_dgram_connect` autobinds socket before calling
`sk-&gt;sk_prot-&gt;connect` and there is no way to call `bpf_bind()` from
hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since
it'd cause double-bind. On the other hand `proto.pre_connect` provides a
flexible way to add BPF hooks for connect only for necessary `proto` and
call them at desired time before `connect`. Since `bpf_bind()` is
allowed to bind only to IP and autobind in `inet_dgram_connect` binds
only port there is no chance of double-bind.

bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite
of value of `bind_address_no_port` socket field.

bpf_bind() sets `with_lock` to `false` when calling to __inet_bind()
and __inet6_bind() since all call-sites, where bpf_bind() is called,
already hold socket lock.

Signed-off-by: Andrey Ignatov &lt;rdna@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: sockmap redirect ingress support</title>
<updated>2018-03-29T22:09:43+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2018-03-28T19:49:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8934ce2fd08171e8605f7fada91ee7619fe17ab8'/>
<id>urn:sha1:8934ce2fd08171e8605f7fada91ee7619fe17ab8</id>
<content type='text'>
Add support for the BPF_F_INGRESS flag in sk_msg redirect helper.
To do this add a scatterlist ring for receiving socks to check
before calling into regular recvmsg call path. Additionally, because
the poll wakeup logic only checked the skb recv queue we need to
add a hook in TCP stack (similar to write side) so that we have
a way to wake up polling socks when a scatterlist is redirected
to that sock.

After this all that is needed is for the redirect helper to
push the scatterlist into the psock receive queue.

Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2018-03-23T15:31:58+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2018-03-23T15:24:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=03fe2debbb2771fb90881e4ce8109b09cf772a5c'/>
<id>urn:sha1:03fe2debbb2771fb90881e4ce8109b09cf772a5c</id>
<content type='text'>
Fun set of conflict resolutions here...

For the mac80211 stuff, these were fortunately just parallel
adds.  Trivially resolved.

In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.

In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.

The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.

The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:

====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch.  This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
            drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524
            (IB/mlx5: Fix cleanup order on unload) added to for-rc and
            commit b5ca15ad7e61 (IB/mlx5: Add proper representors support)
            add as part of the devel cycle both needed to modify the
            init/de-init functions used by mlx5.  To support the new
            representors, the new functions added by the cleanup patch
            needed to be made non-static, and the init/de-init list
            added by the representors patch needed to be modified to
            match the init/de-init list changes made by the cleanup
            patch.
    Updates:
            drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
            prototypes added by representors patch to reflect new function
            names as changed by cleanup patch
            drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
            stage list to match new order from cleanup patch
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: generalize sk_alloc_sg to work with scatterlist rings</title>
<updated>2018-03-19T20:14:38+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2018-03-18T19:57:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8c05dbf04b2882c3c0bc43fe7668c720210877f3'/>
<id>urn:sha1:8c05dbf04b2882c3c0bc43fe7668c720210877f3</id>
<content type='text'>
The current implementation of sk_alloc_sg expects scatterlist to always
start at entry 0 and complete at entry MAX_SKB_FRAGS.

Future patches will want to support starting at arbitrary offset into
scatterlist so add an additional sg_start parameters and then default
to the current values in TLS code paths.

Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>sock: make static tls function alloc_sg generic sock helper</title>
<updated>2018-03-19T20:14:38+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2018-03-18T19:56:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2c3682f0be97a5f57c6c8b40fa154dfc77efb461'/>
<id>urn:sha1:2c3682f0be97a5f57c6c8b40fa154dfc77efb461</id>
<content type='text'>
The TLS ULP module builds scatterlists from a sock using
page_frag_refill(). This is going to be useful for other ULPs
so move it into sock file for more general use.

In the process remove useless goto at end of while loop.

Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>sock_diag: request _diag module only when the family or proto has been registered</title>
<updated>2018-03-12T15:03:42+00:00</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2018-03-10T10:57:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf2ae2e4bf9360e07c0cdfa166bcdc0afd92f4ce'/>
<id>urn:sha1:bf2ae2e4bf9360e07c0cdfa166bcdc0afd92f4ce</id>
<content type='text'>
Now when using 'ss' in iproute, kernel would try to load all _diag
modules, which also causes corresponding family and proto modules
to be loaded as well due to module dependencies.

Like after running 'ss', sctp, dccp, af_packet (if it works as a module)
would be loaded.

For example:

  $ lsmod|grep sctp
  $ ss
  $ lsmod|grep sctp
  sctp_diag              16384  0
  sctp                  323584  5 sctp_diag
  inet_diag              24576  4 raw_diag,tcp_diag,sctp_diag,udp_diag
  libcrc32c              16384  3 nf_conntrack,nf_nat,sctp

As these family and proto modules are loaded unintentionally, it
could cause some problems, like:

- Some debug tools use 'ss' to collect the socket info, which loads all
  those diag and family and protocol modules. It's noisy for identifying
  issues.

- Users usually expect to drop sctp init packet silently when they
  have no sense of sctp protocol instead of sending abort back.

- It wastes resources (especially with multiple netns), and SCTP module
  can't be unloaded once it's loaded.

...

In short, it's really inappropriate to have these family and proto
modules loaded unexpectedly when just doing debugging with inet_diag.

This patch is to introduce sock_load_diag_module() where it loads
the _diag module only when it's corresponding family or proto has
been already registered.

Note that we can't just load _diag module without the family or
proto loaded, as some symbols used in _diag module are from the
family or proto module.

v1-&gt;v2:
  - move inet proto check to inet_diag to avoid a compiling err.
v2-&gt;v3:
  - define sock_load_diag_module in sock.c and export one symbol
    only.
  - improve the changelog.

Reported-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Acked-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Acked-by: Phil Sutter &lt;phil@nwl.cc&gt;
Acked-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: remove sk_check_csum_caps()</title>
<updated>2018-02-21T19:24:14+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-02-19T19:56:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dead7cdb0daec58490891e59f4fae0c5c76fa5f3'/>
<id>urn:sha1:dead7cdb0daec58490891e59f4fae0c5c76fa5f3</id>
<content type='text'>
Since TCP relies on GSO, we do not need this helper anymore.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: switch to GSO being always on</title>
<updated>2018-02-21T19:24:13+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-02-19T19:56:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a'/>
<id>urn:sha1:0a6b2a1dc2a2105f178255fe495eb914b09cb37a</id>
<content type='text'>
Oleksandr Natalenko reported performance issues with BBR without FQ
packet scheduler that were root caused to lack of SG and GSO/TSO on
his configuration.

In this mode, TCP internal pacing has to setup a high resolution timer
for each MSS sent.

We could implement in TCP a strategy similar to the one adopted
in commit fefa569a9d4b ("net_sched: sch_fq: account for schedule/timers drifts")
or decide to finally switch TCP stack to a GSO only mode.

This has many benefits :

1) Most TCP developments are done with TSO in mind.
2) Less high-resolution timers needs to be armed for TCP-pacing
3) GSO can benefit of xmit_more hint
4) Receiver GRO is more effective (as if TSO was used for real on sender)
   -&gt; Lower ACK traffic
5) Write queues have less overhead (one skb holds about 64KB of payload)
6) SACK coalescing just works.
7) rtx rb-tree contains less packets, SACK is cheaper.

This patch implements the minimum patch, but we can remove some legacy
code as follow ups.

Tested:

On 40Gbit link, one netperf -t TCP_STREAM

BBR+fq:
sg on:  26 Gbits/sec
sg off: 15.7 Gbits/sec   (was 2.3 Gbit before patch)

BBR+pfifo_fast:
sg on:  24.2 Gbits/sec
sg off: 14.9 Gbits/sec  (was 0.66 Gbit before patch !!! )

BBR+fq_codel:
sg on:  24.4 Gbits/sec
sg off: 15 Gbits/sec  (was 0.66 Gbit before patch !!! )

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Oleksandr Natalenko &lt;oleksandr@natalenko.name&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: make getname() functions return length rather than use int* parameter</title>
<updated>2018-02-12T19:15:04+00:00</updated>
<author>
<name>Denys Vlasenko</name>
<email>dvlasenk@redhat.com</email>
</author>
<published>2018-02-12T19:00:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9b2c45d479d0fb8647c9e83359df69162b5fbe5f'/>
<id>urn:sha1:9b2c45d479d0fb8647c9e83359df69162b5fbe5f</id>
<content type='text'>
Changes since v1:
Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always &gt;= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err &lt; 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &amp;buflen and subsequently
not used in any way.

Userspace API is not changed.

    text    data     bss      dec     hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko &lt;dvlasenk@redhat.com&gt;
CC: David S. Miller &lt;davem@davemloft.net&gt;
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-bluetooth@vger.kernel.org
CC: linux-decnet-user@lists.sourceforge.net
CC: linux-wireless@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: linux-sctp@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
