summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2019-05-27ipv4: remove redundant assignment to nColin Ian King1-1/+0
The pointer n is being assigned a value however this value is never read in the code block and the end of the code block continues to the next loop iteration. Clean up the code by removing the redundant assignment. Fixes: 1bff1a0c9bbda ("ipv4: Add function to send route updates") Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27inet: frags: rework rhashtable dismantleEric Dumazet1-12/+37
syszbot found an interesting use-after-free [1] happening while IPv4 fragment rhashtable was destroyed at netns dismantle. While no insertions can possibly happen at the time a dismantling netns is destroying this rhashtable, timers can still fire and attempt to remove elements from this rhashtable. This is forbidden, since rhashtable_free_and_destroy() has no synchronization against concurrent inserts and deletes. Add a new fqdir->dead flag so that timers do not attempt a rhashtable_remove_fast() operation. We also have to respect an RCU grace period before starting the rhashtable_free_and_destroy() from process context, thus we use rcu_work infrastructure. This is a refinement of a prior rough attempt to fix this bug : https://marc.info/?l=linux-netdev&m=153845936820900&w=2 Since the rhashtable cleanup is now deferred to a work queue, netns dismantles should be slightly faster. [1] BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:194 [inline] BUG: KASAN: use-after-free in rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212 Read of size 8 at addr ffff8880a6497b70 by task kworker/0:0/5 CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.2.0-rc1+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events rht_deferred_worker Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 __read_once_size include/linux/compiler.h:194 [inline] rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212 rht_deferred_worker+0x111/0x2030 lib/rhashtable.c:411 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Allocated by task 32687: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc_node mm/slab.c:3620 [inline] __kmalloc_node+0x4e/0x70 mm/slab.c:3627 kmalloc_node include/linux/slab.h:590 [inline] kvmalloc_node+0x68/0x100 mm/util.c:431 kvmalloc include/linux/mm.h:637 [inline] kvzalloc include/linux/mm.h:645 [inline] bucket_table_alloc+0x90/0x480 lib/rhashtable.c:178 rhashtable_init+0x3f4/0x7b0 lib/rhashtable.c:1057 inet_frags_init_net include/net/inet_frag.h:109 [inline] ipv4_frags_init_net+0x182/0x410 net/ipv4/ip_fragment.c:683 ops_init+0xb3/0x410 net/core/net_namespace.c:130 setup_net+0x2d3/0x740 net/core/net_namespace.c:316 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206 ksys_unshare+0x440/0x980 kernel/fork.c:2692 __do_sys_unshare kernel/fork.c:2760 [inline] __se_sys_unshare kernel/fork.c:2758 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 7: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 kvfree+0x61/0x70 mm/util.c:460 bucket_table_free+0x69/0x150 lib/rhashtable.c:108 rhashtable_free_and_destroy+0x165/0x8b0 lib/rhashtable.c:1155 inet_frags_exit_net+0x3d/0x50 net/ipv4/inet_fragment.c:152 ipv4_frags_exit_net+0x73/0x90 net/ipv4/ip_fragment.c:695 ops_exit_list.isra.0+0xaa/0x150 net/core/net_namespace.c:154 cleanup_net+0x3fb/0x960 net/core/net_namespace.c:553 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 The buggy address belongs to the object at ffff8880a6497b40 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 48 bytes inside of 1024-byte region [ffff8880a6497b40, ffff8880a6497f40) The buggy address belongs to the page: page:ffffea0002992580 refcount:1 mapcount:0 mapping:ffff8880aa400ac0 index:0xffff8880a64964c0 compound_mapcount: 0 flags: 0x1fffc0000010200(slab|head) raw: 01fffc0000010200 ffffea0002916e88 ffffea000218fe08 ffff8880aa400ac0 raw: ffff8880a64964c0 ffff8880a6496040 0000000100000005 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880a6497a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a6497a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff8880a6497b00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff8880a6497b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a6497c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27net: dynamically allocate fqdir structuresEric Dumazet7-56/+60
Following patch will add rcu grace period before fqdir rhashtable destruction, so we need to dynamically allocate fqdir structures to not force expensive synchronize_rcu() calls in netns dismantle path. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27net: add a net pointer to struct fqdirEric Dumazet4-23/+13
fqdir will soon be dynamically allocated. We need to reach the struct net pointer from fqdir, so add it, and replace the various container_of() constructs by direct access to the new field. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27net: rename inet_frags_init_net() to fdir_init()Eric Dumazet4-8/+4
And pass an extra parameter, since we will soon dynamically allocate fqdir structures. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27ieee820154: 6lowpan: no longer reference init_net in lowpan_frags_ns_ctl_tableEric Dumazet1-11/+6
(struct net *)->ieee802154_lowpan.fqdir will soon be a pointer, so make sure lowpan_frags_ns_ctl_table[] does not reference init_net. lowpan_frags_ns_sysctl_register() can perform the needed initialization for all netns. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27netfilter: ipv6: nf_defrag: no longer reference init_net in ↵Eric Dumazet1-12/+7
nf_ct_frag6_sysctl_table (struct net *)->nf_frag.fqdir will soon be a pointer, so make sure nf_ct_frag6_sysctl_table[] does not reference init_net. nf_ct_frag6_sysctl_register() can perform the needed initialization for all netns. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27ipv6: no longer reference init_net in ip6_frags_ns_ctl_table[]Eric Dumazet1-10/+5
(struct net *)->ipv6.fqdir will soon be a pointer, so make sure ip6_frags_ns_ctl_table[] does not reference init_net. ip6_frags_ns_ctl_register() can perform the needed initialization for all netns. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[]Eric Dumazet1-12/+6
(struct net *)->ipv4.fqdir will soon be a pointer, so make sure ip4_frags_ns_ctl_table[] does not reference init_net. ip4_frags_ns_ctl_register() can perform the needed initialization for all netns. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27net: rename struct fqdir fieldsEric Dumazet6-88/+88
Rename the @frags fields from structs netns_ipv4, netns_ipv6, netns_nf_frag and netns_ieee802154_lowpan to @fqdir Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27net: rename inet_frags_exit_net() to fqdir_exit()Eric Dumazet5-10/+10
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27inet: rename netns_frags to fqdirEric Dumazet5-42/+42
1) struct netns_frags is renamed to struct fqdir This structure is really holding many frag queues in a hash table. 2) (struct inet_frag_queue)->net field is renamed to fqdir since net is generally associated to a 'struct net' pointer in networking stack. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-25flow_offload: use struct_size() in kzalloc()Gustavo A. R. Silva1-2/+1
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Refactor ip6_route_del for cached routesDavid Ahern1-15/+21
Move the removal of cached routes to a helper, ip6_del_cached_rt, that can be invoked per nexthop. Rename the existig ip6_del_cached_rt to __ip6_del_cached_rt since it is called by ip6_del_cached_rt. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Make fib6_nh optional at the end of fib6_infoDavid Ahern4-81/+85
Move fib6_nh to the end of fib6_info and make it an array of size 0. Pass a flag to fib6_info_alloc indicating if the allocation needs to add space for a fib6_nh. The current code path always has a fib6_nh allocated with a fib6_info; with nexthop objects they will be separate. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Move exception bucket to fib6_nhDavid Ahern2-68/+123
Similar to the pcpu routes exceptions are really per nexthop, so move rt6i_exception_bucket from fib6_info to fib6_nh. To avoid additional increases to the size of fib6_nh for a 1-bit flag, use the lowest bit in the allocated memory pointer for the flushed flag. Add helpers for retrieving the bucket pointer to mask off the flag. The cleanup of the exception bucket is moved to fib6_nh_release. fib6_nh_flush_exceptions can now be called from 2 contexts: 1. deleting a fib entry 2. deleting a fib6_nh For 1., fib6_nh_flush_exceptions is called for a specific fib6_info that is getting deleted. All exceptions in the cache using the entry are deleted. For 2, the fib6_nh itself is getting destroyed so fib6_nh_flush_exceptions is called for a NULL fib6_info which means flush all entries. The pmtu.sh selftest exercises the affected code paths - from creating exceptions to cleaning them up on device delete. All tests pass without any rcu locking or memleak warnings. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Refactor exception functionsDavid Ahern1-48/+86
Before moving exception bucket from fib6_info to fib6_nh, refactor rt6_flush_exceptions, rt6_remove_exception_rt, rt6_mtu_change_route, and rt6_update_exception_stamp_rt. In all 3 cases, move the primary logic into a new helper that starts with fib6_nh_. The latter 3 functions still take a fib6_info; this will be changed to fib6_nh in the next patch. In the case of rt6_mtu_change_route, move the fib6_metric_locked out as a standalone check - no need to call the new function if the fib entry has the mtu locked. Also, add fib6_info to rt6_mtu_change_arg as a way of passing the fib entry to the new helper. No functional change intended. The goal here is to make the next patch easier to review by moving existing lookup logic for each to new helpers. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Refactor fib6_drop_pcpu_fromDavid Ahern1-10/+25
Move the existing pcpu walk in fib6_drop_pcpu_from to a new helper, __fib6_drop_pcpu_from, that can be invoked per fib6_nh with a reference to the from entries that need to be evicted. If the passed in 'from' is non-NULL then only entries associated with that fib6_info are removed (e.g., case where fib entry is deleted); if the 'from' is NULL are entries are flushed (e.g., fib6_nh is deleted). For fib6_info entries with builtin fib6_nh (ie., current code) there is no change in behavior. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24ipv6: Move pcpu cached routes to fib6_nhDavid Ahern3-33/+36
rt6_info are specific instances of a fib entry and are tied to a device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib entries have separate fib6_info for each nexthop in a multipath route, so the location of the pcpu cache in the fib6_info struct worked. However, with nexthop objects a fib6_info can point to a set of nexthops (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu cache needs to be moved to the fib6_nh struct so the cached entries are local to the nexthop specification used to create the rt6_info. Initialization and free of the pcpu entries moved to fib6_nh_init and fib6_nh_release. Change in location only, from fib6_info down to fib6_nh; no other functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23devlink: add warning in case driver does not set port typeJiri Pirko1-0/+38
Prevent misbehavior of drivers who would not set port type for longer period of time. Drivers should always set port type. Do WARN if that happens. Note that it is perfectly fine to temporarily not have the type set, during initialization and port type change. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23hv_sock: perf: loop in send() to maximize bandwidthSunil Muthuswamy1-14/+31
Currently, the hv_sock send() iterates once over the buffer, puts data into the VMBUS channel and returns. It doesn't maximize on the case when there is a simultaneous reader draining data from the channel. In such a case, the send() can maximize the bandwidth (and consequently minimize the cpu cycles) by iterating until the channel is found to be full. Perf data: Total Data Transfer: 10GB/iteration Single threaded reader/writer, Linux hvsocket writer with Windows hvsocket reader Packet size: 64KB CPU sys time was captured using the 'time' command for the writer to send 10GB of data. 'Send Buffer Loop' is with the patch applied. The values below are over 10 iterations. |--------------------------------------------------------| | | Current | Send Buffer Loop | |--------------------------------------------------------| | | Throughput | CPU sys | Throughput | CPU sys | | | (MB/s) | time (s) | (MB/s) | time (s) | |--------------------------------------------------------| | Min | 407 | 7.048 | 401 | 5.958 | |--------------------------------------------------------| | Max | 455 | 7.563 | 542 | 6.993 | |--------------------------------------------------------| | Avg | 440 | 7.411 | 451 | 6.639 | |--------------------------------------------------------| | Median | 446 | 7.417 | 447 | 6.761 | |--------------------------------------------------------| Observation: 1. The avg throughput doesn't really change much with this change for this scenario. This is most probably because the bottleneck on throughput is somewhere else. 2. The average system (or kernel) cpu time goes down by 10%+ with this change, for the same amount of data transfer. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23hv_sock: perf: Allow the socket buffer size options to influence the actual ↵Sunil Muthuswamy1-10/+40
socket buffers Currently, the hv_sock buffer size is static and can't scale to the bandwidth requirements of the application. This change allows the applications to influence the socket buffer sizes using the SO_SNDBUF and the SO_RCVBUF socket options. Few interesting points to note: 1. Since the VMBUS does not allow a resize operation of the ring size, the socket buffer size option should be set prior to establishing the connection for it to take effect. 2. Setting the socket option comes with the cost of that much memory being reserved/allocated by the kernel, for the lifetime of the connection. Perf data: Total Data Transfer: 1GB Single threaded reader/writer Results below are summarized over 10 iterations. Linux hvsocket writer + Windows hvsocket reader: |---------------------------------------------------------------------------------------------| |Packet size -> | 128B | 1KB | 4KB | 64KB | |---------------------------------------------------------------------------------------------| |SO_SNDBUF size | | Throughput in MB/s (min/max/avg/median): | | v | | |---------------------------------------------------------------------------------------------| | Default | 109/118/114/116 | 636/774/701/700 | 435/507/480/476 | 410/491/462/470 | | 16KB | 110/116/112/111 | 575/705/662/671 | 749/900/854/869 | 592/824/692/676 | | 32KB | 108/120/115/115 | 703/823/767/772 | 718/878/850/866 | 1593/2124/2000/2085 | | 64KB | 108/119/114/114 | 592/732/683/688 | 805/934/903/911 | 1784/1943/1862/1843 | |---------------------------------------------------------------------------------------------| Windows hvsocket writer + Linux hvsocket reader: |---------------------------------------------------------------------------------------------| |Packet size -> | 128B | 1KB | 4KB | 64KB | |---------------------------------------------------------------------------------------------| |SO_RCVBUF size | | Throughput in MB/s (min/max/avg/median): | | v | | |---------------------------------------------------------------------------------------------| | Default | 69/82/75/73 | 313/343/333/336 | 418/477/446/445 | 659/701/676/678 | | 16KB | 69/83/76/77 | 350/401/375/382 | 506/548/517/516 | 602/624/615/615 | | 32KB | 62/83/73/73 | 471/529/496/494 | 830/1046/935/939 | 944/1180/1070/1100 | | 64KB | 64/70/68/69 | 467/533/501/497 | 1260/1590/1430/1431 | 1605/1819/1670/1660 | |---------------------------------------------------------------------------------------------| Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23neighbor: Add tracepoint to __neigh_createDavid Ahern1-0/+2
Add tracepoint to __neigh_create to enable debugging of new entries. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23net: Set strict_start_type for routes and rulesDavid Ahern2-0/+2
New userspace on an older kernel can send unknown and unsupported attributes resulting in an incompelete config which is almost always wrong for routing (few exceptions are passthrough settings like the protocol that installed the route). Set strict_start_type in the policies for IPv4 and IPv6 routes and rules to detect new, unsupported attributes and fail the route add. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23ipv4: Rename and export nh_update_mtuDavid Ahern1-2/+2
Rename nh_update_mtu to fib_nhc_update_mtu and export for use by the nexthop code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23ipv4: export fib_info_update_nh_saddrDavid Ahern1-6/+5
Add scope as input argument versus relying on fib_info reference in fib_nh, and export fib_info_update_nh_saddr. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23ipv4: export fib_flushDavid Ahern1-1/+1
As nexthops are deleted, fib entries referencing it are marked dead. Export fib_flush so those entries can be removed in a timely manner. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23ipv4: export fib_check_nhDavid Ahern1-6/+6
Change fib_check_nh to take net, table and scope as input arguments over struct fib_config and export for use by nexthop code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23ipv4: Add function to send route updatesDavid Ahern1-0/+72
Add fib_info_notify_update to walk the fib and send RTM_NEWROUTE notifications with NLM_F_REPLACE set for entries linked to a fib_info that have nh_updated flag set. This helper will be used by the nexthop code to notify userspace of routes that are impacted when a nexthop config is updated via replace. The new function and its helper are similar to how fib_flush and fib_table_flush work for address delete and link down events. This notification is needed for legacy apps that do not understand the new nexthop object. Apps that are nexthop aware can use the RTA_NH_ID attribute in the route notification to just ignore it. In the future this should be wrapped in a sysctl to allow OS'es that are fully updated to avoid the notificaton storm. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23ipv6: export function to send route updatesDavid Ahern3-4/+37
Add fib6_rt_update to send RTM_NEWROUTE with NLM_F_REPLACE set. This helper will be used by the nexthop code to notify userspace of routes that are impacted when a nexthop config is updated via replace. This notification is needed for legacy apps that do not understand the new nexthop object. Apps that are nexthop aware can use the RTA_NH_ID attribute in the route notification to just ignore it. In the future this should be wrapped in a sysctl to allow OS'es that are fully updated to avoid the notificaton storm. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23ipv6: Add hook to bump sernum for a route to stubsDavid Ahern2-0/+9
Add hook to ipv6 stub to bump the sernum up to the root node for a route. This is needed by the nexthop code when a nexthop config changes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-23ipv6: Add delete route hook to stubsDavid Ahern2-0/+7
Add ip6_del_rt to the IPv6 stub. The hook is needed by the nexthop code to remove entries linked to a nexthop that is getting deleted. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics.Felipe Gasper1-0/+12
This adds the ability for Netlink to report a socket's UID along with the other UNIX diagnostic information that is already available. This will allow diagnostic tools greater insight into which users control which socket. To test this, do the following as a non-root user: unshare -U -r bash nc -l -U user.socket.$$ & .. and verify from within that same session that Netlink UNIX socket diagnostics report the socket's UID as 0. Also verify that Netlink UNIX socket diagnostics report the socket's UID as the user's UID from an unprivileged process in a different session. Verify the same from a root process. Signed-off-by: Felipe Gasper <felipe@felipegasper.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds6-12/+32
Pull networking fixes from David Miller: 1) Clear up some recent tipc regressions because of registration ordering. Fix from Junwei Hu. 2) tipc's TLV_SET() can read past the end of the supplied buffer during the copy. From Chris Packham. 3) ptp example program doesn't match the kernel, from Richard Cochran. 4) Outgoing message type fix in qrtr, from Bjorn Andersson. 5) Flow control regression in stmmac, from Tan Tee Min. 6) Fix inband autonegotiation in phylink, from Russell King. 7) Fix sk_bound_dev_if handling in rawv6_bind(), from Mike Manning. 8) Fix usbnet crash after disconnect, from Kloetzke Jan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits) usbnet: fix kernel crash after disconnect selftests: fib_rule_tests: use pre-defined DEV_ADDR net-next: net: Fix typos in ip-sysctl.txt ipv6: Consider sk_bound_dev_if when binding a raw socket to an address net: phylink: ensure inband AN works correctly usbnet: ipheth: fix racing condition net: stmmac: dma channel control register need to be init first net: stmmac: fix ethtool flow control not able to get/set net: qrtr: Fix message type of outgoing packets networking: : fix typos in code comments ptp: Fix example program to match kernel. fddi: fix typos in code comments selftests: fib_rule_tests: enable forwarding before ipv4 from/iif test selftests: fib_rule_tests: fix local IPv4 address typo tipc: Avoid copying bytes beyond the supplied data 2/2] net: xilinx_emaclite: use readx_poll_timeout() in mdio wait function 1/2] net: axienet: use readx_poll_timeout() in mdio wait function vlan: Mark expected switch fall-through macvlan: Mark expected switch fall-through net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query ...
2019-05-21ipv6: Consider sk_bound_dev_if when binding a raw socket to an addressMike Manning1-0/+2
IPv6 does not consider if the socket is bound to a device when binding to an address. The result is that a socket can be bound to eth0 and then bound to the address of eth1. If the device is a VRF, the result is that a socket can only be bound to an address in the default VRF. Resolve by considering the device if sk_bound_dev_if is set. Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-21Merge tag 'spdx-5.2-rc2' of ↵Linus Torvalds300-634/+300
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull SPDX update from Greg KH: "Here is a series of patches that add SPDX tags to different kernel files, based on two different things: - SPDX entries are added to a bunch of files that we missed a year ago that do not have any license information at all. These were either missed because the tool saw the MODULE_LICENSE() tag, or some EXPORT_SYMBOL tags, and got confused and thought the file had a real license, or the files have been added since the last big sweep, or they were Makefile/Kconfig files, which we didn't touch last time. - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan tools can determine the license text in the file itself. Where this happens, the license text is removed, in order to cut down on the 700+ different ways we have in the kernel today, in a quest to get rid of all of these. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers. The reason for these "large" patches is if we were to continue to progress at the current rate of change in the kernel, adding license tags to individual files in different subsystems, we would be finished in about 10 years at the earliest. There will be more series of these types of patches coming over the next few weeks as the tools and reviewers crunch through the more "odd" variants of how to say "GPLv2" that developers have come up with over the years, combined with other fun oddities (GPL + a BSD disclaimer?) that are being unearthed, with the goal for the whole kernel to be cleaned up. These diffstats are not small, 3840 files are touched, over 10k lines removed in just 24 patches" * tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3 ...
2019-05-21treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24Thomas Gleixner4-36/+4
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 50 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154042.917228456@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13Thomas Gleixner41-567/+41
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details [based] [from] [clk] [highbank] [c] you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 355 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154041.837383322@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can distribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154041.622608495@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3Thomas Gleixner3-12/+3
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 or later as published by the free software foundation extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 9 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154040.848507137@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 1Thomas Gleixner1-14/+1
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin street fifth floor boston ma 02110 1301 usa this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option [no]_[pad]_[ctrl] any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 51 franklin street fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 176 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154040.652910950@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner112-0/+112
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner89-0/+89
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner49-0/+49
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21net: qrtr: Fix message type of outgoing packetsBjorn Andersson1-2/+2
QRTR packets has a message type in the header, which is repeated in the control header. For control packets we therefor copy the type from beginning of the outgoing payload and use that as message type. For non-control messages an endianness fix introduced in v5.2-rc1 caused the type to be 0, rather than QRTR_TYPE_DATA, causing all messages to be dropped by the receiver. Fix this by converting and using qrtr_type, which will remain QRTR_TYPE_DATA for non-control messages. Fixes: 8f5e24514cbd ("net: qrtr: use protocol endiannes variable") Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-20vlan: Mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: net/8021q/vlan_dev.c: In function ‘vlan_dev_ioctl’: net/8021q/vlan_dev.c:374:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (!net_eq(dev_net(dev), &init_net)) ^ net/8021q/vlan_dev.c:376:2: note: here case SIOCGMIIPHY: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-20tipc: fix modprobe tipc failed after switch order of device registrationJunwei Hu3-10/+27
Error message printed: modprobe: ERROR: could not insert 'tipc': Address family not supported by protocol. when modprobe tipc after the following patch: switch order of device registration, commit 7e27e8d6130c ("tipc: switch order of device registration to fix a crash") Because sock_create_kern(net, AF_TIPC, ...) called by tipc_topsrv_create_listener() in the initialization process of tipc_init_net(), so tipc_socket_init() must be execute before that. Meanwhile, tipc_net_id need to be initialized when sock_create() called, and tipc_socket_init() is no need to be called for each namespace. I add a variable tipc_topsrv_net_ops, and split the register_pernet_subsys() of tipc into two parts, and split tipc_socket_init() with initialization of pernet params. By the way, I fixed resources rollback error when tipc_bcast_init() failed in tipc_init_net(). Fixes: 7e27e8d6130c ("tipc: switch order of device registration to fix a crash") Signed-off-by: Junwei Hu <hujunwei4@huawei.com> Reported-by: Wang Wang <wangwang2@huawei.com> Reported-by: syzbot+1e8114b61079bfe9cbc5@syzkaller.appspotmail.com Reviewed-by: Kang Zhou <zhoukang7@huawei.com> Reviewed-by: Suanming Mou <mousuanming@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds27-129/+214
Pull networking fixes from David Miller:1) Use after free in __dev_map_entry_free(), from Eric Dumazet. 1) Use after free in __dev_map_entry_free(), from Eric Dumazet. 2) Fix TCP retransmission timestamps on passive Fast Open, from Yuchung Cheng. 3) Orphan NFC, we'll take the patches directly into my tree. From Johannes Berg. 4) We can't recycle cloned TCP skbs, from Eric Dumazet. 5) Some flow dissector bpf test fixes, from Stanislav Fomichev. 6) Fix RCU marking and warnings in rhashtable, from Herbert Xu. 7) Fix some potential fib6 leaks, from Eric Dumazet. 8) Fix a _decode_session4 uninitialized memory read bug fix that got lost in a merge. From Florian Westphal. 9) Fix ipv6 source address routing wrt. exception route entries, from Wei Wang. 10) The netdev_xmit_more() conversion was not done %100 properly in mlx5 driver, fix from Tariq Toukan. 11) Clean up botched merge on netfilter kselftest, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (74 commits) of_net: fix of_get_mac_address retval if compiled without CONFIG_OF net: fix kernel-doc warnings for socket.c net: Treat sock->sk_drops as an unsigned int when printing kselftests: netfilter: fix leftover net/net-next merge conflict mlxsw: core: Prevent reading unsupported slave address from SFP EEPROM mlxsw: core: Prevent QSFP module initialization for old hardware vsock/virtio: Initialize core virtio vsock before registering the driver net/mlx5e: Fix possible modify header actions memory leak net/mlx5e: Fix no rewrite fields with the same match net/mlx5e: Additional check for flow destination comparison net/mlx5e: Add missing ethtool driver info for representors net/mlx5e: Fix number of vports for ingress ACL configuration net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled net/mlx5e: Fix wrong xmit_more application net/mlx5: Fix peer pf disable hca command net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index net/mlx5: Add meaningful return codes to status_to_err function net/mlx5: Imply MLXFW in mlx5_core Revert "tipc: fix modprobe tipc failed after switch order of device registration" vsock/virtio: free packets during the socket release ...
2019-05-19net: fix kernel-doc warnings for socket.cRandy Dunlap1-17/+17
Fix kernel-doc warnings by moving the kernel-doc notation to be immediately above the functions that it describes. Fixes these warnings for sock_sendmsg() and sock_recvmsg(): ../net/socket.c:658: warning: Excess function parameter 'sock' description in 'INDIRECT_CALLABLE_DECLARE' ../net/socket.c:658: warning: Excess function parameter 'msg' description in 'INDIRECT_CALLABLE_DECLARE' ../net/socket.c:889: warning: Excess function parameter 'sock' description in 'INDIRECT_CALLABLE_DECLARE' ../net/socket.c:889: warning: Excess function parameter 'msg' description in 'INDIRECT_CALLABLE_DECLARE' ../net/socket.c:889: warning: Excess function parameter 'flags' description in 'INDIRECT_CALLABLE_DECLARE' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-19net: Treat sock->sk_drops as an unsigned int when printingPatrick Talbert6-6/+6
Currently, procfs socket stats format sk_drops as a signed int (%d). For large values this will cause a negative number to be printed. We know the drop count can never be a negative so change the format specifier to %u. Signed-off-by: Patrick Talbert <ptalbert@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>