summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-06-15net/packet: introduce packet_rcv_try_clear_pressure() helperEric Dumazet1-4/+9
There are two places where we want to clear the pressure if possible, add a helper to make it more obvious. Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Willem de Bruijn <willemb@google.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net/packet: remove locking from packet_rcv_has_room()Eric Dumazet1-11/+9
__packet_rcv_has_room() can now be run without lock being held. po->pressure is only a non persistent hint, we can mark all read/write accesses with READ_ONCE()/WRITE_ONCE() to document the fact that the field could be written without any synchronization. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net/packet: implement shortcut in tpacket_rcv()Eric Dumazet1-0/+6
tpacket_rcv() can be hit under DDOS quite hard, since it will always grab a socket spinlock, to eventually find there is no room for an additional packet. Using tcpdump [1] on a busy host can lead to catastrophic consequences, because of all cpus spinning on a contended spinlock. This replicates a similar strategy used in packet_rcv() [1] Also some applications mistakenly use af_packet socket bound to ETH_P_ALL only to send packets. Receive queue is never drained and immediately full. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net/packet: make tp_drops atomicEric Dumazet2-9/+12
Under DDOS, we want to be able to increment tp_drops without touching the spinlock. This will help readers to drain the receive queue slightly faster :/ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net/packet: constify __packet_rcv_has_room()Eric Dumazet1-5/+8
Goal is use the helper without lock being held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net/packet: constify prb_lookup_block() and __tpacket_v3_has_room()Eric Dumazet1-7/+7
Goal is to be able to use __tpacket_v3_has_room() without holding a lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net/packet: constify packet_lookup_frame() and __tpacket_has_room()Eric Dumazet1-7/+7
Goal is to be able to use __tpacket_has_room() without holding a lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net/packet: constify __packet_get_status() argumentEric Dumazet1-1/+1
struct packet_sock is only read. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net: phy: Add more 1000BaseX support detectionRobert Hancock1-0/+2
Commit "net: phy: Add detection of 1000BaseX link mode support" added support for not filtering out 1000BaseX mode from the PHY's supported modes in genphy_config_init, but we have to make a similar change in genphy_read_abilities in order to actually detect it as a supported mode in the first place. Add this in. Signed-off-by: Robert Hancock <hancock@sedsystems.ca> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net: ethernet: ti: cpsw_ethtool: simplify slave loopsIvan Khoronzhuk1-19/+21
Only for consistency reasons, do it like in main cpsw.c module and use ndev reference but not by means of slave. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net: ethernet: ti: cpsw: use cpsw as drv dataIvan Khoronzhuk1-9/+7
No need to set ndev for drvdata when mainly cpsw reference is needed, so correct this legacy decision. Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15Merge tag 'for-linus-20190614' of git://git.kernel.dk/linux-blockLinus Torvalds13-259/+114
Pull block fixes from Jens Axboe: - Remove references to old schedulers for the scheduler switching and blkio controller documentation (Andreas) - Kill duplicate check for report zone for null_blk (Chaitanya) - Two bcache fixes (Coly) - Ensure that mq-deadline is selected if zoned block device is enabled, as we need that to support them (Damien) - Fix io_uring memory leak (Eric) - ps3vram fallout from LBDAF removal (Geert) - Redundant blk-mq debugfs debugfs_create return check cleanup (Greg) - Extend NOPLM quirk for ST1000LM024 drives (Hans) - Remove error path warning that can now trigger after the queue removal/addition fixes (Ming) * tag 'for-linus-20190614' of git://git.kernel.dk/linux-block: block/ps3vram: Use %llu to format sector_t after LBDAF removal libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached bcache: fix stack corruption by PRECEDING_KEY() blk-mq: remove WARN_ON(!q->elevator) from blk_mq_sched_free_requests blkio-controller.txt: Remove references to CFQ block/switching-sched.txt: Update to blk-mq schedulers null_blk: remove duplicate check for report zone blk-mq: no need to check return value of debugfs_create functions io_uring: fix memory leak of UNIX domain socket inode block: force select mq-deadline for zoned block devices
2019-06-15Merge branch 'i2c/for-current' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "I2C has two simple but wanted driver fixes for you" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: pca-platform: Fix GPIO lookup code i2c: acorn: fix i2c warning
2019-06-15bpf, x64: fix stack layout of JITed bpf codeAlexei Starovoitov1-53/+21
Since commit 177366bf7ceb the %rbp stopped pointing to %rbp of the previous stack frame. That broke frame pointer based stack unwinding. This commit is a partial revert of it. Note that the location of tail_call_cnt is fixed, since the verifier enforces MAX_BPF_STACK stack size for programs with tail calls. Fixes: 177366bf7ceb ("bpf: change x86 JITed program stack layout") Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-06-15Smack: Restore the smackfsdef mount option and add missing prefixesCasey Schaufler1-5/+7
The 5.1 mount system rework changed the smackfsdef mount option to smackfsdefault. This fixes the regression by making smackfsdef treated the same way as smackfsdefault. Also fix the smack_param_specs[] to have "smack" prefixes on all the names. This isn't visible to a user unless they either: (a) Try to mount a filesystem that's converted to the internal mount API and that implements the ->parse_monolithic() context operation - and only then if they call security_fs_context_parse_param() rather than security_sb_eat_lsm_opts(). There are no examples of this upstream yet, but nfs will probably want to do this for nfs2 or nfs3. (b) Use fsconfig() to configure the filesystem - in which case security_fs_context_parse_param() will be called. This issue is that smack_sb_eat_lsm_opts() checks for the "smack" prefix on the options, but smack_fs_context_parse_param() does not. Fixes: c3300aaf95fb ("smack: get rid of match_token()") Fixes: 2febd254adc4 ("smack: Implement filesystem context security hooks") Cc: stable@vger.kernel.org Reported-by: Jose Bollo <jose.bollo@iot.bzh> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-15bpf: Fix build error without CONFIG_INETYueHaibing1-7/+24
If CONFIG_INET is not set, building fails: kernel/bpf/verifier.o: In function `check_mem_access': verifier.c: undefined reference to `bpf_xdp_sock_is_valid_access' kernel/bpf/verifier.o: In function `convert_ctx_accesses': verifier.c: undefined reference to `bpf_xdp_sock_convert_ctx_access' Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: fada7fdc83c0 ("bpf: Allow bpf_map_lookup_elem() on an xskmap") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15selftests/bpf: convert socket_cookie test to sk storageStanislav Fomichev2-32/+38
This lets us test that both BPF_PROG_TYPE_CGROUP_SOCK_ADDR and BPF_PROG_TYPE_SOCK_OPS can access underlying bpf_sock. Cc: Martin Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf/tools: sync bpf.hStanislav Fomichev1-0/+2
Add sk to struct bpf_sock_addr and struct bpf_sock_ops. Cc: Martin Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog typeStanislav Fomichev2-0/+27
And let it use bpf_sk_storage_{get,delete} helpers to access socket storage. Kernel context (struct bpf_sock_ops_kern) already has sk member, so I just expose it to the BPF hooks. I use PTR_TO_SOCKET_OR_NULL and return NULL in !is_fullsock case. I also export bpf_tcp_sock to make it possible to access tcp socket stats. Cc: Martin Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf: export bpf_sock for BPF_PROG_TYPE_CGROUP_SOCK_ADDR prog typeStanislav Fomichev2-0/+17
And let it use bpf_sk_storage_{get,delete} helpers to access socket storage. Kernel context (struct bpf_sock_addr_kern) already has sk member, so I just expose it to the BPF hooks. Using PTR_TO_SOCKET instead of PTR_TO_SOCK_COMMON should be safe because the hook is called on bind/connect. Cc: Martin Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf: Add test for SO_REUSEPORT_DETACH_BPFMartin KaFai Lau1-0/+54
This patch adds a test for the new sockopt SO_REUSEPORT_DETACH_BPF. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf: Sync asm-generic/socket.h to tools/Martin KaFai Lau1-0/+147
SO_DETACH_REUSEPORT_BPF is needed for the test in the next patch. It is defined in the socket.h. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf: net: Add SO_DETACH_REUSEPORT_BPFMartin KaFai Lau8-0/+40
There is SO_ATTACH_REUSEPORT_[CE]BPF but there is no DETACH. This patch adds SO_DETACH_REUSEPORT_BPF sockopt. The same sockopt can be used to undo both SO_ATTACH_REUSEPORT_[CE]BPF. reseport_detach_prog() is added and it is mostly a mirror of the existing reuseport_attach_prog(). The differences are, it does not call reuseport_alloc() and returns -ENOENT when there is no old prog. Cc: Craig Gallek <kraig@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15libbpf: fix check for presence of associated BTF for map creationAndrii Nakryiko1-4/+5
Kernel internally checks that either key or value type ID is specified, before using btf_fd. Do the same in libbpf's map creation code for determining when to retry map creation w/o BTF. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: fba01a0689a9 ("libbpf: use negative fd to specify missing BTF") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15selftests/bpf: signedness bug in enable_all_controllers()Dan Carpenter1-1/+1
The "len" variable needs to be signed for the error handling to work properly. Fixes: 596092ef8bea ("selftests/bpf: enable all available cgroup v2 controllers") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15samples/bpf: fix include path in MakefilePrashant Bhole1-1/+1
Recent commit included libbpf.h in selftests/bpf/bpf_util.h. Since some samples use bpf_util.h and samples/bpf/Makefile doesn't have libbpf.h path included, build was failing. Let's add the path in samples/bpf/Makefile. Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf, devmap: Add missing RCU read lock on flushToshiaki Makita1-0/+4
.ndo_xdp_xmit() assumes it is called under RCU. For example virtio_net uses RCU to detect it has setup the resources for tx. The assumption accidentally broke when introducing bulk queue in devmap. Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf, devmap: Add missing bulk queue freeToshiaki Makita1-0/+1
dev_map_free() forgot to free bulk queue when freeing its entries. Fixes: 5d053f9da431 ("bpf: devmap prepare xdp frames for bulking") Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15bpf, devmap: Fix premature entry free on destroying mapToshiaki Makita1-2/+2
dev_map_free() waits for flush_needed bitmap to be empty in order to ensure all flush operations have completed before freeing its entries. However the corresponding clear_bit() was called before using the entries, so the entries could be used after free. All access to the entries needs to be done before clearing the bit. It seems commit a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush") accidentally changed the clear_bit() and memory access order. Note that the problem happens only in __dev_map_flush(), not in dev_map_flush_old(). dev_map_flush_old() is called only after nulling out the corresponding netdev_map entry, so dev_map_free() never frees the entry thus no such race happens there. Fixes: a5e2da6e9787 ("bpf: netdev is never null in __dev_map_flush") Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-15Merge branch 'net-mlx5-use-indirect-call-wrappers'David S. Miller1-6/+19
Paolo Abeni says: ==================== net/mlx5: use indirect call wrappers The mlx5_core driver uses several indirect calls in fast-path, some of them are invoked on each ingress packet, even for the XDP-only traffic. This series leverage the indirect call wrappers infrastructure the avoid the expansive RETPOLINE overhead for 2 indirect calls in fast-path. Each call is addressed on a different patch, plus we need to introduce a couple of additional helpers to cope with the higher number of possible direct-call alternatives. v2 -> v3: - do not add more INDIRECT_CALL_* macros - use only the direct calls always available regardless of the mlx5 build options in the last patch v1 -> v2: - update the direct call list and use a macro to define it, as per Saeed suggestion. An intermediated additional macro is needed to allow arg list expansion - patch 2/3 is unchanged, as the generated code looks better this way than with possible alternative (dropping BP hits) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net/mlx5e: use indirect calls wrapper for the rx packet handlerPaolo Abeni1-1/+2
We can avoid another indirect call per packet wrapping the rx handler call with the proper helper. To ensure that even the last listed direct call experience measurable gain, despite the additional conditionals we must traverse before reaching it, I tested reversing the order of the listed options, with performance differences below noise level. Together with the previous indirect call patch, this gives ~6% performance improvement in raw UDP tput. v2 -> v3: - use only the direct calls always available regardless of the mlx5 build options - drop the direct call list macro, to keep the code as simple as possible for future rework v1 -> v2: - update the direct call list and use a macro to define it, as per Saeed suggestion. An intermediated additional macro is needed to allow arg list expansion Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15net/mlx5e: use indirect calls wrapper for skb allocationPaolo Abeni1-5/+17
We can avoid an indirect call per packet wrapping the skb creation with the appropriate helper. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper()Wei Li1-2/+5
The mapper may be NULL when called from register_ftrace_function_probe() with probe->data == NULL. This issue can be reproduced as follow (it may be covered by compiler optimization sometime): / # cat /sys/kernel/debug/tracing/set_ftrace_filter #### all functions enabled #### / # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter [ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 206.952402] Mem abort info: [ 206.952819] ESR = 0x96000006 [ 206.955326] Exception class = DABT (current EL), IL = 32 bits [ 206.955844] SET = 0, FnV = 0 [ 206.956272] EA = 0, S1PTW = 0 [ 206.956652] Data abort info: [ 206.957320] ISV = 0, ISS = 0x00000006 [ 206.959271] CM = 0, WnR = 0 [ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000 [ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000 [ 206.964953] Internal error: Oops: 96000006 [#1] SMP [ 206.971122] Dumping ftrace buffer: [ 206.973677] (ftrace buffer empty) [ 206.975258] Modules linked in: [ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____)) [ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17 [ 206.978955] Hardware name: linux,dummy-virt (DT) [ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO) [ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118 [ 206.980874] lr : ftrace_count_free+0x68/0x80 [ 206.982539] sp : ffff0000182f3ab0 [ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700 [ 206.983632] x27: ffff000013054b40 x26: 0000000000000001 [ 206.984000] x25: ffff00001385f000 x24: 0000000000000000 [ 206.984394] x23: ffff000013453000 x22: ffff000013054000 [ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28 [ 206.986575] x19: ffff000013872c30 x18: 0000000000000000 [ 206.987111] x17: 0000000000000000 x16: 0000000000000000 [ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000 [ 206.987850] x13: 000000000017430e x12: 0000000000000580 [ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc [ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550 [ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000 [ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001 [ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28 [ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000 [ 206.992557] Call trace: [ 206.993101] free_ftrace_func_mapper+0x2c/0x118 [ 206.994827] ftrace_count_free+0x68/0x80 [ 206.995238] release_probe+0xfc/0x1d0 [ 206.995555] register_ftrace_function_probe+0x4a8/0x868 [ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180 [ 206.996330] ftrace_dump_callback+0x50/0x70 [ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8 [ 206.997157] ftrace_filter_write+0x44/0x60 [ 206.998971] __vfs_write+0x64/0xf0 [ 206.999285] vfs_write+0x14c/0x2f0 [ 206.999591] ksys_write+0xbc/0x1b0 [ 206.999888] __arm64_sys_write+0x3c/0x58 [ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0 [ 207.000607] el0_svc_handler+0x144/0x1c8 [ 207.000916] el0_svc+0x8/0xc [ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303) [ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]--- [ 207.010126] Kernel panic - not syncing: Fatal exception [ 207.011322] SMP: stopping secondary CPUs [ 207.013956] Dumping ftrace buffer: [ 207.014595] (ftrace buffer empty) [ 207.015632] Kernel Offset: disabled [ 207.017187] CPU features: 0x002,20006008 [ 207.017985] Memory Limit: none [ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]--- Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.com Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-15module: Fix livepatch/ftrace module text permissions raceJosh Poimboeuf2-1/+15
It's possible for livepatch and ftrace to be toggling a module's text permissions at the same time, resulting in the following panic: BUG: unable to handle page fault for address: ffffffffc005b1d9 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061 Oops: 0003 [#1] PREEMPT SMP PTI CPU: 1 PID: 453 Comm: insmod Tainted: G O K 5.2.0-rc1-a188339ca5 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 RIP: 0010:apply_relocate_add+0xbe/0x14c Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08 RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246 RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060 RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000 RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0 R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018 R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74 FS: 00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: klp_init_object_loaded+0x10f/0x219 ? preempt_latency_start+0x21/0x57 klp_enable_patch+0x662/0x809 ? virt_to_head_page+0x3a/0x3c ? kfree+0x8c/0x126 patch_init+0x2ed/0x1000 [livepatch_test02] ? 0xffffffffc0060000 do_one_initcall+0x9f/0x1c5 ? kmem_cache_alloc_trace+0xc4/0xd4 ? do_init_module+0x27/0x210 do_init_module+0x5f/0x210 load_module+0x1c41/0x2290 ? fsnotify_path+0x3b/0x42 ? strstarts+0x2b/0x2b ? kernel_read+0x58/0x65 __do_sys_finit_module+0x9f/0xc3 ? __do_sys_finit_module+0x9f/0xc3 __x64_sys_finit_module+0x1a/0x1c do_syscall_64+0x52/0x61 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The above panic occurs when loading two modules at the same time with ftrace enabled, where at least one of the modules is a livepatch module: CPU0 CPU1 klp_enable_patch() klp_init_object_loaded() module_disable_ro() ftrace_module_enable() ftrace_arch_code_modify_post_process() set_all_modules_text_ro() klp_write_object_relocations() apply_relocate_add() *patches read-only code* - BOOM A similar race exists when toggling ftrace while loading a livepatch module. Fix it by ensuring that the livepatch and ftrace code patching operations -- and their respective permissions changes -- are protected by the text_mutex. Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.com Reported-by: Johannes Erdfelt <johannes@erdfelt.com> Fixes: 444d13ff10fb ("modules: add ro_after_init support") Acked-by: Jessica Yu <jeyu@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-15tracing/uprobe: Fix obsolete comment on trace_uprobe_create()Eiichi Tsukata1-2/+0
Commit 0597c49c69d5 ("tracing/uprobes: Use dyn_event framework for uprobe events") cleaned up the usage of trace_uprobe_create(), and the function has been no longer used for removing uprobe/uretprobe. Link: http://lkml.kernel.org/r/20190614074026.8045-2-devel@etsukata.com Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14tracing/uprobe: Fix NULL pointer dereference in trace_uprobe_create()Eiichi Tsukata1-3/+10
Just like the case of commit 8b05a3a7503c ("tracing/kprobes: Fix NULL pointer dereference in trace_kprobe_create()"), writing an incorrectly formatted string to uprobe_events can trigger NULL pointer dereference. Reporeducer: # echo r > /sys/kernel/debug/tracing/uprobe_events dmesg: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000079d12067 P4D 8000000079d12067 PUD 7b7ab067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 1903 Comm: bash Not tainted 5.2.0-rc3+ #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 RIP: 0010:strchr+0x0/0x30 Code: c0 eb 0d 84 c9 74 18 48 83 c0 01 48 39 d0 74 0f 0f b6 0c 07 3a 0c 06 74 ea 19 c0 83 c8 01 c3 31 c0 c3 0f 1f 84 00 00 00 00 00 <0f> b6 07 89 f2 40 38 f0 75 0e eb 13 0f b6 47 01 48 83 c RSP: 0018:ffffb55fc0403d10 EFLAGS: 00010293 RAX: ffff993ffb793400 RBX: 0000000000000000 RCX: ffffffffa4852625 RDX: 0000000000000000 RSI: 000000000000002f RDI: 0000000000000000 RBP: ffffb55fc0403dd0 R08: ffff993ffb793400 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff993ff9cc1668 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f30c5147700(0000) GS:ffff993ffda00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007b628000 CR4: 00000000000006f0 Call Trace: trace_uprobe_create+0xe6/0xb10 ? __kmalloc_track_caller+0xe6/0x1c0 ? __kmalloc+0xf0/0x1d0 ? trace_uprobe_create+0xb10/0xb10 create_or_delete_trace_uprobe+0x35/0x90 ? trace_uprobe_create+0xb10/0xb10 trace_run_command+0x9c/0xb0 trace_parse_run_command+0xf9/0x1eb ? probes_open+0x80/0x80 __vfs_write+0x43/0x90 vfs_write+0x14a/0x2a0 ksys_write+0xa2/0x170 do_syscall_64+0x7f/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe Link: http://lkml.kernel.org/r/20190614074026.8045-1-devel@etsukata.com Cc: stable@vger.kernel.org Fixes: 0597c49c69d5 ("tracing/uprobes: Use dyn_event framework for uprobe events") Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14tracing: Make two symbols staticYueHaibing1-2/+2
Fix sparse warnings: kernel/trace/trace.c:6927:24: warning: symbol 'get_tracing_log_err' was not declared. Should it be static? kernel/trace/trace.c:8196:15: warning: symbol 'trace_instance_dir' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20190614153210.24424-1-yuehaibing@huawei.com Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14tracing: avoid build warning with HAVE_NOP_MCOUNTVasily Gorbik1-3/+2
Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it) and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for testing CC_USING_* defines) to avoid conditional compilation and fix the following gcc 9 warning on s390: kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined but not used [-Wunused-function] Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours Fixes: 2f4df0017baed ("tracing: Add -mcount-nop option support") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14tracing: Fix out-of-range read in trace_stack_print()Eiichi Tsukata1-1/+1
Puts range check before dereferencing the pointer. Reproducer: # echo stacktrace > trace_options # echo 1 > events/enable # cat trace > /dev/null KASAN report: ================================================================== BUG: KASAN: use-after-free in trace_stack_print+0x26b/0x2c0 Read of size 8 at addr ffff888069d20000 by task cat/1953 CPU: 0 PID: 1953 Comm: cat Not tainted 5.2.0-rc3+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 Call Trace: dump_stack+0x8a/0xce print_address_description+0x60/0x224 ? trace_stack_print+0x26b/0x2c0 ? trace_stack_print+0x26b/0x2c0 __kasan_report.cold+0x1a/0x3e ? trace_stack_print+0x26b/0x2c0 kasan_report+0xe/0x20 trace_stack_print+0x26b/0x2c0 print_trace_line+0x6ea/0x14d0 ? tracing_buffers_read+0x700/0x700 ? trace_find_next_entry_inc+0x158/0x1d0 s_show+0xea/0x310 seq_read+0xaa7/0x10e0 ? seq_escape+0x230/0x230 __vfs_read+0x7c/0x100 vfs_read+0x16c/0x3a0 ksys_read+0x121/0x240 ? kernel_write+0x110/0x110 ? perf_trace_sys_enter+0x8a0/0x8a0 ? syscall_slow_exit_work+0xa9/0x410 do_syscall_64+0xb7/0x390 ? prepare_exit_to_usermode+0x165/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f867681f910 Code: b6 fe ff ff 48 8d 3d 0f be 08 00 48 83 ec 08 e8 06 db 01 00 66 0f 1f 44 00 00 83 3d f9 2d 2c 00 00 75 10 b8 00 00 00 00 04 RSP: 002b:00007ffdabf23488 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f867681f910 RDX: 0000000000020000 RSI: 00007f8676cde000 RDI: 0000000000000003 RBP: 00007f8676cde000 R08: ffffffffffffffff R09: 0000000000000000 R10: 0000000000000871 R11: 0000000000000246 R12: 00007f8676cde000 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000ec0 Allocated by task 1214: save_stack+0x1b/0x80 __kasan_kmalloc.constprop.0+0xc2/0xd0 kmem_cache_alloc+0xaf/0x1a0 getname_flags+0xd2/0x5b0 do_sys_open+0x277/0x5a0 do_syscall_64+0xb7/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 1214: save_stack+0x1b/0x80 __kasan_slab_free+0x12c/0x170 kmem_cache_free+0x8a/0x1c0 putname+0xe1/0x120 do_sys_open+0x2c5/0x5a0 do_syscall_64+0xb7/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff888069d20000 which belongs to the cache names_cache of size 4096 The buggy address is located 0 bytes inside of 4096-byte region [ffff888069d20000, ffff888069d21000) The buggy address belongs to the page: page:ffffea0001a74800 refcount:1 mapcount:0 mapping:ffff88806ccd1380 index:0x0 compound_mapcount: 0 flags: 0x100000000010200(slab|head) raw: 0100000000010200 dead000000000100 dead000000000200 ffff88806ccd1380 raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888069d1ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888069d1ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff888069d20000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888069d20080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888069d20100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Link: http://lkml.kernel.org/r/20190610040016.5598-1-devel@etsukata.com Fixes: 4285f2fcef80 ("tracing: Remove the ULONG_MAX stack trace hackery") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-06-14i40e: mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/net/ethernet/intel/i40e/i40e_xsk.c: In function ‘i40e_run_xdp_zc’: drivers/net/ethernet/intel/i40e/i40e_xsk.c:217:3: warning: this statement may fall through [-Wimplicit-fallthrough=] bpf_warn_invalid_xdp_action(act); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/intel/i40e/i40e_xsk.c:218:2: note: here case XDP_ABORTED: ^~~~ Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: Missing response checks in driver when starting/stopping FW LLDPAleksandr Loktionov1-21/+24
Driver updated pf->flags before calling i40e_aq_start_lldp(). This patch moved down updating pf->flags down so flags will be updated only in case of successful i40e_aq_start_lldp() call. Also was introduced is_reset_needed local flag to avoid unnecessary h/w reset in case 40e_aq_start_lldp() didn't change lldp state. Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: remove duplicate stat calculation for tx_errorsJacob Keller1-3/+0
The tx_errors statistic was being calculated twice in i40e_update_eth_stats. This appears to be as of commit 201db2898f2c ("i40e: add missing VSI statistics", 2014-03-25). Remove the extra i40e_stat_update32 call for GLV_TEPC. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: Check if the BAR size is large enough before writing to registersAdam Ludkiewicz1-1/+11
This patch fixes the problem with a kernel panic occurring when trying to bind the i40e driver to a non-i40e port. The problem is fixed by checking if the BAR size in the device is large enough by reading the highest register. Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: Missing response checks in driver when starting/stopping FW LLDPPiotr Marczak1-2/+25
Driver did not check response on LLDP flag change and always returned SUCCESS. This patch now checks for an error and returns an error code and has additional information in the log. Signed-off-by: Piotr Marczak <piotr.marczak@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: add input validation for virtchnl handlersSergey Nemov1-43/+31
Change some data to unsigned int instead of integer when we compare. Check LUT values in VIRTCHNL_OP_CONFIG_RSS_LUT handler. Also enhance error/warning messages to print the real values of I40E_MAX_VF_QUEUES, I40E_MAX_VF_VSI and I40E_DEFAULT_QUEUES_PER_VF instead of plain text. Refactor code to comply with 'check first then assign' policy. Remove duplicate checks for VIRTCHNL_OP_CONFIG_RSS_KEY and VIRTCHNL_OP_CONFIG_RSS_LUT opcodes in i40e_vc_process_vf_msg(). We have the very same checks inside the handlers already. Signed-off-by: Sergey Nemov <sergey.nemov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: Improve AQ log granularityDoug Dziggel2-23/+25
This patch makes it possible to log only AQ descriptors, without the entire AQ message buffers being dumped too. It should greatly reduce kernel log size in cases where a full AQ dump is not needed. Selection is made by setting flags in hw->debug_mask. Additionally, some debug messages that preceded an AQ dump have been moved to I40E_DEBUG_AQ_COMMAND class, which seems more appropriate. Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: Add bounds check for ch[] arrayPiotr Kwapulinski1-1/+10
Add bounds check for ch[] array. Use ARRAY_SIZE() to ensure that idx is within the range. Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: Use signed variableMitch Williams1-1/+1
The counter variable in i40e_clean_tx_irq starts out negative and climbs to 0. So it should not be defined as a u16. This was working by accident due to the fact the u16 overflows and underflows predictably. Replace the u16 with int, which is signed and can handle the negativity. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: add constraints for accessing veb arrayPiotr Kwapulinski1-5/+7
Add veb array access boundary checks. Ensure veb array index is smaller than I40E_MAX_VEB. Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-14i40e: let untrusted VF to create up to 16 VLANsPiotr Kwapulinski1-1/+1
This patch lets untrusted VF to create up to 16 VLANs. It was implemented by increasing I40E_VC_MAX_VLAN_PER_VF up to 16. Without this patch untrusted VF could create only up to 8 VLANs. Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>