summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-04-16sch_red: fix off-by-one checks in red_check_params()Eric Dumazet1-2/+2
[ Upstream commit 3a87571f0ffc51ba3bf3ecdb6032861d0154b164 ] This fixes following syzbot report: UBSAN: shift-out-of-bounds in ./include/net/red.h:237:23 shift exponent 32 is too large for 32-bit type 'unsigned int' CPU: 1 PID: 8418 Comm: syz-executor170 Not tainted 5.12.0-rc4-next-20210324-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 red_set_parms include/net/red.h:237 [inline] choke_change.cold+0x3c/0xc8 net/sched/sch_choke.c:414 qdisc_create+0x475/0x12f0 net/sched/sch_api.c:1247 tc_modify_qdisc+0x4c8/0x1a50 net/sched/sch_api.c:1663 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x43f039 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffdfa725168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000400488 RCX: 000000000043f039 RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000004 RBP: 0000000000403020 R08: 0000000000400488 R09: 0000000000400488 R10: 0000000000400488 R11: 0000000000000246 R12: 00000000004030b0 R13: 0000000000000000 R14: 00000000004ac018 R15: 0000000000400488 Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30net: sched: validate stab valuesEric Dumazet1-1/+9
commit e323d865b36134e8c5c82c834df89109a5c60dab upstream. iproute2 package is well behaved, but malicious user space can provide illegal shift values and trigger UBSAN reports. Add stab parameter to red_check_params() to validate user input. syzbot reported: UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18 shift exponent 111 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 14662 Comm: syz-executor.3 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 red_calc_qavg_from_idle_time include/net/red.h:312 [inline] red_calc_qavg include/net/red.h:353 [inline] choke_enqueue.cold+0x18/0x3dd net/sched/sch_choke.c:221 __dev_xmit_skb net/core/dev.c:3837 [inline] __dev_queue_xmit+0x1943/0x2e00 net/core/dev.c:4150 neigh_hh_output include/net/neighbour.h:499 [inline] neigh_output include/net/neighbour.h:508 [inline] ip6_finish_output2+0x911/0x1700 net/ipv6/ip6_output.c:117 __ip6_finish_output net/ipv6/ip6_output.c:182 [inline] __ip6_finish_output+0x4c1/0xe10 net/ipv6/ip6_output.c:161 ip6_finish_output+0x35/0x200 net/ipv6/ip6_output.c:192 NF_HOOK_COND include/linux/netfilter.h:290 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:215 dst_output include/net/dst.h:448 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip6_xmit+0x127e/0x1eb0 net/ipv6/ip6_output.c:320 inet6_csk_xmit+0x358/0x630 net/ipv6/inet6_connection_sock.c:135 dccp_transmit_skb+0x973/0x12c0 net/dccp/output.c:138 dccp_send_reset+0x21b/0x2b0 net/dccp/output.c:535 dccp_finish_passive_close net/dccp/proto.c:123 [inline] dccp_finish_passive_close+0xed/0x140 net/dccp/proto.c:118 dccp_terminate_connection net/dccp/proto.c:958 [inline] dccp_close+0xb3c/0xe60 net/dccp/proto.c:1028 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478 __sock_release+0xcd/0x280 net/socket.c:599 sock_close+0x18/0x20 net/socket.c:1258 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30can: dev: Move device back to init netns on owning netns deleteMartin Willi1-0/+2
commit 3a5ca857079ea022e0b1b17fc154f7ad7dbc150f upstream. When a non-initial netns is destroyed, the usual policy is to delete all virtual network interfaces contained, but move physical interfaces back to the initial netns. This keeps the physical interface visible on the system. CAN devices are somewhat special, as they define rtnl_link_ops even if they are physical devices. If a CAN interface is moved into a non-initial netns, destroying that netns lets the interface vanish instead of moving it back to the initial netns. default_device_exit() skips CAN interfaces due to having rtnl_link_ops set. Reproducer: ip netns add foo ip link set can0 netns foo ip netns delete foo WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60 CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1 Workqueue: netns cleanup_net [<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14) [<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8) [<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114) [<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac) [<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60) [<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380) [<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438) [<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8) [<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c) [<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c) To properly restore physical CAN devices to the initial netns on owning netns exit, introduce a flag on rtnl_link_ops that can be set by drivers. For CAN devices setting this flag, default_device_exit() considers them non-virtual, applying the usual namespace move. The issue was introduced in the commit mentioned below, as at that time CAN devices did not have a dellink() operation. Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.") Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-30idr: add ida_is_emptyMatthew Wilcox1-0/+5
[ Upstream commit 99c494077e2d4282a17120a772eecc00ec3004cc ] Two of the USB Gadgets were poking around in the internals of struct ida in order to determine if it is empty. Add the appropriate abstraction. Link: http://lkml.kernel.org/r/1480369871-5271-63-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Felipe Balbi <balbi@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30ACPI: scan: Use unique number for instance_noAndy Shevchenko1-0/+1
[ Upstream commit eb50aaf960e3bedfef79063411ffd670da94b84b ] The decrementation of acpi_device_bus_id->instance_no in acpi_device_del() is incorrect, because it may cause a duplicate instance number to be allocated next time a device with the same acpi_device_bus_id is added. Replace above mentioned approach by using IDA framework. While at it, define the instance range to be [0, 4096). Fixes: e49bd2dd5a50 ("ACPI: use PNPID:instance_no as bus_id of ACPI device") Fixes: ca9dc8d42b30 ("ACPI / scan: Fix acpi_bus_id_list bookkeeping") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: 4.10+ <stable@vger.kernel.org> # 4.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30macvlan: macvlan_count_rx() needs to be aware of preemptionEric Dumazet1-1/+2
[ Upstream commit dd4fa1dae9f4847cc1fd78ca468ad69e16e5db3e ] macvlan_count_rx() can be called from process context, it is thus necessary to disable preemption before calling u64_stats_update_begin() syzbot was able to spot this on 32bit arch: WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert include/linux/seqlock.h:271 [inline] WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269 Modules linked in: Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 4632 Comm: kworker/1:3 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: ARM-Versatile Express Workqueue: events macvlan_process_broadcast Backtrace: [<82740468>] (dump_backtrace) from [<827406dc>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252) r7:00000080 r6:60000093 r5:00000000 r4:8422a3c4 [<827406c4>] (show_stack) from [<82751b58>] (__dump_stack lib/dump_stack.c:79 [inline]) [<827406c4>] (show_stack) from [<82751b58>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120) [<82751aa0>] (dump_stack) from [<82741270>] (panic+0x130/0x378 kernel/panic.c:231) r7:830209b4 r6:84069ea4 r5:00000000 r4:844350d0 [<82741140>] (panic) from [<80244924>] (__warn+0xb0/0x164 kernel/panic.c:605) r3:8404ec8c r2:00000000 r1:00000000 r0:830209b4 r7:0000010f [<80244874>] (__warn) from [<82741520>] (warn_slowpath_fmt+0x68/0xd4 kernel/panic.c:628) r7:81363f70 r6:0000010f r5:83018e50 r4:00000000 [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert include/linux/seqlock.h:271 [inline]) [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269) r8:5a109000 r7:0000000f r6:a568dac0 r5:89802300 r4:00000001 [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (u64_stats_update_begin include/linux/u64_stats_sync.h:128 [inline]) [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_count_rx include/linux/if_macvlan.h:47 [inline]) [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_broadcast+0x154/0x26c drivers/net/macvlan.c:291) r5:89802300 r4:8a927740 [<8136499c>] (macvlan_broadcast) from [<81365020>] (macvlan_process_broadcast+0x258/0x2d0 drivers/net/macvlan.c:317) r10:81364f78 r9:8a86d000 r8:8a9c7e7c r7:8413aa5c r6:00000000 r5:00000000 r4:89802840 [<81364dc8>] (macvlan_process_broadcast) from [<802696a4>] (process_one_work+0x2d4/0x998 kernel/workqueue.c:2275) r10:00000008 r9:8404ec98 r8:84367a02 r7:ddfe6400 r6:ddfe2d40 r5:898dac80 r4:8a86d43c [<802693d0>] (process_one_work) from [<80269dcc>] (worker_thread+0x64/0x54c kernel/workqueue.c:2421) r10:00000008 r9:8a9c6000 r8:84006d00 r7:ddfe2d78 r6:898dac94 r5:ddfe2d40 r4:898dac80 [<80269d68>] (worker_thread) from [<80271f40>] (kthread+0x184/0x1a4 kernel/kthread.c:292) r10:85247e64 r9:898dac80 r8:80269d68 r7:00000000 r6:8a9c6000 r5:89a2ee40 r4:8a97bd00 [<80271dbc>] (kthread) from [<80200114>] (ret_from_fork+0x14/0x20 arch/arm/kernel/entry-common.S:158) Exception stack(0x8a9c7fb0 to 0x8a9c7ff8) Fixes: 412ca1550cbe ("macvlan: Move broadcasts into a work queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30u64_stats,lockdep: Fix u64_stats_init() vs lockdepPeter Zijlstra1-3/+4
[ Upstream commit d5b0e0677bfd5efd17c5bbb00156931f0d41cb85 ] Jakub reported that: static struct net_device *rtl8139_init_board(struct pci_dev *pdev) { ... u64_stats_init(&tp->rx_stats.syncp); u64_stats_init(&tp->tx_stats.syncp); ... } results in lockdep getting confused between the RX and TX stats lock. This is because u64_stats_init() is an inline calling seqcount_init(), which is a macro using a static variable to generate a lockdep class. By wrapping that in an inline, we negate the effect of the macro and fold the static key variable, hence the confusion. Fix by also making u64_stats_init() a macro for the case where it matters, leaving the other case an inline for argument validation etc. Reported-by: Jakub Kicinski <kuba@kernel.org> Debugged-by: "Ahmed S. Darwish" <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: "Erhard F." <erhard_f@mailbox.org> Link: https://lkml.kernel.org/r/YEXicy6+9MksdLZh@hirez.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-24kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()Oleg Nesterov1-0/+13
commit 5abbe51a526253b9f003e9a0a195638dc882d660 upstream. Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT. Add a new helper which sets restart_block->fn and calls a dummy arch_set_restart_data() helper. Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210201174641.GA17871@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-24USB: replace hardcode maximum usb string length by definitionMacpaul Lin1-0/+3
commit 81c7462883b0cc0a4eeef0687f80ad5b5baee5f6 upstream. Replace hardcoded maximum USB string length (126 bytes) by definition "USB_MAX_STRING_LEN". Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/1592471618-29428-1-git-send-email-macpaul.lin@mediatek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17can: skb: can_skb_set_owner(): fix ref counting if socket was closed before ↵Oleksij Rempel1-2/+6
setting skb ownership commit e940e0895a82c6fbaa259f2615eb52b57ee91a7e upstream. There are two ref count variables controlling the free()ing of a socket: - struct sock::sk_refcnt - which is changed by sock_hold()/sock_put() - struct sock::sk_wmem_alloc - which accounts the memory allocated by the skbs in the send path. In case there are still TX skbs on the fly and the socket() is closed, the struct sock::sk_refcnt reaches 0. In the TX-path the CAN stack clones an "echo" skb, calls sock_hold() on the original socket and references it. This produces the following back trace: | WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134 | refcount_t: addition on 0; use-after-free. | Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E) | CPU: 0 PID: 280 Comm: test_can.sh Tainted: G E 5.11.0-04577-gf8ff6603c617 #203 | Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) | Backtrace: | [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220 | [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8) | [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90 | [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2 | [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540 | [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50) | [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c) | [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600 | [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00 | [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600 | [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400 | [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534) To fix this problem, only set skb ownership to sockets which have still a ref count > 0. Fixes: 0ae89beb283a ("can: add destructor for self generated skbs") Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Andre Naujoks <nautsch2@gmail.com> Link: https://lore.kernel.org/r/20210226092456.27126-1-o.rempel@pengutronix.de Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17uapi: nfnetlink_cthelper.h: fix userspace compilation errorDmitry V. Levin1-1/+1
commit c33cb0020ee6dd96cc9976d6085a7d8422f6dbed upstream. Apparently, <linux/netfilter/nfnetlink_cthelper.h> and <linux/netfilter/nfnetlink_acct.h> could not be included into the same compilation unit because of a cut-and-paste typo in the former header. Fixes: 12f7a505331e6 ("netfilter: add user-space connection tracking helper infrastructure") Cc: <stable@vger.kernel.org> # v3.6 Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-11misc: eeprom_93xx46: Add quirk to support Microchip 93LC46B eepromAswath Govindraju1-0/+2
[ Upstream commit f6f1f8e6e3eea25f539105d48166e91f0ab46dd1 ] A dummy zero bit is sent preceding the data during a read transfer by the Microchip 93LC46B eeprom (section 2.7 of[1]). This results in right shift of data during a read. In order to ignore this bit a quirk can be added to send an extra zero bit after the read address. Add a quirk to ignore the zero bit sent before data by adding a zero bit after the read address. [1] - https://www.mouser.com/datasheet/2/268/20001749K-277859.pdf Signed-off-by: Aswath Govindraju <a-govindraju@ti.com> Link: https://lore.kernel.org/r/20210105105817.17644-3-a-govindraju@ti.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-07zsmalloc: account the number of compacted pages correctlyRokudo Yan1-1/+1
commit 2395928158059b8f9858365fce7713ce7fef62e4 upstream. There exists multiple path may do zram compaction concurrently. 1. auto-compaction triggered during memory reclaim 2. userspace utils write zram<id>/compaction node So, multiple threads may call zs_shrinker_scan/zs_compact concurrently. But pages_compacted is a per zsmalloc pool variable and modification of the variable is not serialized(through under class->lock). There are two issues here: 1. the pages_compacted may not equal to total number of pages freed(due to concurrently add). 2. zs_shrinker_scan may not return the correct number of pages freed(issued by current shrinker). The fix is simple: 1. account the number of pages freed in zs_compact locally. 2. use actomic variable pages_compacted to accumulate total number. Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages") Signed-off-by: Rokudo Yan <wu-yan@tcl.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs outputJoe Perches1-0/+16
commit 2efc459d06f1630001e3984854848a5647086232 upstream. Output defects can exist in sysfs content using sprintf and snprintf. sprintf does not know the PAGE_SIZE maximum of the temporary buffer used for outputting sysfs content and it's possible to overrun the PAGE_SIZE buffer length. Add a generic sysfs_emit function that knows that the size of the temporary buffer and ensures that no overrun is done. Add a generic sysfs_emit_at function that can be used in multiple call situations that also ensures that no overrun is done. Validate the output buffer argument to be page aligned. Validate the offset len argument to be within the PAGE_SIZE buf. Signed-off-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/r/884235202216d464d61ee975f7465332c86f76b2.1600285923.git.joe@perches.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-03net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sendingJason A. Donenfeld3-8/+26
commit ee576c47db60432c37e54b1e2b43a8ca6d3a8dca upstream. The icmp{,v6}_send functions make all sorts of use of skb->cb, casting it with IPCB or IP6CB, assuming the skb to have come directly from the inet layer. But when the packet comes from the ndo layer, especially when forwarded, there's no telling what might be in skb->cb at that point. As a result, the icmp sending code risks reading bogus memory contents, which can result in nasty stack overflows such as this one reported by a user: panic+0x108/0x2ea __stack_chk_fail+0x14/0x20 __icmp_send+0x5bd/0x5c0 icmp_ndo_send+0x148/0x160 In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read from it. The optlen parameter there is of particular note, as it can induce writes beyond bounds. There are quite a few ways that can happen in __ip_options_echo. For example: // sptr/skb are attacker-controlled skb bytes sptr = skb_network_header(skb); // dptr/dopt points to stack memory allocated by __icmp_send dptr = dopt->__data; // sopt is the corrupt skb->cb in question if (sopt->rr) { optlen = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data // this now writes potentially attacker-controlled data, over // flowing the stack: memcpy(dptr, sptr+sopt->rr, optlen); } In the icmpv6_send case, the story is similar, but not as dire, as only IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is worse than the iif case, but it is passed to ipv6_find_tlv, which does a bit of bounds checking on the value. This is easy to simulate by doing a `memset(skb->cb, 0x41, sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by good fortune and the rarity of icmp sending from that context that we've avoided reports like this until now. For example, in KASAN: BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0 Write of size 38 at addr ffff888006f1f80e by task ping/89 CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5 Call Trace: dump_stack+0x9a/0xcc print_address_description.constprop.0+0x1a/0x160 __kasan_report.cold+0x20/0x38 kasan_report+0x32/0x40 check_memory_region+0x145/0x1a0 memcpy+0x39/0x60 __ip_options_echo+0xa0e/0x12b0 __icmp_send+0x744/0x1700 Actually, out of the 4 drivers that do this, only gtp zeroed the cb for the v4 case, while the rest did not. So this commit actually removes the gtp-specific zeroing, while putting the code where it belongs in the shared infrastructure of icmp{,v6}_ndo_send. This commit fixes the issue by passing an empty IPCB or IP6CB along to the functions that actually do the work. For the icmp_send, this was already trivial, thanks to __icmp_send providing the plumbing function. For icmpv6_send, this required a tiny bit of refactoring to make it behave like the v4 case, after which it was straight forward. Fixes: a2b78e9b2cac ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs") Reported-by: SinYu <liuxyon@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-03ipv6: silence compilation warning for non-IPV6 buildsLeon Romanovsky1-1/+1
commit 1faba27f11c8da244e793546a1b35a9b1da8208e upstream. The W=1 compilation of allmodconfig generates the following warning: net/ipv6/icmp.c:448:6: warning: no previous prototype for 'icmp6_send' [-Wmissing-prototypes] 448 | void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, | ^~~~~~~~~~ Fix it by providing function declaration for builds with ipv6 as a module. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-03ipv6: icmp6: avoid indirect call for icmpv6_send()Eric Dumazet1-1/+21
commit cc7a21b6fbd945f8d8f61422ccd27203c1fafeb7 upstream. If IPv6 is builtin, we do not need an expensive indirect call to reach icmp6_send(). v2: put inline keyword before the type to avoid sparse warnings. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-03icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=nJason A. Donenfeld1-6/+10
commit a8e41f6033a0c5633d55d6e35993c9e2005d872f upstream. The icmpv6_send function has long had a static inline implementation with an empty body for CONFIG_IPV6=n, so that code calling it doesn't need to be ifdef'd. The new icmpv6_ndo_send function, which is intended for drivers as a drop-in replacement with an identical function signature, should follow the same pattern. Without this patch, drivers that used to work with CONFIG_IPV6=n now result in a linker error. Cc: Chen Zhou <chenzhou10@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-03icmp: introduce helper for nat'd source address in network device contextJason A. Donenfeld2-0/+12
commit 0b41713b606694257b90d61ba7e2712d8457648b upstream. This introduces a helper function to be called only by network drivers that wraps calls to icmp[v6]_send in a conntrack transformation, in case NAT has been used. We don't want to pollute the non-driver path, though, so we introduce this as a helper to be called by places that actually make use of this, as suggested by Florian. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23Xen/gntdev: correct error checking in gntdev_map_grant_pages()Jan Beulich1-0/+1
commit ebee0eab08594b2bd5db716288a4f1ae5936e9bc upstream. Failure of the kernel part of the mapping operation should also be indicated as an error to the caller, or else it may assume the respective kernel VA is okay to access. Furthermore gnttab_map_refs() failing still requires recording successfully mapped handles, so they can be unmapped subsequently. This in turn requires there to be a way to tell full hypercall failure from partial success - preset map_op status fields such that they won't "happen" to look as if the operation succeeded. Also again use GNTST_okay instead of implying its value (zero). This is part of XSA-361. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23net: watchdog: hold device global xmit lock during tx disableEdwin Peer1-0/+2
commit 3aa6bce9af0e25b735c9c1263739a5639a336ae8 upstream. Prevent netif_tx_disable() running concurrently with dev_watchdog() by taking the device global xmit lock. Otherwise, the recommended: netif_carrier_off(dev); netif_tx_disable(dev); driver shutdown sequence can happen after the watchdog has already checked carrier, resulting in possible false alarms. This is because netif_tx_lock() only sets the frozen bit without maintaining the locks on the individual queues. Fixes: c3f26a269c24 ("netdev: Fix lockdep warnings in multiqueue configurations.") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23memcg: fix a crash in wb_workfn when a device disappearsTheodore Ts'o2-16/+23
[ Upstream commit 68f23b89067fdf187763e75a56087550624fdbee ] Without memcg, there is a one-to-one mapping between the bdi and bdi_writeback structures. In this world, things are fairly straightforward; the first thing bdi_unregister() does is to shutdown the bdi_writeback structure (or wb), and part of that writeback ensures that no other work queued against the wb, and that the wb is fully drained. With memcg, however, there is a one-to-many relationship between the bdi and bdi_writeback structures; that is, there are multiple wb objects which can all point to a single bdi. There is a refcount which prevents the bdi object from being released (and hence, unregistered). So in theory, the bdi_unregister() *should* only get called once its refcount goes to zero (bdi_put will drop the refcount, and when it is zero, release_bdi gets called, which calls bdi_unregister). Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about the Brave New memcg World, and calls bdi_unregister directly. It does this without informing the file system, or the memcg code, or anything else. This causes the root wb associated with the bdi to be unregistered, but none of the memcg-specific wb's are shutdown. So when one of these wb's are woken up to do delayed work, they try to dereference their wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now NULL, thanks to the bdi_unregister() called by del_gendisk(). As a result, *boom*. Fortunately, it looks like the rest of the writeback path is perfectly happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to create a bdi_dev_name() function which can handle bdi->dev being NULL. This also allows us to bulletproof the writeback tracepoints to prevent them from dereferencing a NULL pointer and crashing the kernel if one is tracing with memcg's enabled, and an iSCSI device dies or a USB storage stick is pulled. The most common way of triggering this will be hotremoval of a device while writeback with memcg enabled is going on. It was triggering several times a day in a heavily loaded production environment. Google Bug Id: 145475544 Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-23include/trace/events/writeback.h: fix -Wstringop-truncation warningsQian Cai1-18/+20
[ Upstream commit d1a445d3b86c9341ce7a0954c23be0edb5c9bec5 ] There are many of those warnings. In file included from ./arch/powerpc/include/asm/paca.h:15, from ./arch/powerpc/include/asm/current.h:13, from ./include/linux/thread_info.h:21, from ./include/asm-generic/preempt.h:5, from ./arch/powerpc/include/generated/asm/preempt.h:1, from ./include/linux/preempt.h:78, from ./include/linux/spinlock.h:51, from fs/fs-writeback.c:19: In function 'strncpy', inlined from 'perf_trace_writeback_page_template' at ./include/trace/events/writeback.h:56:1: ./include/linux/string.h:260:9: warning: '__builtin_strncpy' specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix it by using the new strscpy_pad() which was introduced in "lib/string: Add strscpy_pad() function" and will always be NUL-terminated instead of strncpy(). Also, change strlcpy() to use strscpy_pad() in this file for consistency. Link: http://lkml.kernel.org/r/1564075099-27750-1-git-send-email-cai@lca.pw Fixes: 455b2864686d ("writeback: Initial tracing support") Fixes: 028c2dd184c0 ("writeback: Add tracing to balance_dirty_pages") Fixes: e84d0a4f8e39 ("writeback: trace event writeback_queue_io") Fixes: b48c104d2211 ("writeback: trace event bdi_dirty_ratelimit") Fixes: cc1676d917f3 ("writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()") Fixes: 9fb0a7da0c52 ("writeback: add more tracepoints") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Nitin Gote <nitin.r.gote@intel.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Stephen Kitt <steve@sk2.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-23lib/string: Add strscpy_pad() functionTobin C. Harding1-0/+4
[ Upstream commit 458a3bf82df4fe1f951d0f52b1e0c1e9d5a88a3b ] We have a function to copy strings safely and we have a function to copy strings and zero the tail of the destination (if source string is shorter than destination buffer) but we do not have a function to do both at once. This means developers must write this themselves if they desire this functionality. This is a chore, and also leaves us open to off by one errors unnecessarily. Add a function that calls strscpy() then memset()s the tail to zero if the source string is shorter than the destination buffer. Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: Shuah Khan <shuah@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-23SUNRPC: Move simple_get_bytes and simple_get_netobj into private headerDave Wysochanski1-2/+1
[ Upstream commit ba6dfce47c4d002d96cd02a304132fca76981172 ] Remove duplicated helper functions to parse opaque XDR objects and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h. In the new file carry the license and copyright from the source file net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside include/linux/sunrpc/xdr.h since lockd is not the only user of struct xdr_netobj. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-23fgraph: Initialize tracing_graph_pause at task creationSteven Rostedt (VMware)1-1/+3
commit 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 upstream. On some archs, the idle task can call into cpu_suspend(). The cpu_suspend() will disable or pause function graph tracing, as there's some paths in bringing down the CPU that can have issues with its return address being modified. The task_struct structure has a "tracing_graph_pause" atomic counter, that when set to something other than zero, the function graph tracer will not modify the return address. The problem is that the tracing_graph_pause counter is initialized when the function graph tracer is enabled. This can corrupt the counter for the idle task if it is suspended in these architectures. CPU 1 CPU 2 ----- ----- do_idle() cpu_suspend() pause_graph_tracing() task_struct->tracing_graph_pause++ (0 -> 1) start_graph_tracing() for_each_online_cpu(cpu) { ftrace_graph_init_idle_task(cpu) task-struct->tracing_graph_pause = 0 (1 -> 0) unpause_graph_tracing() task_struct->tracing_graph_pause-- (0 -> -1) The above should have gone from 1 to zero, and enabled function graph tracing again. But instead, it is set to -1, which keeps it disabled. There's no reason that the field tracing_graph_pause on the task_struct can not be initialized at boot up. Cc: stable@vger.kernel.org Fixes: 380c4b1411ccd ("tracing/function-graph-tracer: append the tracing_graph_flag") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339 Reported-by: pierre.gondois@arm.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()Johannes Weiner1-5/+28
commit 739f79fc9db1b38f96b5a5109b247a650fbebf6d upstream. Jaegeuk and Brad report a NULL pointer crash when writeback ending tries to update the memcg stats: BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0 IP: test_clear_page_writeback+0x12e/0x2c0 [...] RIP: 0010:test_clear_page_writeback+0x12e/0x2c0 Call Trace: <IRQ> end_page_writeback+0x47/0x70 f2fs_write_end_io+0x76/0x180 [f2fs] bio_endio+0x9f/0x120 blk_update_request+0xa8/0x2f0 scsi_end_request+0x39/0x1d0 scsi_io_completion+0x211/0x690 scsi_finish_command+0xd9/0x120 scsi_softirq_done+0x127/0x150 __blk_mq_complete_request_remote+0x13/0x20 flush_smp_call_function_queue+0x56/0x110 generic_smp_call_function_single_interrupt+0x13/0x30 smp_call_function_single_interrupt+0x27/0x40 call_function_single_interrupt+0x89/0x90 RIP: 0010:native_safe_halt+0x6/0x10 (gdb) l *(test_clear_page_writeback+0x12e) 0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619). 614 mod_node_page_state(page_pgdat(page), idx, val); 615 if (mem_cgroup_disabled() || !page->mem_cgroup) 616 return; 617 mod_memcg_state(page->mem_cgroup, idx, val); 618 pn = page->mem_cgroup->nodeinfo[page_to_nid(page)]; 619 this_cpu_add(pn->lruvec_stat->count[idx], val); 620 } 621 622 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, 623 gfp_t gfp_mask, The issue is that writeback doesn't hold a page reference and the page might get freed after PG_writeback is cleared (and the mapping is unlocked) in test_clear_page_writeback(). The stat functions looking up the page's node or zone are safe, as those attributes are static across allocation and free cycles. But page->mem_cgroup is not, and it will get cleared if we race with truncation or migration. It appears this race window has been around for a while, but less likely to trigger when the memcg stats were updated first thing after PG_writeback is cleared. Recent changes reshuffled this code to update the global node stats before the memcg ones, though, stretching the race window out to an extent where people can reproduce the problem. Update test_clear_page_writeback() to look up and pin page->mem_cgroup before clearing PG_writeback, then not use that pointer afterward. It is a partial revert of 62cccb8c8e7a ("mm: simplify lock_page_memcg()") but leaves the pageref-holding callsites that aren't affected alone. Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org Fixes: 62cccb8c8e7a ("mm: simplify lock_page_memcg()") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Jaegeuk Kim <jaegeuk@kernel.org> Tested-by: Jaegeuk Kim <jaegeuk@kernel.org> Reported-by: Bradley Bolen <bradleybolen@gmail.com> Tested-by: Brad Bolen <bradleybolen@gmail.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [guptap@codeaurora.org: Resolved merge conflicts] Signed-off-by: Prakash Gupta <guptap@codeaurora.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB pageMuchun Song1-0/+3
commit 585fc0d2871c9318c949fbf45b1f081edd489e96 upstream. If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10elfcore: fix building with clangArnd Bergmann1-0/+22
commit 6e7b64b9dd6d96537d816ea07ec26b7dedd397b9 upstream. kernel/elfcore.c only contains weak symbols, which triggers a bug with clang in combination with recordmcount: Cannot find symbol for section 2: .text. kernel/elfcore.o: failed Move the empty stubs into linux/elfcore.h as inline functions. As only two architectures use these, just use the architecture specific Kconfig symbols to key off the declaration. Link: https://lkml.kernel.org/r/20201204165742.3815221-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Barret Rhoden <brho@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not builtBartosz Golaszewski1-0/+2
commit 9def3b1a07c41e21c68a0eb353e3e569fdd1d2b1 upstream. Since commit c40aaaac1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") dmar.c needs struct iommu_device to be selected. We can drop this dependency by not dereferencing struct iommu_device if IOMMU_API is not selected and by reusing the information stored in iommu->drhd->ignored instead. This fixes the following build error when IOMMU_API is not selected: drivers/iommu/dmar.c: In function ‘free_iommu’: drivers/iommu/dmar.c:1139:41: error: ‘struct iommu_device’ has no member named ‘ops’ 1139 | if (intel_iommu_enabled && iommu->iommu.ops) { ^ Fixes: c40aaaac1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported address widths") Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20201013073055.11262-1-brgl@bgdev.pl Signed-off-by: Joerg Roedel <jroedel@suse.de> [ - context change due to moving drivers/iommu/dmar.c to drivers/iommu/intel/dmar.c - set the drhr in the iommu like in upstream commit b1012ca8dc4f ("iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu") ] Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04futex: Add mutex around futex exitThomas Gleixner2-0/+2
commit 3f186d974826847a07bc7964d79ec4eded475ad9 upstream. The mutex will be used in subsequent changes to replace the busy looping of a waiter when the futex owner is currently executing the exit cleanup to prevent a potential live lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.845798895@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04futex: Mark the begin of futex exit explicitlyThomas Gleixner1-28/+3
commit 18f694385c4fd77a09851fd301236746ca83f3cb upstream. Instead of relying on PF_EXITING use an explicit state for the futex exit and set it in the futex exit function. This moves the smp barrier and the lock/unlock serialization into the futex code. As with the DEAD state this is restricted to the exit path as exec continues to use the same task struct. This allows to simplify that logic in a next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.539409004@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04futex: Split futex_mm_release() for exit/execThomas Gleixner1-2/+4
commit 150d71584b12809144b8145b817e83b81158ae5f upstream. To allow separate handling of the futex exit state in the futex exit code for exit and exec, split futex_mm_release() into two functions and invoke them from the corresponding exit/exec_mm_release() callsites. Preparatory only, no functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.332094221@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04exit/exec: Seperate mm_release()Thomas Gleixner1-2/+4
commit 4610ba7ad877fafc0a25a30c6c82015304120426 upstream. mm_release() contains the futex exit handling. mm_release() is called from do_exit()->exit_mm() and from exec()->exec_mm(). In the exit_mm() case PF_EXITING and the futex state is updated. In the exec_mm() case these states are not touched. As the futex exit code needs further protections against exit races, this needs to be split into two functions. Preparatory only, no functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.240518241@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04futex: Replace PF_EXITPIDONE with a stateThomas Gleixner2-1/+35
commit 3d4775df0a89240f671861c6ab6e8d59af8e9e41 upstream. The futex exit handling relies on PF_ flags. That's suboptimal as it requires a smp_mb() and an ugly lock/unlock of the exiting tasks pi_lock in the middle of do_exit() to enforce the observability of PF_EXITING in the futex code. Add a futex_state member to task_struct and convert the PF_EXITPIDONE logic over to the new state. The PF_EXITING dependency will be cleaned up in a later step. This prepares for handling various futex exit issues later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.149449274@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04futex: Move futex exit handling into futex codeThomas Gleixner2-9/+17
commit ba31c1a48538992316cc71ce94fa9cd3e7b427c0 upstream. The futex exit handling is #ifdeffed into mm_release() which is not pretty to begin with. But upcoming changes to address futex exit races need to add more functionality to this exit code. Split it out into a function, move it into futex code and make the various futex exit functions static. Preparatory only and no functional change. Folded build fix from Borislav. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191106224556.049705556@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04y2038: futex: Move compat implementation into futex.cArnd Bergmann1-8/+0
commit 04e7712f4460585e5eed5b853fd8b82a9943958f upstream. We are going to share the compat_sys_futex() handler between 64-bit architectures and 32-bit architectures that need to deal with both 32-bit and 64-bit time_t, and this is easier if both entry points are in the same file. In fact, most other system call handlers do the same thing these days, so let's follow the trend here and merge all of futex_compat.c into futex.c. In the process, a few minor changes have to be done to make sure everything still makes sense: handle_futex_death() and futex_cmpxchg_enabled() become local symbol, and the compat version of the fetch_robust_entry() function gets renamed to compat_fetch_robust_entry() to avoid a symbol clash. This is intended as a purely cosmetic patch, no behavior should change. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Lee: Back-ported to satisfy a build dependency] Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-30compiler.h: Raise minimum version of GCC to 5.1 for arm64Will Deacon1-0/+6
commit dca5244d2f5b94f1809f0c02a549edf41ccd5493 upstream. GCC versions >= 4.9 and < 5.1 have been shown to emit memory references beyond the stack pointer, resulting in memory corruption if an interrupt is taken after the stack pointer has been adjusted but before the reference has been executed. This leads to subtle, infrequent data corruption such as the EXT4 problems reported by Russell King at the link below. Life is too short for buggy compilers, so raise the minimum GCC version required by arm64 to 5.1. Reported-by: Russell King <linux@armlinux.org.uk> Suggested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Florian Weimer <fweimer@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20210105154726.GD1551@shell.armlinux.org.uk Link: https://lore.kernel.org/r/20210112224832.10980-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [will: backport to 4.4.y/4.9.y/4.14.y; add __clang__ check] Link: https://lore.kernel.org/r/CA+G9fYuzE9WMSB7uGjV4gTzK510SHEdJb_UXQCzsQ5MqA=h9SA@mail.gmail.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23Input: uinput - avoid FF flush when destroying deviceDmitry Torokhov1-0/+1
commit e8b95728f724797f958912fd9b765a695595d3a6 upstream. Normally, when input device supporting force feedback effects is being destroyed, we try to "flush" currently playing effects, so that the physical device does not continue vibrating (or executing other effects). Unfortunately this does not work well for uinput as flushing of the effects deadlocks with the destroy action: - if device is being destroyed because the file descriptor is being closed, then there is noone to even service FF requests; - if device is being destroyed because userspace sent UI_DEV_DESTROY, while theoretically it could be possible to service FF requests, userspace is unlikely to do so (they'd need to make sure FF handling happens on a separate thread) even if kernel solves the issue with FF ioctls deadlocking with UI_DEV_DESTROY ioctl on udev->mutex. To avoid lockups like the one below, let's install a custom input device flush handler, and avoid trying to flush force feedback effects when we destroying the device, and instead rely on uinput to shut off the device properly. NMI watchdog: Watchdog detected hard LOCKUP on cpu 3 ... <<EOE>> [<ffffffff817a0307>] _raw_spin_lock_irqsave+0x37/0x40 [<ffffffff810e633d>] complete+0x1d/0x50 [<ffffffffa00ba08c>] uinput_request_done+0x3c/0x40 [uinput] [<ffffffffa00ba587>] uinput_request_submit.part.7+0x47/0xb0 [uinput] [<ffffffffa00bb62b>] uinput_dev_erase_effect+0x5b/0x76 [uinput] [<ffffffff815d91ad>] erase_effect+0xad/0xf0 [<ffffffff815d929d>] flush_effects+0x4d/0x90 [<ffffffff815d4cc0>] input_flush_device+0x40/0x60 [<ffffffff815daf1c>] evdev_cleanup+0xac/0xc0 [<ffffffff815daf5b>] evdev_disconnect+0x2b/0x60 [<ffffffff815d74ac>] __input_unregister_device+0xac/0x150 [<ffffffff815d75f7>] input_unregister_device+0x47/0x70 [<ffffffffa00bac45>] uinput_destroy_device+0xb5/0xc0 [uinput] [<ffffffffa00bb2de>] uinput_ioctl_handler.isra.9+0x65e/0x740 [uinput] [<ffffffff811231ab>] ? do_futex+0x12b/0xad0 [<ffffffffa00bb3f8>] uinput_ioctl+0x18/0x20 [uinput] [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480 [<ffffffff81337553>] ? security_file_ioctl+0x43/0x60 [<ffffffff812414a9>] SyS_ioctl+0x79/0x90 [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71 Reported-by: Rodrigo Rivas Costa <rodrigorivascosta@gmail.com> Reported-by: Clément VUCHENER <clement.vuchener@gmail.com> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=193741 Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPIShawn Guo1-0/+7
[ Upstream commit ee61cfd955a64a58ed35cbcfc54068fcbd486945 ] It adds a stub acpi_create_platform_device() for !CONFIG_ACPI build, so that caller doesn't have to deal with !CONFIG_ACPI build issue. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-17vmlinux.lds.h: Add PGO and AutoFDO input sectionsNick Desaulniers1-1/+4
commit eff8728fe69880d3f7983bec3fb6cea4c306261f upstream. Basically, consider .text.{hot|unlikely|unknown}.* part of .text, too. When compiling with profiling information (collected via PGO instrumentations or AutoFDO sampling), Clang will separate code into .text.hot, .text.unlikely, or .text.unknown sections based on profiling information. After D79600 (clang-11), these sections will have a trailing `.` suffix, ie. .text.hot., .text.unlikely., .text.unknown.. When using -ffunction-sections together with profiling infomation, either explicitly (FGKASLR) or implicitly (LTO), code may be placed in sections following the convention: .text.hot.<foo>, .text.unlikely.<bar>, .text.unknown.<baz> where <foo>, <bar>, and <baz> are functions. (This produces one section per function; we generally try to merge these all back via linker script so that we don't have 50k sections). For the above cases, we need to teach our linker scripts that such sections might exist and that we'd explicitly like them grouped together, otherwise we can wind up with code outside of the _stext/_etext boundaries that might not be mapped properly for some architectures, resulting in boot failures. If the linker script is not told about possible input sections, then where the section is placed as output is a heuristic-laiden mess that's non-portable between linkers (ie. BFD and LLD), and has resulted in many hard to debug bugs. Kees Cook is working on cleaning this up by adding --orphan-handling=warn linker flag used in ARCH=powerpc to additional architectures. In the case of linker scripts, borrowing from the Zen of Python: explicit is better than implicit. Also, ld.bfd's internal linker script considers .text.hot AND .text.hot.* to be part of .text, as well as .text.unlikely and .text.unlikely.*. I didn't see support for .text.unknown.*, and didn't see Clang producing such code in our kernel builds, but I see code in LLVM that can produce such section names if profiling information is missing. That may point to a larger issue with generating or collecting profiles, but I would much rather be safe and explicit than have to debug yet another issue related to orphan section placement. Reported-by: Jian Cai <jiancai@google.com> Suggested-by: Fāng-ruì Sòng <maskray@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Luis Lozano <llozano@google.com> Tested-by: Manoj Gupta <manojgupta@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=add44f8d5c5c05e08b11e033127a744d61c26aee Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=1de778ed23ce7492c523d5850c6c6dbb34152655 Link: https://reviews.llvm.org/D79600 Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1084760 Link: https://lore.kernel.org/r/20200821194310.3089815-7-keescook@chromium.org Debugged-by: Luis Lozano <llozano@google.com> [nc: Fix small conflict around lack of NOINSTR_TEXT and .text..refcount] Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17target: add XCOPY target/segment desc sense codesDavid Disseldorp1-0/+4
[ Upstream commit e864212078ded276bdb272b2e0ee6a979357ca8a ] As defined in http://www.t10.org/lists/asc-num.htm. To be used during validation of XCOPY target and segment descriptor lists. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-12net: sched: prevent invalid Scell_log shift countRandy Dunlap1-1/+3
[ Upstream commit bd1248f1ddbc48b0c30565fce897a3b6423313b8 ] Check Scell_log shift size in red_check_params() and modify all callers of red_check_params() to pass Scell_log. This prevents a shift out-of-bounds as detected by UBSAN: UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22 shift exponent 72 is too large for 32-bit type 'int' Fixes: 8afa10cbe281 ("net_sched: red: Avoid illegal values") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com Cc: Nogah Frankel <nogahf@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-09xen/xenbus: Count pending messages for each watchSeongJae Park1-0/+2
commit 3dc86ca6b4c8cfcba9da7996189d1b5a358a94fc upstream. This commit adds a counter of pending messages for each watch in the struct. It is used to skip unnecessary pending messages lookup in 'unregister_xenbus_watch()'. It could also be used in 'will_handle' callback. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-09xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path()SeongJae Park1-1/+5
commit 2e85d32b1c865bec703ce0c962221a5e955c52c2 upstream. Some code does not directly make 'xenbus_watch' object and call 'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This commit adds support of 'will_handle' callback in the 'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-09xen/xenbus: Allow watches discard events before queueingSeongJae Park1-0/+7
commit fed1755b118147721f2c87b37b9d66e62c39b668 upstream. If handling logics of watch events are slower than the events enqueue logic and the events can be created from the guests, the guests could trigger memory pressure by intensively inducing the events, because it will create a huge number of pending events that exhausting the memory. Fortunately, some watch events could be ignored, depending on its handler callback. For example, if the callback has interest in only one single path, the watch wouldn't want multiple pending events. Or, some watches could ignore events to same path. To let such watches to volutarily help avoiding the memory pressure situation, this commit introduces new watch callback, 'will_handle'. If it is not NULL, it will be called for each new event just before enqueuing it. Then, if the callback returns false, the event will be discarded. No watch is using the callback for now, though. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-09kdev_t: always inline major/minor helper functionsJosh Poimboeuf1-11/+11
commit aa8c7db494d0a83ecae583aa193f1134ef25d506 upstream. Silly GCC doesn't always inline these trivial functions. Fixes the following warning: arch/x86/kernel/sys_ia32.o: warning: objtool: cp_stat64()+0xd8: call to new_encode_dev() with UACCESS enabled Link: https://lkml.kernel.org/r/984353b44a4484d86ba9f73884b7306232e25e30.1608737428.git.jpoimboe@redhat.com Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> [build-tested] Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-09of: fix linker-section match-table corruptionJohan Hovold1-0/+1
commit 5812b32e01c6d86ba7a84110702b46d8a8531fe9 upstream. Specify type alignment when declaring linker-section match-table entries to prevent gcc from increasing alignment and corrupting the various tables with padding (e.g. timers, irqchips, clocks, reserved memory). This is specifically needed on x86 where gcc (typically) aligns larger objects like struct of_device_id with static extent on 32-byte boundaries which at best prevents matching on anything but the first entry. Specifying alignment when declaring variables suppresses this optimisation. Here's a 64-bit example where all entries are corrupt as 16 bytes of padding has been inserted before the first entry: ffffffff8266b4b0 D __clk_of_table ffffffff8266b4c0 d __of_table_fixed_factor_clk ffffffff8266b5a0 d __of_table_fixed_clk ffffffff8266b680 d __clk_of_table_sentinel And here's a 32-bit example where the 8-byte-aligned table happens to be placed on a 32-byte boundary so that all but the first entry are corrupt due to the 28 bytes of padding inserted between entries: 812b3ec0 D __irqchip_of_table 812b3ec0 d __of_table_irqchip1 812b3fa0 d __of_table_irqchip2 812b4080 d __of_table_irqchip3 812b4160 d irqchip_of_match_end Verified on x86 using gcc-9.3 and gcc-4.9 (which uses 64-byte alignment), and on arm using gcc-7.2. Note that there are no in-tree users of these tables on x86 currently (even if they are included in the image). Fixes: 54196ccbe0ba ("of: consolidate linker section OF match table declarations") Fixes: f6e916b82022 ("irqchip: add basic infrastructure") Cc: stable <stable@vger.kernel.org> # 3.9 Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20201123102319.8090-2-johan@kernel.org [ johan: adjust context to 5.4 ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-09uapi: move constants from <linux/kernel.h> to <linux/const.h>Petr Vorel8-14/+12
commit a85cbe6159ffc973e5702f70a3bd5185f8f3c38d upstream. and include <linux/const.h> in UAPI headers instead of <linux/kernel.h>. The reason is to avoid indirect <linux/sysinfo.h> include when using some network headers: <linux/netlink.h> or others -> <linux/kernel.h> -> <linux/sysinfo.h>. This indirect include causes on MUSL redefinition of struct sysinfo when included both <sys/sysinfo.h> and some of UAPI headers: In file included from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/kernel.h:5, from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/netlink.h:5, from ../include/tst_netlink.h:14, from tst_crypto.c:13: x86_64-buildroot-linux-musl/sysroot/usr/include/linux/sysinfo.h:8:8: error: redefinition of `struct sysinfo' struct sysinfo { ^~~~~~~ In file included from ../include/tst_safe_macros.h:15, from ../include/tst_test.h:93, from tst_crypto.c:11: x86_64-buildroot-linux-musl/sysroot/usr/include/sys/sysinfo.h:10:8: note: originally defined here Link: https://lkml.kernel.org/r/20201015190013.8901-1-petr.vorel@gmail.com Signed-off-by: Petr Vorel <petr.vorel@gmail.com> Suggested-by: Rich Felker <dalias@aerifal.cx> Acked-by: Rich Felker <dalias@libc.org> Cc: Peter Korsgaard <peter@korsgaard.com> Cc: Baruch Siach <baruch@tkos.co.il> Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-29seq_buf: Avoid type mismatch for seq_buf_initArnd Bergmann2-3/+3
[ Upstream commit d9a9280a0d0ae51dc1d4142138b99242b7ec8ac6 ] Building with W=2 prints a number of warnings for one function that has a pointer type mismatch: linux/seq_buf.h: In function 'seq_buf_init': linux/seq_buf.h:35:12: warning: pointer targets in assignment from 'unsigned char *' to 'char *' differ in signedness [-Wpointer-sign] Change the type in the function prototype according to the type in the structure. Link: https://lkml.kernel.org/r/20201026161108.3707783-1-arnd@kernel.org Fixes: 9a7777935c34 ("tracing: Convert seq_buf fields to be like seq_file fields") Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>