summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2022-03-18serial: 8250: fix XOFF/XON sending when DMA is usedIlpo Järvinen1-0/+2
When 8250 UART is using DMA, x_char (XON/XOFF) is never sent to the wire. After this change, x_char is injected correctly. Create uart_xchar_out() helper for sending the x_char out and accounting related to it. It seems that almost every driver does these same steps with x_char. Except for 8250, however, almost all currently lack .serial_out so they cannot immediately take advantage of this new helper. The downside of this patch is that it might reintroduce the problems some devices faced with mixed DMA/non-DMA transfer which caused revert f967fc8f165f (Revert "serial: 8250_dma: don't bother DMA with small transfers"). However, the impact should be limited to cases with XON/XOFF (that didn't work with DMA capable devices to begin with so this problem is not very likely to cause a major issue, if any at all). Fixes: 9ee4b83e51f74 ("serial: 8250: Add support for dmaengine") Reported-by: Gilles Buloz <gilles.buloz@kontron.com> Tested-by: Gilles Buloz <gilles.buloz@kontron.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220314091432.4288-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-03serial: make uart_console_write->putchar()'s character an unsigned charJiri Slaby1-1/+1
Currently, uart_console_write->putchar's second parameter (the character) is of type int. It makes little sense, provided uart_console_write() accepts the input string as "const char *s" and passes its content -- the characters -- to putchar(). So switch the character's type to unsigned char. We don't use char as that is signed on some platforms. That would cause troubles for drivers which (implicitly) cast the char to u16 when writing to the device. Sign extension would happen in that case and the value written would be completely different to the provided char. DZ is an example of such a driver -- on MIPS, it uses u16 for dz_out in dz_console_putchar(). Note we do the char -> uchar conversion implicitly in uart_console_write(). Provided we do not change size of the data type, sign extension does not happen there, so the problem is void. This makes the types consistent and unified with the rest of the uart layer, which uses unsigned char in most places already. One exception is xmit_buf, but that is going to be converted later. Cc: Paul Cercueil <paul@crapouillou.net> Cc: Tobias Klauser <tklauser@distanz.ch> Cc: Russell King <linux@armlinux.org.uk> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Ludovic Desroches <ludovic.desroches@microchip.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: bcm-kernel-feedback-list@broadcom.com Cc: Alexander Shiyan <shc_work@mail.ru> Cc: Baruch Siach <baruch@tkos.co.il> Cc: "Maciej W. Rozycki" <macro@orcam.me.uk> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Pengutronix Kernel Team <kernel@pengutronix.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: Karol Gugala <kgugala@antmicro.com> Cc: Mateusz Holenko <mholenko@antmicro.com> Cc: Vladimir Zapolskiy <vz@mleia.com> Cc: Neil Armstrong <narmstrong@baylibre.com> Cc: Kevin Hilman <khilman@baylibre.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Taichi Sugaya <sugaya.taichi@socionext.com> Cc: Takao Orito <orito.takao@socionext.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: "Andreas Färber" <afaerber@suse.de> Cc: Manivannan Sadhasivam <mani@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Andy Gross <agross@kernel.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Cc: Orson Zhai <orsonzhai@gmail.com> Cc: Baolin Wang <baolin.wang7@gmail.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: Patrice Chotard <patrice.chotard@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Korsgaard <peter@korsgaard.com> Cc: Michal Simek <michal.simek@xilinx.com> Acked-by: Richard Genoud <richard.genoud@gmail.com> [atmel_serial] Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Paul Cercueil <paul@crapouillou.net> Acked-by: Neil Armstrong <narmstrong@baylibre.com> # meson_serial Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20220303080831.21783-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-28tty: serial: define UART_LCR_WLEN() macroJiri Slaby1-0/+2
Define a generic UART_LCR_WLEN() macro with a size argument. It can be used to encode byte size into an LCR value. Therefore we can use it to simplify the drivers using tty_get_char_size() in the next patches. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20220224095558.30929-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-28Merge 5.17-rc6 into tty-nextGreg Kroah-Hartman11-22/+38
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-27Merge tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-0/+8
Pull dma-mapping fix from Christoph Hellwig: - fix a swiotlb info leak (Halil Pasic) * tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix info leak with DMA_FROM_DEVICE
2022-02-26Merge tag 'trace-v5.17-rc4' of ↵Linus Torvalds1-10/+12
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - rtla (Real-Time Linux Analysis tool): - fix typo in man page - Update API -e to -E before it is released - Error message fix and memory leak fix - Partially uninline trace event soft disable to shrink text - Fix function graph start up test - Have triggers affect the trace instance they are in and not top level - Have osnoise sleep in the units it says it uses - Remove unused ftrace stub function - Remove event probe redundant info from event in the buffer - Fix group ownership setting in tracefs - Ensure trace buffer is minimum size to prevent crashes * tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: rtla/osnoise: Fix error message when failing to enable trace instance rtla/osnoise: Free params at the exit rtla/hist: Make -E the short version of --entries tracing: Fix selftest config check for function graph start up test tracefs: Set the group ownership in apply_options() not parse_options() tracing/osnoise: Make osnoise_main to sleep for microseconds ftrace: Remove unused ftrace_startup_enable() stub tracing: Ensure trace buffer is at least 4096 bytes large tracing: Uninline trace_trigger_soft_disabled() partly eprobes: Remove redundant event type information tracing: Have traceon and traceoff trigger honor the instance tracing: Dump stacktrace trigger to the corresponding instance rtla: Fix systme -> system typo on man page
2022-02-25Merge tag 'pm-5.17-rc6' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Fix the throttle IRQ handling during cpufreq initialization on Qualcomm platforms (Bjorn Andersson)" * tag 'pm-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: qcom-hw: Delay enabling throttle_irq cpufreq: Reintroduce ready() callback
2022-02-25Merge tag 'char-misc-5.17-rc6' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are a few small driver fixes for 5.17-rc6 for reported issues. The majority of these are IIO fixes for small things, and the other two are a mvmem and mtd core conflict fix. All of these have been in linux-next with no reported issues" * tag 'char-misc-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: mtd: core: Fix a conflict between MTD and NVMEM on wp-gpios property nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot iio: Fix error handling for PM iio: addac: ad74413r: correct comparator gpio getters mask usage iio: addac: ad74413r: use ngpio size when iterating over mask iio: addac: ad74413r: Do not reference negative array offsets iio: adc: men_z188_adc: Fix a resource leak in an error handling path iio: frequency: admv1013: remove the always true condition iio: accel: fxls8962af: add padding to regmap for SPI iio:imu:adis16480: fix buffering for devices with no burst mode iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits iio: adc: tsc2046: fix memory corruption by preventing array overflow
2022-02-25tracing: Uninline trace_trigger_soft_disabled() partlyChristophe Leroy1-10/+12
On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline keyword is not honored and trace_trigger_soft_disabled() appears approx 50 times in vmlinux. Adding -Winline to the build, the following message appears: ./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline] That function is rather big for an inlined function: c003df60 <trace_trigger_soft_disabled>: c003df60: 94 21 ff f0 stwu r1,-16(r1) c003df64: 7c 08 02 a6 mflr r0 c003df68: 90 01 00 14 stw r0,20(r1) c003df6c: bf c1 00 08 stmw r30,8(r1) c003df70: 83 e3 00 24 lwz r31,36(r3) c003df74: 73 e9 01 00 andi. r9,r31,256 c003df78: 41 82 00 10 beq c003df88 <trace_trigger_soft_disabled+0x28> c003df7c: 38 60 00 00 li r3,0 c003df80: 39 61 00 10 addi r11,r1,16 c003df84: 4b fd 60 ac b c0014030 <_rest32gpr_30_x> c003df88: 73 e9 00 80 andi. r9,r31,128 c003df8c: 7c 7e 1b 78 mr r30,r3 c003df90: 41 a2 00 14 beq c003dfa4 <trace_trigger_soft_disabled+0x44> c003df94: 38 c0 00 00 li r6,0 c003df98: 38 a0 00 00 li r5,0 c003df9c: 38 80 00 00 li r4,0 c003dfa0: 48 05 c5 f1 bl c009a590 <event_triggers_call> c003dfa4: 73 e9 00 40 andi. r9,r31,64 c003dfa8: 40 82 00 28 bne c003dfd0 <trace_trigger_soft_disabled+0x70> c003dfac: 73 ff 02 00 andi. r31,r31,512 c003dfb0: 41 82 ff cc beq c003df7c <trace_trigger_soft_disabled+0x1c> c003dfb4: 80 01 00 14 lwz r0,20(r1) c003dfb8: 83 e1 00 0c lwz r31,12(r1) c003dfbc: 7f c3 f3 78 mr r3,r30 c003dfc0: 83 c1 00 08 lwz r30,8(r1) c003dfc4: 7c 08 03 a6 mtlr r0 c003dfc8: 38 21 00 10 addi r1,r1,16 c003dfcc: 48 05 6f 6c b c0094f38 <trace_event_ignore_this_pid> c003dfd0: 38 60 00 01 li r3,1 c003dfd4: 4b ff ff ac b c003df80 <trace_trigger_soft_disabled+0x20> However it is located in a hot path so inlining it is important. But forcing inlining of the entire function by using __always_inline leads to increasing the text size by approx 20 kbytes. Instead, split the fonction in two parts, one part with the likely fast path, flagged __always_inline, and a second part out of line. With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE vmlinux text increases by only 1,4 kbytes, which is partly compensated by a decrease of vmlinux data by 7 kbytes. On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this change reduces vmlinux text by more than 30 kbytes. Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25math64: New DIV_U64_ROUND_CLOSEST helperPali Rohár1-0/+13
Provide DIV_U64_ROUND_CLOSEST helper which uses div_u64 to perform division rounded to the closest integer using unsigned 64bit dividend and unsigned 32bit divisor. Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Link: https://lore.kernel.org/r/20220219152818.4319-2-kabel@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-24Merge tag 'net-5.17-rc6' of ↵Linus Torvalds1-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. Current release - regressions: - bpf: fix crash due to out of bounds access into reg2btf_ids - mvpp2: always set port pcs ops, avoid null-deref - eth: marvell: fix driver load from initrd - eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC" Current release - new code bugs: - mptcp: fix race in overlapping signal events Previous releases - regressions: - xen-netback: revert hotplug-status changes causing devices to not be configured - dsa: - avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held - fix panic when removing unoffloaded port from bridge - dsa: microchip: fix bridging with more than two member ports Previous releases - always broken: - bpf: - fix crash due to incorrect copy_map_value when both spin lock and timer are present in a single value - fix a bpf_timer initialization issue with clang - do not try bpf_msg_push_data with len 0 - add schedule points in batch ops - nf_tables: - unregister flowtable hooks on netns exit - correct flow offload action array size - fix a couple of memory leaks - vsock: don't check owner in vhost_vsock_stop() while releasing - gso: do not skip outer ip header in case of ipip and net_failover - smc: use a mutex for locking "struct smc_pnettable" - openvswitch: fix setting ipv6 fields causing hw csum failure - mptcp: fix race in incoming ADD_ADDR option processing - sysfs: add check for netdevice being present to speed_show - sched: act_ct: fix flow table lookup after ct clear or switching zones - eth: intel: fixes for SR-IOV forwarding offloads - eth: broadcom: fixes for selftests and error recovery - eth: mellanox: flow steering and SR-IOV forwarding fixes Misc: - make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends not report freed skbs as drops - force inlining of checksum functions in net/checksum.h" * tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits) net: mv643xx_eth: process retval from of_get_mac_address ping: remove pr_err from ping_lookup Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC" openvswitch: Fix setting ipv6 fields causing hw csum failure ipv6: prevent a possible race condition with lifetimes net/smc: Use a mutex for locking "struct smc_pnettable" bnx2x: fix driver load from initrd Revert "xen-netback: Check for hotplug-status existence before watching" Revert "xen-netback: remove 'hotplug-status' once it has served its purpose" net/mlx5e: Fix VF min/max rate parameters interchange mistake net/mlx5e: Add missing increment of count net/mlx5e: MPLSoUDP decap, fix check for unsupported matches net/mlx5e: Fix MPLSoUDP encap to use MPLS action information net/mlx5e: Add feature check for set fec counters net/mlx5e: TC, Skip redundant ct clear actions net/mlx5e: TC, Reject rules with forward and drop actions net/mlx5e: TC, Reject rules with drop and modify hdr action net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets net/mlx5e: Fix wrong return value on ioctl EEPROM query failure net/mlx5: Fix possible deadlock on rule deletion ...
2022-02-24Merge tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull block fixes from Jens Axboe: - NVMe pull request: - send H2CData PDUs based on MAXH2CDATA (Varun Prakash) - fix passthrough to namespaces with unsupported features (Christoph Hellwig) - Clear iocb->private at poll completion (Stefano) * tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block: nvme-tcp: send H2CData PDUs based on MAXH2CDATA nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info nvme: don't return an error from nvme_configure_metadata block: clear iocb->private in blkdev_bio_end_io_async()
2022-02-24Merge branch 'cpufreq/arm/fixes' of ↵Rafael J. Wysocki1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq fixes for 5.18-rc6 from Viresh Kumar: "This fixes issues related to throttle IRQ for Qcom SoCs." * 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: qcom-hw: Delay enabling throttle_irq cpufreq: Reintroduce ready() callback
2022-02-23Merge tag 'slab-for-5.17-rc6' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fixes from Vlastimil Babka: - Build fix (workaround) for clang. - Fix a /proc/kcore based slabinfo script broken by struct slab changes in 5.17-rc1. * tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: tools/cgroup/slabinfo: update to work with struct slab slab: remove __alloc_size attribute from __kmalloc_track_caller
2022-02-23nvme-tcp: send H2CData PDUs based on MAXH2CDATAVarun Prakash1-0/+1
As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3) Maximum Host to Controller Data length (MAXH2CDATA): Specifies the maximum number of PDU-Data bytes per H2CData PDU in bytes. This value is a multiple of dwords and should be no less than 4,096. Current code sets H2CData PDU data_length to r2t_length, it does not check MAXH2CDATA value. Fix this by setting H2CData PDU data_length to min(req->h2cdata_left, queue->maxh2cdata). Also validate MAXH2CDATA value returned by target in ICResp PDU, if it is not a multiple of dword or if it is less than 4096 return -EINVAL from nvme_tcp_init_connection(). Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-21nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios propertyChristophe Kerello1-1/+3
Wp-gpios property can be used on NVMEM nodes and the same property can be also used on MTD NAND nodes. In case of the wp-gpios property is defined at NAND level node, the GPIO management is done at NAND driver level. Write protect is disabled when the driver is probed or resumed and is enabled when the driver is released or suspended. When no partitions are defined in the NAND DT node, then the NAND DT node will be passed to NVMEM framework. If wp-gpios property is defined in this node, the GPIO resource is taken twice and the NAND controller driver fails to probe. It would be possible to set config->wp_gpio at MTD level before calling nvmem_register function but NVMEM framework will toggle this GPIO on each write when this GPIO should only be controlled at NAND level driver to ensure that the Write Protect has not been enabled. A way to fix this conflict is to add a new boolean flag in nvmem_config named ignore_wp. In case ignore_wp is set, the GPIO resource will be managed by the provider. Fixes: 2a127da461a9 ("nvmem: add support for the write-protect pin") Cc: stable@vger.kernel.org Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20220220151432.16605-2-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21slab: remove __alloc_size attribute from __kmalloc_track_callerGreg Kroah-Hartman1-2/+1
Commit c37495d6254c ("slab: add __alloc_size attributes for better bounds checking") added __alloc_size attributes to a bunch of kmalloc function prototypes. Unfortunately the change to __kmalloc_track_caller seems to cause clang to generate broken code and the first time this is called when booting, the box will crash. While the compiler problems are being reworked and attempted to be solved [1], let's just drop the attribute to solve the issue now. Once it is resolved it can be added back. [1] https://github.com/ClangBuiltLinux/linux/issues/1599 Fixes: c37495d6254c ("slab: add __alloc_size attributes for better bounds checking") Cc: stable <stable@vger.kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Nathan Chancellor <nathan@kernel.org> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: llvm@lists.linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220218131358.3032912-1-gregkh@linuxfoundation.org
2022-02-20Merge tag 'sched_urgent_for_v5.17_rc5' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: "Fix task exposure order when forking tasks" * tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix yet more sched_fork() races
2022-02-19sched: Fix yet more sched_fork() racesPeter Zijlstra1-2/+2
Where commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group") fixed a fork race vs cgroup, it opened up a race vs syscalls by not placing the task on the runqueue before it gets exposed through the pidhash. Commit 13765de8148f ("sched/fair: Fix fault in reweight_entity") is trying to fix a single instance of this, instead fix the whole class of issues, effectively reverting this commit. Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org> Tested-by: Zhang Qiao <zhangqiao22@huawei.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net
2022-02-18Merge tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+2
Pull block fixes from Jens Axboe: - Surprise removal fix (Christoph) - Ensure that pages are zeroed before submitted for userspace IO (Haimin) - Fix blk-wbt accounting issue with BFQ (Laibin) - Use bsize for discard granularity in loop (Ming) - Fix missing zone handling in blk_complete_request() (Pankaj) * tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block: block/wbt: fix negative inflight counter when remove scsi device block: fix surprise removal for drivers calling blk_set_queue_dying block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern block: loop:use kstatfs.f_bsize of backing file to set discard granularity block: Add handling for zone append command in blk_complete_request
2022-02-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski1-5/+4
Alexei Starovoitov says: ==================== pull-request: bpf 2022-02-17 We've added 8 non-merge commits during the last 7 day(s) which contain a total of 8 files changed, 119 insertions(+), 15 deletions(-). The main changes are: 1) Add schedule points in map batch ops, from Eric. 2) Fix bpf_msg_push_data with len 0, from Felix. 3) Fix crash due to incorrect copy_map_value, from Kumar. 4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar. 5) Fix a bpf_timer initialization issue with clang, from Yonghong. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add schedule points in batch ops bpf: Fix crash due to out of bounds access into reg2btf_ids. selftests: bpf: Check bpf_msg_push_data return value bpf: Fix a bpf_timer initialization issue bpf: Emit bpf_timer in vmlinux BTF selftests/bpf: Add test for bpf_timer overwriting crash bpf: Fix crash due to incorrect copy_map_value bpf: Do not try bpf_msg_push_data with len 0 ==================== Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17Merge tag 'net-5.17-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and netfilter. Current release - regressions: - dsa: lantiq_gswip: fix use after free in gswip_remove() - smc: avoid overwriting the copies of clcsock callback functions Current release - new code bugs: - iwlwifi: - fix use-after-free when no FW is present - mei: fix the pskb_may_pull check in ipv4 - mei: retry mapping the shared area - mvm: don't feed the hardware RFKILL into iwlmei Previous releases - regressions: - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr() - tipc: fix wrong publisher node address in link publications - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW assertion - bgmac: make idm and nicpm resource optional again - atl1c: fix tx timeout after link flap Previous releases - always broken: - vsock: remove vsock from connected table when connect is interrupted by a signal - ping: change destination interface checks to match raw sockets - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing semantics (and null-deref) after SO_RESERVE_MEM was added - ipv6: make exclusive flowlabel checks per-netns - bonding: force carrier update when releasing slave - sched: limit TC_ACT_REPEAT loops - bridge: multicast: notify switchdev driver whenever MC processing gets disabled because of max entries reached - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found - iwlwifi: fix locking when "HW not ready" - phy: mediatek: remove PHY mode check on MT7531 - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN - dsa: lan9303: - fix polarity of reset during probe - fix accelerated VLAN handling" * tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) bonding: force carrier update when releasing slave nfp: flower: netdev offload check for ip6gretap ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt ipv4: fix data races in fib_alias_hw_flags_set net: dsa: lan9303: add VLAN IDs to master device net: dsa: lan9303: handle hwaccel VLAN tags vsock: remove vsock from connected table when connect is interrupted by a signal Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname" ping: fix the dif and sdif check in ping_lookup net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 net: sched: limit TC_ACT_REPEAT loops tipc: fix wrong notification node addresses net: dsa: lantiq_gswip: fix use after free in gswip_remove() ipv6: per-netns exclusive flowlabel checks net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled CDC-NCM: avoid overflow in sanity checking mctp: fix use after free net: mscc: ocelot: fix use-after-free in ocelot_vlan_del() bonding: fix data-races around agg_select_timer dpaa2-eth: Initialize mutex used in one step timestamping path ...
2022-02-17block: fix surprise removal for drivers calling blk_set_queue_dyingChristoph Hellwig1-1/+2
Various block drivers call blk_set_queue_dying to mark a disk as dead due to surprise removal events, but since commit 8e141f9eb803 that doesn't work given that the GD_DEAD flag needs to be set to stop I/O. Replace the driver calls to blk_set_queue_dying with a new (and properly documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the only remaining caller. Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-15Merge tag 'hyperv-fixes-signed-20220215' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Rework use of DMA_BIT_MASK in vmbus to work around a clang bug (Michael Kelley) - Fix NUMA topology (Long Li) - Fix a memory leak in vmbus (Miaoqian Lin) - One minor clean-up patch (Cai Huoqing) * tag 'hyperv-fixes-signed-20220215' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: utils: Make use of the helper macro LIST_HEAD() Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64) Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
2022-02-14net_sched: add __rcu annotation to netdev->qdiscEric Dumazet1-1/+1
syzbot found a data-race [1] which lead me to add __rcu annotations to netdev->qdisc, and proper accessors to get LOCKDEP support. [1] BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1: attach_default_qdiscs net/sched/sch_generic.c:1167 [inline] dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221 __dev_open+0x2e9/0x3a0 net/core/dev.c:1416 __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139 rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150 __rtnl_newlink net/core/rtnetlink.c:3489 [inline] rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0: qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323 __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050 tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211 rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 470502de5bdb ("net: sched: unlock rules update API") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14swiotlb: fix info leak with DMA_FROM_DEVICEHalil Pasic1-0/+8
The problem I'm addressing was discovered by the LTP test covering cve-2018-1000204. A short description of what happens follows: 1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV and a corresponding dxferp. The peculiar thing about this is that TUR is not reading from the device. 2) In sg_start_req() the invocation of blk_rq_map_user() effectively bounces the user-space buffer. As if the device was to transfer into it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()") we make sure this first bounce buffer is allocated with GFP_ZERO. 3) For the rest of the story we keep ignoring that we have a TUR, so the device won't touch the buffer we prepare as if the we had a DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device and the buffer allocated by SG is mapped by the function virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here scatter-gather and not scsi generics). This mapping involves bouncing via the swiotlb (we need swiotlb to do virtio in protected guest like s390 Secure Execution, or AMD SEV). 4) When the SCSI TUR is done, we first copy back the content of the second (that is swiotlb) bounce buffer (which most likely contains some previous IO data), to the first bounce buffer, which contains all zeros. Then we copy back the content of the first bounce buffer to the user-space buffer. 5) The test case detects that the buffer, which it zero-initialized, ain't all zeros and fails. One can argue that this is an swiotlb problem, because without swiotlb we leak all zeros, and the swiotlb should be transparent in a sense that it does not affect the outcome (if all other participants are well behaved). Copying the content of the original buffer into the swiotlb buffer is the only way I can think of to make swiotlb transparent in such scenarios. So let's do just that if in doubt, but allow the driver to tell us that the whole mapped buffer is going to be overwritten, in which case we can preserve the old behavior and avoid the performance impact of the extra bounce. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-14Merge 5.17-rc4 into tty-nextGreg Kroah-Hartman20-46/+169
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-13Merge tag 'objtool_urgent_for_v5.17_rc4' of ↵Linus Torvalds1-16/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: "Fix a case where objtool would mistakenly warn about instructions being unreachable" * tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
2022-02-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-2/+5
Merge misc fixes from Andrew Morton: "5 patches. Subsystems affected by this patch series: binfmt, procfs, and mm (vmscan, memcg, and kfence)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kfence: make test case compatible with run time set sample interval mm: memcg: synchronize objcg lists with a dedicated spinlock mm: vmscan: remove deadlock due to throttling failing to make progress fs/proc: task_mmu.c: don't read mapcount for migration entry fs/binfmt_elf: fix PT_LOAD p_align values for loaders
2022-02-12kfence: make test case compatible with run time set sample intervalPeng Liu1-0/+2
The parameter kfence_sample_interval can be set via boot parameter and late shell command, which is convenient for automated tests and KFENCE parameter optimization. However, KFENCE test case just uses compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test case not run as users desired. Export kfence_sample_interval, so that KFENCE test case can use run-time-set sample interval. Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.com Signed-off-by: Peng Liu <liupeng256@huawei.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Christian Knig <christian.koenig@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-12mm: memcg: synchronize objcg lists with a dedicated spinlockRoman Gushchin1-2/+3
Alexander reported a circular lock dependency revealed by the mmap1 ltp test: LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1)) WARNING: possible circular locking dependency detected 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted ------------------------------------------------------ mmap1/202299 is trying to acquire lock: 00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0 but task is already holding lock: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sighand->siglock){-.-.}-{2:2}: __lock_acquire+0x604/0xbd8 lock_acquire.part.0+0xe2/0x238 lock_acquire+0xb0/0x200 _raw_spin_lock_irqsave+0x6a/0xd8 __lock_task_sighand+0x90/0x190 cgroup_freeze_task+0x2e/0x90 cgroup_migrate_execute+0x11c/0x608 cgroup_update_dfl_csses+0x246/0x270 cgroup_subtree_control_write+0x238/0x518 kernfs_fop_write_iter+0x13e/0x1e0 new_sync_write+0x100/0x190 vfs_write+0x22c/0x2d8 ksys_write+0x6c/0xf8 __do_syscall+0x1da/0x208 system_call+0x82/0xb0 -> #0 (css_set_lock){..-.}-{2:2}: check_prev_add+0xe0/0xed8 validate_chain+0x736/0xb20 __lock_acquire+0x604/0xbd8 lock_acquire.part.0+0xe2/0x238 lock_acquire+0xb0/0x200 _raw_spin_lock_irqsave+0x6a/0xd8 obj_cgroup_release+0x4a/0xe0 percpu_ref_put_many.constprop.0+0x150/0x168 drain_obj_stock+0x94/0xe8 refill_obj_stock+0x94/0x278 obj_cgroup_charge+0x164/0x1d8 kmem_cache_alloc+0xac/0x528 __sigqueue_alloc+0x150/0x308 __send_signal+0x260/0x550 send_signal+0x7e/0x348 force_sig_info_to_task+0x104/0x180 force_sig_fault+0x48/0x58 __do_pgm_check+0x120/0x1f0 pgm_check_handler+0x11e/0x180 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sighand->siglock); lock(css_set_lock); lock(&sighand->siglock); lock(css_set_lock); *** DEADLOCK *** 2 locks held by mmap1/202299: #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180 #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168 stack backtrace: CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Hardware name: IBM 3906 M04 704 (LPAR) Call Trace: dump_stack_lvl+0x76/0x98 check_noncircular+0x136/0x158 check_prev_add+0xe0/0xed8 validate_chain+0x736/0xb20 __lock_acquire+0x604/0xbd8 lock_acquire.part.0+0xe2/0x238 lock_acquire+0xb0/0x200 _raw_spin_lock_irqsave+0x6a/0xd8 obj_cgroup_release+0x4a/0xe0 percpu_ref_put_many.constprop.0+0x150/0x168 drain_obj_stock+0x94/0xe8 refill_obj_stock+0x94/0x278 obj_cgroup_charge+0x164/0x1d8 kmem_cache_alloc+0xac/0x528 __sigqueue_alloc+0x150/0x308 __send_signal+0x260/0x550 send_signal+0x7e/0x348 force_sig_info_to_task+0x104/0x180 force_sig_fault+0x48/0x58 __do_pgm_check+0x120/0x1f0 pgm_check_handler+0x11e/0x180 INFO: lockdep is turned off. In this example a slab allocation from __send_signal() caused a refilling and draining of a percpu objcg stock, resulted in a releasing of another non-related objcg. Objcg release path requires taking the css_set_lock, which is used to synchronize objcg lists. This can create a circular dependency with the sighandler lock, which is taken with the locked css_set_lock by the freezer code (to freeze a task). In general it seems that using css_set_lock to synchronize objcg lists makes any slab allocations and deallocation with the locked css_set_lock and any intervened locks risky. To fix the problem and make the code more robust let's stop using css_set_lock to synchronize objcg lists and use a new dedicated spinlock instead. Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-12bpf: Fix a bpf_timer initialization issueYonghong Song1-4/+2
The patch in [1] intends to fix a bpf_timer related issue, but the fix caused existing 'timer' selftest to fail with hang or some random errors. After some debug, I found an issue with check_and_init_map_value() in the hashtab.c. More specifically, in hashtab.c, we have code l_new = bpf_map_kmalloc_node(&htab->map, ...) check_and_init_map_value(&htab->map, l_new...) Note that bpf_map_kmalloc_node() does not do initialization so l_new contains random value. The function check_and_init_map_value() intends to zero the bpf_spin_lock and bpf_timer if they exist in the map. But I found bpf_spin_lock is zero'ed but bpf_timer is not zero'ed. With [1], later copy_map_value() skips copying of bpf_spin_lock and bpf_timer. The non-zero bpf_timer caused random failures for 'timer' selftest. Without [1], for both bpf_spin_lock and bpf_timer case, bpf_timer will be zero'ed, so 'timer' self test is okay. For check_and_init_map_value(), why bpf_spin_lock is zero'ed properly while bpf_timer not. In bpf uapi header, we have struct bpf_spin_lock { __u32 val; }; struct bpf_timer { __u64 :64; __u64 :64; } __attribute__((aligned(8))); The initialization code: *(struct bpf_spin_lock *)(dst + map->spin_lock_off) = (struct bpf_spin_lock){}; *(struct bpf_timer *)(dst + map->timer_off) = (struct bpf_timer){}; It appears the compiler has no obligation to initialize anonymous fields. For example, let us use clang with bpf target as below: $ cat t.c struct bpf_timer { unsigned long long :64; }; struct bpf_timer2 { unsigned long long a; }; void test(struct bpf_timer *t) { *t = (struct bpf_timer){}; } void test2(struct bpf_timer2 *t) { *t = (struct bpf_timer2){}; } $ clang -target bpf -O2 -c -g t.c $ llvm-objdump -d t.o ... 0000000000000000 <test>: 0: 95 00 00 00 00 00 00 00 exit 0000000000000008 <test2>: 1: b7 02 00 00 00 00 00 00 r2 = 0 2: 7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2 3: 95 00 00 00 00 00 00 00 exit gcc11.2 does not have the above issue. But from INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x Programming languages — C http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf page 157: Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization. To fix the problem, let use memset for bpf_timer case in check_and_init_map_value(). For consistency, memset is also used for bpf_spin_lock case. [1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/ Fixes: 68134668c17f3 ("bpf: Add map side support for bpf timers.") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220211194953.3142152-1-yhs@fb.com
2022-02-12bpf: Fix crash due to incorrect copy_map_valueKumar Kartikeya Dwivedi1-1/+2
When both bpf_spin_lock and bpf_timer are present in a BPF map value, copy_map_value needs to skirt both objects when copying a value into and out of the map. However, the current code does not set both s_off and t_off in copy_map_value, which leads to a crash when e.g. bpf_spin_lock is placed in map value with bpf_timer, as bpf_map_update_elem call will be able to overwrite the other timer object. When the issue is not fixed, an overwriting can produce the following splat: [root@(none) bpf]# ./test_progs -t timer_crash [ 15.930339] bpf_testmod: loading out-of-tree module taints kernel. [ 16.037849] ================================================================== [ 16.038458] BUG: KASAN: user-memory-access in __pv_queued_spin_lock_slowpath+0x32b/0x520 [ 16.038944] Write of size 8 at addr 0000000000043ec0 by task test_progs/325 [ 16.039399] [ 16.039514] CPU: 0 PID: 325 Comm: test_progs Tainted: G OE 5.16.0+ #278 [ 16.039983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014 [ 16.040485] Call Trace: [ 16.040645] <TASK> [ 16.040805] dump_stack_lvl+0x59/0x73 [ 16.041069] ? __pv_queued_spin_lock_slowpath+0x32b/0x520 [ 16.041427] kasan_report.cold+0x116/0x11b [ 16.041673] ? __pv_queued_spin_lock_slowpath+0x32b/0x520 [ 16.042040] __pv_queued_spin_lock_slowpath+0x32b/0x520 [ 16.042328] ? memcpy+0x39/0x60 [ 16.042552] ? pv_hash+0xd0/0xd0 [ 16.042785] ? lockdep_hardirqs_off+0x95/0xd0 [ 16.043079] __bpf_spin_lock_irqsave+0xdf/0xf0 [ 16.043366] ? bpf_get_current_comm+0x50/0x50 [ 16.043608] ? jhash+0x11a/0x270 [ 16.043848] bpf_timer_cancel+0x34/0xe0 [ 16.044119] bpf_prog_c4ea1c0f7449940d_sys_enter+0x7c/0x81 [ 16.044500] bpf_trampoline_6442477838_0+0x36/0x1000 [ 16.044836] __x64_sys_nanosleep+0x5/0x140 [ 16.045119] do_syscall_64+0x59/0x80 [ 16.045377] ? lock_is_held_type+0xe4/0x140 [ 16.045670] ? irqentry_exit_to_user_mode+0xa/0x40 [ 16.046001] ? mark_held_locks+0x24/0x90 [ 16.046287] ? asm_exc_page_fault+0x1e/0x30 [ 16.046569] ? asm_exc_page_fault+0x8/0x30 [ 16.046851] ? lockdep_hardirqs_on+0x7e/0x100 [ 16.047137] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.047405] RIP: 0033:0x7f9e4831718d [ 16.047602] Code: b4 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 6c 0c 00 f7 d8 64 89 01 48 [ 16.048764] RSP: 002b:00007fff488086b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000023 [ 16.049275] RAX: ffffffffffffffda RBX: 00007f9e48683740 RCX: 00007f9e4831718d [ 16.049747] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff488086d0 [ 16.050225] RBP: 00007fff488086f0 R08: 00007fff488085d7 R09: 00007f9e4cb594a0 [ 16.050648] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f9e484cde30 [ 16.051124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 16.051608] </TASK> [ 16.051762] ================================================================== Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com
2022-02-11Merge tag 'acpi-5.17-rc4' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These revert two commits that turned out to be problematic and fix two issues related to wakeup from suspend-to-idle on x86. Specifics: - Revert a recent change that attempted to avoid issues with conflicting address ranges during PCI initialization, because it turned out to introduce a regression (Hans de Goede). - Revert a change that limited EC GPE wakeups from suspend-to-idle to systems based on Intel hardware, because it turned out that systems based on hardware from other vendors depended on that functionality too (Mario Limonciello). - Fix two issues related to the handling of wakeup interrupts and wakeup events signaled through the EC GPE during suspend-to-idle on x86 (Rafael Wysocki)" * tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: x86/PCI: revert "Ignore E820 reservations for bridge windows on newer systems" PM: s2idle: ACPI: Fix wakeup interrupts handling ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
2022-02-09Merge tag 'nfsd-5.17-2' of ↵Linus Torvalds1-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd fixes from Chuck Lever: "Ensure that NFS clients cannot send file size or offset values that can cause the NFS server to crash or to return incorrect or surprising results. In particular, fix how the NFS server handles values larger than OFFSET_MAX" * tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Deprecate NFS_OFFSET_MAX NFSD: Fix offset type in I/O trace points NFSD: COMMIT operations must not return NFS?ERR_INVAL NFSD: Clamp WRITE offsets NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes NFSD: Fix ia_size underflow NFSD: Fix the behavior of READ near OFFSET_MAX
2022-02-09NFSD: Deprecate NFS_OFFSET_MAXChuck Lever1-8/+0
NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there was a kernel-wide OFFSET_MAX value. As a clean up, replace the last few uses of it with its generic equivalent, and get rid of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09cpufreq: Reintroduce ready() callbackBjorn Andersson1-0/+3
This effectively revert '4bf8e582119e ("cpufreq: Remove ready() callback")', in order to reintroduce the ready callback. This is needed in order to be able to leave the thermal pressure interrupts in the Qualcomm CPUfreq driver disabled during initialization, so that it doesn't fire while related_cpus are still 0. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> [ Viresh: Added the Chinese translation as well and updated commit msg ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2022-02-08Merge tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds1-0/+1
Pull NFS client fixes from Anna Schumaker: "Stable Fixes: - Fix initialization of nfs_client cl_flags Other Fixes: - Fix performance issues with uncached readdir calls - Fix potential pointer dereferences in rpcrdma_ep_create - Fix nfs4_proc_get_locations() kernel-doc comment - Fix locking during sunrpc sysfs reads - Update my email address in the MAINTAINERS file to my new kernel.org email" * tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: lock against ->sock changing during sysfs read MAINTAINERS: Update my email address NFS: Fix nfs4_proc_get_locations() kernel-doc comment xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create NFS: Fix initialisation of nfs_client cl_flags field NFS: Avoid duplicate uncached readdir calls on eof NFS: Don't skip directory entries when doing uncached readdir NFS: Don't overfill uncached readdir pages
2022-02-07PM: s2idle: ACPI: Fix wakeup interrupts handlingRafael J. Wysocki1-2/+2
After commit e3728b50cd9b ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE") wakeup interrupts occurring immediately after the one discarded by acpi_s2idle_wake() may be missed. Moreover, if the SCI triggers again immediately after the rearming in acpi_s2idle_wake(), that wakeup may be missed too. The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup() when pm_wakeup_irq is 0, but that's not the case any more after the interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is cleared by the pm_wakeup_clear() call in s2idle_loop(). However, there may be wakeup interrupts occurring in that time frame and if that happens, they will be missed. To address that issue first move the clearing of pm_wakeup_irq to the point at which it is known that the interrupt causing acpi_s2idle_wake() to tun will be discarded, before rearming the SCI for wakeup. Moreover, because that only reduces the size of the time window in which the issue may manifest itself, allow pm_system_irq_wakeup() to register two second wakeup interrupts in a row and, when discarding the first one, replace it with the second one. [Of course, this assumes that only one wakeup interrupt can be discarded in one go, but currently that is the case and I am not aware of any plans to change that.] Fixes: e3728b50cd9b ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE") Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-07Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64)Michael Kelley1-0/+1
Using DMA_BIT_MASK(64) as an initializer for a global variable causes problems with Clang 12.0.1. The compiler doesn't understand that value 64 is excluded from the shift at compile time, resulting in a build error. While this is a compiler problem, avoid the issue by setting up the dma_mask memory as part of struct hv_device, and initialize it using dma_set_mask(). Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Vitaly Chikunov <vt@altlinux.org> Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: 743b237c3a7b ("scsi: storvsc: Add Isolation VM support for storvsc driver") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/1644176216-12531-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-02-07ata: libata-core: Fix ata_dev_config_cpr()Damien Le Moal1-1/+1
The concurrent positioning ranges log page 47h is a general purpose log page and not a subpage of the indentify device log. Using ata_identify_page_supported() to test for concurrent positioning ranges support is thus wrong. ata_log_supported() must be used. Furthermore, unlike other advanced ATA features (e.g. NCQ priority), accesses to the concurrent positioning ranges log page are not gated by a feature bit from the device IDENTIFY data. Since many older drives react badly to the READ LOG EXT and/or READ LOG DMA EXT commands isued to read device log pages, avoid problems with older drives by limiting the concurrent positioning ranges support detection to drives implementing at least the ACS-4 ATA standard (major version 11). This additional condition effectively turns ata_dev_config_cpr() into a nop for older drives, avoiding problems in the field. Fixes: fe22e1c2f705 ("libata: support concurrent positioning ranges log") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215519 Cc: stable@vger.kernel.org Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Abderraouf Adjal <adjal.arf@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-02-06Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds1-9/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Various bug fixes for ext4 fast commit and inline data handling. Also fix regression introduced as part of moving to the new mount API" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: fs/ext4: fix comments mentioning i_mutex ext4: fix incorrect type issue during replay_del_range jbd2: fix kernel-doc descriptions for jbd2_journal_shrink_{scan,count}() ext4: fix potential NULL pointer dereference in ext4_fill_super() jbd2: refactor wait logic for transaction updates into a common function jbd2: cleanup unused functions declarations from jbd2.h ext4: fix error handling in ext4_fc_record_modified_inode() ext4: remove redundant max inline_size check in ext4_da_write_inline_data_begin() ext4: fix error handling in ext4_restore_inline_data() ext4: fast commit may miss file actions ext4: fast commit may not fallback for ineligible commit ext4: modify the logic of ext4_mb_new_blocks_simple ext4: prevent used blocks from being allocated during fast commit replay
2022-02-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-3/+109
Pull kvm fixes from Paolo Bonzini: "ARM: - A couple of fixes when handling an exception while a SError has been delivered - Workaround for Cortex-A510's single-step erratum RISC-V: - Make CY, TM, and IR counters accessible in VU mode - Fix SBI implementation version x86: - Report deprecation of x87 features in supported CPUID - Preparation for fixing an interrupt delivery race on AMD hardware - Sparse fix All except POWER and s390: - Rework guest entry code to correctly mark noinstr areas and fix vtime' accounting (for x86, this was already mostly correct but not entirely; for ARM, MIPS and RISC-V it wasn't)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer KVM: x86: Report deprecated x87 features in supported CPUID KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs KVM: arm64: Avoid consuming a stale esr value when SError occur RISC-V: KVM: Fix SBI implementation version RISC-V: KVM: make CY, TM, and IR counters accessible in VU mode kvm/riscv: rework guest entry logic kvm/arm64: rework guest entry logic kvm/x86: rework guest entry logic kvm/mips: rework guest entry logic kvm: add guest_state_{enter,exit}_irqoff() KVM: x86: Move delivery of non-APICv interrupt into vendor code kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h
2022-02-05Merge tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-0/+2
Pull iomap fix from Darrick Wong: "A single bugfix for iomap. The fix should eliminate occasional complaints about stall warnings when a lot of writeback IO completes all at once and we have to then go clearing status on a large number of folios. Summary: - Limit the length of ioend chains in writeback so that we don't trip the softlockup watchdog and to limit long tail latency on clearing PageWriteback" * tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs, iomap: limit individual ioend chain lengths in writeback
2022-02-05Merge tag 'kvmarm-fixes-5.17-2' of ↵Paolo Bonzini324-5381/+8994
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #2 - A couple of fixes when handling an exception while a SError has been delivered - Workaround for Cortex-A510's single-step[ erratum
2022-02-04Merge tag 'ata-5.17-rc3' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA fixes from Damien Le Moal: - Sergey volunteered to be a reviewer for the Renesas R-Car SATA driver and PATA drivers. Update the MAINTAINERS file accordingly. - Regression fix: add a horkage flag to prevent accessing the log directory log page with SATADOM-ML 3ME SATA devices as they react badly to reading that log page (from Anton). * tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage MAINTAINERS: add myself as Renesas R-Car SATA driver reviewer MAINTAINERS: add myself as PATA drivers reviewer
2022-02-04Merge tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-1/+1
Pull drm fixes from Dave Airlie: "Regular fixes for the week. Daniel has agreed to bring back the fbcon hw acceleration under a CONFIG option for the non-drm fbdev users, we don't advise turning this on unless you are in the niche that is old fbdev drivers, Since it's essentially a revert and shouldn't be high impact seemed like a good time to do it now. Otherwise, i915 and amdgpu fixes are most of it, along with some minor fixes elsewhere. fbdev: - readd fbcon acceleration i915: - fix DP monitor via type-c dock - fix for engine busyness and read timeout with GuC - use ALLOW_FAIL for error capture buffer allocs - don't use interruptible lock on error paths - smatch fix to reject zero sized overlays. amdgpu: - mGPU fan boost fix for beige goby - S0ix fixes - Cyan skillfish hang fix - DCN fixes for DCN 3.1 - DCN fixes for DCN 3.01 - Apple retina panel fix - ttm logic inversion fix dma-buf: - heaps: fix potential spectre v1 gadget kmb: - fix potential oob access mxsfb: - fix NULL ptr deref nouveau: - fix potential oob access during BIOS decode" * tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm: (24 commits) drm: mxsfb: Fix NULL pointer dereference drm/amdgpu: fix logic inversion in check drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels drm/amd/display: revert "Reset fifo after enable otg" drm/amd/display: watermark latencies is not enough on DCN31 drm/amd/display: Update watermark values for DCN301 drm/amdgpu: fix a potential GPU hang on cyan skillfish drm/amd: Only run s3 or s0ix if system is configured properly drm/amd: add support to check whether the system is set to s3 fbcon: Add option to enable legacy hardware acceleration Revert "fbcon: Disable accelerated scrolling" Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)" drm/i915/pmu: Fix KMD and GuC race on accessing busyness dma-buf: heaps: Fix potential spectre v1 gadget drm/amd: Warn users about potential s0ix problems drm/amd/pm: correct the MGpuFanBoost support for Beige Goby drm/nouveau: fix off by one in BIOS boundary checking drm/i915/adlp: Fix TypeC PHY-ready status readout drm/i915/pmu: Use PM timestamp instead of RING TIMESTAMP for reference ...
2022-02-04Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-0/+20
Merge misc fixes from Andrew Morton: "10 patches. Subsystems affected by this patch series: ipc, MAINTAINERS, and mm (vmscan, debug, pagemap, kmemleak, and selftests)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner" MAINTAINERS: update rppt's email mm/kmemleak: avoid scanning potential huge holes ipc/sem: do not sleep with a spin lock held mm/pgtable: define pte_index so that preprocessor could recognize it mm/page_table_check: check entries at pmd levels mm/khugepaged: unify collapse pmd clear, flush and free mm/page_table_check: use unsigned long for page counters and cleanup mm/debug_vm_pgtable: remove pte entry from the page table Revert "mm/page_isolation: unset migratetype directly for non Buddy page"
2022-02-04Merge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-clientLinus Torvalds2-0/+6
Pull ceph fixes from Ilya Dryomov: "A patch to make it possible to disable zero copy path in the messenger to avoid checksum or authentication tag mismatches and ensuing session resets in case the destination buffer isn't guaranteed to be stable" * tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client: libceph: optionally use bounce buffer on recv path in crc mode libceph: make recv path in secure mode work the same as send path
2022-02-04Merge tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-0/+7
Pull cifs fixes from Steve French: "SMB3 client fixes including: - multiple fscache related fixes, reenabling ability to read/write to cached files for cifs.ko (that was temporarily disabled for cifs.ko a few weeks ago due to the recent fscache changes) - also includes a new fscache helper function ("query_occupancy") used by above - fix for multiuser mounts and NTLMSSP auth (workstation name) for stable - fix locking ordering problem in multichannel code - trivial malformed comment fix" * tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix workstation_name for multiuser mounts Invalidate fscache cookie only when inode attributes are changed. cifs: Fix the readahead conversion to manage the batch when reading from cache cifs: Implement cache I/O by accessing the cache directly netfs, cachefiles: Add a method to query presence of data in the cache cifs: Transition from ->readpages() to ->readahead() cifs: unlock chan_lock before calling cifs_put_tcp_session Fix a warning about a malformed kernel doc comment in cifs