summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-08-24mmc: sdio: Don't warn about vendor CIS tuplesSean Anderson1-5/+15
CIS tuples in the range 0x80-0x8F are reserved for vendors. Some devices have tuples in this range which get warned about every boot. Since this is normal behavior, don't print these tuples unless debug is enabled. Unfortunately, we cannot use a variable for the format string since it gets pasted by pr_*_ratelimited. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Link: https://lore.kernel.org/r/20210726163654.1110969-1-sean.anderson@seco.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24memstick: ms_block: Fix spelling contraction "cant" -> "can't"Colin Ian King1-1/+1
There is a spelling mistake in a pr_err message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210728103254.171546-1-colin.king@canonical.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: core: Only print retune error when we don't check for card removalWolfram Sang1-5/+7
Skip printing a retune error when we scan for a removed card because we then expect a failed command. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20210630041658.7574-1-wsa+renesas@sang-engineering.com [Ulf: Rebased patch] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: core: Store pointer to bio_crypt_ctx in mmc_requestEric Biggers3-15/+10
Make 'struct mmc_request' contain a pointer to the request's 'struct bio_crypt_ctx' directly, instead of extracting a 32-bit DUN from it which is a cqhci-crypto specific detail. This keeps the cqhci crypto specific details in the cqhci module, and it makes mmc_core and mmc_block ready for MMC crypto hardware that accepts the DUN and/or key in a way that is more flexible than that which will be specified by the eMMC v5.2 standard. Exynos SoCs are an example of such hardware, as their inline encryption hardware takes keys directly (it has no concept of keyslots) and supports 128-bit DUNs. Note that the 32-bit DUN length specified by the standard is very restrictive, so it is likely that more hardware will support longer DUNs despite it not following the standard. Thus, limiting the scope of the 32-bit DUN assumption to the place that actually needs it is warranted. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20210721154738.3966463-1-ebiggers@kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: sdhci-esdhc-imx: Remove unneeded mmc-esdhc-imx.h headerFabio Estevam2-43/+32
After the i.MX conversion to a DT-only platform, the mmc-esdhc-imx.h header file is no longer used outside the driver, so move its content to the sdhci-esdhc-imx driver and remove the header. Signed-off-by: Fabio Estevam <festevam@gmail.com> Link: https://lore.kernel.org/r/20210719193413.3792615-1-festevam@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: core: Avoid hogging the CPU while polling for busy after I/O writesUlf Hansson2-40/+30
When mmc_blk_card_busy() calls card_busy_detect() to poll for the card's state with CMD13, this is done without any delays in between the commands being sent. Rather than fixing card_busy_detect() in this regards, let's instead convert into using the common __mmc_poll_for_busy(), which also helps us to avoid open-coding. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://lore.kernel.org/r/20210702134229.357717-4-ulf.hansson@linaro.org
2021-08-24mmc: core: Avoid hogging the CPU while polling for busy for mmc ioctlsUlf Hansson1-1/+2
When __mmc_blk_ioctl_cmd() calls card_busy_detect() to verify that the card's states moves back into transfer state, the polling with CMD13 is done without any delays in between the commands being sent. Rather than fixing card_busy_detect() in this regards, let's instead convert into using the common mmc_poll_for_busy(), which also helps us to avoid open-coding. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://lore.kernel.org/r/20210702134229.357717-3-ulf.hansson@linaro.org
2021-08-24mmc: core: Avoid hogging the CPU while polling for busy in the I/O err pathUlf Hansson3-2/+5
When mmc_blk_fix_state() sends a CMD12 to try to move the card into the transfer state, it calls card_busy_detect() to poll for the card's state with CMD13. This is done without any delays in between the commands being sent. Rather than fixing card_busy_detect() in this regards, let's instead convert into using the common mmc_poll_for_busy(), which also helps us to avoid open-coding. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://lore.kernel.org/r/20210702134229.357717-2-ulf.hansson@linaro.org
2021-08-24mmc: dw_mmc: Add data CRC error injectionVincent Whitchurch2-0/+80
This driver has had problems when handling data errors. Add fault injection support so that the abort handling can be easily triggered and regression-tested. A hrtimer is used to indicate a data CRC error at various points during the data transfer. Note that for the recent problem with hangs in the case of some data CRC errors, a udelay(10) inserted at the start of send_stop_abort() greatly helped in triggering the error, but I've not included this as part of the fault injection support since it seemed too specific. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Reviewed-by: Jaehoon Chung <jh80.chung@samsung.com> Link: https://lore.kernel.org/r/20210701080534.23138-1-vincent.whitchurch@axis.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: mmc_spi: Simplify busy loop in mmc_spi_skip()Andy Shevchenko1-11/+4
Infinite loops are hard to read and understand because of hidden main loop condition. Simplify such one in mmc_spi_skip(). Using schedule() to schedule (and be friendly to others) is discouraged and cond_resched() should be used instead. Hence, replace schedule() with cond_resched() at the same time. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210623101731.87885-1-andriy.shevchenko@linux.intel.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: mmci: De-assert reset on probeLinus Walleij1-0/+3
If we find a reset handle when probing the MMCI block, make sure the reset is de-asserted. It could happen that a hardware has reset asserted at boot. Cc: Russell King <linux@armlinux.org.uk> Cc: Yann Gautier <yann.gautier@foss.st.com> Cc: Ludovic Barre <ludovic.barre@st.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20210630102408.3543024-1-linus.walleij@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: usdhi6rol0: use proper DMAENGINE API for terminationWolfram Sang1-2/+2
dmaengine_terminate_all() is deprecated in favor of explicitly saying if it should be sync or async. Here, we want dmaengine_terminate_sync() because there is no other synchronization code in the driver to handle an async case. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20210623095734.3046-4-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: sh_mmcif: use proper DMAENGINE API for terminationWolfram Sang1-2/+2
dmaengine_terminate_all() is deprecated in favor of explicitly saying if it should be sync or async. Here, we want dmaengine_terminate_sync() because there is no other synchronization code in the driver to handle an async case. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20210623095734.3046-3-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: renesas_sdhi_sys_dmac: use proper DMAENGINE API for terminationWolfram Sang1-2/+2
dmaengine_terminate_all() is deprecated in favor of explicitly saying if it should be sync or async. Here, we want dmaengine_terminate_sync() because there is no other synchronization code in the driver to handle an async case. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20210623095734.3046-2-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24dt-bindings: mmc: sdhci-msm: Add compatible string for sc7280Shaik Sajida Bhanu1-0/+1
Add sc7280 SoC specific compatible strings for qcom-sdhci controller. Signed-off-by: Shaik Sajida Bhanu <sbhanu@codeaurora.org> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/1623835207-29462-1-git-send-email-sbhanu@codeaurora.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: arasan: Fix the issue in reading tap values from DTSai Krishna Potthuri1-2/+4
'of_property_read_variable_u32_array' function returns number of elements read on success. This patch updates the condition check in the driver to overwrite the tap values from DT if exist. Signed-off-by: Sai Krishna Potthuri <lakshmi.sai.krishna.potthuri@xilinx.com> Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1623753837-21035-8-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: sdhci-of-arasan: Modify data type of the clk_phase arrayManish Narani1-1/+1
Modify the data type of the clk_phase array to u32 to make it compatible with the argument requirement of "of_property_read_variable_u32_array". Addresses-coverity: ("incompatible_param") Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1623753837-21035-7-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: sdhci-of-arasan: Use appropriate type of division macroManish Narani1-1/+1
The division macro DIV_ROUND_CLOSEST takes int values as the argument. However the code here uses unsigned int values for this, which is causing the values comparison with 0 as always true. We can use DIV_ROUND_CLOSEST_ULL instead for the same. Addresses-coverity: ("result_independent_of_operands") Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1623753837-21035-6-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: sdhci-of-arasan: Check return value of non-void funtionsManish Narani1-3/+15
At a couple of places, the return values of the non-void functions were not getting checked. This was reported by the coverity tool. Modify the code to check the return values of the same. Addresses-Coverity: ("check_return") Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1623753837-21035-5-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: sdhci-of-arasan: Skip Auto tuning for DDR50 mode in ZynqMP platformManish Narani1-0/+4
ZynqMP platform does not perform auto tuning in DDR50 mode. Skip the same while the card is operating in DDR50 mode. Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1623753837-21035-4-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: sdhci-of-arasan: Add "SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12" quirk.Manish Narani1-0/+1
Arasan controller supports AUTO CMD12, this patch adds "SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12" quirk to enable auto cmd12 feature. By using auto cmd12 we can also avoid following error message "Got data interrupt even though no data operation in progress" Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1623753837-21035-3-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: sdhci-of-arasan: Modified SD default speed to 19MHz for ZynqMPManish Narani1-0/+18
SD standard speed timing was met only at 19MHz and not 25 MHz, that's why changing driver to 19MHz. The reason for this is when a level shifter is used on the board, timing was met for standard speed only at 19MHz. Since this level shifter is commonly required for high speed modes, the driver is modified to use standard speed of 19Mhz. Signed-off-by: Manish Narani <manish.narani@xilinx.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/1623753837-21035-2-git-send-email-manish.narani@xilinx.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: host: factor out clearing the retune stateWolfram Sang3-4/+8
We have this in two places, so let's have a dedicated function. It is also more readable. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20210624151616.38770-4-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24mmc: host: add kdoc for mmc_retune_{en|dis}ableWolfram Sang1-0/+10
I wanted to use it in a wrong way, so document the intended way. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Link: https://lore.kernel.org/r/20210624151616.38770-3-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-24platform-msi: Add ABI to show msi_irqs of platform devicesBarry Song2-5/+29
PCI devices expose the associated MSI interrupts via sysfs, but platform devices which utilize MSI interrupts do not. This information is important for user space tools to optimize affinity settings. Utilize the generic MSI sysfs facility to expose this information for platform MSI. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210813035628.6844-3-21cnbao@gmail.com
2021-08-24genirq/msi: Move MSI sysfs handling from PCI to MSI coreBarry Song3-115/+148
Move PCI's MSI sysfs code to the irq core so that other busses such as platform can reuse it. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210813035628.6844-2-21cnbao@gmail.com
2021-08-24genirq/cpuhotplug: Demote debug printk to KERN_DEBUGLee Jones1-1/+1
This sort of information is only generally useful when debugging. No need to have these sprinkled through the kernel log otherwise. Real world problem: During pre-release testing these have an affect on performance on real products. To the point where so much logging builds up, that it sets off the watchdog(s) on some high profile consumer devices. Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210816134817.1503661-1-lee.jones@linaro.org
2021-08-24Keep read and write fds with each nlm_fileJ. Bruce Fields7-44/+111
We shouldn't really be using a read-only file descriptor to take a write lock. Most filesystems will put up with it. But NFS, for example, won't. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-24ucounts: Increase ucounts reference counter before the security hookAlexey Gladkov1-6/+6
We need to increment the ucounts reference counter befor security_prepare_creds() because this function may fail and abort_creds() will try to decrement this reference. [ 96.465056][ T8641] FAULT_INJECTION: forcing a failure. [ 96.465056][ T8641] name fail_page_alloc, interval 1, probability 0, space 0, times 0 [ 96.478453][ T8641] CPU: 1 PID: 8641 Comm: syz-executor668 Not tainted 5.14.0-rc6-syzkaller #0 [ 96.487215][ T8641] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 96.497254][ T8641] Call Trace: [ 96.500517][ T8641] dump_stack_lvl+0x1d3/0x29f [ 96.505758][ T8641] ? show_regs_print_info+0x12/0x12 [ 96.510944][ T8641] ? log_buf_vmcoreinfo_setup+0x498/0x498 [ 96.516652][ T8641] should_fail+0x384/0x4b0 [ 96.521141][ T8641] prepare_alloc_pages+0x1d1/0x5a0 [ 96.526236][ T8641] __alloc_pages+0x14d/0x5f0 [ 96.530808][ T8641] ? __rmqueue_pcplist+0x2030/0x2030 [ 96.536073][ T8641] ? lockdep_hardirqs_on_prepare+0x3e2/0x750 [ 96.542056][ T8641] ? alloc_pages+0x3f3/0x500 [ 96.546635][ T8641] allocate_slab+0xf1/0x540 [ 96.551120][ T8641] ___slab_alloc+0x1cf/0x350 [ 96.555689][ T8641] ? kzalloc+0x1d/0x30 [ 96.559740][ T8641] __kmalloc+0x2e7/0x390 [ 96.563980][ T8641] ? kzalloc+0x1d/0x30 [ 96.568029][ T8641] kzalloc+0x1d/0x30 [ 96.571903][ T8641] security_prepare_creds+0x46/0x220 [ 96.577174][ T8641] prepare_creds+0x411/0x640 [ 96.581747][ T8641] __sys_setfsuid+0xe2/0x3a0 [ 96.586333][ T8641] do_syscall_64+0x3d/0xb0 [ 96.590739][ T8641] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 96.596611][ T8641] RIP: 0033:0x445a69 [ 96.600483][ T8641] Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 [ 96.620152][ T8641] RSP: 002b:00007f1054173318 EFLAGS: 00000246 ORIG_RAX: 000000000000007a [ 96.628543][ T8641] RAX: ffffffffffffffda RBX: 00000000004ca4c8 RCX: 0000000000445a69 [ 96.636600][ T8641] RDX: 0000000000000010 RSI: 00007f10541732f0 RDI: 0000000000000000 [ 96.644550][ T8641] RBP: 00000000004ca4c0 R08: 0000000000000001 R09: 0000000000000000 [ 96.652500][ T8641] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004ca4cc [ 96.660631][ T8641] R13: 00007fffffe0b62f R14: 00007f1054173400 R15: 0000000000022000 Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred") Reported-by: syzbot+01985d7909f9468f013c@syzkaller.appspotmail.com Signed-off-by: Alexey Gladkov <legion@kernel.org> Link: https://lkml.kernel.org/r/97433b1742c3331f02ad92de5a4f07d673c90613.1629735352.git.legion@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-08-24ucounts: Fix regression preventing increasing of rlimits in init_user_nsEric W. Biederman1-4/+4
"Ma, XinjianX" <xinjianx.ma@intel.com> reported: > When lkp team run kernel selftests, we found after these series of patches, testcase mqueue: mq_perf_tests > in kselftest failed with following message. > > # selftests: mqueue: mq_perf_tests > # > # Initial system state: > # Using queue path: /mq_perf_tests > # RLIMIT_MSGQUEUE(soft): 819200 > # RLIMIT_MSGQUEUE(hard): 819200 > # Maximum Message Size: 8192 > # Maximum Queue Size: 10 > # Nice value: 0 > # > # Adjusted system state for testing: > # RLIMIT_MSGQUEUE(soft): (unlimited) > # RLIMIT_MSGQUEUE(hard): (unlimited) > # Maximum Message Size: 16777216 > # Maximum Queue Size: 65530 > # Nice value: -20 > # Continuous mode: (disabled) > # CPUs to pin: 3 > # ./mq_perf_tests: mq_open() at 296: Too many open files > not ok 2 selftests: mqueue: mq_perf_tests # exit=1 > ``` > > Test env: > rootfs: debian-10 > gcc version: 9 After investigation the problem turned out to be that ucount_max for the rlimits in init_user_ns was being set to the initial rlimit value. The practical problem is that ucount_max provides a limit that applications inside the user namespace can not exceed. Which means in practice that rlimits that have been converted to use the ucount infrastructure were not able to exceend their initial rlimits. Solve this by setting the relevant values of ucount_max to RLIM_INIFINITY. A limit in init_user_ns is pointless so the code should allow the values to grow as large as possible without riscking an underflow or an overflow. As the ltp test case was a bit of a pain I have reproduced the rlimit failure and tested the fix with the following little C program: > #include <stdio.h> > #include <fcntl.h> > #include <sys/stat.h> > #include <mqueue.h> > #include <sys/time.h> > #include <sys/resource.h> > #include <errno.h> > #include <string.h> > #include <stdlib.h> > #include <limits.h> > #include <unistd.h> > > int main(int argc, char **argv) > { > struct mq_attr mq_attr; > struct rlimit rlim; > mqd_t mqd; > int ret; > > ret = getrlimit(RLIMIT_MSGQUEUE, &rlim); > if (ret != 0) { > fprintf(stderr, "getrlimit(RLIMIT_MSGQUEUE) failed: %s\n", strerror(errno)); > exit(EXIT_FAILURE); > } > printf("RLIMIT_MSGQUEUE %lu %lu\n", > rlim.rlim_cur, rlim.rlim_max); > rlim.rlim_cur = RLIM_INFINITY; > rlim.rlim_max = RLIM_INFINITY; > ret = setrlimit(RLIMIT_MSGQUEUE, &rlim); > if (ret != 0) { > fprintf(stderr, "setrlimit(RLIMIT_MSGQUEUE, RLIM_INFINITY) failed: %s\n", strerror(errno)); > exit(EXIT_FAILURE); > } > > memset(&mq_attr, 0, sizeof(struct mq_attr)); > mq_attr.mq_maxmsg = 65536 - 1; > mq_attr.mq_msgsize = 16*1024*1024 - 1; > > mqd = mq_open("/mq_rlimit_test", O_RDONLY|O_CREAT, 0600, &mq_attr); > if (mqd == (mqd_t)-1) { > fprintf(stderr, "mq_open failed: %s\n", strerror(errno)); > exit(EXIT_FAILURE); > } > ret = mq_close(mqd); > if (ret) { > fprintf(stderr, "mq_close failed; %s\n", strerror(errno)); > exit(EXIT_FAILURE); > } > > return EXIT_SUCCESS; > } Fixes: 6e52a9f0532f ("Reimplement RLIMIT_MSGQUEUE on top of ucounts") Fixes: d7c9e99aee48 ("Reimplement RLIMIT_MEMLOCK on top of ucounts") Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Reported-by: kernel test robot lkp@intel.com Acked-by: Alexey Gladkov <legion@kernel.org> Link: https://lkml.kernel.org/r/87eeajswfc.fsf_-_@disp2133 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-08-24bpf: Fix ringbuf helper function compatibilityDaniel Borkmann1-2/+6
Commit 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") extended check_map_func_compatibility() by enforcing map -> helper function match, but not helper -> map type match. Due to this all of the bpf_ringbuf_*() helper functions could be used with a wrong map type such as array or hash map, leading to invalid access due to type confusion. Also, both BPF_FUNC_ringbuf_{submit,discard} have ARG_PTR_TO_ALLOC_MEM as argument and not a BPF map. Therefore, their check_map_func_compatibility() presence is incorrect since it's only for map type checking. Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: Ryota Shiga (Flatt Security) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-08-23io_uring: add support for IORING_OP_LINKATDmitry Kadashev4-1/+76
IORING_OP_LINKAT behaves like linkat(2) and takes the same flags and arguments. In some internal places 'hardlink' is used instead of 'link' to avoid confusion with the SQE links. Name 'link' conflicts with the existing 'link' member of io_kiocb. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-12-dkadashev@gmail.com [axboe: add splice_fd_in check] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23io_uring: add support for IORING_OP_SYMLINKATDmitry Kadashev4-2/+70
IORING_OP_SYMLINKAT behaves like symlinkat(2) and takes the same flags and arguments. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/io-uring/20210514145259.wtl4xcsp52woi6ab@wittgenstein/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-11-dkadashev@gmail.com [axboe: add splice_fd_in check] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23bio: improve kerneldoc documentation for bio_alloc_kiocb()Jens Axboe1-1/+4
We're missing a description for the 'nr_vecs' parameter. While in there, clarify that freeing a bio allocated through this function must be done from process context. Fixes: 1cbbd31c4ada ("bio: add allocation cache abstraction") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23block: provide bio_clear_hipri() helperJens Axboe3-6/+10
Any case that turns off REQ_HIPRI must also clear BIO_PERCPU_CACHE, as non-polled IO may complete through hard/soft IRQ and hence isn't safe for our polled bio alloc cache. Provide a helper that does just that, and use it in the merging code as well if we split a bio and turn off polling. Fixes: be863b9e4348 ("block: clear BIO_PERCPU_CACHE flag if polling isn't supported") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23block: use the percpu bio cache in __blkdev_direct_IOChristoph Hellwig1-2/+4
Use bio_alloc_kiocb to dip into the percpu cache of bios when the caller asks for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23io_uring: enable use of bio alloc cacheJens Axboe1-1/+1
Mark polled IO as being safe for dipping into the bio allocation cache, in case the targeted bio_set has it enabled. This brings an IOPOLL gen2 Optane QD=128 workload from ~3.2M IOPS to ~3.5M IOPS. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23block: clear BIO_PERCPU_CACHE flag if polling isn't supportedJens Axboe1-1/+4
The bio alloc cache relies on the fact that a polled bio will complete in process context, clear the cacheable flag if we disable polling for a given bio. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23bio: add allocation cache abstractionJens Axboe4-14/+135
Add a per-cpu bio_set cache for bio allocations, enabling us to quickly recycle them instead of going through the slab allocator. This cache isn't IRQ safe, and hence is only really suitable for polled IO. Very simple - keeps a count of bio's in the cache, and maintains a max of 512 with a slack of 64. If we get above max + slack, we drop slack number of bio's. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23fs: add kiocb alloc cache flagJens Axboe1-0/+2
If this kiocb can safely use the polled bio allocation cache, then this flag must be set. Generally this can be set for polled IO, where we will not see IRQ completions of the request. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23bio: optimize initialization of a bioJens Axboe1-2/+30
The memset() used is measurably slower in targeted benchmarks, wasting about 1% of the total runtime, or 50% of the (later) hot path cached bio alloc. Get rid of it and fill in the bio manually. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23io_uring: fix io_try_cancel_userdata race for iowqPavel Begunkov1-2/+3
WARNING: CPU: 1 PID: 5870 at fs/io_uring.c:5975 io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975 CPU: 0 PID: 5870 Comm: iou-wrk-5860 Not tainted 5.14.0-rc6-next-20210820-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975 Call Trace: io_async_cancel fs/io_uring.c:6014 [inline] io_issue_sqe+0x22d5/0x65a0 fs/io_uring.c:6407 io_wq_submit_work+0x1dc/0x300 fs/io_uring.c:6511 io_worker_handle_work+0xa45/0x1840 fs/io-wq.c:533 io_wqe_worker+0x2cc/0xbb0 fs/io-wq.c:582 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 io_try_cancel_userdata() can be called from io_async_cancel() executing in the io-wq context, so the warning fires, which is there to alert anyone accessing task->io_uring->io_wq in a racy way. However, io_wq_put_and_exit() always first waits for all threads to complete, so the only detail left is to zero tctx->io_wq after the context is removed. note: one little assumption is that when IO_WQ_WORK_CANCEL, the executor won't touch ->io_wq, because io_wq_destroy() might cancel left pending requests in such a way. Cc: stable@vger.kernel.org Reported-by: syzbot+b0c9d1588ae92866515f@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dfdd37a80cfa9ffd3e59538929c99cdd55d8699e.1629721757.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23io_uring: add support for IORING_OP_MKDIRATDmitry Kadashev2-0/+61
IORING_OP_MKDIRAT behaves like mkdirat(2) and takes the same flags and arguments. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-10-dkadashev@gmail.com [axboe: add splice_fd_in check] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23namei: update do_*() helpers to return intsDmitry Kadashev2-8/+8
Update the following to return int rather than long, for uniformity with the rest of the do_* helpers in namei.c: * do_rmdir() * do_unlinkat() * do_mkdirat() * do_mknodat() * do_symlinkat() Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/io-uring/20210514143202.dmzfcgz5hnauy7ze@wittgenstein/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-9-dkadashev@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23namei: make do_linkat() take struct filenameDmitry Kadashev1-16/+29
Pass in the struct filename pointers instead of the user string, for uniformity with do_renameat2, do_unlinkat, do_mknodat, etc. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/io-uring/20210330071700.kpjoyp5zlni7uejm@wittgenstein/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-8-dkadashev@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23namei: add getname_uflags()Dmitry Kadashev3-6/+11
There are a couple of places where we already open-code the (flags & AT_EMPTY_PATH) check and io_uring will likely add another one in the future. Let's just add a simple helper getname_uflags() that handles this directly and use it. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/io-uring/20210415100815.edrn4a7cy26wkowe@wittgenstein/ Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-7-dkadashev@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23namei: make do_symlinkat() take struct filenameDmitry Kadashev1-11/+12
Pass in the struct filename pointers instead of the user string, for uniformity with the recently converted do_mkdnodat(), do_unlinkat(), do_renameat(), do_mkdirat(). Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/io-uring/20210330071700.kpjoyp5zlni7uejm@wittgenstein/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-6-dkadashev@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23namei: make do_mknodat() take struct filenameDmitry Kadashev1-8/+11
Pass in the struct filename pointers instead of the user string, for uniformity with the recently converted do_unlinkat(), do_renameat(), do_mkdirat(). Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/io-uring/20210330071700.kpjoyp5zlni7uejm@wittgenstein/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-5-dkadashev@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23namei: make do_mkdirat() take struct filenameDmitry Kadashev2-7/+20
Pass in the struct filename pointers instead of the user string, and update the three callers to do the same. This is heavily based on commit dbea8d345177 ("fs: make do_renameat2() take struct filename"). This behaves like do_unlinkat() and do_renameat2(). Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20210708063447.3556403-4-dkadashev@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23namei: change filename_parentat() calling conventionsDmitry Kadashev1-55/+53
Since commit 5c31b6cedb675 ("namei: saner calling conventions for filename_parentat()") filename_parentat() had the following behavior WRT the passed in struct filename *: * On error the name is consumed (putname() is called on it); * On success the name is returned back as the return value; Now there is a need for filename_create() and filename_lookup() variants that do not consume the passed filename, and following the same "consume the name only on error" semantics is proven to be hard to reason about and result in confusing code. Hence this preparation change splits filename_parentat() into two: one that always consumes the name and another that never consumes the name. This will allow to implement two filename_create() variants in the same way, and is a consistent and hopefully easier to reason about approach. Link: https://lore.kernel.org/io-uring/CAOKbgA7MiqZAq3t-HDCpSGUFfco4hMA9ArAE-74fTpU+EkvKPw@mail.gmail.com/ Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> Link: https://lore.kernel.org/r/20210708063447.3556403-3-dkadashev@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>