summaryrefslogtreecommitdiff
path: root/drivers/nvdimm
AgeCommit message (Collapse)AuthorFilesLines
2021-09-09Merge tag 'cxl-for-5.15' of ↵Linus Torvalds4-164/+356
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL (Compute Express Link) updates from Dan Williams: - Fix detection of CXL host bridges to filter out disabled ACPI0016 devices in the ACPI DSDT. - Fix kernel lockdown integration to disable raw commands when raw PCI access is disabled. - Fix a broken debug message. - Add support for "Get Partition Info". I.e. enumerate the split between volatile and persistent capacity on bi-modal CXL memory expanders. - Re-factor the core by subject area. This is a work in progress. - Prepare libnvdimm to understand CXL labels in addition to EFI labels. This is a work in progress. * tag 'cxl-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (25 commits) cxl/registers: Fix Documentation warning cxl/pmem: Fix Documentation warning cxl/uapi: Fix defined but not used warnings cxl/pci: Fix debug message in cxl_probe_regs() cxl/pci: Fix lockdown level cxl/acpi: Do not add DSDT disabled ACPI0016 host bridge ports libnvdimm/labels: Add claim class helpers libnvdimm/labels: Add type-guid helpers libnvdimm/labels: Add blk special cases for nlabel and position helpers libnvdimm/labels: Add blk isetcookie set / validation helpers libnvdimm/labels: Add a checksum calculation helper libnvdimm/labels: Introduce label setter helpers libnvdimm/labels: Add isetcookie validation helper libnvdimm/labels: Introduce getters for namespace label fields cxl/mem: Adjust ram/pmem range to represent DPA ranges cxl/mem: Account for partitionable space in ram/pmem ranges cxl/pci: Store memory capacity values cxl/pci: Simplify register setup cxl/pci: Ignore unknown register block types cxl/core: Move memdev management to core ...
2021-09-09Merge tag 'libnvdimm-for-5.15' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Fix a race condition in the teardown path of raw mode pmem namespaces. - Cleanup the code that filesystems use to detect filesystem-dax capabilities of their underlying block device. * tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: remove bdev_dax_supported xfs: factor out a xfs_buftarg_is_dax helper dax: stub out dax_supported for !CONFIG_FS_DAX dax: remove __generic_fsdax_supported dax: move the dax_read_lock() locking into dax_supported dax: mark dax_get_by_host static dm: use fs_dax_get_by_bdev instead of dax_get_by_host dax: stop using bdevname fsdax: improve the FS_DAX Kconfig description and help text libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
2021-09-01Merge tag 'driver-core-5.15-rc1' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core patches for 5.15-rc1. These do change a number of different things across different subsystems, and because of that, there were 2 stable tags created that might have already come into your tree from different pulls that did the following - changed the bus remove callback to return void - sysfs iomem_get_mapping rework Other than those two things, there's only a few small things in here: - kernfs performance improvements for huge numbers of sysfs users at once - tiny api cleanups - other minor changes All of these have been in linux-next for a while with no reported problems, other than the before-mentioned merge issue" * tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits) MAINTAINERS: Add dri-devel for component.[hc] driver core: platform: Remove platform_device_add_properties() ARM: tegra: paz00: Handle device properties with software node API bitmap: extend comment to bitmap_print_bitmask/list_to_buf drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI topology: use bin_attribute to break the size limitation of cpumap ABI lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list sysfs: Rename struct bin_attribute member to f_mapping sysfs: Invoke iomem_get_mapping() from the sysfs open callback debugfs: Return error during {full/open}_proxy_open() on rmmod zorro: Drop useless (and hardly used) .driver member in struct zorro_dev zorro: Simplify remove callback sh: superhyway: Simplify check in remove callback nubus: Simplify check in remove callback nubus: Make struct nubus_driver::remove return void kernfs: dont call d_splice_alias() under kernfs node lock kernfs: use i_lock to protect concurrent inode updates kernfs: switch kernfs to use an rwsem kernfs: use VFS negative dentry caching ...
2021-08-25libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbindsumiyawang1-2/+2
There is a use after free crash when the pmem driver tears down its mapping while I/O is still inbound. This is triggered by driver unbind, "ndctl destroy-namespace", while I/O is in flight. Fix the sequence of blk_cleanup_queue() vs memunmap(). The crash signature is of the form: BUG: unable to handle page fault for address: ffffc90080200000 CPU: 36 PID: 9606 Comm: systemd-udevd Call Trace: ? pmem_do_bvec+0xf9/0x3a0 ? xas_alloc+0x55/0xd0 pmem_rw_page+0x4b/0x80 bdev_read_page+0x86/0xb0 do_mpage_readpage+0x5d4/0x7a0 ? lru_cache_add+0xe/0x10 mpage_readpages+0xf9/0x1c0 ? bd_link_disk_holder+0x1a0/0x1a0 blkdev_readpages+0x1d/0x20 read_pages+0x67/0x1a0 ndctl Call Trace in vmcore: PID: 23473 TASK: ffff88c4fbbe8000 CPU: 1 COMMAND: "ndctl" __schedule schedule blk_mq_freeze_queue_wait blk_freeze_queue blk_cleanup_queue pmem_release_queue devm_action_release release_nodes devres_release_all device_release_driver_internal device_driver_detach unbind_store Cc: <stable@vger.kernel.org> Signed-off-by: sumiyawang <sumiyawang@tencent.com> Reviewed-by: yongduan <yongduan@tencent.com> Link: https://lore.kernel.org/r/1629632949-14749-1-git-send-email-sumiyawang@tencent.com Fixes: 50f44ee7248a ("mm/devm_memremap_pages: fix final page put race") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-24libnvdimm/labels: Add claim class helpersDan Williams4-19/+28
In preparation for LIBNVDIMM to manage labels on CXL devices deploy helpers that abstract the label type from the implementation. The CXL label format is mostly similar to the EFI label format with concepts / fields added, like dynamic region creation and label type guids, and other concepts removed like BLK-mode and interleave-set-cookie ids. CXL labels do have the concept of a claim class represented by an "abstraction" identifier. It turns out both label implementations use the same ids, but EFI encodes them as GUIDs and CXL labels encode them as UUIDs. For now abstract out the claim class such that the UUID vs GUID distinction can later be hidden in the helper. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162982116719.1124374.9917866609080940364.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-24libnvdimm/labels: Add type-guid helpersDan Williams3-19/+28
In preparation for CXL label support, which does not have the type-guid concept, wrap the existing users with nsl_set_type_guid, and nsl_validate_type_guid. Recall that the type-guid is a value in the ACPI NFIT table to indicate how the memory range is used / should be presented to upper layers. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162982116208.1124374.13938280892226800953.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-24libnvdimm/labels: Add blk special cases for nlabel and position helpersDan Williams1-17/+29
In preparation for LIBNVDIMM to manage labels on CXL devices deploy helpers that abstract the label type from the implementation. The CXL label format is mostly similar to the EFI label format with concepts / fields added, like dynamic region creation and label type guids, and other concepts removed like BLK-mode and interleave-set-cookie ids. Finish off the BLK-mode specific helper conversion with the nlabel and position behaviour that is specific to EFI v1.2 labels and not the original v1.1 definition. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162982115698.1124374.10182273478536799613.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-24libnvdimm/labels: Add blk isetcookie set / validation helpersDan Williams3-9/+34
In preparation for LIBNVDIMM to manage labels on CXL devices deploy helpers that abstract the label type from the implementation. The CXL label format is mostly similar to the EFI label format with concepts / fields added, like dynamic region creation and label type guids, and other concepts removed like BLK-mode and interleave-set-cookie ids. Given BLK-mode is not even supported on CXL push hide the BLK-mode specific details inside the helpers. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162982115185.1124374.13459190993792729776.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-24libnvdimm/labels: Add a checksum calculation helperDan Williams1-33/+35
In preparation for LIBNVDIMM to manage labels on CXL devices deploy helpers that abstract the label type from the implementation. The CXL label format is mostly similar to the EFI label format with concepts / fields added, like dynamic region creation and label type guids, and other concepts removed like BLK-mode and interleave-set-cookie ids. CXL labels support checksums by default, but early versions of the EFI labels did not. Add a validate function that can return true in the case the label format does not implement a checksum. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162982114637.1124374.6966639787307077105.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-24libnvdimm/labels: Introduce label setter helpersDan Williams3-33/+99
In preparation for LIBNVDIMM to manage labels on CXL devices deploy helpers that abstract the label type from the implementation. The CXL label format is mostly similar to the EFI label format with concepts / fields added, like dynamic region creation and label type guids, and other concepts removed like BLK-mode and interleave-set-cookie ids. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162982114123.1124374.17153270107594686116.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-24libnvdimm/labels: Add isetcookie validation helperDan Williams2-5/+10
In preparation to handle CXL labels with the same code that handles EFI labels, add a specific interleave-set-cookie validation helper rather than a getter since the CXL label type does not support this concept. The answer for CXL labels will always be true. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162982113550.1124374.206762177785773038.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-24libnvdimm/labels: Introduce getters for namespace label fieldsDan Williams3-46/+110
In preparation for LIBNVDIMM to manage labels on CXL devices deploy helpers that abstract the label type from the implementation. The CXL label format is mostly similar to the EFI label format with concepts / fields added, like dynamic region creation and label type guids, and other concepts removed like BLK-mode and interleave-set-cookie ids. In addition to nsl_get_* helpers there is the nsl_ref_name() helper that returns a pointer to a label field rather than copying the data. Where changes touch the old whitespace style, update to clang-format expectations. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162982113002.1124374.15922077050771304490.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-11libnvdimm/region: Fix label activation vs errorsDan Williams1-6/+11
There are a few scenarios where init_active_labels() can return without registering deactivate_labels() to run when the region is disabled. In particular label error injection creates scenarios where a DIMM is disabled, but labels on other DIMMs in the region become activated. Arrange for init_active_labels() to always register deactivate_labels(). Reported-by: Krzysztof Kensicki <krzysztof.kensicki@intel.com> Cc: <stable@vger.kernel.org> Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation.") Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/162766356450.3223041.1183118139023841447.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-07-21bus: Make remove callback return voidUwe Kleine-König1-2/+1
The driver core ignores the return value of this callback because there is only little it can do when a device disappears. This is the final bit of a long lasting cleanup quest where several buses were converted to also return void from their remove callback. Additionally some resource leaks were fixed that were caused by drivers returning an error code in the expectation that the driver won't go away. With struct bus_type::remove returning void it's prevented that newly implemented buses return an ignored error code and so don't anticipate wrong expectations for driver authors. Reviewed-by: Tom Rix <trix@redhat.com> (For fpga) Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> (For drivers/s390 and drivers/vfio) Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> (For ARM, Amba and related parts) Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Chen-Yu Tsai <wens@csie.org> (for sunxi-rsb) Acked-by: Pali Rohár <pali@kernel.org> Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org> (for media) Acked-by: Hans de Goede <hdegoede@redhat.com> (For drivers/platform) Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-By: Vinod Koul <vkoul@kernel.org> Acked-by: Juergen Gross <jgross@suse.com> (For xen) Acked-by: Lee Jones <lee.jones@linaro.org> (For mfd) Acked-by: Johannes Thumshirn <jth@kernel.org> (For mcb) Acked-by: Johan Hovold <johan@kernel.org> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> (For slimbus) Acked-by: Kirti Wankhede <kwankhede@nvidia.com> (For vfio) Acked-by: Maximilian Luz <luzmaximilian@gmail.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> (For ulpi and typec) Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> (For ipack) Acked-by: Geoff Levand <geoff@infradead.org> (For ps3) Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com> (For thunderbolt) Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> (For intel_th) Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> (For pcmcia) Acked-by: Rafael J. Wysocki <rafael@kernel.org> (For ACPI) Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> (rpmsg and apr) Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> (For intel-ish-hid) Acked-by: Dan Williams <dan.j.williams@intel.com> (For CXL, DAX, and NVDIMM) Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com> (For isa) Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (For firewire) Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> (For hid) Acked-by: Thorsten Scherer <t.scherer@eckelmann.de> (For siox) Acked-by: Sven Van Asbroeck <TheSven73@gmail.com> (For anybuss) Acked-by: Ulf Hansson <ulf.hansson@linaro.org> (For MMC) Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C Acked-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210713193522.1770306-6-u.kleine-koenig@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-04Merge tag 'cxl-for-5.14' of ↵Linus Torvalds2-23/+59
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL (Compute Express Link) updates from Dan Williams: "This subsystem is still in the build-out phase as the bulk of the update is improvements to enumeration and fleshing out the device model. In terms of new features, more mailbox commands have been added to the allowed-list in support of persistent memory provisioning support targeting v5.15. The critical update from an enumeration perspective is support for the CXL Fixed Memory Window Structure that indicates to Linux which system physical address ranges decode to the CXL Host Bridges in the system. This allows the driver to detect which address ranges have been mapped by firmware and what address ranges are available for future hotplug. So, again, mostly skeleton this round, with more meat targeting v5.15. Summary: - Add support for the CXL Fixed Memory Window Structure, a recent extension of the ACPI CEDT (CXL Early Discovery Table) - Add infrastructure for component registers - Add HDM (Host-managed device memory) decoder definitions - Define a device model for an HDM decoder tree - Bridge CXL persistent memory capabilities to an NVDIMM bus / device-model - Switch to fine grained mapping of CXL MMIO registers to allow different drivers / system software to own individual register blocks - Enable media provisioning commands, and publish the label storage area size in sysfs - Miscellaneous cleanups and fixes" * tag 'cxl-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (34 commits) cxl/pci: Rename CXL REGLOC ID cxl/acpi: Use the ACPI CFMWS to create static decoder objects cxl/acpi: Add the Host Bridge base address to CXL port objects cxl/pmem: Register 'pmem' / cxl_nvdimm devices libnvdimm: Drop unused device power management support libnvdimm: Export nvdimm shutdown helper, nvdimm_delete() cxl/pmem: Add initial infrastructure for pmem support cxl/core: Add cxl-bus driver infrastructure cxl/pci: Add media provisioning required commands cxl/component_regs: Fix offset cxl/hdm: Fix decoder count calculation cxl/acpi: Introduce cxl_decoder objects cxl/acpi: Enumerate host bridge root ports cxl/acpi: Add downstream port data to cxl_port instances cxl/Kconfig: Default drivers to CONFIG_CXL_BUS cxl/acpi: Introduce the root of a cxl_port topology cxl/pci: Fixup devm_cxl_iomap_block() to take a 'struct device *' cxl/pci: Add HDM decoder capabilities cxl/pci: Reserve individual register block regions cxl/pci: Map registers based on capabilities ...
2021-06-16libnvdimm: Drop unused device power management supportDan Williams1-8/+37
LIBNVDIMM device objects register sysfs power attributes despite nothing requiring that support. Clean up sysfs remove the power/ attribute group. This requires a device_create() and a device_register() usage to be converted to the device_initialize() + device_add() pattern. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162379910795.2993820.10130417680551632288.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-16libnvdimm: Export nvdimm shutdown helper, nvdimm_delete()Dan Williams2-15/+22
CXL is a hotplug bus and arranges for nvdimm devices to be dynamically discovered and removed. The libnvdimm core manages shutdown of nvdimm security operations when the device is unregistered. That functionality is moved to nvdimm_delete() and invoked by the CXL-to-nvdimm glue code. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162379910271.2993820.2955889139842401250.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09libnvdimm/pmem: Fix blk_cleanup_disk() usageDan Williams1-3/+3
The queue_to_disk() helper can not be used after del_gendisk() communicate @disk via the pgmap->owner. Otherwise, queue_to_disk() returns NULL resulting in the splat below. Kernel attempted to read user page (330) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000330 Faulting instruction address: 0xc000000000906344 Oops: Kernel access of bad area, sig: 11 [#1] [..] NIP [c000000000906344] pmem_pagemap_cleanup+0x24/0x40 LR [c0000000004701d4] memunmap_pages+0x1b4/0x4b0 Call Trace: [c000000022cbb9c0] [c0000000009063c8] pmem_pagemap_kill+0x28/0x40 (unreliable) [c000000022cbb9e0] [c0000000004701d4] memunmap_pages+0x1b4/0x4b0 [c000000022cbba90] [c0000000008b28a0] devm_action_release+0x30/0x50 [c000000022cbbab0] [c0000000008b39c8] release_nodes+0x2f8/0x3e0 [c000000022cbbb60] [c0000000008ac440] device_release_driver_internal+0x190/0x2b0 [c000000022cbbba0] [c0000000008a8450] unbind_store+0x130/0x170 Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Fixes: 87eb73b2ca7c ("nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk") Link: http://lore.kernel.org/r/DFB75BA8-603F-4A35-880B-C5B23EF8FA7D@linux.vnet.ibm.com Cc: Christoph Hellwig <hch@lst.de> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/162310994435.1571616.334551212901820961.stgit@dwillia2-desk3.amr.corp.intel.com [axboe: fold in compile warning fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01nvme-multipath: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-1/+0
Convert the nvme-multipath driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-10/+5
Convert the nvdimm-pmem driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01nvdimm-btt: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig2-19/+7
Convert the nvdimm-btt driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01nvdimm-blk: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-20/+6
Convert the nvdimm-blk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01block: automatically enable GENHD_FL_EXT_DEVTChristoph Hellwig3-3/+0
Automatically set the GENHD_FL_EXT_DEVT flag for all disks allocated without an explicit number of minors. This is what all new block drivers should do, so make sure it is the default without boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-07include: remove pagemap.h from blkdev.hMatthew Wilcox (Oracle)2-0/+2
My UEK-derived config has 1030 files depending on pagemap.h before this change. Afterwards, just 326 files need to be rebuilt when I touch pagemap.h. I think blkdev.h is probably included too widely, but untangling that dependency is harder and this solves my problem. x86 allmodconfig builds, but there may be implicit include problems on other architectures. Link: https://lkml.kernel.org/r/20210309195747.283796-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Dan Williams <dan.j.williams@intel.com> [nvdimm] Acked-by: Jens Axboe <axboe@kernel.dk> [block] Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Martin K. Petersen <martin.petersen@oracle.com> [scsi] Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-10libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNCVaibhav Jain1-2/+7
In case a platform doesn't provide explicit flush-hints but provides an explicit flush callback via ND_REGION_ASYNC region flag, then nvdimm_has_flush() still returns '0' indicating that writes do not require flushing. This happens on PPC64 with patch at [1] applied, where 'deep_flush' of a region was denied even though an explicit flush function was provided. Fix this by adding a condition to nvdimm_has_flush() to test for the ND_REGION_ASYNC flag on the region and see if a 'region->flush' callback is assigned. Link: http://lore.kernel.org/r/161703936121.36.7260632399582101498.stgit@e1fbed493c87 [1] Fixes: c5d4355d10d4 ("libnvdimm: nd_region flush callback support") Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Link: https://lore.kernel.org/r/20210402092555.208590-1-vaibhav@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-04-09libnvdimm: Notify disk drivers to revalidate region read-onlyDan Williams3-12/+46
Previous kernels allowed the BLKROSET to override the disk's read-only status. With that situation fixed the pmem driver needs to rely on notification events to reevaluate the disk read-only status after the host region has been marked read-write. Recall that when libnvdimm determines that the persistent memory has lost persistence (for example lack of energy to flush from DRAM to FLASH on an NVDIMM-N device) it marks the region read-only, but that state can be overridden by the user via: echo 0 > /sys/bus/nd/devices/regionX/read_only ...to date there is no notification that the region has restored persistence, so the user override is the only recovery. Fixes: 52f019d43c22 ("block: add a hard-readonly flag to struct gendisk") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/161534060720.528671.2341213328968989192.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-24Merge tag 'libnvdimm-for-5.12' of ↵Linus Torvalds5-22/+9
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and device-dax updates from Dan Williams: - Fix the error code polarity for the device-dax/mapping attribute - For the device-dax and libnvdimm bus implementations stop implementing a useless return code for the remove() callback. - Miscellaneous cleanups * tag 'libnvdimm-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax-device: Make remove callback return void device-dax: Drop an empty .remove callback device-dax: Fix error path in dax_driver_register device-dax: Properly handle drivers without remove callback device-dax: Prevent registering drivers without probe callback libnvdimm: Make remove callback return void libnvdimm/dimm: Simplify nvdimm_remove() device-dax: Fix default return code of range_parse()
2021-02-21Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds3-6/+6
Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
2021-02-17libnvdimm: Make remove callback return voidUwe Kleine-König5-19/+9
All drivers return 0 in their remove callback and the driver core ignores the return value of nvdimm_bus_remove() anyhow. So simplify by changing the driver remove callback to return void and return 0 unconditionally to the upper layer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210212171043.2136580-2-u.kleine-koenig@pengutronix.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17libnvdimm/dimm: Simplify nvdimm_remove()Uwe Kleine-König1-3/+0
nvdimm_remove is only ever called after nvdimm_probe() returned successfully. In this case driver data is always set to a non-NULL value so the check for driver data being NULL can go away as it's always false. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210212171043.2136580-1-u.kleine-koenig@pengutronix.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-02libnvdimm/dimm: Avoid race between probe and available_slots_show()Dan Williams1-3/+15
Richard reports that the following test: (while true; do cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null done) & while true; do for i in $(seq 0 4); do echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind done for i in $(seq 0 4); do echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind done done ...fails with a crash signature like: divide error: 0000 [#1] SMP KASAN PTI RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm] [..] Call Trace: available_slots_show+0x4e/0x120 [libnvdimm] dev_attr_show+0x42/0x80 ? memset+0x20/0x40 sysfs_kf_seq_show+0x218/0x410 The root cause is that available_slots_show() consults driver-data, but fails to synchronize against device-unbind setting up a TOCTOU race to access uninitialized memory. Validate driver-data under the device-lock. Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure") Cc: <stable@vger.kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Coly Li <colyli@suse.com> Reported-by: Richard Palethorpe <rpalethorpe@suse.com> Acked-by: Richard Palethorpe <rpalethorpe@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-01-28libnvdimm/namespace: Fix visibility of namespace resource attributeDan Williams1-5/+5
Legacy pmem namespaces lost support for the "resource" attribute when the code was cleaned up to put the permission visibility in the declaration. Restore this by listing 'resource' in the default attributes. A new ndctl regression test for pfn_to_online_page() corner cases builds on this fix. Fixes: bfd2e9140656 ("libnvdimm: Simplify root read-only definition for the 'resource' attribute") Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/161052334995.1805594.12054873528154362921.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-01-25block: store a block_device pointer in struct bioChristoph Hellwig3-6/+6
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-11libnvdimm/pmem: Remove unused headerJianpeng Ma1-1/+0
'commit a8b456d01cd6 ("bdi: remove BDI_CAP_SYNCHRONOUS_IO")' forgot remove the related header file. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20201229002635.42555-1-jianpeng.ma@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-12-17libnvdimm/label: Return -ENXIO for no slot in __blk_label_updateZhang Qilong1-1/+3
Forget to set error code when nd_label_alloc_slot failed, and we add it to avoid overwritten error code. Fixes: 0ba1c634892b ("libnvdimm: write blk label set") Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201205115056.2076523-1-zhangqilong3@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-12-16libnvdimm: Cleanup include of badblocks.hEnrico Weigelt3-2/+3
* drivers/nvdimm/core.c doesn't use anything from badblocks.h on its own, thus including it isn't needed. There's indeed indirect use, via funcs in nd.h, but this one already includes badblocks.h. * drivers/nvdimm/claim.c calls stuff from badblocks.h and therefore should include it on its own (instead of relying any other header doing that) * drivers/nvdimm/btt.h doesn't really need anything from badblocks.h and can easily live with a forward declaration of struct badblocks (just having pointers to it, but not dereferencing it anywhere) Signed-off-by: Enrico Weigelt <info@metux.net> Link: https://lore.kernel.org/r/20201215163531.21446-1-info@metux.net Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-11-20libnvdimm/namespace: Fix reaping of invalidated block-window-namespace labelsDan Williams1-0/+9
A recent change to ndctl to attempt to reconfigure namespaces in place uncovered a label accounting problem in block-window-type namespaces. The ndctl "create.sh" test is able to trigger this signature: WARNING: CPU: 34 PID: 9167 at drivers/nvdimm/label.c:1100 __blk_label_update+0x9a3/0xbc0 [libnvdimm] [..] RIP: 0010:__blk_label_update+0x9a3/0xbc0 [libnvdimm] [..] Call Trace: uuid_store+0x21b/0x2f0 [libnvdimm] kernfs_fop_write+0xcf/0x1c0 vfs_write+0xcc/0x380 ksys_write+0x68/0xe0 When allocated capacity for a namespace is renamed (new UUID) the labels with the old UUID need to be deleted. The ndctl behavior to always destroy namespaces on reconfiguration hid this problem. The immediate impact of this bug is limited since block-window-type namespaces only seem to exist in the specification and not in any shipping products. However, the label handling code is being reused for other technologies like CXL region labels, so there is a benefit to making sure both vertical labels sets (block-window) and horizontal label sets (pmem) have a functional reference implementation in libnvdimm. Fixes: c4703ce11c23 ("libnvdimm/namespace: Fix label tracking error") Cc: <stable@vger.kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-10-14mm/memremap_pages: support multiple ranges per invocationDan Williams2-0/+2
In support of device-dax growing the ability to front physically dis-contiguous ranges of memory, update devm_memremap_pages() to track multiple ranges with a single reference counter and devm instance. Convert all [devm_]memremap_pages() users to specify the number of ranges they are mapping in their 'struct dev_pagemap' instance. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.co Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-14mm/memremap_pages: convert to 'struct range'Dan Williams6-46/+55
The 'struct resource' in 'struct dev_pagemap' is only used for holding resource span information. The other fields, 'name', 'flags', 'desc', 'parent', 'sibling', and 'child' are all unused wasted space. This is in preparation for introducing a multi-range extension of devm_memremap_pages(). The bulk of this change is unwinding all the places internal to libnvdimm that used 'struct resource' unnecessarily, and replacing instances of 'struct dev_pagemap'.res with 'struct dev_pagemap'.range. P2PDMA had a minor usage of the resource flags field, but only to report failures with "%pR". That is replaced with an open coded print of the range. [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()] Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen] Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-blockLinus Torvalds5-16/+7
Pull block updates from Jens Axboe: - Series of merge handling cleanups (Baolin, Christoph) - Series of blk-throttle fixes and cleanups (Baolin) - Series cleaning up BDI, seperating the block device from the backing_dev_info (Christoph) - Removal of bdget() as a generic API (Christoph) - Removal of blkdev_get() as a generic API (Christoph) - Cleanup of is-partition checks (Christoph) - Series reworking disk revalidation (Christoph) - Series cleaning up bio flags (Christoph) - bio crypt fixes (Eric) - IO stats inflight tweak (Gabriel) - blk-mq tags fixes (Hannes) - Buffer invalidation fixes (Jan) - Allow soft limits for zone append (Johannes) - Shared tag set improvements (John, Kashyap) - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel) - DM no-wait support (Mike, Konstantin) - Request allocation improvements (Ming) - Allow md/dm/bcache to use IO stat helpers (Song) - Series improving blk-iocost (Tejun) - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang, Xianting, Yang, Yufen, yangerkun) * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits) block: fix uapi blkzoned.h comments blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue blk-mq: get rid of the dead flush handle code path block: get rid of unnecessary local variable block: fix comment and add lockdep assert blk-mq: use helper function to test hw stopped block: use helper function to test queue register block: remove redundant mq check block: invoke blk_mq_exit_sched no matter whether have .exit_sched percpu_ref: don't refer to ref->data if it isn't allocated block: ratelimit handle_bad_sector() message blk-throttle: Re-use the throtl_set_slice_end() blk-throttle: Open code __throtl_de/enqueue_tg() blk-throttle: Move service tree validation out of the throtl_rb_first() blk-throttle: Move the list operation after list validation blk-throttle: Fix IO hang for a corner case blk-throttle: Avoid tracking latency if low limit is invalid blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() block: Remove redundant 'return' statement ...
2020-10-12Merge tag 'ras_updates_for_v5.10' of ↵Linus Torvalds2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Extend the recovery from MCE in kernel space also to processes which encounter an MCE in kernel space but while copying from user memory by sending them a SIGBUS on return to user space and umapping the faulty memory, by Tony Luck and Youquan Song. - memcpy_mcsafe() rework by splitting the functionality into copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables support for new hardware which can recover from a machine check encountered during a fast string copy and makes that the default and lets the older hardware which does not support that advance recovery, opt in to use the old, fragile, slow variant, by Dan Williams. - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta. - Do not use MSR-tracing accessors in #MC context and flag any fault while accessing MCA architectural MSRs as an architectural violation with the hope that such hw/fw misdesigns are caught early during the hw eval phase and they don't make it into production. - Misc fixes, improvements and cleanups, as always. * tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Allow for copy_mc_fragile symbol checksum to be generated x86/mce: Decode a kernel instruction to determine if it is copying from user x86/mce: Recover from poison found while copying from user space x86/mce: Avoid tail copy when machine check terminated a copy from user x86/mce: Add _ASM_EXTABLE_CPY for copy user access x86/mce: Provide method to find out the type of an exception handler x86/mce: Pass pointer to saved pt_regs to severity calculation routines x86/copy_mc: Introduce copy_mc_enhanced_fast_string() x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list x86/mce: Add Skylake quirk for patrol scrub reported errors RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE() x86/mce: Annotate mce_rd/wrmsrl() with noinstr x86/mce/dev-mcelog: Do not update kflags on AMD systems x86/mce: Stop mce_reign() from re-computing severity for every CPU x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR x86/mce: Increase maximum number of banks to 64 x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check() x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap RAS/CEC: Fix cec_init() prototype
2020-10-06x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()Dan Williams2-4/+4
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
2020-09-24bdi: remove BDI_CAP_SYNCHRONOUS_IOChristoph Hellwig2-3/+0
BDI_CAP_SYNCHRONOUS_IO is only checked in the swap code, and used to decided if ->rw_page can be used on a block device. Just check up for the method instead. The only complication is that zram needs a second set of block_device_operations as it can switch between modes that actually support ->rw_page and those who don't. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-02nvdimm: simplify revalidate_disk handlingChristoph Hellwig5-13/+7
The nvdimm block driver abuse revalidate_disk in a strange way, and totally unrelated to what other drivers do. Simplify this by just calling nvdimm_revalidate_disk (which seems rather misnamed) from the probe routines, as the additional bdev size revalidation is pointless at this point, and remove the revalidate_disk methods given that it can only be triggered from add_disk, which is right before the manual calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-17libnvdimm: KASAN: global-out-of-bounds Read in internal_create_groupZqiang1-0/+1
Because the last member of the "nvdimm_firmware_attributes" array was not assigned a null ptr, when traversal of "grp->attrs" array is out of bounds in "create_files" func. func: create_files: ->for (i = 0, attr = grp->attrs; *attr && !error; i++, attr++) ->.... BUG: KASAN: global-out-of-bounds in create_files fs/sysfs/group.c:43 [inline] BUG: KASAN: global-out-of-bounds in internal_create_group+0x9d8/0xb20 fs/sysfs/group.c:149 Read of size 8 at addr ffffffff8a2e4cf0 by task kworker/u17:10/959 CPU: 2 PID: 959 Comm: kworker/u17:10 Not tainted 5.8.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound async_run_entry_fn Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 create_files fs/sysfs/group.c:43 [inline] internal_create_group+0x9d8/0xb20 fs/sysfs/group.c:149 internal_create_groups.part.0+0x90/0x140 fs/sysfs/group.c:189 internal_create_groups fs/sysfs/group.c:185 [inline] sysfs_create_groups+0x25/0x50 fs/sysfs/group.c:215 device_add_groups drivers/base/core.c:2024 [inline] device_add_attrs drivers/base/core.c:2178 [inline] device_add+0x7fd/0x1c40 drivers/base/core.c:2881 nd_async_device_register+0x12/0x80 drivers/nvdimm/bus.c:506 async_run_entry_fn+0x121/0x530 kernel/async.c:123 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the variable: nvdimm_firmware_attributes+0x10/0x40 Link: https://lore.kernel.org/r/20200812085501.30963-1-qiang.zhang@windriver.com Link: https://lore.kernel.org/r/20200814150509.225615-1-vaibhav@linux.ibm.com Fixes: 48001ea50d17f ("PM, libnvdimm: Add runtime firmware activation support") Reported-by: syzbot+1cf0ffe61aecf46f588f@syzkaller.appspotmail.com Reported-by: Sandipan Das <sandipan@linux.ibm.com> Reported-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2020-08-15mm: add thp_sizeMatthew Wilcox (Oracle)2-7/+3
This function returns the number of bytes in a THP. It is like page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE is disabled. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-2/+2
Pull virtio updates from Michael Tsirkin: - IRQ bypass support for vdpa and IFC - MLX5 vdpa driver - Endianness fixes for virtio drivers - Misc other fixes * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits) vdpa/mlx5: fix up endian-ness for mtu vdpa: Fix pointer math bug in vdpasim_get_config() vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config() vdpa/mlx5: fix memory allocation failure checks vdpa/mlx5: Fix uninitialised variable in core/mr.c vdpa_sim: init iommu lock virtio_config: fix up warnings on parisc vdpa/mlx5: Add VDPA driver for supported mlx5 devices vdpa/mlx5: Add shared memory registration code vdpa/mlx5: Add support library for mlx5 VDPA implementation vdpa/mlx5: Add hardware descriptive header file vdpa: Modify get_vq_state() to return error code net/vdpa: Use struct for set/get vq state vdpa: remove hard coded virtq num vdpasim: support batch updating vhost-vdpa: support IOTLB batching hints vhost-vdpa: support get/set backend features vhost: generialize backend features setting/getting vhost-vdpa: refine ioctl pre-processing vDPA: dont change vq irq after DRIVER_OK ...
2020-08-11Merge tag 'libnvdimm-for-5.9' of ↵Linus Torvalds8-10/+298
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updayes from Vishal Verma: "You'd normally receive this pull request from Dan Williams, but he's busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm this cycle. This adds a new feature in libnvdimm - 'Runtime Firmware Activation', and a few small cleanups and fixes in libnvdimm and DAX. I'd originally intended to make separate topic-based pull requests - one for libnvdimm, and one for DAX, but some of the DAX material fell out since it wasn't quite ready. Summary: - add 'Runtime Firmware Activation' support for NVDIMMs that advertise the relevant capability - misc libnvdimm and DAX cleanups" * tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr libnvdimm/security: the 'security' attr never show 'overwrite' state libnvdimm/security: fix a typo ACPI: NFIT: Fix ARS zero-sized allocation dax: Fix incorrect argument passed to xas_set_err() ACPI: NFIT: Add runtime firmware activate support PM, libnvdimm: Add runtime firmware activation support libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO() drivers/dax: Expand lock scope to cover the use of addresses fs/dax: Remove unused size parameter dax: print error message by pr_info() in __generic_fsdax_supported() driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW} tools/testing/nvdimm: Emulate firmware activation commands tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation tools/testing/nvdimm: Add command debug messages tools/testing/nvdimm: Cleanup dimm index passing ACPI: NFIT: Define runtime firmware activation commands ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor libnvdimm: Validate command family indices
2020-08-07Merge tag 'powerpc-5.9-1' of ↵Linus Torvalds2-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
2020-08-05virtio_pmem: convert to LE accessorsMichael S. Tsirkin1-2/+2
Virtio pmem is modern-only. Use LE accessors for config space. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>