summaryrefslogtreecommitdiff
path: root/drivers/nvdimm
AgeCommit message (Collapse)AuthorFilesLines
2018-04-20Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error"Dan Williams1-2/+1
With commit df3f126482db ("libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()") it is now possible to allow of_pmem to be built as a module as originally implemented. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-20libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()Rob Herring1-1/+1
Remove the direct dependency on of_node_to_nid() by using dev_to_node() instead. Any DT platform device will have its NUMA node id set when the device is created. With this, commit 291717b6fbdb ("libnvdimm, of_pmem: workaround OF_NUMA=n build error") can be reverted. Fixes: 717197608952 ("libnvdimm: Add device-tree based driver") Cc: Dan Williams <dan.j.williams@intel.com> Cc: Oliver O'Halloran <oohall@gmail.com> Cc: linux-nvdimm@lists.01.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-16libnvdimm, dimm: handle EACCES failures from label readsDan Williams1-10/+12
The new support for the standard _LSR and _LSW methods neglected to also update the nvdimm_init_config_data() and nvdimm_set_config_data() to return the translated error code from failed commands. This precision is necessary because the locked status that was previously returned on ND_CMD_GET_CONFIG_SIZE commands is now returned on ND_CMD_{GET,SET}_CONFIG_DATA commands. If the kernel misses this indication it can inadvertently fall back to label-less mode when it should otherwise avoid all access to locked regions. Cc: <stable@vger.kernel.org> Fixes: 4b27db7e26cd ("acpi, nfit: add support for the _LSI, _LSR, and...") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-10Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds18-124/+254
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
2018-04-09Merge branch 'for-4.17/dax' into libnvdimm-for-nextDan Williams1-1/+1
2018-04-09Merge branch 'for-4.17/libnvdimm' into libnvdimm-for-nextDan Williams18-123/+253
2018-04-09libnvdimm, of_pmem: workaround OF_NUMA=n build errorDan Williams1-1/+2
Stephen reports that an x86 allmodconfig build fails to build the of_pmem driver due to a missing definition of of_node_to_nid(). That helper is currently only exported in the OF_NUMA=y case. In other cases, ppc and sparc, it is a weak symbol, and outside of those platforms it is a static inline. Until an OF_NUMA=n configuration can reliably support usage of of_node_to_nid() in modules across architectures, mark this driver as 'bool' instead of 'tristate'. Cc: Rob Herring <robh@kernel.org> Cc: Oliver O'Halloran <oohall@gmail.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-07libnvdimm: Add device-tree based driverOliver O'Halloran3-0/+130
This patch adds peliminary device-tree bindings for persistent memory regions. The driver registers a libnvdimm bus for each pmem-region node and each address range under the node is converted to a region within that bus. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-07libnvdimm: Add of_node to region and bus descriptorsOliver O'Halloran2-0/+2
We want to be able to cross reference the region and bus devices with the device tree node that they were spawned from. libNVDIMM handles creating the actual devices for these internally, so we need to pass in a pointer to the relevant node in the descriptor. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-07libnvdimm, region: quiet region probeDan Williams1-2/+2
The message about constraining number of online cpus to be less than or equal to ND_MAX_LANES (256) is only useful for block-aperture configurations and BTT. Make it debug since it is only relevant when debugging performance. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-07libnvdimm, namespace: use a safe lookup for dimm device nameDan Williams1-2/+2
The following NULL dereference results from incorrectly assuming that ndd is valid in this print: struct nvdimm_drvdata *ndd = to_ndd(&nd_region->mapping[i]); /* * Give up if we don't find an instance of a uuid at each * position (from 0 to nd_region->ndr_mappings - 1), or if we * find a dimm with two instances of the same uuid. */ dev_err(&nd_region->dev, "%s missing label for %pUb\n", dev_name(ndd->dev), nd_label->uuid); BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: nd_region_register_namespaces+0xd67/0x13c0 [libnvdimm] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 43 PID: 673 Comm: kworker/u609:10 Not tainted 4.16.0-rc4+ #1 [..] RIP: 0010:nd_region_register_namespaces+0xd67/0x13c0 [libnvdimm] [..] Call Trace: ? devres_add+0x2f/0x40 ? devm_kmalloc+0x52/0x60 ? nd_region_activate+0x9c/0x320 [libnvdimm] nd_region_probe+0x94/0x260 [libnvdimm] ? kernfs_add_one+0xe4/0x130 nvdimm_bus_probe+0x63/0x100 [libnvdimm] Switch to using the nvdimm device directly. Fixes: 0e3b0d123c8f ("libnvdimm, namespace: allow multiple pmem...") Cc: <stable@vger.kernel.org> Reported-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-07libnvdimm, dimm: fix dpa reservation vs uninitialized label areaDan Williams1-3/+5
At initialization time the 'dimm' driver caches a copy of the memory device's label area and reserves address space for each of the namespaces defined. However, as can be seen below, the reservation occurs even when the index blocks are invalid: nvdimm nmem0: nvdimm_init_config_data: len: 131072 rc: 0 nvdimm nmem0: config data size: 131072 nvdimm nmem0: __nd_label_validate: nsindex0 labelsize 1 invalid nvdimm nmem0: __nd_label_validate: nsindex1 labelsize 1 invalid nvdimm nmem0: : pmem-6025e505: 0x1000000000 @ 0xf50000000 reserve <-- bad Gate dpa reservation on the presence of valid index blocks. Cc: <stable@vger.kernel.org> Fixes: 4a826c83db4e ("libnvdimm: namespace indices: read and validate") Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-06Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-blockLinus Torvalds4-6/+5
Pull block layer updates from Jens Axboe: "It's a pretty quiet round this time, which is nice. This contains: - series from Bart, cleaning up the way we set/test/clear atomic queue flags. - series from Bart, fixing races between gendisk and queue registration and removal. - set of bcache fixes and improvements from various folks, by way of Michael Lyle. - set of lightnvm updates from Matias, most of it being the 1.2 to 2.0 transition. - removal of unused DIO flags from Nikolay. - blk-mq/sbitmap memory ordering fixes from Omar. - divide-by-zero fix for BFQ from Paolo. - minor documentation patches from Randy. - timeout fix from Tejun. - Alpha "can't write a char atomically" fix from Mikulas. - set of NVMe fixes by way of Keith. - bsg and bsg-lib improvements from Christoph. - a few sed-opal fixes from Jonas. - cdrom check-disk-change deadlock fix from Maurizio. - various little fixes, comment fixes, etc from various folks" * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits) blk-mq: Directly schedule q->timeout_work when aborting a request blktrace: fix comment in blktrace_api.h lightnvm: remove function name in strings lightnvm: pblk: remove some unnecessary NULL checks lightnvm: pblk: don't recover unwritten lines lightnvm: pblk: implement 2.0 support lightnvm: pblk: implement get log report chunk lightnvm: pblk: rename ppaf* to addrf* lightnvm: pblk: check for supported version lightnvm: implement get log report chunk helpers lightnvm: make address conversions depend on generic device lightnvm: add support for 2.0 address format lightnvm: normalize geometry nomenclature lightnvm: complete geo structure with maxoc* lightnvm: add shorten OCSSD version in geo lightnvm: add minor version to generic geometry lightnvm: simplify geometry structure lightnvm: pblk: refactor init/exit sequences lightnvm: Avoid validation of default op value lightnvm: centralize permission check for lightnvm ioctl ...
2018-04-03libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device'Dan Williams2-1/+8
For debug, it is useful for bus providers to be able to retrieve the 'struct device' associated with an nd_region instance that it registered. We already have to_nd_region() to perform the reverse cast operation, in fact its duplicate declaration can be removed from the private drivers/nvdimm/nd.h header. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-03dax: introduce CONFIG_DAX_DRIVERDan Williams1-1/+1
In support of allowing device-mapper to compile out idle/dead code when there are no dax providers in the system, introduce the DAX_DRIVER symbol. This is selected by all leaf drivers that device-mapper might be layered on top. This allows device-mapper to conditionally 'select DAX' only when a provider is present. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-22libnvdimm, nfit: fix persistence domain reportingDan Williams1-4/+6
The persistence domain is a point in the platform where once writes reach that destination the platform claims it will make them persistent relative to power loss. In the ACPI NFIT this is currently communicated as 2 bits in the "NFIT - Platform Capabilities Structure". The bits comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM Durability on Power Loss Capable". Commit 96c3a239054a "libnvdimm: expose platform persistence attr..." shows the persistence domain as flags, but it's really an enumerated hierarchy. Fix this newly introduced user ABI to show the closest available persistence domain before userspace develops dependencies on seeing, or needing to develop code to tolerate, the raw NFIT flags communicated through the libnvdimm-generic region attribute. Fixes: 96c3a239054a ("libnvdimm: expose platform persistence attr...") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-22libnvdimm, region: hide persistence_domain when unknownDan Williams1-0/+7
Similar to other region attributes, do not emit the persistence_domain attribute if its contents are empty. Fixes: 96c3a239054a ("libnvdimm: expose platform persistence attr...") Cc: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-17block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h>Bart Van Assche1-1/+0
It happens often while I'm preparing a patch for a block driver that I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT available for this driver? Do I have to introduce definitions of these constants before I can use these constants? To avoid this confusion, move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the <linux/blkdev.h> header file such that these become available for all block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h header file conditional to avoid that including that header file after <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE redefinition. Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have not been removed from uapi header files nor from NAND drivers in which these constants are used for another purpose than converting block layer offsets and sizes into a number of sectors. Cc: David S. Miller <davem@davemloft.net> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-15libnvdimm, label: change nvdimm_num_label_slots per UEFI 2.7Toshi Kani1-10/+24
sizeof_namespace_index() fails when NVDIMM devices have the minimum 1024 bytes label storage area. nvdimm_num_label_slots() returns 3 slots while the area is only big enough for 2 slots. Change nvdimm_num_label_slots() to calculate a number of label slots according to UEFI 2.7 spec. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-15libnvdimm, label: change min label storage size per UEFI 2.7Toshi Kani1-1/+1
UEFI 2.7 defines in page 758 that: Initial Label Storage Area Configuration : The minimum size of the Label Storage Area is large enough to hold 2 index blocks and 2 labels. The mininum index block size is 256 bytes, and the minimum label size is also 256 bytes. Change ND_LABEL_MIN_SIZE to (256 * 4) so that NVDIMM devices with the minimum label storage area do not fail with the size check in nvdimm_init_config_data(). Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-15libnvdimm, pmem: use module_nd_driverJohannes Thumshirn1-11/+1
Use module_nd_driver() instead of having module_init() and module_exit() callbacks which just call nd_driver_register() and nd_driver_unregister(). Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-14libnvdimm: remove redundant assignment to pointer 'dev'Colin Ian King1-1/+1
Pointer dev is being assigned a value that is never read, it is being re-assigned the same value later on, hence the initialization is redundant and can be removed. Cleans up clang warning: drivers/nvdimm/pfn_devs.c:307:17: warning: Value stored to 'dev' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-09block: Use blk_queue_flag_*() in drivers instead of queue_flag_*()Bart Van Assche3-4/+4
This patch has been generated as follows: for verb in set_unlocked clear_unlocked set clear; do replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \ $(git grep -lw queue_flag_${verb} drivers block/bsg*) done Except for protecting all queue flag changes with the queue lock this patch does not change any functionality. Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-08libnvdimm, {btt, blk}: do integrity setup before add_disk()Vishal Verma2-4/+2
Prior to 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") we needed to temporarily add a zero-capacity disk before registering for blk-integrity. But adding a zero-capacity disk caused the partition table scanning to bail early, and this resulted in partitions not coming up after a probe of the BTT or blk namespaces. We can now register for integrity before the disk has been added, and this fixes the rescan problems. Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk") Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-06libnvdimm: remove redundant __func__ in dev_dbgDan Williams10-93/+77
Dynamic debug can be instructed to add the function name to the debug output using the +f switch, so there is no need for the libnvdimm modules to do it again. If a user decides to add the +f switch for libnvdimm's dynamic debug this results in double prints of the function name. Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-03libnvdimm: re-enable deep flush for pmem devices via fsync()Dave Jiang1-2/+1
Re-enable deep flush so that users always have a way to be sure that a write makes it all the way out to media. Writes from the PMEM driver always arrive at the NVDIMM since movnt is used to bypass the cache, and the driver relies on the ADR (Asynchronous DRAM Refresh) mechanism to flush write buffers on power failure. The Deep Flush mechanism is there to explicitly write buffers to protect against (rare) ADR failure. This change prevents a regression in deep flush behavior so that applications can continue to depend on fsync() as a mechanism to trigger deep flush in the filesystem-DAX case. Fixes: 06e8ccdab15f4 ("acpi: nfit: Add support for detect platform CPU cache...") Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-02-28block: Add 'lock' as third argument to blk_alloc_queue_node()Bart Van Assche1-1/+1
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-03Merge branch 'for-4.16/nfit' into libnvdimm-for-nextRoss Zwisler2-1/+16
2018-02-03Merge branch 'for-4.16/dax' into libnvdimm-for-nextRoss Zwisler6-74/+268
2018-02-03libnvdimm, namespace: remove redundant initialization of 'nd_mapping'Colin Ian King1-1/+1
Pointer nd_mapping is being initialized to a value that is never read, instead it is being updated to a new value in all the cases where it is being read afterwards, hence the initialization is redundant and can be removed. Cleans up clang warning: drivers/nvdimm/namespace_devs.c:2411:21: warning: Value stored to 'nd_mapping' during its initialization is never rea Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
2018-02-02libnvdimm: expose platform persistence attribute for nd_regionDave Jiang1-0/+13
Providing a sysfs attribute for nd_region that shows the persistence capabilities for the platform. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
2018-02-02acpi: nfit: Add support for detect platform CPU cache flush on power lossDave Jiang1-1/+3
In ACPI 6.2a the platform capability structure has been added to the NFIT tables. That provides software the ability to determine whether a system supports the auto flushing of CPU caches on power loss. If the capability is supported, we do not need to do dax_flush(). Plumbing the path to set the property on per region from the NFIT tables. This patch depends on the ACPI NFIT 6.2a platform capabilities support code in include/acpi/actbl1.h. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
2018-01-20libnvdimm, btt: fix uninitialized err_lockJeff Moyer1-1/+1
When a sector mode namespace is initially created, the arena's err_lock is not initialized. If, on the other hand, the namespace already exists, the mutex is initialized. To fix the issue, I moved the mutex initialization into the arena_alloc, which is called by both discover_arenas and create_arenas. This was discovered on an older kernel where mutex_trylock checks the count to determine whether the lock is held. Because the data structure is kzalloc-d, that count was 0 (held), and I/O to the device would hang forever waiting for the lock to be released (see btt_write_pg, for example). Current kernels have a different mutex implementation that checks for a non-null owner, and so this doesn't show up as a problem. If that lock were ever contended, it might cause issues, but you'd have to be really unlucky, I think. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08memremap: change devm_memremap_pages interface to use struct dev_pagemapChristoph Hellwig4-34/+40
This new interface is similar to how struct device (and many others) work. The caller initializes a 'struct dev_pagemap' as required and calls 'devm_memremap_pages'. This allows the pagemap structure to be embedded in another structure and thus container_of can be used. In this way application specific members can be stored in a containing struct. This will be used by the P2P infrastructure and HMM could probably be cleaned up to use it as well (instead of having it's own, similar 'hmm_devmem_pages_create' function). Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-12-22libnvdimm, btt: Fix an incompatibility in the log layoutVishal Verma2-35/+211
Due to a spec misinterpretation, the Linux implementation of the BTT log area had different padding scheme from other implementations, such as UEFI and NVML. This fixes the padding scheme, and defaults to it for new BTT layouts. We attempt to detect the padding scheme in use when probing for an existing BTT. If we detect the older/incompatible scheme, we continue using it. Reported-by: Juston Li <juston.li@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Fixes: 5212e11fde4d ("nd_btt: atomic sector updates") Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-12-22libnvdimm, btt: add a couple of missing kernel-doc linesVishal Verma1-0/+2
Recent updates to btt.h neglected to add corresponding kernel-doc lines for new structure members. Add them. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-12-20libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignmentDan Williams1-3/+12
The following namespace configuration attempt: # ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f libndctl: ndctl_dax_enable: dax0.1: failed to enable Error: namespace0.0: failed to enable failed to reconfigure namespace: No such device or address ...fails when the backing memory range is not physically aligned to 1G: # cat /proc/iomem | grep Persistent 210000000-30fffffff : Persistent Memory (legacy) In the above example the 4G persistent memory range starts and ends on a 256MB boundary. We handle this case correctly when needing to handle cases that violate section alignment (128MB) collisions against "System RAM", and we simply need to extend that padding/truncation for the 1GB alignment use case. Cc: <stable@vger.kernel.org> Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...") Reported-and-tested-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-12-20libnvdimm, pfn: fix start_pad handling for aligned namespacesDan Williams1-2/+3
The alignment checks at pfn driver startup fail to properly account for the 'start_pad' in the case where the namespace is misaligned relative to its internal alignment. This is typically triggered in 1G aligned namespace, but could theoretically trigger with small namespace alignments. When this triggers the kernel reports messages of the form: dax2.1: bad offset: 0x3c000000 dax disabled align: 0x40000000 Cc: <stable@vger.kernel.org> Fixes: 1ee6667cd8d1 ("libnvdimm, pfn, dax: fix initialization vs autodetect...") Reported-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-12-04nfit, libnvdimm: deprecate the generic SMART ioctlDan Williams1-3/+0
The kernel's ND_IOCTL_SMART_THRESHOLD command is based on a payload definition that has become broken / out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL definition. Deprecate the use of the ND_IOCTL_SMART_THRESHOLD command in favor of the ND_CMD_CALL approach taken by NVDIMM_FAMILY_{HPE,MSFT}, where we can manage the per-vendor variance in userspace. In a couple years, when the new scheme is widely deployed in userspace packages, the ND_IOCTL_SMART_THRESHOLD support can be removed. For now we prevent new binaries from compiling against the kernel header definitions, but kernel still compatible with old binaries. The libndctl.h [1] header is now the authoritative interface definition for NVDIMM SMART. [1]: https://github.com/pmem/ndctl Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-17Merge tag 'libnvdimm-for-4.15' of ↵Linus Torvalds12-283/+351
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - 957ac8c421ad ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - a39e596baa07 ("xfs: support for synchronous DAX faults") and 7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...
2017-11-16bdi: introduce BDI_CAP_SYNCHRONOUS_IOMinchan Kim2-0/+5
As discussed at https://lkml.kernel.org/r/<20170728165604.10455-1-ross.zwisler@linux.intel.com> someday we will remove rw_page(). If so, we need something to detect such super-fast storage on which synchronous IO operations like the current rw_page are always a win. Introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices. With it, we could use various optimization techniques. Link: http://lkml.kernel.org/r/1505886205-9671-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "The usual rocket-science from trivial tree for 4.15" * 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: MAINTAINERS: relinquish kconfig MAINTAINERS: Update my email address treewide: Fix typos in Kconfig kfifo: Fix comments init/Kconfig: Fix module signing document location misc: ibmasm: Return error on error path HID: logitech-hidpp: fix mistake in printk, "feeback" -> "feedback" MAINTAINERS: Correct path to uDraw PS3 driver tracing: Fix doc mistakes in trace sample tracing: Kconfig text fixes for CONFIG_HWLAT_TRACER MIPS: Alchemy: Remove reverted CONFIG_NETLINK_MMAP from db1xxx_defconfig mm/huge_memory.c: fixup grammar in comment lib/xz: Add fall-through comments to a switch statement
2017-11-02libnvdimm, badrange: remove a WARN for list_emptyVishal Verma1-1/+0
Now that we're reusing the badrange functions for nfit_test, and that exposes badrange injection/clearing to userspace via the DSM paths, it is plausible that a user may call the clear DSM multiple times. Since it is harmless to do so, we can remove the WARN in badrange_forget. Cc: Dave Jiang <dave.jiang@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-02libnvdimm: move poison list functions to a new 'badrange' fileDave Jiang6-277/+311
nfit_test needs to use the poison list manipulation code as well. Make it more generic and in the process rename poison to badrange, and move all the related helpers to a new file. Signed-off-by: Dave Jiang <dave.jiang@intel.com> [vishal: Add badrange.o to nfit_test's Kbuild] [vishal: add a missed include in bus.c for the new badrange functions] [vishal: rename all instances of 'be' to 'bre'] Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman2-0/+2
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12treewide: Fix typos in KconfigMasanari Iida1-1/+1
This patch fixes some spelling typos found in Kconfig files. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-10-07libnvdimm, namespace: make a couple of functions staticColin Ian King1-2/+2
The functions create_namespace_pmem and create_namespace_blk are local to the source and do not need to be in global scope, so make them static. Cleans up sparse warnings: symbol 'create_namespace_pmem' was not declared. Should it be static? symbol 'create_namespace_blk' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-10-07libnvdimm: introduce 'flags' attribute for DIMM 'lock' and 'alias' statusDan Williams1-0/+12
Given that we now how have two mechanisms for a DIMM to indicate that it is locked: * NVDIMM_FAMILY_INTEL 'get_config_size' _DSM command * ACPI 6.2 Label Storage Read / Write commands ...export the generic libnvdimm DIMM status in a new 'flags' attribute. This attribute can also reflect the 'alias' state which indicates whether the nvdimm core is enforcing labels for aliased-region-capacity that the given dimm is an interleave-set member. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-10-07acpi, nfit: add support for the _LSI, _LSR, and _LSW label methodsDan Williams1-0/+2
ACPI 6.2 adds support for named methods to access the label storage area of an NVDIMM. We prefer these new methods if available and otherwise fallback to the NVDIMM_FAMILY_INTEL _DSMs. The kernel ioctls, ND_IOCTL_{GET,SET}_CONFIG_{SIZE,DATA}, remain generic and the driver translates the 'package' payloads into the NVDIMM_FAMILY_INTEL 'buffer' format to maintain compatibility with existing userspace and keep the output buffer parsing code in the driver common. The output payloads are mostly compatible save for the 'label area locked' status that moves from the 'config_size' (_LSI) command to the 'config_read' (_LSR) command status. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-09-28libnvdimm, namespace: fix label initialization to use valid seq numbersDan Williams1-1/+1
The set of valid sequence numbers is {1,2,3}. The specification indicates that an implementation should consider 0 a sign of a critical error: UEFI 2.7: 13.19 NVDIMM Label Protocol Software never writes the sequence number 00, so a correctly check-summed Index Block with this sequence number probably indicates a critical error. When software discovers this case it treats it as an invalid Index Block indication. While the expectation is that the invalid block is just thrown away, the Robustness Principle says we should fix this to make both sequence numbers valid. Fixes: f524bf271a5c ("libnvdimm: write pmem label set") Cc: <stable@vger.kernel.org> Reported-by: Juston Li <juston.li@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>