summaryrefslogtreecommitdiff
path: root/drivers/lightnvm/pblk-read.c
AgeCommit message (Collapse)AuthorFilesLines
2019-05-06lightnvm: pblk: simplify partial read pathIgor Konopko1-254/+85
This patch changes the approach to handling partial read path. In old approach merging of data from round buffer and drive was fully made by drive. This had some disadvantages - code was complex and relies on bio internals, so it was hard to maintain and was strongly dependent on bio changes. In new approach most of the handling is done mostly by block layer functions such as bio_split(), bio_chain() and generic_make request() and generally is less complex and easier to maintain. Below some more details of the new approach. When read bio arrives, it is cloned for pblk internal purposes. All the L2P mapping, which includes copying data from round buffer to bio and thus bio_advance() calls is done on the cloned bio, so the original bio is untouched. If we found that we have partial read case, we still have original bio untouched, so we can split it and continue to process only first part of it in current context, when the rest will be called as separate bio request which is passed to generic_make_request() for further processing. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Heiner Litz <hlitz@ucsc.edu> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06lightnvm: pblk: IO path reorganizationIgor Konopko1-45/+38
This patch is made in order to prepare read path for new approach to partial read handling, which is simpler in compare with previous one. The most important change is to move the handling of completed and failed bio from the pblk_make_rq() to particular read and write functions. This is needed, since after partial read path changes, sometimes completed/failed bio will be different from original one, so we cannot do this any longer in pblk_make_rq(). Other changes are small read path refactor in order to reduce the size of the following patch with partial read changes. Generally the goal of this patch is not to change the functionality, but just to prepare the code for the following changes. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06lightnvm: pblk: GC error handlingIgor Konopko1-1/+0
Currently when there is an IO error (or similar) on GC read path, pblk still move the line, which was currently under GC process to free state. Such a behaviour can lead to silent data mismatch issue. With this patch, the line which was under GC process on which some IO errors occurred, will be putted back to closed state (instead of free state as it was without this patch) and the L2P mapping for such a failed sectors will not be updated. Then in case of any user IOs to such a failed sectors, pblk would be able to return at least real IO error instead of stale data as it is right now. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06lightnvm: pblk: set proper read status in bioIgor Konopko1-6/+5
Currently in case of read errors, bi_status is not set properly which leads to returning inproper data to layers above. This patch fix that by setting proper status in case of read errors. Also remove unnecessary warn_once(), which does not make sense in that place, since user bio is not used for interation with drive and thus bi_status will not be set here. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-06lightnvm: pblk: reduce L2P memory footprintIgor Konopko1-1/+1
Currently L2P map size is calculated based on the total number of available sectors, which is redundant, since it contains mapping for overprovisioning as well (11% by default). Change this size to the real capacity and thus reduce the memory footprint significantly - with default op value it is approx. 110MB of DRAM less for every 1TB of media. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-10lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecsHans Holmberg1-22/+28
The introduction of multipage bio vectors broke pblk's partial read logic due to it not being prepared for multipage bio vectors. Use bio vector iterators instead of direct bio vector indexing. Fixes: 07173c3ec276 ("block: enable multipage bvecs") Reported-by: Klaus Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Updated description. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-11lightnvm: pblk: support packed metadataIgor Konopko1-0/+6
pblk performs recovery of open lines by storing the LBA in the per LBA metadata field. Recovery therefore only works for drives that has this field. This patch adds support for packed metadata, which store l2p mapping for open lines in last sector of every write unit and enables drives without per IO metadata to recover open lines. After this patch, drives with OOB size <16B will use packed metadata and metadata size larger than16B will continue to use the device per IO metadata. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-11lightnvm: pblk: add helpers for OOB metadataIgor Konopko1-17/+31
pblk currently assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive in the future. After this patch only OOB metadata equal to 16 bytes is supported. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-11lightnvm: pblk: move lba list to partial read contextIgor Konopko1-15/+5
Currently DMA allocated memory is reused on partial read for lba_list_mem and lba_list_media arrays. In preparation for dynamic DMA pool sizes we need to move this arrays into pblk_pr_ctx structures. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: pblk: add SPDX license tagJavier González1-0/+1
Add GLP-2.0 SPDX license tag to all pblk files Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: pblk: encapsulate rqd dma allocationsJavier González1-21/+10
dma allocations for ppa_list and meta_list in rqd are replicated in several places across the pblk codebase. Make helpers to encapsulate creation and deletion to simplify the code. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: pblk: fix up prints in pblk_read_check_randHans Holmberg1-2/+2
The prefix when printing ppas in pblk_read_check_rand should be "rnd" not "seq", so fix this so we can differentiate between lba missmatches in random and sequential reads. Also change the print order so we align with pblk_read_check_seq, printing read lba first. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: introduce nvm_rq_to_ppa_listHans Holmberg1-7/+4
There is a number of places in the lightnvm subsystem where the user iterates over the ppa list. Before iterating, the user must know if it is a single or multiple LBAs due to vector commands using either the nvm_rq ->ppa_addr or ->ppa_list fields on command submission, which leads to open-coding the if/else statement. Instead of having multiple if/else's, move it into a function that can be called by its users. A nice side effect of this cleanup is that this patch fixes up a bunch of cases where we don't consider the single-ppa case in pblk. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: pblk: improve line helpersJavier González1-2/+2
The current helper to obtain a line from a ppa returns the line id, which requires its users to explicitly retrieve the pointer to the line with the id. Make 2 different helpers: one returning the line id and one returning the line directly. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: pblk: refactor put line fn on read completionMatias Bjørling1-17/+1
The read completion path uses the put_line variable to decide whether the reference on a line should be released. The function name used for that is pblk_read_put_rqd_kref, which could lead one to believe that it is the rqd that is releasing the reference, while it is the line reference that is put. Rename and also split the function in two to account for either rqd or single ppa callers and move it to core, such that it later can be used in the write path as well. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Reviewed-by: Heiner Litz <hlitz@ucsc.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: pblk: remove size and out of bounds read checkMatias Bjørling1-7/+0
The I/O size and capacity checks are already done by the block layer. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: pblk: unify vector max req constantsMatias Bjørling1-3/+3
Both NVM_MAX_VLBA and PBLK_MAX_REQ_ADDRS define how many LBAs that are available in a vector command. pblk uses them interchangeably in its implementation. Use NVM_MAX_VLBA as the main one and remove usages of PBLK_MAX_REQ_ADDRS. Also remove the power representation that only has one user, and instead calculate it at runtime. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-09lightnvm: combine 1.2 and 2.0 command flagsMatias Bjørling1-7/+1
Add nvm_set_flags helper to enable core to appropriately set the command flags for read/write/erase depending on which version a drive supports. The flags arguments can be distilled into the access hint, scrambling, and program/erase suspend. Replace the access hint with a "is_seq" parameter. The rest of the flags are dependent on the command opcode, which is trivial to detect and set. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-18block: Add and use op_stat_group() for indexing disk_stat fields.Michael Callahan1-2/+3
Add and use a new op_stat_group() function for indexing partition stat fields rather than indexing them by rq_data_dir() or bio_data_dir(). This function works similarly to op_is_sync() in that it takes the request::cmd_flags or bio::bi_opf flags and determines which stats should et updated. In addition, the second parameter to generic_start_io_acct() and generic_end_io_acct() is now a REQ_OP rather than simply a read or write bit and it uses op_stat_group() on the parameter to determine the stat group. Note that the partition in_flight counts are not part of the per-cpu statistics and as such are not indexed via this function. It's now indexed by op_is_write(). tj: Refreshed on top of v4.17. Updated to pass around REQ_OP. Signed-off-by: Michael Callahan <michaelcallahan@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13lightnvm: pblk: add asynchronous partial readHeiner Litz1-63/+120
In the read path, partial reads are currently performed synchronously which affects performance for workloads that generate many partial reads. This patch adds an asynchronous partial read path as well as the required partial read ctx. Signed-off-by: Heiner Litz <hlitz@ucsc.edu> Reviewed-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13lightnvm: pblk: expose generic disk name on pr_* msgsMatias Bjørling1-12/+13
The error messages in pblk does not say which pblk instance that a message occurred from. Update each error message to reflect the instance it belongs to, and also prefix it with pblk, so we know the message comes from the pblk module. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13lightnvm: pblk: fix read_bitmap for 32bit archsMatias Bjørling1-7/+7
If using pblk on a 32bit architecture, and there is a need to perform a partial read, the partial read bitmap will only have allocated 32 entries, where as 64 are needed. Make sure that the read_bitmap is initialized to 64bits on 32bit architectures as well. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-13lightnvm: move NVM_DEBUG to pblkMatias Bjørling1-14/+14
There is no users of CONFIG_NVM_DEBUG in the LightNVM subsystem. All users are in pblk. Rename NVM_DEBUG to NVM_PBLK_DEBUG and enable only for pblk. Also fix up the CONFIG_NVM_PBLK entry to follow the code style for Kconfig files. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01lightnvm: pblk: remove unnecessary bio_get/putJavier González1-37/+28
In the read path, pblk gets a reference to the incoming bio and puts it after ending the bio. Though this behavior is correct, it is unnecessary since pblk is the one putting the bio, therefore, it cannot disappear underneath it. Removing this reference, allows to clean up rqd->bio and avoids pointer bouncing for the different read paths. Now, the incoming bio always resides in the read context and pblk's internal bios (if any) reside in rqd->bio. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01lightnvm: fix partial read error pathIgor Konopko1-4/+4
When error occurs during bio_add_page on partial read path, pblk tries to free pages twice. Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01lightnvm: pblk: remove unnecessary indirectionJavier González1-12/+2
Call nvm_submit_io directly and remove an unnecessary indirection on the read path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01lightnvm: pblk: improve error msg on corrupted LBAsJavier González1-10/+32
In the event of a mismatch between the read LBA and the metadata pointer reported by the device, improve the error message to be able to detect the offending physical address (PPA) mapped to the corrupted LBA. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01lightnvm: pblk: check read lba on gc pathJavier González1-6/+33
Check that the lba stored in the LBA metadata is correct in the GC path too. This requires a new helper function to check random reads in the vector read. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-31lightnvm: convert to bioset_init()/mempool_init()Kent Overstreet1-2/+2
Convert lightnvm to embedded bio sets. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-30lightnvm: simplify geometry structureJavier González1-1/+1
Currently, the device geometry is stored redundantly in the nvm_id and nvm_geo structures at a device level. Moreover, when instantiating targets on a specific number of LUNs, these structures are replicated and manually modified to fit the instance channel and LUN partitioning. Instead, create a generic geometry around nvm_geo, which can be used by (i) the underlying device to describe the geometry of the whole device, and (ii) instances to describe their geometry independently. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05lightnvm: pblk: add iostat supportJavier González1-12/+19
Since pblk registers its own block device, the iostat accounting is not automatically done for us. Therefore, add the necessary accounting logic to satisfy the iostat interface. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05lightnvm: pblk: compress and reorder helper functionsJavier González1-2/+2
Through time, we have generated some redundant helper functions. Refactor them to eliminate redundant and unnecessary code. Also, reorder them to improve readability Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: implement generic path for sync I/OJavier González1-18/+3
Implement a generic path for sending sync I/O on LightNVM. This allows to reuse the standard synchronous path trough blk_execute_rq(), instead of implementing a wait_for_completion on the target side (e.g., pblk). Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: remove redundant check on read pathJavier González1-2/+2
A partial read I/O in pblk is an I/O where some sectors reside in the write buffer in main memory and some are persisted on the device. Such an I/O must at least contain 2 lbas, therefore checking for the case where a single lba is mapped is not necessary. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: guarantee line integrity on readsJavier González1-18/+53
When a line is recycled during garbage collection, reads can still be issued to the line. If the line is freed in the middle of this process, data corruption might occur. This patch guarantees that lines are not freed in the middle of reads that target them (lines). Specifically, we use the existing line reference to decide when a line is eligible for being freed after the recycle process. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: check lba sanity on read pathJavier González1-3/+48
As part of pblk's recovery scheme, we store the lba mapped to each physical sector on the device's out-of-bound (OOB) area. On the read path, we can use this information to validate that the data being delivered to the upper layers corresponds to the lba being requested. The cost of this check is an extra copy on the DMA region on the device and an extra comparison in the host, given that (i) the OOB area is being read together with the data in the media, and (ii) the DMA region allocated for the ppa list can be reused for the metadata stored on the OOB area. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: use rqd->end_io for completionJavier González1-3/+2
For consistency with the rest of pblk, use rqd->end_io to point to the function taking care of ending the request on the completion path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: refactor rqd alloc/freeJavier González1-2/+0
Refactor the rqd allocation and free functions so that all I/O types can use these helper functions. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: improve naming for internal req.Javier González1-3/+3
Each request type sent to the LightNVM subsystem requires different metadata. Until now, we have tailored this metadata based on write, read and erase commands. However, pblk uses different metadata for internal writes that do not hit the write buffer. Instead of abusing the metadata for reads, create a new request type - internal write to improve code readability. In the process, create internal values for each I/O type instead of abusing the READ/WRITE macros, as suggested by Christoph. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: put bio on bio completionJavier González1-1/+0
Simplify put bio by doing it on bio end_io instead of manually putting it on the completion path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: simplify data validity check on GCJavier González1-36/+43
When a line is selected for recycling by the garbage collector (GC), the line state changes and the invalid bitmap is frozen, preventing invalidations from happening. Throughout the GC, the L2P map is checked to verify that not data being recycled has been updated. The last check is done before the new map is being stored on the L2P table. Though this algorithm works, it requires a number of corner cases to be checked each time the L2P table is being updated. This complicates readability and is error prone in case that the recycling algorithm is modified. Instead, this patch makes the invalid bitmap accessible even when the line is being recycled. When recycled data is being remapped, it is enough to check the invalid bitmap for the line before updating the L2P table. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: refactor read lba sanity checkJavier González1-19/+10
Refactor lba sanity check on read path to avoid code duplication. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: remove checks on mempool alloc.Javier González1-8/+0
As part of the mempool audit on pblk, remove unnecessary mempool allocation checks on mempools. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: fix min size for page mempoolJavier González1-1/+1
pblk uses an internal page mempool for allocating pages on internal bios. The main two users of this memory pool are partial reads (reads with some sectors in cache and some on media) and padded writes, which need to add dummy pages to an existing bio already containing valid data (and with a large enough bioset allocated). In both cases, the maximum number of pages per bio is defined by the maximum number of physical sectors supported by the underlying device. This patch fixes a bad mempool allocation, where the min_nr of elements on the pool was fixed (to 16), which is lower than the maximum number of sectors supported by NVMe (as of the time for this patch). Instead, use the maximum number of allowed sectors reported by the device. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13lightnvm: pblk: use right flag for GC allocationJavier González1-2/+5
The data buffer for the GC path allocates virtual memory through vmalloc. When this change was introduced, a flag signaling kmalloc'ed memory was wrongly introduced. Use the right flag when creating a bio from this buffer. Fixes: de54e703a422 ("lightnvm: pblk: use vmalloc for GC data buffer") Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-28lightnvm: pblk: advance bio according to lba indexJavier González1-7/+16
When a lba either hits the cache or corresponds to an empty entry in the L2P table, we need to advance the bio according to the position in which the lba is located. Otherwise, we will copy data in the wrong page, thus causing data corruption for the application. In case of a cache hit, we assumed that bio->bi_iter.bi_idx would contain the correct index, but this is no necessarily true. Instead, use the local bio advance counter and iterator. This guarantees that lbas hitting the cache are copied into the right bv_page. In case of an empty L2P entry, we omitted to advance the bio. In the cases when the same I/O also contains a cache hit, data corresponding to this lba will be copied to the wrong bv_page. Fix this by advancing the bio as we do in the case of a cache hit. Fixes: a4bd217b4326 lightnvm: physical block device (pblk) target Signed-off-by: Javier González <javier@javigon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: verify that cache read is still validJavier González1-2/+1
When a read is directed to the cache, we risk that the lba has been updated during the time we made the L2P table lookup and the time we are actually reading form the cache. We intentionally not hold the L2P lock not to block other threads. While strict ordering is not a guarantee at this level (unless REQ_FLUSH has been previously issued), we have experience that some databases that have recently implemented direct I/O support, issue metadata reads very close to the writes, without issuing a fsync in the middle. An easy way to support them while they is to make an extra effort and check the L2P map right before reading the cache. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: use vmalloc for GC data bufferJavier González1-2/+2
For now, we allocate a per I/O buffer for GC data. Since the potential size of the buffer is 256KB and GC is not in the fast path, do this allocation with vmalloc. This puts lets pressure on the memory allocator at no performance cost. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-27lightnvm: pblk: fail gracefully on irrec. errorJavier González1-0/+3
Due to user writes being decoupled from media writes because of the need of an intermediate write buffer, irrecoverable media write errors lead to pblk stalling; user writes fill up the buffer and end up in an infinite retry loop. In order to let user writes fail gracefully, it is necessary for pblk to keep track of its own internal state and prevent further writes from being placed into the write buffer. This patch implements a state machine to keep track of internal errors and, in case of failure, fail further user writes in an standard way. Depending on the type of error, pblk will do its best to persist buffered writes (which are already acknowledged) and close down on a graceful manner. This way, data might be recovered by re-instantiating pblk. Such state machine paves out the way for a state-based FTL log. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-27lightnvm: pblk: set metadata list for all I/OsJavier González1-24/+21
Set a dma area for all I/Os in order to read/write from/to the metadata stored on the per-sector out-of-bound area. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>