summaryrefslogtreecommitdiff
path: root/drivers/block/rbd.c
AgeCommit message (Collapse)AuthorFilesLines
2018-06-15Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-4/+7
Pull ceph updates from Ilya Dryomov: "The main piece is a set of libceph changes that revamps how OSD requests are aborted, improving CephFS ENOSPC handling and making "umount -f" actually work (Zheng and myself). The rest is mostly mount option handling cleanups from Chengguang and assorted fixes from Zheng, Luis and Dongsheng. * tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client: (31 commits) rbd: flush rbd_dev->watch_dwork after watch is unregistered ceph: update description of some mount options ceph: show ino32 if the value is different with default ceph: strengthen rsize/wsize/readdir_max_bytes validation ceph: fix alignment of rasize ceph: fix use-after-free in ceph_statfs() ceph: prevent i_version from going back ceph: fix wrong check for the case of updating link count libceph: allocate the locator string with GFP_NOFAIL libceph: make abort_on_full a per-osdc setting libceph: don't abort reads in ceph_osdc_abort_on_full() libceph: avoid a use-after-free during map check libceph: don't warn if req->r_abort_on_full is set libceph: use for_each_request() in ceph_osdc_abort_on_full() libceph: defer __complete_request() to a workqueue libceph: move more code into __complete_request() libceph: no need to call flush_workqueue() before destruction ceph: flush pending works before shutdown super ceph: abort osd requests on force umount libceph: introduce ceph_osdc_abort_requests() ...
2018-06-04rbd: flush rbd_dev->watch_dwork after watch is unregisteredDongsheng Yang1-1/+1
There is a problem if we are going to unmap a rbd device and the watch_dwork is going to queue delayed work for watch: unmap Thread watch Thread timer do_rbd_remove cancel_tasks_sync(rbd_dev) queue_delayed_work for watch destroy_workqueue(rbd_dev->task_wq) drain_workqueue(wq) destroy other resources in wq call_timer_fn __queue_work() Then the delayed work escape the cancel_tasks_sync() and destroy_workqueue() and we will get an user-after-free call trace: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI Modules linked in: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 4.17.0-rc6+ #13 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__queue_work+0x6a/0x3b0 RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086 RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000 RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00 R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970 R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0 Call Trace: <IRQ> ? __queue_work+0x3b0/0x3b0 call_timer_fn+0x2d/0x130 run_timer_softirq+0x16e/0x430 ? tick_sched_timer+0x37/0x70 __do_softirq+0xd2/0x280 irq_exit+0xd5/0xe0 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 [ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch() either bails out early because the watch is UNREGISTERED at that point or just gets cancelled. ] Cc: stable@vger.kernel.org Fixes: 99d1694310df ("rbd: retry watch re-registration periodically") Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04libceph, rbd: add error handling for osd_req_op_cls_init()Chengguang Xu1-3/+6
Add proper error handling for osd_req_op_cls_init() to replace BUG_ON statement when failing from memory allocation. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04Merge tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-blockLinus Torvalds1-22/+22
Pull block updates from Jens Axboe: - clean up how we pass around gfp_t and blk_mq_req_flags_t (Christoph) - prepare us to defer scheduler attach (Christoph) - clean up drivers handling of bounce buffers (Christoph) - fix timeout handling corner cases (Christoph/Bart/Keith) - bcache fixes (Coly) - prep work for bcachefs and some block layer optimizations (Kent). - convert users of bio_sets to using embedded structs (Kent). - fixes for the BFQ io scheduler (Paolo/Davide/Filippo) - lightnvm fixes and improvements (Matias, with contributions from Hans and Javier) - adding discard throttling to blk-wbt (me) - sbitmap blk-mq-tag handling (me/Omar/Ming). - remove the sparc jsflash block driver, acked by DaveM. - Kyber scheduler improvement from Jianchao, making it more friendly wrt merging. - conversion of symbolic proc permissions to octal, from Joe Perches. Previously the block parts were a mix of both. - nbd fixes (Josef and Kevin Vigor) - unify how we handle the various kinds of timestamps that the block core and utility code uses (Omar) - three NVMe pull requests from Keith and Christoph, bringing AEN to feature completeness, file backed namespaces, cq/sq lock split, and various fixes - various little fixes and improvements all over the map * tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits) blk-mq: update nr_requests when switching to 'none' scheduler block: don't use blocking queue entered for recursive bio submits dm-crypt: fix warning in shutdown path lightnvm: pblk: take bitmap alloc. out of critical section lightnvm: pblk: kick writer on new flush points lightnvm: pblk: only try to recover lines with written smeta lightnvm: pblk: remove unnecessary bio_get/put lightnvm: pblk: add possibility to set write buffer size manually lightnvm: fix partial read error path lightnvm: proper error handling for pblk_bio_add_pages lightnvm: pblk: fix smeta write error path lightnvm: pblk: garbage collect lines with failed writes lightnvm: pblk: rework write error recovery path lightnvm: pblk: remove dead function lightnvm: pass flag on graceful teardown to targets lightnvm: pblk: check for chunk size before allocating it lightnvm: pblk: remove unnecessary argument lightnvm: pblk: remove unnecessary indirection lightnvm: pblk: return NVM_ error on failed submission lightnvm: pblk: warn in case of corrupted write buffer ...
2018-05-24block drivers/block: Use octal not symbolic permissionsJoe Perches1-22/+22
Convert the S_<FOO> symbolic permissions to their octal equivalents as using octal and not symbolic permissions is preferred by many as more readable. see: https://lkml.org/lkml/2016/8/2/1945 Done with automated conversion via: $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...> Miscellanea: o Wrapped modified multi-line calls to a single line where appropriate o Realign modified multi-line calls to open parenthesis Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-10libceph: add osd_req_op_extent_osd_data_bvecs()Ilya Dryomov1-1/+3
... and store num_bvecs for client code's convenience. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-04-16rbd: notrim map optionIlya Dryomov1-5/+14
Add an option to turn off discard and write zeroes offload support to avoid deprovisioning a fully provisioned image. When enabled, discard requests will fail with -EOPNOTSUPP, write zeroes requests will fall back to manually zeroing. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Hitoshi Kamei <hitoshi.kamei.xm@hitachi.com>
2018-04-16rbd: adjust queue limits for "fancy" stripingIlya Dryomov1-9/+8
In order to take full advantage of merging in ceph_file_to_extents(), allow object set sized I/Os. If the layout is not "fancy", an object set consists of just one object. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16rbd: avoid Wreturn-type warningsArnd Bergmann1-3/+3
In some configurations gcc cannot see that rbd_assert(0) leads to an unreachable code path: drivers/block/rbd.c: In function 'rbd_img_is_write': drivers/block/rbd.c:1397:1: error: control reaches end of non-void function [-Werror=return-type] drivers/block/rbd.c: In function '__rbd_obj_handle_request': drivers/block/rbd.c:2499:1: error: control reaches end of non-void function [-Werror=return-type] drivers/block/rbd.c: In function 'rbd_obj_handle_write': drivers/block/rbd.c:2471:1: error: control reaches end of non-void function [-Werror=return-type] As the rbd_assert() here shows has no extra information beyond the verbose BUG(), we can simply use BUG() directly in its place. This is reliably detected as not returning on any architecture, since it doesn't depend on the unlikely() comparison that confused gcc. Fixes: 3da691bf4366 ("rbd: new request handling code") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16rbd: support timeout in rbd_wait_state_locked()Dongsheng Yang1-1/+21
currently, the rbd_wait_state_locked() will wait forever if we can't get our state locked. Example: rbd map --exclusive test1 --> /dev/rbd0 rbd map test1 --> /dev/rbd1 dd if=/dev/zero of=/dev/rbd1 bs=1M count=1 --> IO blocked To avoid this problem, this patch introduce a timeout design in rbd_wait_state_locked(). Then rbd_wait_state_locked() will return error when we reach a timeout. This patch allow user to set the lock_timeout in rbd mapping. Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16rbd: refactor rbd_wait_state_locked()Ilya Dryomov1-17/+26
In preparation for lock_timeout option, make rbd_wait_state_locked() return error codes. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-10Merge tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-1472/+980
Pull ceph updates from Ilya Dryomov: "The big ticket items are: - support for rbd "fancy" striping (myself). The striping feature bit is now fully implemented, allowing mapping v2 images with non-default striping patterns. This completes support for --image-format 2. - CephFS quota support (Luis Henriques and Zheng Yan). This set is based on the new SnapRealm code in the upcoming v13.y.z ("Mimic") release. Quota handling will be rejected on older filesystems. - memory usage improvements in CephFS (Chengguang Xu). Directory specific bits have been split out of ceph_file_info and some effort went into improving cap reservation code to avoid OOM crashes. Also included a bunch of assorted fixes all over the place from Chengguang and others" * tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client: (67 commits) ceph: quota: report root dir quota usage in statfs ceph: quota: add counter for snaprealms with quota ceph: quota: cache inode pointer in ceph_snap_realm ceph: fix root quota realm check ceph: don't check quota for snap inode ceph: quota: update MDS when max_bytes is approaching ceph: quota: support for ceph.quota.max_bytes ceph: quota: don't allow cross-quota renames ceph: quota: support for ceph.quota.max_files ceph: quota: add initial infrastructure to support cephfs quotas rbd: remove VLA usage rbd: fix spelling mistake: "reregisteration" -> "reregistration" ceph: rename function drop_leases() to a more descriptive name ceph: fix invalid point dereference for error case in mdsc destroy ceph: return proper bool type to caller instead of pointer ceph: optimize memory usage ceph: optimize mds session register libceph, ceph: add __init attribution to init funcitons ceph: filter out used flags when printing unused open flags ceph: don't wait on writeback when there is no more dirty pages ...
2018-04-02rbd: remove VLA usageKyle Spiers1-4/+4
As part of the effort to remove VLAs from the kernel[1], this moves the literal values into the stack array calculation instead of using a variable for the sizing. The resulting size can be found from sizeof(buf). [1] https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Kyle Spiers <kyle@spiers.me> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: fix spelling mistake: "reregisteration" -> "reregistration"Colin Ian King1-1/+1
Trivial fix to spelling mistake in rdb_warn message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: get the latest osdmap when using an existing clientIlya Dryomov1-36/+33
Currently we request the latest osdmap only if ceph_pg_poolid_by_name() fails with -ENOENT. This is effective with newly created pools, but we also want to avoid attempting to map from pools that were recently deleted and report "pool does not exist" instead. (Such an attempt eventually fails in the OSD client after map check code kicks in, but the error message is confusing.) Request the latest osdmap unconditionally after bumping a ref on an existing client in rbd_client_find(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: move rbd_get_client() below rbd_put_client()Ilya Dryomov1-20/+20
... to avoid a forward declaration in the next commit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove redundant declaration of rbd_spec_put()Ilya Dryomov1-1/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: allow "fancy" stripingIlya Dryomov1-27/+2
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jason Dillaman <dillaman@redhat.com>
2018-04-02rbd: introduce OWN_BVECS data typeIlya Dryomov1-7/+149
If the layout is "fancy", we need to be able to rearrange the provided bio_vecs in stripe unit chunks to make it possible for the messenger to read/write directly from/to the provided data buffer, without employing a temporary data buffer for assembling the result. Higher level bio_vec arrays are generally immutable, so this requires copying into a private array. Only the bio_vecs themselves are shuffled around, not the actual data. OWN_BVECS doesn't own any pages. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove rbd_parent_request_{create,destroy}()Ilya Dryomov1-68/+6
rbd_parent_request_create() takes a ref on obj_req for child_img_req. There is no point in doing that because child_img_req is created on behalf of obj_req -- obj_req is the initiator and can't be completed before child_img_req. Open-code the rest of rbd_parent_request_create() and remove it along with rbd_parent_request_destroy(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: get rid of img_req->{offset,length}Ilya Dryomov1-18/+8
These are set, but no longer used. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove rbd_img_request_fill() and helpersIlya Dryomov1-98/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: switch to common striping frameworkIlya Dryomov1-23/+168
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: create+truncate for whole-object layered discardsIlya Dryomov1-1/+6
A whole-object layered discard is implemented as a truncate rather than a delete: a dummy object is needed to prevent the CoW machinery from kicking in. However, a truncate on a non-existent object is a no-op. If the object doesn't exist in HEAD, a discard request is effectively ignored, which violates our "discard zeroes data" promise and breaks REQ_OP_WRITE_ZEROES implementation. A non-exclusive create on an existing object is also a no-op, so the fix is to do a compound create+truncate instead. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: move to obj_req->img_extentsIlya Dryomov1-52/+98
In preparation for rbd "fancy" striping, replace obj_req->img_offset with obj_req->img_extents. A single starting offset isn't sufficient because we want only one OSD request per object and will merge adjacent object extents in ceph_file_to_extents(). The final object extent may map into multiple different byte ranges in the image. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: incorporate ceph_object_extentIlya Dryomov1-37/+34
obj_req->object_no -> obj_req->ex.oe_objno obj_req->offset -> obj_req->ex.oe_off obj_req->length -> obj_req->ex.oe_len ... and use ex for linking object requests to image requests. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: store data_type in img_req instead of obj_reqIlya Dryomov1-26/+8
All object requests are associated with an image request now -- avoid duplicating the same info in each object request. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove obj_req->flags fieldIlya Dryomov1-35/+0
There are no standalone (!IMG_DATA) object requests anymore. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove old request completion codeIlya Dryomov1-172/+3
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: new request completion codeIlya Dryomov1-13/+55
Do away with partial request completions and all the associated complexity. Individual object requests no longer need to be completed in order -- when the last one becomes ready, we complete the entire higher level request all at once. This also wraps up the conversion to a state machine model and eliminates the recursion described in commit 6d69bb536bac ("rbd: prevent kernel stack blow up on rbd map"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: update rbd_img_request_submit() signatureIlya Dryomov1-10/+3
It should be void now. Also, object requests are unlinked only in image request destructor, which can't run before rbd_img_request_put(), so no need for _safe. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: add img_req->op_type fieldIlya Dryomov1-63/+12
Store op_type in its own field instead of packing it into flags. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: simplify rbd_osd_req_create()Ilya Dryomov1-45/+14
No need to pass rbd_dev and op_type to rbd_osd_req_create(): there are no standalone (!IMG_DATA) object requests anymore and osd_req->r_flags can be set in rbd_osd_req_format_{read,write}(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove old request handling codeIlya Dryomov1-730/+4
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: new request handling codeIlya Dryomov1-77/+601
The notable changes are: - instead of explicitly stat'ing the object to see if it exists before issuing the write, send the write optimistically along with the stat in a single OSD request - zero copyup optimization - all object requests are associated with an image request and have a valid ->img_request pointer; there are no standalone (!IMG_DATA) object requests anymore - code is structured as a state machine (vs a bunch of callbacks with implicit state) Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: move from raw pages to bvec data descriptorsIlya Dryomov1-78/+77
In preparation for rbd "fancy" striping which requires bio_vec arrays, wire up BVECS data type and kill off PAGES data type. There is nothing wrong with using page vectors for copyup requests -- it's just less iterator boilerplate code to write for the new striping framework. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: get rid of img_req->copyup_pagesIlya Dryomov1-34/+9
The initiating object request is the proper owner -- save a bit of space. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: don't (ab)use obj_req->pages for stat requestsIlya Dryomov1-10/+5
obj_req->pages is for provided data buffers. stat requests are internal and should be NODATA. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: remove bio cloning helpersIlya Dryomov1-141/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02libceph, rbd: new bio handling code (aka don't clone bios)Ilya Dryomov1-27/+40
The reason we clone bios is to be able to give each object request (and consequently each ceph_osd_data/ceph_msg_data item) its own pointer to a (list of) bio(s). The messenger then initializes its cursor with cloned bio's ->bi_iter, so it knows where to start reading from/writing to. That's all the cloned bios are used for: to determine each object request's starting position in the provided data buffer. Introduce ceph_bio_iter to do exactly that -- store position within bio list (i.e. pointer to bio) + position within that bio (i.e. bvec_iter). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: start enums at 1 instead of 0Ilya Dryomov1-2/+4
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: set max_segment_size to UINT_MAXIlya Dryomov1-1/+1
Commit 21acdf45f495 ("rbd: set max_segments to USHRT_MAX") removed the limit on max_segments. Remove the limit on max_segment_size as well. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-03-17block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h>Bart Van Assche1-9/+0
It happens often while I'm preparing a patch for a block driver that I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT available for this driver? Do I have to introduce definitions of these constants before I can use these constants? To avoid this confusion, move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the <linux/blkdev.h> header file such that these become available for all block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h header file conditional to avoid that including that header file after <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE redefinition. Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have not been removed from uapi header files nor from NAND drivers in which these constants are used for another purpose than converting block layer offsets and sizes into a number of sectors. Cc: David S. Miller <davem@davemloft.net> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-09block: Use blk_queue_flag_*() in drivers instead of queue_flag_*()Bart Van Assche1-2/+2
This patch has been generated as follows: for verb in set_unlocked clear_unlocked set clear; do replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \ $(git grep -lw queue_flag_${verb} drivers block/bsg*) done Except for protecting all queue flag changes with the queue lock this patch does not change any functionality. Cc: Mike Snitzer <snitzer@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-29rbd: whitelist RBD_FEATURE_OPERATIONS feature bitIlya Dryomov1-1/+3
This feature bit restricts older clients from performing certain maintenance operations against an image (e.g. clone, snap create). krbd does not perform maintenance operations. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-01-29rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full()Ilya Dryomov1-2/+0
If rbd_img_request_submit() fails, parent_request->obj_request is NULLed out, triggering an assert in rbd_obj_request_put(): rbd_img_request_put(parent_request) rbd_parent_request_destroy rbd_obj_request_put(NULL) Just remove it -- parent_request->obj_request will be put in rbd_parent_request_destroy(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29rbd: use kmem_cache_zalloc() in rbd_img_request_create()Ilya Dryomov1-7/+2
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29rbd: obj_request->completion is unusedIlya Dryomov1-6/+1
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-09rbd: set max_segments to USHRT_MAXIlya Dryomov1-1/+1
Commit d3834fefcfe5 ("rbd: bump queue_max_segments") bumped max_segments (unsigned short) to max_hw_sectors (unsigned int). max_hw_sectors is set to the number of 512-byte sectors in an object and overflows unsigned short for 32M (largest possible) objects, making the block layer resort to handing us single segment (i.e. single page or even smaller) bios in that case. Cc: stable@vger.kernel.org Fixes: d3834fefcfe5 ("rbd: bump queue_max_segments") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-01-09rbd: reacquire lock should update lock owner client idFlorian Margaine1-5/+11
Otherwise, future operations on this RBD using exclusive-lock are going to require the lock from a non-existent client id. Cc: stable@vger.kernel.org Fixes: 14bb211d324d ("rbd: support updating the lock cookie without releasing the lock") Link: http://tracker.ceph.com/issues/19929 Signed-off-by: Florian Margaine <florian@platform.sh> [idryomov@gmail.com: rbd_set_owner_cid() call, __rbd_lock() helper] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>