kernel/linux.git/drivers/block/rbd.c, branch v4.4.171

rbd: set max_segments to USHRT_MAX

2018-01-17T08:35:30+00:00

commit 21acdf45f4958135940f0b4767185cf911d4b010 upstream. Commit d3834fefcfe5 ("rbd: bump queue_max_segments") bumped max_segments (unsigned short) to max_hw_sectors (unsigned int). max_hw_sectors is set to the number of 512-byte sectors in an object and overflows unsigned short for 32M (largest possible) objects, making the block layer resort to handing us single segment (i.e. single page or even smaller) bios in that case. Fixes: d3834fefcfe5 ("rbd: bump queue_max_segments") Signed-off-by: Ilya Dryomov Reviewed-by: Alex Elder Signed-off-by: Greg Kroah-Hartman

rbd: use GFP_NOIO for parent stat and data requests

2017-11-15T16:13:13+00:00

commit 1e37f2f84680fa7f8394fd444b6928e334495ccc upstream. rbd_img_obj_exists_submit() and rbd_img_obj_parent_read_full() are on the writeback path for cloned images -- we attempt a stat on the parent object to see if it exists and potentially read it in to call copyup. GFP_NOIO should be used instead of GFP_KERNEL here. Link: http://tracker.ceph.com/issues/22014 Signed-off-by: Ilya Dryomov Reviewed-by: David Disseldorp [idryomov@gmail.com: backport to < 4.9: context] Signed-off-by: Greg Kroah-Hartman

rbd: use GFP_NOIO consistently for request allocations

2016-04-20T06:42:09+00:00

commit 2224d879c7c0f85c14183ef82eb48bd875ceb599 upstream. As of 5a60e87603c4c533492c515b7f62578189b03c9c, RBD object request allocations are made via rbd_obj_request_create() with GFP_NOIO. However, subsequent OSD request allocations in rbd_osd_req_create*() use GFP_ATOMIC. With heavy page cache usage (e.g. OSDs running on same host as krbd client), rbd_osd_req_create() order-1 GFP_ATOMIC allocations have been observed to fail, where direct reclaim would have allowed GFP_NOIO allocations to succeed. Suggested-by: Vlastimil Babka Suggested-by: Neil Brown Signed-off-by: David Disseldorp Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman

rbd: don't put snap_context twice in rbd_queue_workfn()

2015-12-04T13:29:18+00:00

Commit 4e752f0ab0e8 ("rbd: access snapshot context and mapping size safely") moved ceph_get_snap_context() out of rbd_img_request_create() and into rbd_queue_workfn(), adding a ceph_put_snap_context() to the error path in rbd_queue_workfn(). However, rbd_img_request_create() consumes a ref on snapc, so calling ceph_put_snap_context() after a successful rbd_img_request_create() leads to an extra put. Fix it. Cc: stable@vger.kernel.org # 3.18+ Signed-off-by: Ilya Dryomov Reviewed-by: Josh Durgin

rbd: remove duplicate calls to rbd_dev_mapping_clear()

2015-11-02T22:36:48+00:00

Commit d1cf5788450e ("rbd: set mapping info earlier") defined rbd_dev_mapping_clear(), but, just a few days after, commit f35a4dee14c3 ("rbd: set the mapping size and features later") moved rbd_dev_mapping_set() calls and added another rbd_dev_mapping_clear() call instead of moving the old one. Around the same time, another duplicate was introduced in rbd_dev_device_release() - kill both. Signed-off-by: Ilya Dryomov

rbd: set device_type::release instead of device::release

2015-11-02T22:36:48+00:00

No point in providing an empty device_type::release callback and then setting device::release for each rbd_dev dynamically. Signed-off-by: Ilya Dryomov

rbd: don't free rbd_dev outside of the release callback

2015-11-02T22:36:48+00:00

struct rbd_device has struct device embedded in it, which means it's part of kobject universe and has an unpredictable life cycle. Freeing its memory outside of the release callback is flawed, yet commits 200a6a8be5db ("rbd: don't destroy rbd_dev in device release function") and 8ad42cd0c002 ("rbd: don't have device release destroy rbd_dev") moved rbd_dev_destroy() out to rbd_dev_image_release(). This commit reverts most of that, the key points are: - rbd_dev->dev is initialized in rbd_dev_create(), making it possible to use rbd_dev_destroy() - which is just a put_device() - both before we register with device core and after. - rbd_dev_release() (the release callback) is the only place we kfree(rbd_dev). It's also where we do module_put(), keeping the module unload race window as small as possible. - We pin the module in rbd_dev_create(), but only for mapping rbd_dev-s. Moving image related stuff out of struct rbd_device into another struct which isn't tied with sysfs and device core is long overdue, but until that happens, this will keep rbd module refcount (which users can observe with lsmod) sane. Fixes: http://tracker.ceph.com/issues/12697 Cc: Alex Elder Signed-off-by: Ilya Dryomov

rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails

2015-11-02T22:36:48+00:00

Returning pool id (i.e. >= 0) from a sysfs ->store() callback makes userspace think it needs to retry the write. Fix it - it's a leftover from the times when the equivalent of rbd_dev_create() was the first action in rbd_add(). Signed-off-by: Ilya Dryomov

rbd: drop null test before destroy functions

2015-11-02T22:36:47+00:00

Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @@ expression x; @@ -if (x != NULL) { \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); x = NULL; -} // Signed-off-by: Julia Lawall Signed-off-by: Ilya Dryomov

rbd: require stable pages if message data CRCs are enabled

2015-10-30T18:25:02+00:00

rbd requires stable pages, as it performs a crc of the page data before they are send to the OSDs. But since kernel 3.9 (patch 1d1d1a767206fbe5d4c69493b7e6d2a8d08cc0a0 "mm: only enforce stable page writes if the backing device requires it") it is not assumed anymore that block devices require stable pages. This patch sets the necessary flag to get stable pages back for rbd. In a ceph installation that provides multiple ext4 formatted rbd devices "bad crc" messages appeared regularly (ca 1 message every 1-2 minutes on every OSD that provided the data for the rbd) in the OSD-logs before this patch. After this patch this messages are pretty much gone (only ca 1-2 / month / OSD). Cc: stable@vger.kernel.org # 3.9+, needs backporting Signed-off-by: Ronny Hegewald [idryomov@gmail.com: require stable pages only in crc case, changelog] Signed-off-by: Ilya Dryomov