| Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 296dcc40f2f2e402facf7cd26cf3f2c8f4b17d47 upstream.
When the block device is opened with FMODE_EXCL, ref_count is set to -1.
This value doesn't get reset when the device is closed which means the
device cannot be opened again. Fix this by checking for refcount <= 0
in the release method.
Reported-and-tested-by: Stan Johnson <userm57@yahoo.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit de7b75d82f70c5469675b99ad632983c50b6f7e7 upstream.
LKP recently reported a hang at bootup in the floppy code:
[ 245.678853] INFO: task mount:580 blocked for more than 120 seconds.
[ 245.679906] Tainted: G T 4.19.0-rc6-00172-ga9f38e1 #1
[ 245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.682181] mount D 6372 580 1 0x00000004
[ 245.683023] Call Trace:
[ 245.683425] __schedule+0x2df/0x570
[ 245.683975] schedule+0x2d/0x80
[ 245.684476] schedule_timeout+0x19d/0x330
[ 245.685090] ? wait_for_common+0xa5/0x170
[ 245.685735] wait_for_common+0xac/0x170
[ 245.686339] ? do_sched_yield+0x90/0x90
[ 245.686935] wait_for_completion+0x12/0x20
[ 245.687571] __floppy_read_block_0+0xfb/0x150
[ 245.688244] ? floppy_resume+0x40/0x40
[ 245.688844] floppy_revalidate+0x20f/0x240
[ 245.689486] check_disk_change+0x43/0x60
[ 245.690087] floppy_open+0x1ea/0x360
[ 245.690653] __blkdev_get+0xb4/0x4d0
[ 245.691212] ? blkdev_get+0x1db/0x370
[ 245.691777] blkdev_get+0x1f3/0x370
[ 245.692351] ? path_put+0x15/0x20
[ 245.692871] ? lookup_bdev+0x4b/0x90
[ 245.693539] blkdev_get_by_path+0x3d/0x80
[ 245.694165] mount_bdev+0x2a/0x190
[ 245.694695] squashfs_mount+0x10/0x20
[ 245.695271] ? squashfs_alloc_inode+0x30/0x30
[ 245.695960] mount_fs+0xf/0x90
[ 245.696451] vfs_kern_mount+0x43/0x130
[ 245.697036] do_mount+0x187/0xc40
[ 245.697563] ? memdup_user+0x28/0x50
[ 245.698124] ksys_mount+0x60/0xc0
[ 245.698639] sys_mount+0x19/0x20
[ 245.699167] do_int80_syscall_32+0x61/0x130
[ 245.699813] entry_INT80_32+0xc7/0xc7
showing that we never complete that read request. The reason is that
the completion setup is racy - it initializes the completion event
AFTER submitting the IO, which means that the IO could complete
before/during the init. If it does, we are passing garbage to
complete() and we may sleep forever waiting for the event to
occur.
Fixes: 7b7b68bba5ef ("floppy: bail out in open() if drive is not responding to block0 read")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit bc811f05d77f47059c197a98b6ad242eb03999cb upstream.
syzbot reports a divide-by-zero off the NBD_SET_BLKSIZE ioctl.
We need proper validation of the input here. Not just if it's
zero, but also if the value is a power-of-2 and in a valid
range. Add that.
Reported-by: syzbot <syzbot+25dbecbec1e62c6b0dd4@syzkaller.appspotmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 65eea8edc315589d6c993cf12dbb5d0e9ef1fe4e upstream.
The final field of a floppy_struct is the field "name", which is a pointer
to a string in kernel memory. The kernel pointer should not be copied to
user memory. The FDGETPRM ioctl copies a floppy_struct to user memory,
including this "name" field. This pointer cannot be used by the user
and it will leak a kernel address to user-space, which will reveal the
location of kernel code and data and undermine KASLR protection.
Model this code after the compat ioctl which copies the returned data
to a previously cleared temporary structure on the stack (excluding the
name pointer) and copy out to userspace from there. As we already have
an inparam union with an appropriate member and that memory is already
cleared even for read only calls make use of that as a temporary store.
Based on an initial patch by Brian Belleville.
CVE-2018-7755
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Broke up long line.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit e573427a440fd67d3f522357d7ac901d59281948 upstream.
This feature bit restricts older clients from performing certain
maintenance operations against an image (e.g. clone, snap create).
krbd does not perform maintenance operations.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 5a0ec388ef0f6e33841aeb810d7fa23f049ec4cd upstream.
Commit 523e1d399ce0 ("block: make gendisk hold a reference to its queue")
modified add_disk() and disk_release() but did not update any of the
error paths that trigger a put_disk() call after disk->queue has been
assigned. That introduced the following behavior in the pktcdvd driver
if pkt_new_dev() fails:
Kernel BUG at 00000000e98fd882 [verbose debug info unavailable]
Since disk_release() calls blk_put_queue() anyway if disk->queue != NULL,
fix this by removing the blk_cleanup_queue() call from the pkt_setup_dev()
error path.
Fixes: commit 523e1d399ce0 ("block: make gendisk hold a reference to its queue")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 1e37f2f84680fa7f8394fd444b6928e334495ccc upstream.
rbd_img_obj_exists_submit() and rbd_img_obj_parent_read_full() are on
the writeback path for cloned images -- we attempt a stat on the parent
object to see if it exists and potentially read it in to call copyup.
GFP_NOIO should be used instead of GFP_KERNEL here.
Link: http://tracker.ceph.com/issues/22014
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 5fbd545cd3fd311ea1d6e8be4cedddd0ee5684c7 upstream.
Ensure that the members of struct skd_msg_buf have been transferred
to the PCIe adapter before the doorbell is triggered. This patch
avoids that I/O fails sporadically and that the following error
message is reported:
(skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 7277cc67b3916eed47558c64f9c9c0de00a35cda upstream.
Since put_disk() triggers a disk_release() call and since that
last function calls blk_put_queue() if disk->queue != NULL, clear
the disk->queue pointer before calling put_disk(). This avoids
that unloading the skd kernel module triggers the following
use-after-free:
WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
refcount_t: underflow; use-after-free.
CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1
Workqueue: events work_for_cpu_fn
Call Trace:
dump_stack+0x63/0x84
__warn+0xcb/0xf0
warn_slowpath_fmt+0x5a/0x80
refcount_sub_and_test+0x70/0x80
refcount_dec_and_test+0x11/0x20
kobject_put+0x1f/0x50
blk_put_queue+0x15/0x20
disk_release+0xae/0xf0
device_release+0x32/0x90
kobject_release+0x67/0x170
kobject_put+0x2b/0x50
put_disk+0x17/0x20
skd_destruct+0x5c/0x890 [skd]
skd_pci_probe+0x124d/0x13a0 [skd]
local_pci_probe+0x42/0xa0
work_for_cpu_fn+0x14/0x20
process_one_work+0x19e/0x470
worker_thread+0x1dc/0x4a0
kthread+0x125/0x140
ret_from_fork+0x25/0x30
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 2630628b2dbc3fc320aafaf84836119e4e3d62f1 upstream.
Apparently we now implicitly get definitions for BITS_PER_PAGE and
BITS_PER_PAGE_MASK from the pid_namespace.h
Instead of renaming our defines, I chose to define only if not yet
defined, but to double check the value if already defined.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 946e87981942552e526aca9cb6204f02a6c847cb upstream.
The verbose module parameter can be set to 2 for extremely verbose
messages so the type should be int instead of bool.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tim Waugh <tim@cyberelk.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 089bc0143f489bd3a4578bdff5f4ca68fb26f341 upstream.
Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other backends do.
Build on the fact that all response structure flavors are actually
identical (the old code did make this assumption too).
This is XSA-216.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit d72e9a7a93e4f8e9e52491921d99e0c8aa89eb4e upstream.
The copy_page is optimized memcpy for page-alinged address. If it is
used with non-page aligned address, it can corrupt memory which means
system corruption. With zram, it can happen with
1. 64K architecture
2. partial IO
3. slub debug
Partial IO need to allocate a page and zram allocates it via kmalloc.
With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
address. And finally, copy_page(mem, cmem) corrupts memory.
So, this patch changes it to memcpy.
Actuaully, we don't need to change zram_bvec_write part because zsmalloc
returns page-aligned address in case of PAGE_SIZE class but it's not
good to rely on the internal of zsmalloc.
Note:
When this patch is merged to stable, clear_page should be fixed, too.
Unfortunately, recent zram removes it by "same page merge" feature so
it's hard to backport this patch to -stable tree.
I will handle it when I receive the mail from stable tree maintainer to
merge this patch to backport.
Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()")
Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16, replacing clear_page() as advised]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit d61b7f972dab2a7d187c38254845546dfc8eed85 upstream.
A user noticed that write performance was horrible over loopback and we
traced it to an inversion of when we need to set MSG_MORE. It should be
set when we have more bvec's to send, not when we are on the last bvec.
This patch made the test go from 20 iops to 78k iops.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Fixes: 429a787be679 ("nbd: fix use-after-free of rq/bio in the xmit path")
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 429a787be6793554ee02aacc7e1f11ebcecc4453 upstream.
For writes, we can get a completion in while we're still iterating
the request and bio chain. If that happens, we're reading freed
memory and we can crash.
Break out after the last segment and avoid having the iterator
read freed memory.
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit d48756228ee9161ac8836b346589a43fabdc9f3c upstream.
The nvme_remove function tears down all allocated resources in the correct
order, so no need to free queues on error during initialization. This
fixes possible use-after-free errors when queues are still associated
with a blk-mq hctx.
Reported-by: Scott Bauer <scott.bauer@intel.com>
Tested-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.16:
- Adjust filename, context
- Only nvme_setup_io_queues() needs to be fixed]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 811c6688774613a78bfa020f64b570b73f6974c8 upstream.
A while ago, commit 9875201e1049 ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().
A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup(). Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
Call Trace:
[<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
[<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
[<ffffffff8109d5db>] process_one_work+0x17b/0x470
[<ffffffff8109e3ab>] worker_thread+0x11b/0x400
[<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
[<ffffffff810a5acf>] kthread+0xcf/0xe0
[<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
[<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
[<ffffffff81645dd8>] ret_from_fork+0x58/0x90
[<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
RIP [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]
If an error occurs during rbd map, we have to error out, potentially
tearing down a watch. Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:
Assertion failure in rbd_dev_header_info() at line 4722:
rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
Call Trace:
[<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
[<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
[<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
[<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
[<ffffffff81107799>] process_one_work+0x689/0x1a80
[<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
[<ffffffff81132065>] ? finish_task_switch+0x225/0x640
[<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
[<ffffffff81108c69>] worker_thread+0xd9/0x1320
[<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
[<ffffffff8111b02d>] kthread+0x21d/0x2e0
[<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
[<ffffffff82022802>] ret_from_fork+0x22/0x40
[<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
RIP [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30
To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section. The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.
Fixes: http://tracker.ceph.com/issues/15490
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit dec63a4dec2d6d01346fd5d96062e67c0636852b upstream.
gcc-6.0 found an ancient bug in the paride driver, which had a
"module_param(verbose, bool, 0);" since before 2.6.12, but actually uses
it to accept '0', '1' or '2' as arguments:
drivers/block/paride/pd.c: In function 'pd_init_dev_parms':
drivers/block/paride/pd.c:298:29: warning: comparison of constant '1' with boolean expression is always false [-Wbool-compare]
#define DBMSG(msg) ((verbose>1)?(msg):NULL)
In 2012, Rusty did a cleanup patch that also changed the type of the
variable to 'bool', which introduced what is now a gcc warning.
This changes the type back to 'int' and adapts the module_param() line
instead, so it should work as documented in case anyone ever cares about
running the ancient driver with debugging.
Fixes: 90ab5ee94171 ("module_param: make bool parameters really bool (drivers & misc)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Rusty Russell <rusty@rustcorp.com.au>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit d8a18d2d8f5de55666c6011ed175939d22c8e3d8 upstream.
Prevent standby immediate command from being issued in remove,
suspend and shutdown paths, while drive is in FTL rebuild process.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit e35b94738a2f7caa12017f69ef385cb6b8028965 upstream.
Remove setting and clearing MTIP_PF_EH_ACTIVE_BIT flag in
mtip_handle_tfe() as they are redundant. Also avoid waking
up service thread from mtip_handle_tfe() because it is
already woken up in case of taskfile error.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit cfc05bd31384c4898bf2437a4de5557f3cf9803a upstream.
Service thread does not detect the need for taskfile error hanlding. Fixed the
flag condition to process taskfile error.
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit da6ccaaa79caca4f38b540b651238f87215217a2 upstream.
Make the "Attempted send on closed socket" error messages generated in
nbd_request_handler() ratelimited.
When the nbd socket is shutdown, the nbd_request_handler() function emits
an error message for every request remaining in its queue. If the queue
is large, this will spam a large amount of messages to the log. There's
no need for a separate error message for each request, so this patch
ratelimits it.
In the specific case this was found, the system was virtual and the error
messages were logged to the serial port, which overwhelmed it.
Fixes: 4d48a542b427 ("nbd: fix I/O hang on disconnected nbds")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit d913897abace843bba20249f3190167f7895e9c3 upstream.
When we're using LZ4 multi compression streams for zram swap, we found
out page allocation failure message in system running test. That was
not only once, but a few(2 - 5 times per test). Also, some failure
cases were continually occurring to try allocation order 3.
In order to make parallel compression private data, we should call
kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
2/3 size memory to allocate in that time, page allocation fails. This
patch makes to use vmalloc() as fallback of kmalloc(), this prevents
page alloc failure warning.
After using this, we never found warning message in running test, also
It could reduce process startup latency about 60-120ms in each case.
For reference a call trace :
Binder_1: page allocation failure: order:3, mode:0x10c0d0
CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
Call trace:
dump_backtrace+0x0/0x270
show_stack+0x10/0x1c
dump_stack+0x1c/0x28
warn_alloc_failed+0xfc/0x11c
__alloc_pages_nodemask+0x724/0x7f0
__get_free_pages+0x14/0x5c
kmalloc_order_trace+0x38/0xd8
zcomp_lz4_create+0x2c/0x38
zcomp_strm_alloc+0x34/0x78
zcomp_strm_multi_find+0x124/0x1ec
zcomp_strm_find+0xc/0x18
zram_bvec_rw+0x2fc/0x780
zram_make_request+0x25c/0x2d4
generic_make_request+0x80/0xbc
submit_bio+0xa4/0x15c
__swap_writepage+0x218/0x230
swap_writepage+0x3c/0x4c
shrink_page_list+0x51c/0x8d0
shrink_inactive_list+0x3f8/0x60c
shrink_lruvec+0x33c/0x4cc
shrink_zone+0x3c/0x100
try_to_free_pages+0x2b8/0x54c
__alloc_pages_nodemask+0x514/0x7f0
__get_free_pages+0x14/0x5c
proc_info_read+0x50/0xe4
vfs_read+0xa0/0x12c
SyS_read+0x44/0x74
DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
[minchan@kernel.org: change vmalloc gfp and adding comment about gfp]
[sergey.senozhatsky@gmail.com: tweak comments and styles]
Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 3d5fe03a3ea013060ebba2a811aeb0f23f56aefa upstream.
We can end up allocating a new compression stream with GFP_KERNEL from
within the IO path, which may result is nested (recursive) IO
operations. That can introduce problems if the IO path in question is a
reclaimer, holding some locks that will deadlock nested IOs.
Allocate streams and working memory using GFP_NOIO flag, forbidding
recursive IO and FS operations.
An example:
inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
(jbd2_handle){+.+.?.}, at: start_this_handle+0x4ca/0x555
{IN-RECLAIM_FS-W} state was registered at:
__lock_acquire+0x8da/0x117b
lock_acquire+0x10c/0x1a7
start_this_handle+0x52d/0x555
jbd2__journal_start+0xb4/0x237
__ext4_journal_start_sb+0x108/0x17e
ext4_dirty_inode+0x32/0x61
__mark_inode_dirty+0x16b/0x60c
iput+0x11e/0x274
__dentry_kill+0x148/0x1b8
shrink_dentry_list+0x274/0x44a
prune_dcache_sb+0x4a/0x55
super_cache_scan+0xfc/0x176
shrink_slab.part.14.constprop.25+0x2a2/0x4d3
shrink_zone+0x74/0x140
kswapd+0x6b7/0x930
kthread+0x107/0x10f
ret_from_fork+0x3f/0x70
irq event stamp: 138297
hardirqs last enabled at (138297): debug_check_no_locks_freed+0x113/0x12f
hardirqs last disabled at (138296): debug_check_no_locks_freed+0x33/0x12f
softirqs last enabled at (137818): __do_softirq+0x2d3/0x3e9
softirqs last disabled at (137813): irq_exit+0x41/0x95
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(jbd2_handle);
<Interrupt>
lock(jbd2_handle);
*** DEADLOCK ***
5 locks held by git/20158:
#0: (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
#1: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
#2: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
#3: (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
#4: (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
stack backtrace:
CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
Call Trace:
dump_stack+0x4c/0x6e
mark_lock+0x384/0x56d
mark_held_locks+0x5f/0x76
lockdep_trace_alloc+0xb2/0xb5
kmem_cache_alloc_trace+0x32/0x1e2
zcomp_strm_alloc+0x25/0x73 [zram]
zcomp_strm_multi_find+0xe7/0x173 [zram]
zcomp_strm_find+0xc/0xe [zram]
zram_bvec_rw+0x2ca/0x7e0 [zram]
zram_make_request+0x1fa/0x301 [zram]
generic_make_request+0x9c/0xdb
submit_bio+0xf7/0x120
ext4_io_submit+0x2e/0x43
ext4_bio_write_page+0x1b7/0x300
mpage_submit_page+0x60/0x77
mpage_map_and_submit_buffers+0x10f/0x21d
ext4_writepages+0xc8c/0xe1b
do_writepages+0x23/0x2c
__filemap_fdatawrite_range+0x84/0x8b
filemap_flush+0x1c/0x1e
ext4_alloc_da_blocks+0xb8/0x117
ext4_rename+0x132/0x6dc
? mark_held_locks+0x5f/0x76
ext4_rename2+0x29/0x2b
vfs_rename+0x540/0x636
SyS_renameat2+0x359/0x44d
SyS_rename+0x1e/0x20
entry_SYSCALL_64_fastpath+0x12/0x6f
[minchan@kernel.org: add stable mark]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 18779149101c0dd43ded43669ae2a92d21b6f9cb upstream.
Since indirect descriptors are in memory shared with the frontend, the
frontend could alter the first_sect and last_sect values after they have
been validated but before they are recorded in the request. This may
result in I/O requests that overflow the foreign page, possibly
overwriting local pages when the I/O request is executed.
When parsing indirect descriptors, only read first_sect and last_sect
once.
This is part of XSA155.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[ luis: backported to 3.16:
- Use ACCESS_ONCE instead of READ_ONCE
- Use PAGE_SIZE instead of XEN_PAGE_SIZE ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 1f13d75ccb806260079e0679d55d9253e370ec8a upstream.
A compiler may load a switch statement value multiple times, which could
be bad when the value is in memory shared with the frontend.
When converting a non-native request to a native one, ensure that
src->operation is only loaded once by using READ_ONCE().
This is part of XSA155.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[ luis: backported to 3.16:
- replaced READ_ONCE() by ACCESS_ONCE() ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit bae818ee1577c27356093901a0ea48f672eda514 upstream.
rbd requires stable pages, as it performs a crc of the page data before
they are send to the OSDs.
But since kernel 3.9 (patch 1d1d1a767206fbe5d4c69493b7e6d2a8d08cc0a0
"mm: only enforce stable page writes if the backing device requires
it") it is not assumed anymore that block devices require stable pages.
This patch sets the necessary flag to get stable pages back for rbd.
In a ceph installation that provides multiple ext4 formatted rbd
devices "bad crc" messages appeared regularly (ca 1 message every 1-2
minutes on every OSD that provided the data for the rbd) in the
OSD-logs before this patch. After this patch this messages are pretty
much gone (only ca 1-2 / month / OSD).
Signed-off-by: Ronny Hegewald <Ronny.Hegewald@online.de>
[idryomov@gmail.com: require stable pages only in crc case, changelog]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
[idryomov@gmail.com: backport to 3.9-3.17: context]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 6d69bb536bac0d403d83db1ca841444981b280cd upstream.
Mapping an image with a long parent chain (e.g. image foo, whose parent
is bar, whose parent is baz, etc) currently leads to a kernel stack
overflow, due to the following recursion in the reply path:
rbd_osd_req_callback()
rbd_obj_request_complete()
rbd_img_obj_callback()
rbd_img_parent_read_callback()
rbd_obj_request_complete()
...
Limit the parent chain to 16 images, which is ~5K worth of stack. When
the above recursion is eliminated, this limit can be lifted.
Fixes: http://tracker.ceph.com/issues/12538
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
[idryomov@gmail.com: backport to 3.14: rbd_dev->opts, context]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 1f2c6651f69c14d0d3a9cfbda44ea101b02160ba upstream.
Currently we leak parent_spec and trigger a "parent reference
underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails.
The problem is we take the !parent out_err branch and that only drops
refcounts; parent_spec that would've been freed had we called
rbd_dev_unparent() remains and triggers rbd_warn() in
rbd_dev_parent_put() - at that point we have parent_spec != NULL and
parent_ref == 0, so counter ends up being -1 after the decrement.
Redo rbd_dev_probe_parent() to fix this.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
[idryomov@gmail.com: backport to < 4.2: rbd_dev->opts]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit a54c8f0f2d7df525ff997e2afe71866a1a013064 upstream.
xen-blkfront will crash if the check to talk_to_blkback()
in blkback_changed()(XenbusStateInitWait) returns an error.
The driver data is freed and info is set to NULL. Later during
the close process via talk_to_blkback's call to xenbus_dev_fatal()
the null pointer is passed to and dereference in blkfront_closing.
Signed-off-by: Cathy Avery <cathy.avery@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 3ebe138ac642a195c7f2efdb918f464734421fd6 upstream.
If rbd_dev_image_probe() in rbd_dev_probe_parent() fails, header_name
is freed twice: once in rbd_dev_probe_parent() and then in its caller
rbd_dev_image_probe() (rbd_dev_image_probe() is called recursively to
handle parent images).
rbd_dev_probe_parent() is responsible for probing the parent, so it
shouldn't muck with clone's fields.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 3aaf14da807a4e9931a37f21e4251abb8a67021b upstream.
zcomp_create() verifies the success of zcomp_strm_{multi,single}_create()
through comp->stream, which can potentially be pointing to memory that
was freed if these functions returned an error.
While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic
'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create()
could return other error codes. Function documentation updated
accordingly.
Fixes: beca3ec71fe5 ("zram: add multi stream functionality")
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 53bc7dc004fecf39e0ba70f2f8d120a1444315d3 upstream.
The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge
work haven't finished.
There is a work_pending() before this BUG_ON, but it doesn't account if the work
is still currently running.
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 7b0767502b5db11cb1f0daef2d01f6d71b1192dc upstream.
We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.
When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 2761713d35e370fd640b5781109f753066b746c4 upstream.
For write/discard obj_requests that involved a copyup method call, the
opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
rbd_img_obj_copyup_callback(). The latter frees copyup pages, sets
->xferred and delegates to rbd_img_obj_callback(), the "normal" image
object callback, for reporting to block layer and putting refs.
rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
which means obj_request is marked done in rbd_osd_trivial_callback(),
*before* ->callback is invoked and rbd_img_obj_copyup_callback() has
a chance to run. Marking obj_request done essentially means giving
rbd_img_obj_callback() a license to end it at any moment, so if another
obj_request from the same img_request is being completed concurrently,
rbd_img_obj_end_request() may very well be called on such prematurally
marked done request:
<obj_request-1/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
rbd_obj_request_complete()
rbd_img_obj_copyup_callback()
rbd_img_obj_callback()
<obj_request-2/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
for_each_obj_request(obj_request->img_request) {
rbd_img_obj_end_request(obj_request-1/2)
rbd_img_obj_end_request(obj_request-2/2) <--
}
Calling rbd_img_obj_end_request() on such a request leads to trouble,
in particular because its ->xfferred is 0. We report 0 to the block
layer with blk_update_request(), get back 1 for "this request has more
data in flight" and then trip on
rbd_assert(more ^ (which == img_request->obj_request_count));
with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
been called for both requests and lhs (more) being 1 because we haven't
got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.
To fix this, leverage that rbd wants to call class methods in only two
cases: one is a generic method call wrapper (obj_request is standalone)
and the other is a copyup (obj_request is part of an img_request). So
make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
rbd_img_obj_copyup_callback() from it if obj_request is part of an
img_request, similar to how CEPH_OSD_OP_READ handler invokes
rbd_img_obj_request_read_callback().
Since rbd_img_obj_copyup_callback() is now being called from the OSD
request callback (only), it is renamed to rbd_osd_copyup_callback().
Cc: Alex Elder <elder@linaro.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
[idryomov@gmail.com: backport to < 3.18: context]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 5a60e87603c4c533492c515b7f62578189b03c9c upstream.
rbd_obj_request_create() is called on the main I/O path, so we need to
use GFP_NOIO to make sure allocation doesn't blow back on us. Not all
callers need this, but I'm still hardcoding the flag inside rather than
making it a parameter because a) this is going to stable, and b) those
callers shouldn't really use rbd_obj_request_create() and will be fixed
in the future.
More memory allocation fixes will follow.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 082a75dad84d79d1c15ea9e50f31cb4bb4fa7fd6 upstream.
When we end I/O struct request with error, we need to pass
obj_request->length as @nr_bytes so that the entire obj_request worth
of bytes is completed. Otherwise block layer ends up confused and we
trip on
rbd_assert(more ^ (which == img_request->obj_request_count));
in rbd_img_obj_callback() due to more being true no matter what. We
already do it in most cases but we are missing some, in particular
those where we don't even get a chance to submit any obj_requests, due
to an early -ENOMEM for example.
A number of obj_request->xferred assignments seem to be redundant but
I haven't touched any of obj_request->xferred stuff to keep this small
and isolated.
Cc: Alex Elder <elder@linaro.org>
Reported-by: Shawn Edwards <lesser.evil@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit ff6b8090e26ef7649ef0cc6b42389141ef48b0cf upstream.
we have already allocated memory for nbd_dev, but we were not
releasing that memory and just returning the error value.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Acked-by: Paul Clements <Paul.Clements@SteelEye.com>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit e69b8d414f948c242ad9f3eb2b7e24fba783dbbd upstream.
This effectively reverts the last hunk of 392a9dad7e77 ("rbd: detect
when clone image is flattened").
The problem with parent_overlap != 0 condition is that it's possible
and completely valid to have an image with parent_overlap == 0 whose
parent state needs to be cleaned up on unmap. The next commit, which
drops the "clone image now standalone" logic, opens up another window
of opportunity to hit this, but even without it
# cat parent-ref.sh
#!/bin/bash
rbd create --image-format 2 --size 1 foo
rbd snap create foo@snap
rbd snap protect foo@snap
rbd clone foo@snap bar
rbd resize --allow-shrink --size 0 bar
rbd resize --size 1 bar
DEV=$(rbd map bar)
rbd unmap $DEV
leaves rbd_device/rbd_spec/etc and rbd_client along with ceph_client
hanging around.
My thinking behind calling rbd_dev_parent_put() unconditionally is that
there shouldn't be any requests in flight at that point in time as we
are deep into unmap sequence. Hence, even if rbd_dev_unparent() caused
by flatten is delayed by in-flight requests, it will have finished by
the time we reach rbd_dev_unprobe() caused by unmap, thus turning
unconditional rbd_dev_parent_put() into a no-op.
Fixes: http://tracker.ceph.com/issues/10352
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit ae43e9d05eb4bd324155292f889fbd001c4faea8 upstream.
The comment for rbd_dev_parent_get() said
* We must get the reference before checking for the overlap to
* coordinate properly with zeroing the parent overlap in
* rbd_dev_v2_parent_info() when an image gets flattened. We
* drop it again if there is no overlap.
but the "drop it again if there is no overlap" part was missing from
the implementation. This lead to absurd parent_ref values for images
with parent_overlap == 0, as parent_ref was incremented for each
img_request and virtually never decremented.
Fix this by leveraging the fact that refresh path calls
rbd_dev_v2_parent_info() under header_rwsem and use it for read in
rbd_dev_parent_get(), instead of messing around with atomics. Get rid
of barriers in rbd_dev_v2_parent_info() while at it - I don't see what
they'd pair with now and I suspect we are in a pretty miserable
situation as far as proper locking goes regardless.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 3b9d35d744bb5139f9fed57f38c019bb8c7d351c upstream.
This was not noticed for many years. Affects operation if
md raid is used a backing device for DRBD.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 85b0c6e62c48bb9179fd5b3e954f362fb346cbd5 upstream.
The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a
VTOC label, otherwise it will fail. In particular, it will return error
48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already
handled by directly reading the disk in block/partitions/sun.c (enabled by
CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is
unused in the driver, remove the call and the field.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit d0aedcd4f14a22e23b313f42b7e6e6ebfc0fbc31 upstream.
vio_dring_avail() will allow use of every dring entry, but when the last
entry is allocated then dr->prod == dr->cons which is indistinguishable from
the ring empty condition. This causes the next allocation to reuse an entry.
When this happens in sunvdc, the server side vds driver begins nack'ing the
messages and ends up resetting the ldc channel. This problem does not effect
sunvnet since it checks for < 2.
The fix here is to just never allocate the very last dring slot so that full
and empty are not the same condition. The request start path was changed to
check for the ring being full a bit earlier, and to stop the blk_queue if
there is no space left. The blk_queue will be restarted once the ring is
only half full again. The number of ring entries was increased to 512 which
matches the sunvnet and Solaris vdc drivers, and greatly reduces the
frequency of hitting the ring full condition and the associated blk_queue
stop/starting. The checks in sunvent were adjusted to account for
vio_dring_avail() returning 1 less.
Orabug: 19441666
OraBZ: 14983
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 5eed69ffd248c9f68f56c710caf07db134aef28b upstream.
ldc_map_sg() could fail its check that the number of pages referred to
by the sg scatterlist was <= the number of cookies.
This fixes the issue by doing a similar thing to the xen-blkfront driver,
ensuring that the scatterlist will only ever contain a segment count <=
port->ring_cookies, and each segment will be page aligned, and <= page
size. This ensures that the scatterlist is always mappable.
Orabug: 19347817
OraBZ: 15945
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit de5b73f08468b4fc5e2f6d1505f650262622f78b upstream.
The LDom diskserver doesn't return reliable geometry data. In addition,
the types for all fields in the vio_disk_geom are u16, which were being
truncated in the cast into the u8's of the Linux struct hd_geometry.
Modify vdc_getgeo() to compute the geometry from the disk's capacity in a
manner consistent with xen-blkfront::blkif_getgeo().
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 9bce21828d54a95143f1b74619705c2dd8e88b92 upstream.
Interpret the media type from v1.1 protocol to support CDROM/DVD.
For v1.0 protocol, a disk's size continues to be calculated from the
geometry returned by the vdisk server. The geometry returned by the server
can be less than the actual number of sectors available in the backing
image/device due to the rounding in the division used to compute the
geometry in the vdisk server.
In v1.1 protocol a disk's actual size in sectors is returned during the
handshake. Use this size when v1.1 protocol is negotiated. Since this size
will always be larger than the former geometry computed size, disks created
under v1.0 will be forwards compatible to v1.1, but not vice versa.
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit c406515239376fc93a30d5d03192182160cbd3fb upstream.
zram could kunmap_atomic() a NULL pointer in a rare situation: a zram
page becomes a full-zeroed page after a partial write io. The current
code doesn't handle this case and performs kunmap_atomic() on a NULL
pointer, which panics the kernel.
This patch fixes this issue.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang.kh@gmail.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 792c3a914910bd34302c5345578f85cfcb5e2c01 upstream.
Need to use WQ_MEM_RECLAIM for our workqueues to prevent I/O lockups
under memory pressure - we sit on the memory reclaim path.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Tested-by: Micha Krause <micha@krausam.de>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 255939e783d8f45f8c58487dfc18957c44ea9871 upstream.
Fix to return -ENOMEM from the workqueue alloc error handling
case instead of 0, as done elsewhere in this function.
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|
|
commit 58d1362b50dc87ebf18cd137e7a879fd99b7e721 upstream.
drivers/block/rbd.c: In function ‘rbd_dev_device_setup’:
drivers/block/rbd.c:5090:19: warning: format not a string literal and no format arguments [-Wformat-security]
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|