Age | Commit message (Collapse) | Author | Files | Lines |
|
Makoto report a below KASAN error: zram does out-of-bounds read. Because
strscpy copies from source up to count bytes unconditionally. It could
cause out-of-bounds read on next object in slab.
To prevent it, use strlcpy which checks source's length automatically.
BUG: KASAN: slab-out-of-bounds in strscpy+0x68/0x154
Read of size 8 at addr ffffffc0c3495a00 by task system_server/1314
..
Call trace:
strscpy+0x68/0x154
idle_store+0xc4/0x34c
dev_attr_store+0x50/0x6c
sysfs_kf_write+0x98/0xb4
kernfs_fop_write+0x198/0x260
__vfs_write+0x10c/0x338
vfs_write+0x114/0x238
SyS_write+0xc8/0x168
__sys_trace_return+0x0/0x4
Allocated by task 1314:
__kmalloc+0x280/0x318
kernfs_fop_write+0xac/0x260
__vfs_write+0x10c/0x338
vfs_write+0x114/0x238
SyS_write+0xc8/0x168
__sys_trace_return+0x0/0x4
Freed by task 2855:
kfree+0x138/0x630
kernfs_put_open_node+0x10c/0x124
kernfs_fop_release+0xd8/0x114
__fput+0x130/0x2a4
____fput+0x1c/0x28
task_work_run+0x16c/0x1c8
do_notify_resume+0x2bc/0x107c
work_pending+0x8/0x10
The buggy address belongs to the object at ffffffc0c3495a00
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 0 bytes inside of
128-byte region [ffffffc0c3495a00, ffffffc0c3495a80)
The buggy address belongs to the page:
page:ffffffbf030d2500 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000010200(slab|head)
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffffffc0c3495900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffc0c3495980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffffc0c3495a00: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffffffc0c3495a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffffc0c3495b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Link: http://lkml.kernel.org/r/20190319231911.145968-1-minchan@kernel.org
Cc: <stable@vger.kernel.org> [5.0]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Makoto Wu <makotowu@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull block fixes from Jens Axboe:
"A set of fixes/changes that should go into this series. This contains:
- Kernel doc / comment updates (Bart, Shenghui)
- Un-export of core-only used function (Bart)
- Fix race on loop file access (Dongli)
- pf/pcd queue cleanup fixes (me)
- Use appropriate helper for RESTART bit set (Yufen)
- Use named identifier for classic poll (Yufen)"
* tag 'for-linus-20190323' of git://git.kernel.dk/linux-block:
sbitmap: trivial - update comment for sbitmap_deferred_clear_bit
blkcg: Fix kernel-doc warnings
blk-iolatency: #include "blk.h"
block: Unexport blk_mq_add_to_requeue_list()
block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value
blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx
loop: access lo_backing_file only when the loop device is Lo_bound
blk-mq: use blk_mq_sched_mark_restart_hctx to set RESTART
paride/pcd: cleanup queues when detection fails
paride/pf: cleanup queues when detection fails
|
|
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
Now that we have alloc_size that controls our discard behavior, it
doesn't make sense to have these set to object (set) size. alloc_size
defaults to 64k, but because discard_granularity is likely 4M, only
ranges that are equal to or bigger than 4M can be considered during
fstrim. A smaller io_min is also more likely to be met, resulting in
fewer deferred writes on bluestore OSDs.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
Commit 758a58d0bc67 ("loop: set GENHD_FL_NO_PART_SCAN after
blkdev_reread_part()") separates "lo->lo_backing_file = NULL" and
"lo->lo_state = Lo_unbound" into different critical regions protected by
loop_ctl_mutex.
However, there is below race that the NULL lo->lo_backing_file would be
accessed when the backend of a loop is another loop device, e.g., loop0's
backend is a file, while loop1's backend is loop0.
loop0's backend is file loop1's backend is loop0
__loop_clr_fd()
mutex_lock(&loop_ctl_mutex);
lo->lo_backing_file = NULL; --> set to NULL
mutex_unlock(&loop_ctl_mutex);
loop_set_fd()
mutex_lock_killable(&loop_ctl_mutex);
loop_validate_file()
f = l->lo_backing_file; --> NULL
access if loop0 is not Lo_unbound
mutex_lock(&loop_ctl_mutex);
lo->lo_state = Lo_unbound;
mutex_unlock(&loop_ctl_mutex);
lo->lo_backing_file should be accessed only when the loop device is
Lo_bound.
In fact, the problem has been introduced already in commit 7ccd0791d985
("loop: Push loop_ctl_mutex down into loop_clr_fd()") after which
loop_validate_file() could see devices in Lo_rundown state with which it
did not count. It was harmless at that point but still.
Fixes: 7ccd0791d985 ("loop: Push loop_ctl_mutex down into loop_clr_fd()")
Reported-by: syzbot+9bdc1adc1c55e7fe765b@syzkaller.appspotmail.com
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The driver allocates queues for all the units it potentially
supports. But if we fail to detect any drives, then we fail
loading the module without cleaning up those queues. This is
now evident with the switch to blk-mq, though the bug has
been there forever as far as I can tell.
Also fix cleanup through regular module exit.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The driver allocates queues for all the units it potentially
supports. But if we fail to detect any drives, then we fail
loading the module without cleaning up those queues. This is
now evident with the switch to blk-mq, though the bug has
been there forever as far as I can tell.
Also fix cleanup through regular module exit.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull more block layer changes from Jens Axboe:
"This is a collection of both stragglers, and fixes that came in after
I finalized the initial pull. This contains:
- An MD pull request from Song, with a few minor fixes
- Set of NVMe patches via Christoph
- Pull request from Konrad, with a few fixes for xen/blkback
- pblk fix IO calculation fix (Javier)
- Segment calculation fix for pass-through (Ming)
- Fallthrough annotation for blkcg (Mathieu)"
* tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block: (25 commits)
blkcg: annotate implicit fall through
nvme-tcp: support C2HData with SUCCESS flag
nvmet: ignore EOPNOTSUPP for discard
nvme: add proper write zeroes setup for the multipath device
nvme: add proper discard setup for the multipath device
nvme: remove nvme_ns_config_oncs
nvme: disable Write Zeroes for qemu controllers
nvmet-fc: bring Disconnect into compliance with FC-NVME spec
nvmet-fc: fix issues with targetport assoc_list list walking
nvme-fc: reject reconnect if io queue count is reduced to zero
nvme-fc: fix numa_node when dev is null
nvme-fc: use nr_phys_segments to determine existence of sgl
nvme-loop: init nvmet_ctrl fatal_err_work when allocate
nvme: update comment to make the code easier to read
nvme: put ns_head ref if namespace fails allocation
nvme-trace: fix cdw10 buffer overrun
nvme: don't warn on block content change effects
nvme: add get-feature to admin cmds tracer
md: Fix failed allocation of md_register_thread
It's wrong to add len to sector_nr in raid10 reshape twice
...
|
|
lzo-rle gives higher performance and similar compression ratios to lzo.
Link: http://lkml.kernel.org/r/20190205155944.16007-4-dave.rodgman@arm.com
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull ceph updates from Ilya Dryomov:
"The highlights are:
- rbd will now ignore discards that aren't aligned and big enough to
actually free up some space (myself). This is controlled by the new
alloc_size map option and can be disabled if needed.
- support for rbd deep-flatten feature (myself). Deep-flatten allows
"rbd flatten" to fully disconnect the clone image and its snapshots
from the parent and make the parent snapshot removable.
- a new round of cap handling improvements (Zheng Yan). The kernel
client should now be much more prompt about releasing its caps and
it is possible to put a limit on the number of caps held.
- support for getting ceph.dir.pin extended attribute (Zheng Yan)"
* tag 'ceph-for-5.1-rc1' of git://github.com/ceph/ceph-client: (26 commits)
Documentation: modern versions of ceph are not backed by btrfs
rbd: advertise support for RBD_FEATURE_DEEP_FLATTEN
rbd: whole-object write and zeroout should copyup when snapshots exist
rbd: copyup with an empty snapshot context (aka deep-copyup)
rbd: introduce rbd_obj_issue_copyup_ops()
rbd: stop copying num_osd_ops in rbd_obj_issue_copyup()
rbd: factor out __rbd_osd_req_create()
rbd: clear ->xferred on error from rbd_obj_issue_copyup()
rbd: remove experimental designation from kernel layering
ceph: add mount option to limit caps count
ceph: periodically trim stale dentries
ceph: delete stale dentry when last reference is dropped
ceph: remove dentry_lru file from debugfs
ceph: touch existing cap when handling reply
ceph: pass inclusive lend parameter to filemap_write_and_wait_range()
rbd: round off and ignore discards that are too small
rbd: handle DISCARD and WRITE_ZEROES separately
rbd: get rid of obj_req->obj_request_count
libceph: use struct_size() for kmalloc() in crush_decode()
ceph: send cap releases more aggressively
...
|
|
Pull virtio updates from Michael Tsirkin:
"Several fixes, most notably fix for virtio on swiotlb systems"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: silence an unused-variable warning
virtio: hint if callbacks surprisingly might sleep
virtio-ccw: wire up ->bus_name callback
s390/virtio: handle find on invalid queue gracefully
virtio-ccw: diag 500 may return a negative cookie
virtio_balloon: remove the unnecessary 0-initialization
virtio-balloon: improve update_balloon_size_func
virtio-blk: Consider virtio_max_dma_size() for maximum segment size
virtio: Introduce virtio_max_dma_size()
dma: Introduce dma_max_mapping_size()
swiotlb: Add is_swiotlb_active() function
swiotlb: Introduce swiotlb_max_mapping_size()
|
|
Pull block layer updates from Jens Axboe:
"Not a huge amount of changes in this round, the biggest one is that we
finally have Mings multi-page bvec support merged. Apart from that,
this pull request contains:
- Small series that avoids quiescing the queue for sysfs changes that
match what we currently have (Aleksei)
- Series of bcache fixes (via Coly)
- Series of lightnvm fixes (via Mathias)
- NVMe pull request from Christoph. Nothing major, just SPDX/license
cleanups, RR mp policy (Hannes), and little fixes (Bart,
Chaitanya).
- BFQ series (Paolo)
- Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
for the fast path (Jianchao)
- fops->iopoll() added for async IO polling, this is a feature that
the upcoming io_uring interface will use (Christoph, me)
- Partition scan loop fixes (Dongli)
- mtip32xx conversion from managed resource API (Christoph)
- cdrom registration race fix (Guenter)
- MD pull from Song, two minor fixes.
- Various documentation fixes (Marcos)
- Multi-page bvec feature. This brings a lot of nice improvements
with it, like more efficient splitting, larger IOs can be supported
without growing the bvec table size, and so on. (Ming)
- Various little fixes to core and drivers"
* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
block: fix updating bio's front segment size
block: Replace function name in string with __func__
nbd: propagate genlmsg_reply return code
floppy: remove set but not used variable 'q'
null_blk: fix checking for REQ_FUA
block: fix NULL pointer dereference in register_disk
fs: fix guard_bio_eod to check for real EOD errors
blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
block: optimize bvec iteration in bvec_iter_advance
block: introduce mp_bvec_for_each_page() for iterating over page
block: optimize blk_bio_segment_split for single-page bvec
block: optimize __blk_segment_map_sg() for single-page bvec
block: introduce bvec_nth_page()
iomap: wire up the iopoll method
block: add bio_set_polled() helper
block: wire up block device iopoll method
fs: add an iopoll method to struct file_operations
loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
loop: do not print warn message if partition scan is successful
block: bounce: make sure that bvec table is updated
...
|
|
To prevent any issues with persistent data, separate lzo-rle from lzo so
that it is treated as a separate algorithm, and lzo is still available.
Link: http://lkml.kernel.org/r/20190205155944.16007-3-dave.rodgman@arm.com
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Cc: Matt Sealey <matt.sealey@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <nitingupta910@gmail.com>
Cc: Richard Purdie <rpurdie@openedhand.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Sonny Rao <sonnyrao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core patchset for 5.1-rc1
More patches than "normal" here this merge window, due to some work in
the driver core by Alexander Duyck to rework the async probe
functionality to work better for a number of devices, and independant
work from Rafael for the device link functionality to make it work
"correctly".
Also in here is:
- lots of BUS_ATTR() removals, the macro is about to go away
- firmware test fixups
- ihex fixups and simplification
- component additions (also includes i915 patches)
- lots of minor coding style fixups and cleanups.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
driver core: platform: remove misleading err_alloc label
platform: set of_node in platform_device_register_full()
firmware: hardcode the debug message for -ENOENT
driver core: Add missing description of new struct device_link field
driver core: Fix PM-runtime for links added during consumer probe
drivers/component: kerneldoc polish
async: Add cmdline option to specify drivers to be async probed
driver core: Fix possible supplier PM-usage counter imbalance
PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
driver: platform: Support parsing GpioInt 0 in platform_get_irq()
selftests: firmware: fix verify_reqs() return value
Revert "selftests: firmware: remove use of non-standard diff -Z option"
Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
device: Fix comment for driver_data in struct device
kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
sysfs: remove unused include of kernfs-internal.h
driver core: Postpone DMA tear-down until after devres release
driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
PM-runtime: Take suppliers into account in __pm_runtime_set_status()
device.h: Add __cold to dev_<level> logging functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-5.1/block-post
Pull two xen blkback fixes from Konrad.
* 'stable/for-jens-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront
xen/blkback: add stack variable 'blkif' in connect_ring()
|
|
Segments can't be larger than the maximum DMA mapping size
supported on the platform. Take that into account when
setting the maximum segment size for a block device.
Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
All these places for replacement were found by running the following
grep patterns on the entire kernel code. Please let me know if this
might have missed some instances. This might also have replaced some
false positives. I will appreciate suggestions, inputs and review.
1. git grep "nid == -1"
2. git grep "node == -1"
3. git grep "nid = -1"
4. git grep "node = -1"
This patch (of 2):
At present there are multiple places where invalid node number is
encoded as -1. Even though implicitly understood it is always better to
have macros in there. Replace these open encodings for an invalid node
number with the global macro NUMA_NO_NODE. This helps remove NUMA
related assumptions like 'invalid node' from various places redirecting
them to a common definition.
Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe]
Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx]
Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband]
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
All copyups perform deep-copyup regardless of whether deep-flatten
feature is enabled. The feature bit is used to ensure that image is
written to only by new-enough clients that always perform deep-copyup.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Otherwise, once the parent snapshot is removed, the clone's snapshot
wouldn't reflect the state of the clone prior to whole-object write or
zeroout because a deep-copyup was never done ("rbd flatten" wouldn't do
it because the modified object would exist in HEAD).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This is the core of deep-flatten feature: sending a copyup request
(i.e. a guarded write of the data read from the parent) with an empty
snapshot context (snaps = [], seq = 0) causes the OSD to reflect the
write in all existing snapshots. This allows "rbd flatten" to fully
disconnect the clone image and its snapshots from the parent and make
the parent snapshot removable.
The actual modification request is sent only after deep-copyup request
is completed. Waiting for deep-copyup reply is unnecessary, this will
be improved in the future.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In preparation for deep-flatten feature, split rbd_obj_issue_copyup()
into two functions and add a new write state to make the state machine
slightly more clear. Make the copyup op optional and start using that
for when the overlap goes to 0.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In preparation for deep-flatten feature, stop copying num_osd_ops from
the original request in rbd_obj_issue_copyup(). Split the calculation
into count_{write,zeroout}_ops() respectively and determine whether the
assert_exists guard is needed with the new rbd_obj_copyup_enabled().
As a nice side effect, we no longer guard in the writefull case as the
copyup'ed object is always fully overwritten.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Allow passing a custom snapshot context: NULL for read and an empty
snapshot context for deep-copyup.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Otherwise the assert in rbd_obj_end_request() is triggered.
Fixes: 3da691bf4366 ("rbd: new request handling code")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Support for kernel layering hasn't been considered experimental for
a few years now. All the issues that I'm aware of were shaken out in
2014 and early 2015. Moreover, most of that code was rewritten with
the addition of support for fancy striping.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
If, after rounding off, the discard request is smaller than alloc_size,
drop it on the floor in __rbd_img_fill_request().
Default alloc_size to 64k. This should cover both HDD and SSD based
bluestore OSDs and somewhat improve things for filestore. For OSDs on
filestore with filestore_punch_hole = false, alloc_size is best set to
object size in order to allow deletes and truncates and disallow zero
op.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
With discard_zeroes_data gone in commit 48920ff2a5a9 ("block: remove
the discard_zeroes_data flag"), continuing to provide this guarantee is
pointless: applications can't query it and discards can only be used
for deallocating.
Add OBJ_OP_ZEROOUT and move the existing logic under it. As the first
step to divorcing OBJ_OP_DISCARD, stop worrying about copyups but keep
special casing whole-object layered discards.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
It is effectively unused.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
genlmsg_reply can fail, so propagate its return code
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/block/floppy.c: In function 'request_done':
drivers/block/floppy.c:2233:24: warning:
variable 'q' set but not used [-Wunused-but-set-variable]
It's never used and can be removed.
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
null_handle_bio() erroneously uses the bio_op macro
which masks respective request flag bits including REQ_FUA
out thus failing the check.
Fix by checking bio->bi_opf directly.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
'ring-page-order' set by malicious blkfront
The xenstore 'ring-page-order' is used globally for each blkback queue and
therefore should be read from xenstore only once. However, it is obtained
in read_per_ring_refs() which might be called multiple times during the
initialization of each blkback queue.
If the blkfront is malicious and the 'ring-page-order' is set in different
value by blkfront every time before blkback reads it, this may end up at
the "WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));" in
xen_blkif_disconnect() when frontend is destroyed.
This patch reworks connect_ring() to read xenstore 'ring-page-order' only
once.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Commit 0da03cab87e6
("loop: Fix deadlock when calling blkdev_reread_part()") moves
blkdev_reread_part() out of the loop_ctl_mutex. However,
GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result,
__blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and
will not rescan the loop device to delete all partitions.
Below are steps to reproduce the issue:
step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100
step2 # losetup -P /dev/loop0 tmp.raw
step3 # parted /dev/loop0 mklabel gpt
step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1
step5 # losetup -d /dev/loop0
Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and
there is below kernel warning message:
[ 464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22)
This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part().
Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Do not print warn message when the partition scan returns 0.
Fixes: d57f3374ba48 ("loop: Move special partition reread handling in loop_clr_fd()")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
iov_iter is implemented on bvec itererator helpers, so it is safe to pass
multi-page bvec to it, and this way is much more efficient than passing one
page in each bvec.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
floppy_check_events() is supposed to return bit flags to say which
events occured. We should return zero to say that no event flags are
set. Only BIT(0) and BIT(1) are used in the caller. And .check_events
interface also expect to return an unsigned int value.
However, after commit a0c80efe5956, it may return -EINTR (-4u).
Here, both BIT(0) and BIT(1) are cleared. So this patch shouldn't
affect runtime, but it obviously is still worth fixing.
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: a0c80efe5956 ("floppy: fix lock_fdc() signal handling")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We need the debugfs fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We have various helpers for setting/clearing this flag, and also
a helper to check if the queue supports queueable flushes or not.
But nobody uses them anymore, kill it with fire.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The mtip32xx driver uses managed resources for DMA coherent memory
and irqs, but then always pairs them with free calls anyway, making
the resource tracking rather pointless. Given some DMA allocations
are transient anyway, the irq freeing seems to require ordering vs
other hardware access the best solution seems to be to stop using
the managed resource API entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We are trying to get rid of BUS_ATTR() and the usage of that in rbd.c
can be trivially converted to use BUS_ATTR_WO and RO, so use those
macros instead.
Cc: Sage Weil <sage@redhat.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull block fixes from Jens Axboe:
- block size setting fixes for loop/nbd (Jan Kara)
- md bio_alloc_mddev() cleanup (Marcos)
- Ensure we don't lose the REQ_INTEGRITY flag (Ming)
- Two NVMe fixes by way of Christoph:
- Fix NVMe IRQ calculation (Ming)
- Uninitialized variable in nvmet-tcp (Sagi)
- BFQ comment fix (Paolo)
- License cleanup for recently added blk-mq-debugfs-zoned (Thomas)
* tag 'for-linus-20190118' of git://git.kernel.dk/linux-block:
block: Cleanup license notice
nvme-pci: fix nvme_setup_irqs()
nvmet-tcp: fix uninitialized variable access
block: don't lose track of REQ_INTEGRITY flag
blockdev: Fix livelocks on loop device
nbd: Use set_blocksize() to set device blocksize
md: Make bio_alloc_mddev use bio_alloc_bioset
block, bfq: fix comments on __bfq_deactivate_entity
|
|
As 'be->blkif' is used for many times in connect_ring(), the stack variable
'blkif' is added to substitute 'be-blkif'.
Suggested-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
NBD can update block device block size implicitely through
bd_set_size(). Make it explicitely set blocksize with set_blocksize() as
this behavior of bd_set_size() is going away.
CC: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph, with little fixes all over the map
- Loop caching fix for offset/bs change (Jaegeuk Kim)
- Block documentation tweaks (Jeff, Jon, Weiping, John)
- null_blk zoned tweak (John)
- ahch mvebu suspend/resume support. Should have gone into the merge
window, but there was some confusion on which tree had it. (Miquel)
* tag 'for-linus-20190112' of git://git.kernel.dk/linux-block: (22 commits)
ata: ahci: mvebu: request PHY suspend/resume for Armada 3700
ata: ahci: mvebu: add Armada 3700 initialization needed for S2RAM
ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs
ata: ahci: mvebu: remove stale comment
ata: libahci_platform: comply to PHY framework
loop: drop caches if offset or block_size are changed
block: fix kerneldoc comment for blk_attempt_plug_merge()
nvme: don't initlialize ctrl->cntlid twice
nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN
nvme: pad fake subsys NQN vid and ssvid with zeros
nvme-multipath: zero out ANA log buffer
nvme-fabrics: unset write/poll queues for discovery controllers
nvme-tcp: don't ask if controller is fabrics
nvme-tcp: remove dead code
nvme-pci: fix out of bounds access in nvme_cqe_pending
nvme-pci: rerun irq setup on IO queue init errors
nvme-pci: use the same attributes when freeing host_mem_desc_bufs.
nvme-pci: fix the wrong setting of nr_maps
block: doc: add slice_idle_us to bfq documentation
block: clarify documentation for blk_{start|finish}_plug
...
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma_zalloc_coherent() removal from Christoph Hellwig:
"We've always had a weird situation around dma_zalloc_coherent. To
safely support mapping the allocations to userspace major
architectures like x86 and arm have always zeroed allocations from
dma_alloc_coherent, but a couple other architectures were missing that
zeroing either always or in corner cases.
Then later we grew anothe dma_zalloc_coherent interface to explicitly
request zeroing, but that just added __GFP_ZERO to the allocation
flags, which for some allocators that didn't end up using the page
allocator ended up being a no-op and still not zeroing the
allocations.
So for this merge window I fixed up all remaining architectures to
zero the memory in dma_alloc_coherent, and made dma_zalloc_coherent a
no-op wrapper around dma_alloc_coherent, which fixes all of the above
issues.
dma_zalloc_coherent is now pointless and can go away, and Luis helped
me writing a cocchinelle script and patch series to kill it, which I
think we should apply now just after -rc1 to finally settle these
issue"
* tag 'remove-dma_zalloc_coherent-5.0' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: remove dma_zalloc_coherent()
cross-tree: phase out dma_zalloc_coherent() on headers
cross-tree: phase out dma_zalloc_coherent()
|
|
Pull ceph updates from Ilya Dryomov:
"A patch to allow setting abort_on_full and a fix for an old "rbd
unmap" edge case, marked for stable"
* tag 'ceph-for-5.0-rc2' of git://github.com/ceph/ceph-client:
rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set
ceph: use vmf_error() in ceph_filemap_fault()
libceph: allow setting abort_on_full for rbd
|
|
There is a window between when RBD_DEV_FLAG_REMOVING is set and when
the device is removed from rbd_dev_list. During this window, we set
"already" and return 0.
Returning 0 from write(2) can confuse userspace tools because
0 indicates that nothing was written. In particular, "rbd unmap"
will retry the write multiple times a second:
10:28:05.463299 write(4, "0", 1) = 0
10:28:05.463509 write(4, "0", 1) = 0
10:28:05.463720 write(4, "0", 1) = 0
10:28:05.463942 write(4, "0", 1) = 0
10:28:05.464155 write(4, "0", 1) = 0
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
|
|
If we don't drop caches used in old offset or block_size, we can get old data
from new offset/block_size, which gives unexpected data to user.
For example, Martijn found a loopback bug in the below scenario.
1) LOOP_SET_FD loads first two pages on loop file
2) LOOP_SET_STATUS64 changes the offset on the loop file
3) mount is failed due to the cached pages having wrong superblock
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Reported-by: Martijn Coenen <maco@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch includes some fixes and cleanup for idle-page writeback.
1. writeback_limit interface
Now writeback_limit interface is rather conusing. For example, once
writeback limit budget is exausted, admin can see 0 from
/sys/block/zramX/writeback_limit which is same semantic with disable
writeback_limit at this moment. IOW, admin cannot tell that zero came
from disable writeback limit or exausted writeback limit.
To make the interface clear, let's sepatate enable of writeback limit to
another knob - /sys/block/zram0/writeback_limit_enable
* before:
while true :
# to re-enable writeback limit once previous one is used up
echo 0 > /sys/block/zram0/writeback_limit
echo $((200<<20)) > /sys/block/zram0/writeback_limit
..
.. # used up the writeback limit budget
* new
# To enable writeback limit, from the beginning, admin should
# enable it.
echo $((200<<20)) > /sys/block/zram0/writeback_limit
echo 1 > /sys/block/zram/0/writeback_limit_enable
while true :
echo $((200<<20)) > /sys/block/zram0/writeback_limit
..
.. # used up the writeback limit budget
It's much strightforward.
2. fix condition check idle/huge writeback mode check
The mode in writeback_store is not bit opeartion any more so no need to
use bit operations. Furthermore, current condition check is broken in
that it does writeback every pages regardless of huge/idle.
3. clean up idle_store
No need to use goto.
[minchan@kernel.org: missed spin_lock_init]
Link: http://lkml.kernel.org/r/20190103001601.GA255139@google.com
Link: http://lkml.kernel.org/r/20181224033529.19450-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Suggested-by: John Dias <joaodias@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: John Dias <joaodias@google.com>
Cc: Srinivas Paladugu <srnvs@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|