Age | Commit message (Collapse) | Author | Files | Lines |
|
Use bvec_virt instead of open coding it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210804095634.460779-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use bvec_virt instead of open coding it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210804095634.460779-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use bvec_virt instead of open coding it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210804095634.460779-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
__ebs_rw_bvec use page_address on the submitted bios data, and thus
can't deal with highmem. Disable the target on highmem configs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210804095634.460779-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the bvec_virt helper to clean up the bio integrity processing a
little bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@kernel.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210804095634.460779-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a helper to get the virtual address for a bvec. This avoids that
all callers need to know about the page + offset representation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210804095634.460779-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
inode_detach_wb references the "main" bdi of the inode. With the
recent change to move the bdi from the request_queue to the gendisk
this causes a guaranteed use after free when using certain cgroup
configurations. The big itself is older through as any non-default
inode reference (e.g. an open file descriptor) could have injected
this use after free even before that.
Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Reported-by: syzbot <syzbot+1fb38bb7d3ce0fa3e1c4@syzkaller.appspotmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210816122614.601358-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The dev_t is used as the inode hash, so we should only released it
once then block device inode is gone from the inode cache. Move it
to bdev_free_inode to ensure that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210816122614.601358-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
After patch 54efd50 (block: make generic_make_request handle
arbitrarily sized bios), the IO through io-throttle may be larger,
and these IOs may be further split into more small IOs. However,
IOPS throttle does not seem to be aware of this change, which
makes the calculation of IOPS of large IOs incomplete, resulting
in disk-side IOPS that does not meet expectations. Maybe we should
fix this problem.
We can reproduce it by set max_sectors_kb of disk to 128, set
blkio.write_iops_throttle to 100, run a dd instance inside blkio
and use iostat to watch IOPS:
dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct
As a result, without this change the average IOPS is 1995, with
this change the IOPS is 98.
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/65869aaad05475797d63b4c3fed4f529febe3c26.1627876014.git.brookxu@tencent.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bdev_resize_partition can only operate on the whole device. Make that clear
by passing a gendisk instead of a block_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210810154512.1809898-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bdev_del_partition can only operate on the whole device. Make that clear
by passing a gendisk instead of a block_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210810154512.1809898-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bdev_add_partition can only operate on the whole device. Make that clear
by passing a gendisk instead of a block_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210810154512.1809898-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Partition scanning only happens on the whole device, so pass a
struct gendisk instead of the whole device block_device to the scanners.
This allows to simplify printing the device name in various places as the
disk name is available in disk->name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810154512.1809898-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Just check inode_unhashed on the whole device bdev inode instead,
and provide a helper to check for that information.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Let the callers call del_gendisk so that we can check if add_disk
has been called properly for the cached device case instead of relying
on the block layer internal GENHD_FL_UP flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Except for the IDA none of the allocations in bcache_device_init is
unwound on error, fix that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove usage of the block layer internal GENHD_FL_UP by just looking
at the host state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the nvme-internal NVME_NSHEAD_DISK_LIVE flag instead of abusing
the block layer state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Early probe failure never reaches nvme_ns_remove, so GENHD_FL_UP must
be set at this point. Remove the check.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Restructure mmc_blk_probe to avoid a failure path with a half created
disk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210809064028.1198327-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pass the attribute group for the attributes on the gendisk to
device_add_disk so that they are created atomically with the
disk creation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210809064028.1198327-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the
generic header file to remove redundancy.
Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Link: https://lore.kernel.org/r/20210721025315.1729118-1-guoqing.jiang@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix the !CONFIG_BLOCK build after the recent cleanup.
Fixes: 5ed964f8e54e ("mm: hide laptop_mode_wb_timer entirely behind the BDI API")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When merging one bio to request, if they are discard IO and the queue
supports multi-range discard, we need to return ELEVATOR_DISCARD_MERGE
because both block core and related drivers(nvme, virtio-blk) doesn't
handle mixed discard io merge(traditional IO merge together with
discard merge) well.
Fix the issue by returning ELEVATOR_DISCARD_MERGE in this situation,
so both blk-mq and drivers just need to handle multi-range discard.
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 2705dfb20947 ("block: fix discard request merge")
Link: https://lore.kernel.org/r/20210729034226.1591070-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Just retrieve the bdi from the disk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The backing device information only makes sense for file system I/O,
and thus belongs into the gendisk and not the lower level request_queue
structure. Move it there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a helper to check if a gendisk is associated with a request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
.. and rename the function to disk_update_readahead. This is in
preparation for moving the BDI from the request_queue to the gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't leak the detaŃ–ls of the timer into the block layer, instead
initialize the timer in bdi_alloc and delete it in bdi_unregister.
Note that this means the timer is initialized (but not armed) for
non-block queues as well now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now that device mapper has been changed to register the disk once
it is fully ready all this code is unused.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
device mapper is currently the only outlier that tries to call
register_disk after add_disk, leading to fairly inconsistent state
of these block layer data structures. Instead change device-mapper
to just register the gendisk later now that the holder mechanism
can cope with that.
Note that this introduces a user visible change: the dm kobject is
now only visible after the initial table has been loaded.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move setting md->type from both callers into dm_setup_md_queue.
This ensures that md->type is only set to a valid value after the queue
has been fully setup, something we'll rely on future changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
md->queue is now always set when md->disk is set, so simplify the
conditionals a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
device mapper needs to register holders before it is ready to do I/O.
Currently it does so by registering the disk early, which can leave
the disk and queue in a weird half state where the queue is registered
with the disk, except for sysfs and the elevator. And this state has
been a bit promlematic before, and will get more so when sorting out
the responsibilities between the queue and the disk.
Support registering holders on an initialized but not registered disk
instead by delaying the sysfs registration until the disk is registered.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Invert they way the holder relations are tracked. This very
slightly reduces the memory overhead for partitioned devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210804094147.459763-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since commit 0d02129e76ed ("block: merge struct block_device and struct
hd_struct") there is no way for the bdev to go away as long as there is
a holder, so remove the extra references.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move the block holder code into a separate file as it is not in any way
related to the other block_dev.c code, and add a new selectable config
option for it so that we don't have to build it without any remapped
drivers selected.
The Kconfig symbol contains a _DEPRECATED suffix to match the comments
added in commit 49731baa41df
("block: restore multiple bd_link_disk_holder() support").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We noticed that the user interface of Android devices becomes very slow
under memory pressure. This is because Android uses the zram driver on top
of the loop driver for swapping, because under memory pressure the swap
code alternates reads and writes quickly, because mq-deadline is the
default scheduler for loop devices and because mq-deadline delays writes by
five seconds for such a workload with default settings. Fix this by making
the kernel select I/O scheduler 'none' from inside add_disk() for loop
devices. This default can be overridden at any time from user space,
e.g. via a udev rule. This approach has an advantage compared to changing
the I/O scheduler from userspace from 'mq-deadline' into 'none', namely
that synchronize_rcu() does not get called.
This patch changes the default I/O scheduler for loop devices from
'mq-deadline' into 'none'.
Additionally, this patch reduces the Android boot time on my test setup
with 0.5 seconds compared to configuring the loop I/O scheduler from user
space.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Martijn Coenen <maco@android.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210805174200.3250718-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
elevator_get_default() uses the following algorithm to select an I/O
scheduler from inside add_disk():
- In case of a single hardware queue or if sharing hardware queues across
multiple request queues (BLK_MQ_F_TAG_HCTX_SHARED), use mq-deadline.
- Otherwise, use 'none'.
This is a good choice for most but not for all block drivers. Make it
possible to override the selection of mq-deadline with a new flag,
namely BLK_MQ_F_NO_SCHED_BY_DEFAULT.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Martijn Coenen <maco@android.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210805174200.3250718-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In block/blk-mq-sysfs.c, struct blk_mq_ctx_sysfs_entry is not used to
define any attribute since the "mq" sysfs directory contains only
sub-directories (no attribute files). As a result, blk_mq_sysfs_show(),
blk_mq_sysfs_store(), and struct sysfs_ops blk_mq_sysfs_ops are all
unused and unnecessary. Remove all this unused code.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Link: https://lore.kernel.org/r/20210713081837.524422-1-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Make the loop device raise a DISK_MEDIA_CHANGE event on attach or detach.
# udevadm monitor -up |grep -e DISK_MEDIA_CHANGE -e DEVNAME &
# losetup -f zero
[ 7.454235] loop0: detected capacity change from 0 to 16384
DISK_MEDIA_CHANGE=1
DEVNAME=/dev/loop0
DEVNAME=/dev/loop0
DEVNAME=/dev/loop0
# losetup -f zero
[ 10.205245] loop1: detected capacity change from 0 to 16384
DISK_MEDIA_CHANGE=1
DEVNAME=/dev/loop1
DEVNAME=/dev/loop1
DEVNAME=/dev/loop1
# losetup -f zero2
[ 13.532368] loop2: detected capacity change from 0 to 40960
DISK_MEDIA_CHANGE=1
DEVNAME=/dev/loop2
DEVNAME=/dev/loop2
# losetup -D
DEVNAME=/dev/loop1
DISK_MEDIA_CHANGE=1
DEVNAME=/dev/loop1
DEVNAME=/dev/loop2
DISK_MEDIA_CHANGE=1
DEVNAME=/dev/loop2
DEVNAME=/dev/loop0
DISK_MEDIA_CHANGE=1
DEVNAME=/dev/loop0
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Luca Boccassi <bluca@debian.org>
Link: https://lore.kernel.org/r/20210712230530.29323-7-mcroce@linux.microsoft.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Refactor disk_check_events() and move some code into disk_event_uevent().
Then add disk_force_media_change(), a helper which will be used by
devices to force issuing a DISK_EVENT_MEDIA_CHANGE event.
Co-developed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Link: https://lore.kernel.org/r/20210712230530.29323-6-mcroce@linux.microsoft.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new sysfs handle to export the new diskseq value.
Place it in <sysfs>/block/<disk>/diskseq and document it.
$ grep . /sys/class/block/*/diskseq
/sys/class/block/loop0/diskseq:13
/sys/class/block/loop1/diskseq:14
/sys/class/block/loop2/diskseq:5
/sys/class/block/loop3/diskseq:6
/sys/class/block/ram0/diskseq:1
/sys/class/block/ram1/diskseq:2
/sys/class/block/vda/diskseq:7
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Link: https://lore.kernel.org/r/20210712230530.29323-5-mcroce@linux.microsoft.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new BLKGETDISKSEQ ioctl which retrieves the disk sequence number
from the genhd structure.
# ./getdiskseq /dev/loop*
/dev/loop0: 13
/dev/loop0p1: 13
/dev/loop0p2: 13
/dev/loop0p3: 13
/dev/loop1: 14
/dev/loop1p1: 14
/dev/loop1p2: 14
/dev/loop2: 5
/dev/loop3: 6
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Link: https://lore.kernel.org/r/20210712230530.29323-4-mcroce@linux.microsoft.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Export the newly introduced diskseq in uevents:
$ udevadm info /sys/class/block/* |grep -e DEVNAME -e DISKSEQ
E: DEVNAME=/dev/loop0
E: DISKSEQ=1
E: DEVNAME=/dev/loop1
E: DISKSEQ=2
E: DEVNAME=/dev/loop2
E: DISKSEQ=3
E: DEVNAME=/dev/loop3
E: DISKSEQ=4
E: DEVNAME=/dev/loop4
E: DISKSEQ=5
E: DEVNAME=/dev/loop5
E: DISKSEQ=6
E: DEVNAME=/dev/loop6
E: DISKSEQ=7
E: DEVNAME=/dev/loop7
E: DISKSEQ=8
E: DEVNAME=/dev/nvme0n1
E: DISKSEQ=9
E: DEVNAME=/dev/nvme0n1p1
E: DISKSEQ=9
E: DEVNAME=/dev/nvme0n1p2
E: DISKSEQ=9
E: DEVNAME=/dev/nvme0n1p3
E: DISKSEQ=9
E: DEVNAME=/dev/nvme0n1p4
E: DISKSEQ=9
E: DEVNAME=/dev/nvme0n1p5
E: DISKSEQ=9
E: DEVNAME=/dev/sda
E: DISKSEQ=10
E: DEVNAME=/dev/sda1
E: DISKSEQ=10
E: DEVNAME=/dev/sda2
E: DISKSEQ=10
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Link: https://lore.kernel.org/r/20210712230530.29323-3-mcroce@linux.microsoft.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Associating uevents with block devices in userspace is difficult and racy:
the uevent netlink socket is lossy, and on slow and overloaded systems
has a very high latency.
Block devices do not have exclusive owners in userspace, any process can
set one up (e.g. loop devices). Moreover, device names can be reused
(e.g. loop0 can be reused again and again). A userspace process setting
up a block device and watching for its events cannot thus reliably tell
whether an event relates to the device it just set up or another earlier
instance with the same name.
Being able to set a UUID on a loop device would solve the race conditions.
But it does not allow to derive orderings from uevents: if you see a
uevent with a UUID that does not match the device you are waiting for,
you cannot tell whether it's because the right uevent has not arrived yet,
or it was already sent and you missed it. So you cannot tell whether you
should wait for it or not.
Associating a unique, monotonically increasing sequential number to the
lifetime of each block device, which can be retrieved with an ioctl
immediately upon setting it up, allows to solve the race conditions with
uevents, and also allows userspace processes to know whether they should
wait for the uevent they need or if it was dropped and thus they should
move on.
Additionally, increment the disk sequence number when the media change,
i.e. on DISK_EVENT_MEDIA_CHANGE event.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Link: https://lore.kernel.org/r/20210712230530.29323-2-mcroce@linux.microsoft.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
cmdline-parser.c is only used by the cmdline faux partition format,
so merge the code into that and avoid an indirect call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210728053756.409654-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove the disk_name function now that all users are gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210727062518.122108-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
disk_name for partition 0 just copies out the disk_name field. Replace
the call to disk_name with a %s format specifier.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210727062518.122108-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Printk ->disk_name directly for the disk and use the %pg format specifier
for the block device, which is equivalent to a bdevname call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210727062518.122108-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|