summaryrefslogtreecommitdiff
path: root/arch/um/drivers/ubd_kern.c
AgeCommit message (Collapse)AuthorFilesLines
2021-01-06um: ubd: Submit all data segments atomicallyGabriel Krisman Bertazi1-76/+115
[ Upstream commit fc6b6a872dcd48c6f39c7975836d75113db67d37 ] Internally, UBD treats each physical IO segment as a separate command to be submitted in the execution pipe. If the pipe returns a transient error after a few segments have already been written, UBD will tell the block layer to requeue the request, but there is no way to reclaim the segments already submitted. When a new attempt to dispatch the request is done, those segments already submitted will get duplicated, causing the WARN_ON below in the best case, and potentially data corruption. In my system, running a UML instance with 2GB of RAM and a 50M UBD disk, I can reproduce the WARN_ON by simply running mkfs.fvat against the disk on a freshly booted system. There are a few ways to around this, like reducing the pressure on the pipe by reducing the queue depth, which almost eliminates the occurrence of the problem, increasing the pipe buffer size on the host system, or by limiting the request to one physical segment, which causes the block layer to submit way more requests to resolve a single operation. Instead, this patch modifies the format of a UBD command, such that all segments are sent through a single element in the communication pipe, turning the command submission atomic from the point of view of the block layer. The new format has a variable size, depending on the number of elements, and looks like this: +------------+-----------+-----------+------------ | cmd_header | segment 0 | segment 1 | segment ... +------------+-----------+-----------+------------ With this format, we push a pointer to cmd_header in the submission pipe. This has the advantage of reducing the memory footprint of executing a single request, since it allow us to merge some fields in the header. It is possible to reduce even further each segment memory footprint, by merging bitmap_words and cow_offset, for instance, but this is not the focus of this patch and is left as future work. One issue with the patch is that for a big number of segments, we now perform one big memory allocation instead of multiple small ones, but I wasn't able to trigger any real issues or -ENOMEM because of this change, that wouldn't be reproduced otherwise. This was tested using fio with the verify-crc32 option, and by running an ext4 filesystem over this UBD device. The original WARN_ON was: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141 refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346 Stack: 6084eed0 6063dc77 00000009 6084ef60 00000000 604b8d9f 6084eee0 6063dcbc 6084ef40 6006ab8d e013d780 1c00000000 Call Trace: [<600a0c1c>] ? printk+0x0/0x94 [<6004a888>] show_stack+0x13b/0x155 [<6063dc77>] ? dump_stack_print_info+0xdf/0xe8 [<604b8d9f>] ? refcount_warn_saturate+0x13f/0x141 [<6063dcbc>] dump_stack+0x2a/0x2c [<6006ab8d>] __warn+0x107/0x134 [<6008da6c>] ? wake_up_process+0x17/0x19 [<60487628>] ? blk_queue_max_discard_sectors+0x0/0xd [<6006b05f>] warn_slowpath_fmt+0xd1/0xdf [<6006af8e>] ? warn_slowpath_fmt+0x0/0xdf [<600acc14>] ? raw_read_seqcount_begin.constprop.0+0x0/0x15 [<600619ae>] ? os_nsecs+0x1d/0x2b [<604b8d9f>] refcount_warn_saturate+0x13f/0x141 [<6048bc8f>] refcount_sub_and_test.constprop.0+0x2f/0x37 [<6048c8de>] blk_mq_free_request+0xf1/0x10d [<6048ca06>] __blk_mq_end_request+0x10c/0x114 [<6005ac0f>] ubd_intr+0xb5/0x169 [<600a1a37>] __handle_irq_event_percpu+0x6b/0x17e [<600a1b70>] handle_irq_event_percpu+0x26/0x69 [<600a1bd9>] handle_irq_event+0x26/0x34 [<600a1bb3>] ? handle_irq_event+0x0/0x34 [<600a5186>] ? unmask_irq+0x0/0x37 [<600a57e6>] handle_edge_irq+0xbc/0xd6 [<600a131a>] generic_handle_irq+0x21/0x29 [<60048f6e>] do_IRQ+0x39/0x54 [...] ---[ end trace c6e7444e55386c0f ]--- Cc: Christopher Obbard <chris.obbard@collabora.com> Reported-by: Martyn Welch <martyn@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Tested-by: Christopher Obbard <chris.obbard@collabora.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-23um: ubd: Prevent buffer overrun on command completionGabriel Krisman Bertazi1-1/+3
[ Upstream commit 6e682d53fc1ef73a169e2a5300326cb23abb32ee ] On the hypervisor side, when completing commands and the pipe is full, we retry writing only the entries that failed, by offsetting io_req_buffer, but we don't reduce the number of bytes written, which can cause a buffer overrun of io_req_buffer, and write garbage to the pipe. Cc: Martyn Welch <martyn.welch@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-29um-ubd: Entrust re-queue to the upper layersAnton Ivanov1-2/+6
Fixes crashes due to ubd requeue logic conflicting with the block-mq logic. Crash is reproducible in 5.0 - 5.3. Fixes: 53766defb8c8 ("um: Clean-up command processing in UML UBD driver") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-15um: Add SPDX headers for files in arch/um/driversAlex Dewar1-1/+1
Convert files to use SPDX header. All files are licensed under the GPLv2. Signed-off-by: Alex Dewar <alex.dewar@gmx.co.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-05-08um: Do not unlock mutex that is not hold.Daniel Walter1-2/+2
Return error instead of trying to unlock a mutex that is not hold. Signed-off-by: Daniel Walter <dwalter@google.com> Reviewed-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2019-03-07um: Fix for a possible OOPS in ubd initializationAnton Ivanov1-3/+3
If the ubd device failed to allocate a queue during initialization it tried call blk_cleanup_queue resulting in an oops. This patch simplifies the cleanup logic and ensures that blk_queue_cleanup is called only if there is a valid queue. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-12-28um: Remove obsolete reenable_XX callsAnton Ivanov1-1/+0
reenable_fd has been a NOP since the introduction of the EPOLL based interrupt controller. reenable_channel() is no longer needed as the flow control is now handled via the write IRQs on the channel. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-12-28um: Add support for DISCARD in the UBD DriverAnton Ivanov1-11/+54
Support for DISCARD and WRITE_ZEROES in the ubd driver using fallocate. DISCARD is enabled by default and can be disabled using a new UBD command line flag. If the underlying fs on which the UBD image is stored does not support DISCARD the support for both DISCARD and WRITE_ZEROES is turned off. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-12-28um: Remove unsafe printks from the io threadAnton Ivanov1-25/+17
Printk out of the io thread has been proven to be unsafe. This is not surprising as the thread is part of the UML hypervisor code. It is not supposed to invoke any kernel code/resources. It is necesssary to pass the error to the block IO layer and let it Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-12-28um: Clean-up command processing in UML UBD driverAnton Ivanov1-30/+47
Clean-up command processing and return BLK_STS_NOTSUP for uknown commands. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-12-28um: Switch to block-mq constants in the UML UBD driverAnton Ivanov1-28/+38
Switch to block mq-constants for both commands, error codes and various computations. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2018-11-08ubd: fix missing initialization of io_reqAnton Ivanov1-2/+1
The SYNC path doesn't initialize io_req->error, which can cause random errors. Before the conversion to blk-mq, we always completed requests with BLK_STS_OK status, but now we actually look at the error field and this issue becomes apparent. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> [axboe: fixed up commit message to explain what is actually going on] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-08ubd: fix missing lock around request issueJens Axboe1-2/+7
We need to hold the device lock (and disable interrupts) while writing new commands, or we could be interrupted while that is happening and read invalid requests in the completion path. Fixes: 4e6da0fe8058 ("um: Convert ubd driver to blk-mq") Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-19ubd: remove use of blk_rq_map_sgChristoph Hellwig1-104/+54
There is no good reason to create a scatterlist in the ubd driver, it can just iterate the request directly. Signed-off-by: Christoph Hellwig <hch@lst.de> [rw: Folded in improvements as discussed with hch and jens] Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-14um: Convert ubd driver to blk-mqRichard Weinberger1-85/+93
Convert the driver to the modern blk-mq framework. As byproduct we get rid of our open coded restart logic and let blk-mq handle it. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-28block: genhd: add 'groups' argument to device_add_diskHannes Reinecke1-1/+1
Update device_add_disk() to take an 'groups' argument so that individual drivers can register a device with additional sysfs attributes. This avoids race condition the driver would otherwise have if these groups were to be created with sysfs_add_groups(). Signed-off-by: Martin Wilck <martin.wilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-13treewide: kmalloc() -> kmalloc_array()Kees Cook1-6/+6
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-05-16proc: introduce proc_create_single{,_data}Christoph Hellwig1-14/+2
Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-02-19Epoll based IRQ controllerAnton Ivanov1-2/+2
1. Removes the need to walk the IRQ/Device list to determine who triggered the IRQ. 2. Improves scalability (up to several times performance improvement for cases with 10s of devices). 3. Improves UML baseline IO performance for one disk + one NIC use case by up to 10%. 4. Introduces write poll triggered IRQs. 5. Prerequisite for introducing high performance mmesg family of functions in network IO. 6. Fixes RNG shutdown which was leaking a file descriptor Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-06-09block: introduce new block status code typeChristoph Hellwig1-1/+1
Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-15um: UBD ImprovementsAnton Ivanov1-26/+142
UBD at present is extremely slow because it handles only one request at a time in the IO thread and IRQ handler. The single request at a time is replaced by handling multiple requests as well as necessary workarounds for short reads/writes. Resulting performance improvement in disk IO - 30% Signed-off-by: Anton Ivanov <aivanov@kot-begemot.co.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-06-22um: track 'parent' device in a local variableDan Williams1-2/+3
In preparation for the removal of 'driverfs_dev' from 'struct gendisk' use a local variable to track the parented vs un-parented case in ubd_disk_register(). Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-07block, drivers: add REQ_OP_FLUSH operationMike Christie1-1/+1
This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-13um: switch to using blk_queue_write_cache()Jens Axboe1-1/+1
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-01-10um: Update UBD to use pread/pwrite family of functionsAnton Ivanov1-22/+5
This decreases the number of syscalls per read/write by half. Signed-off-by: Anton Ivanov <aivanov@brocade.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2014-10-13um: ubd: Fix for processes stuck in D state foreverThorsten Knabe1-2/+3
Starting with Linux 3.12 processes get stuck in D state forever in UserModeLinux under sync heavy workloads. This bug was introduced by commit 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport). Fix bug by adding a check if FLUSH request was successfully submitted to the I/O thread and keeping the FLUSH request on the request queue on submission failures. Fixes: 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport) Signed-off-by: Thorsten Knabe <linux@thorsten-knabe.de> Cc: stable@kernel.org # >= 3.12 Signed-off-by: Richard Weinberger <richard@nod.at>
2013-09-07um: Cleanup SIGTERM handlingRichard Weinberger1-1/+2
Richard reported that some UML processes survive if the UML main process receives a SIGTERM. This issue was caused by a wrongly placed signal(SIGTERM, SIG_DFL) in init_new_thread_signals(). It disabled the UML exit handler accidently for some processes. The correct solution is to disable the fatal handler for all UML helper threads/processes. Such that last_ditch_exit() does not get called multiple times and all processes can exit due to SIGTERM. Reported-and-tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2013-09-07um: ubd: Introduce submit_request()Richard Weinberger1-13/+19
Just a clean-up patch to remove the open coded variants and to ensure that all requests are submitted the same way. Signed-off-by: Richard Weinberger <richard@nod.at>
2013-09-07um: ubd: Add REQ_FLUSH suppportRichard Weinberger1-1/+40
UML's block device driver does not support write barriers, to support this this patch adds REQ_FLUSH suppport. Every time the block layer sends a REQ_FLUSH we fsync() now our backing file to guarantee data consistency. Reported-and-tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2013-05-07block_device_operations->release() should return voidAl Viro1-3/+2
The value passed is 0 in all but "it can never happen" cases (and those only in a couple of drivers) *and* it would've been lost on the way out anyway, even if something tried to pass something meaningful. Just don't bother. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-10um: get rid of pointless include "..." where include <...> will doAl Viro1-4/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-08-02um: fix ubd_file_size for read-only filesMartin Pärtel1-1/+1
Made ubd_file_size not request write access. Fixes use of read-only images. Signed-off-by: Martin Pärtel <martin.partel@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-03-25um: clean up the includes in ubdAl Viro1-29/+15
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2012-03-25um: irq: Remove IRQF_DISABLEDYong Zhang1-1/+1
Since commit [e58aa3d2: genirq: Run irq handlers with interrupts disabled], We run all interrupt handlers with interrupts disabled and we even check and yell when an interrupt handler returns with interrupts enabled (see commit [b738a50a: genirq: Warn when handler enables interrupts]). So now this flag is a NOOP and can be removed. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02um: fix ubd cow sizeRichard Weinberger1-1/+30
ubd_file_size() cannot use ubd_dev->cow.file because at this time ubd_dev->cow.file is not initialized. Therefore, ubd_file_size() will always report a wrong disk size when COW files are used. Reading from /dev/ubd* would crash the kernel. We have to read the correct disk size from the COW file's backing file. Signed-off-by: Richard Weinberger <richard@nod.at> CC: stable@kernel.org
2011-11-02um: kill shared/mem_kern.hAl Viro1-1/+0
... nothing declared there exists Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02um: don't include kern.h unless it's neededAl Viro1-1/+0
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2011-01-27um: Replace deprecated spinlock initializationThomas Gleixner1-1/+1
SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jeff Dike <jdike@addtoit.com>
2010-10-22Merge branch 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bklLinus Torvalds1-5/+6
* 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: BKL: introduce CONFIG_BKL. dabusb: remove the BKL sunrpc: remove the big kernel lock init/main.c: remove BKL notations blktrace: remove the big kernel lock rtmutex-tester: make it build without BKL dvb-core: kill the big kernel lock dvb/bt8xx: kill the big kernel lock tlclk: remove big kernel lock fix rawctl compat ioctls breakage on amd64 and itanic uml: kill big kernel lock parisc: remove big kernel lock cris: autoconvert trivial BKL users alpha: kill big kernel lock isapnp: BKL removal s390/block: kill the big kernel lock hpet: kill BKL, add compat_ioctl
2010-10-19uml: kill big kernel lockArnd Bergmann1-5/+6
Three uml device drivers still use the big kernel lock, but all of them can be safely converted to using a per-driver mutex instead. Most likely this is not even necessary, so after further review these can and should be removed as well. The exec system call no longer requires the BKL either, so remove it from there, too. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jeff Dike <jdike@addtoit.com> Cc: user-mode-linux-devel@lists.sourceforge.net
2010-10-15ubd: fix incorrect sector handling during request restartTejun Heo1-4/+5
Commit f81f2f7c (ubd: drop unnecessary rq->sector manipulation) dropped request->sector manipulation in preparation for global request handling cleanup; unfortunately, it incorrectly assumed that the updated sector wasn't being used. ubd tries to issue as many requests as possible to io_thread. When issuing fails due to memory pressure or other reasons, the device is put on the restart list and issuing stops. On IO completion, devices on the restart list are scanned and IO issuing is restarted. ubd issues IOs sg-by-sg and issuing can be stopped in the middle of a request, so each device on the restart queue needs to remember where to restart in its current request. ubd needs to keep track of the issue position itself because, * blk_rq_pos(req) is now updated by the block layer to keep track of _completion_ position. * Multiple io_req's for the current request may be in flight, so it's difficult to tell where blk_rq_pos(req) currently is. Add ubd->rq_pos to keep track of the issue position and use it to correctly restart io_req issue. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Tested-by: Chris Frey <cdfrey@foursquare.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07block: push down BKL into .open and .releaseArnd Bergmann1-1/+6
The open and release block_device_operations are currently called with the BKL held. In order to change that, we must first make sure that all drivers that currently rely on this have no regressions. This blindly pushes the BKL into all .open and .release operations for all block drivers to prepare for the next step. The drivers can subsequently replace the BKL with their own locks or remove it completely when it can be shown that it is not needed. The functions blkdev_get and blkdev_put are the only remaining users of the big kernel lock in the block layer, besides a few uses in the ioctl code, none of which need to serialize with blkdev_{get,put}. Most of these two functions is also under the protection of bdev->bd_mutex, including the actual calls to ->open and ->release, and the common code does not access any global data structures that need the BKL. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-02-26block: Consolidate phys_segment and hw_segment limitsMartin K. Petersen1-1/+1
Except for SCSI no device drivers distinguish between physical and hardware segment limits. Consolidate the two into a single segment limit. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-02-26block: Rename blk_queue_max_sectors to blk_queue_max_hw_sectorsMartin K. Petersen1-1/+1
The block layer calling convention is blk_queue_<limit name>. blk_queue_max_sectors predates this practice, leading to some confusion. Rename the function to appropriately reflect that its intended use is to set max_hw_sectors. Also introduce a temporary wrapper for backwards compability. This can be removed after the merge window is closed. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-15uml: convert to seq_file/proc_fopsAlexey Dobriyan1-18/+18
Convert code away from ->read_proc/->write_proc interfaces. Switch to proc_create()/proc_create_data() which make addition of proc entries reliable wrt NULL ->proc_fops, NULL ->data and so on. Problem with ->read_proc et al is described here commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in /proc entries" Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22const: make block_device_operations constAlexey Dobriyan1-1/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16uml: remove driver_data direct access of struct deviceGreg Kroah-Hartman1-2/+2
In the near future, the driver core is going to not allow direct access to the driver_data pointer in struct device. Instead, the functions dev_get_drvdata() and dev_set_drvdata() should be used. These functions have been around since the beginning, so are backwards compatible with all older kernel versions. Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-11block: implement and enforce request peek/start/fetchTejun Heo1-2/+1
Till now block layer allowed two separate modes of request execution. A request is always acquired from the request queue via elv_next_request(). After that, drivers are free to either dequeue it or process it without dequeueing. Dequeue allows elv_next_request() to return the next request so that multiple requests can be in flight. Executing requests without dequeueing has its merits mostly in allowing drivers for simpler devices which can't do sg to deal with segments only without considering request boundary. However, the benefit this brings is dubious and declining while the cost of the API ambiguity is increasing. Segment based drivers are usually for very old or limited devices and as converting to dequeueing model isn't difficult, it doesn't justify the API overhead it puts on block layer and its more modern users. Previous patches converted all block low level drivers to dequeueing model. This patch completes the API transition by... * renaming elv_next_request() to blk_peek_request() * renaming blkdev_dequeue_request() to blk_start_request() * adding blk_fetch_request() which is combination of peek and start * disallowing completion of queued (not started) requests * applying new API to all LLDs Renamings are for consistency and to break out of tree code so that it's apparent that out of tree drivers need updating. [ Impact: block request issue API cleanup, no functional change ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Mike Miller <mike.miller@hp.com> Cc: unsik Kim <donari75@gmail.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Laurent Vivier <Laurent@lvivier.info> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: Stefan Weinhuber <wein@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11block: convert to pos and nr_sectors accessorsTejun Heo1-1/+1
With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always equal the !hard fields. Convert all rq->sectors, nr_sectors and current_nr_sectors references to accessors. While at it, drop superflous blk_rq_pos() < 0 test in swim.c. [ Impact: use pos and nr_sectors accessors ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Dario Ballabio <ballabio_dario@emc.com> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: unsik Kim <donari75@gmail.com> Cc: Laurent Vivier <Laurent@lvivier.info> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>