summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2015-07-24xen-blkfront: introduce blkfront_gather_backend_features()Bob Liu1-54/+68
There is a bug when migrate from !feature-persistent host to feature-persistent host, because domU still thinks new host/backend doesn't support persistent. Dmesg like: backed has not unmapped grant: 839 backed has not unmapped grant: 773 backed has not unmapped grant: 773 backed has not unmapped grant: 773 backed has not unmapped grant: 839 The fix is to recheck feature-persistent of new backend in blkif_recover(). See: https://lkml.org/lkml/2015/5/25/469 As Roger suggested, we can split the part of blkfront_connect that checks for optional features, like persistent grants, indirect descriptors and flush/barrier features to a separate function and call it from both blkfront_connect and blkif_recover Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-22drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialisingBob Liu1-0/+2
Patch 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b "drivers: xen-blkback: delay pending_req allocation to connect_ring" exposed an problem that Xen blkfront has. There is a race with XenStored and the drivers such that we can see two: vbd vbd-268440320: blkfront:blkback_changed to state 2. vbd vbd-268440320: blkfront:blkback_changed to state 2. vbd vbd-268440320: blkfront:blkback_changed to state 4. state changes to XenbusStateInitWait ('2'). The end result is that blkback_changed() receives two notify and calls twice setup_blkring(). While the backend driver may only get the first setup_blkring() which is wrong and reads out-dated (or reads them as they are being updated with new ring-ref values). The end result is that the ring ends up being incorrectly set. The other drivers in the tree have such checks already in. Reported-and-Tested-by: Robert Butera <robert.butera@oracle.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-06xen/block: add multi-page ring supportBob Liu4-59/+180
Extend xen/block to support multi-page ring, so that more requests can be issued by using more than one pages as the request ring between blkfront and backend. As a result, the performance can get improved significantly. We got some impressive improvements on our highend iscsi storage cluster backend. If using 64 pages as the ring, the IOPS increased about 15 times for the throughput testing and above doubled for the latency testing. The reason was the limit on outstanding requests is 32 if use only one-page ring, but in our case the iscsi lun was spread across about 100 physical drives, 32 was really not enough to keep them busy. Changes in v2: - Rebased to 4.0-rc6. - Document on how multi-page ring feature working to linux io/blkif.h. Changes in v3: - Remove changes to linux io/blkif.h and follow the protocol defined in io/blkif.h of XEN tree. - Rebased to 4.1-rc3 Changes in v4: - Turn to use 'ring-page-order' and 'max-ring-page-order'. - A few comments from Roger. Changes in v5: - Clarify with 4k granularity to comment - Address more comments from Roger Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-06driver: xen-blkfront: move talk_to_blkback to a more suitable placeBob Liu1-8/+6
The major responsibility of talk_to_blkback() is allocate and initialize the request ring and write the ring info to xenstore. But this work should be done after backend entered 'XenbusStateInitWait' as defined in the protocol file. See xen/include/public/io/blkif.h in XEN git tree: Front Back ================================= ===================================== XenbusStateInitialising XenbusStateInitialising o Query virtual device o Query backend device identification properties. data. o Setup OS device instance. o Open and validate backend device. o Publish backend features and transport parameters. | | V XenbusStateInitWait o Query backend features and transport parameters. o Allocate and initialize the request ring. There is no problem with this yet, but it is an violation of the design and furthermore it would not allow frontend/backend to negotiate 'multi-page' and 'multi-queue' features. Changes in v2: - Re-write the commit message to be more clear. Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-06drivers: xen-blkback: delay pending_req allocation to connect_ringBob Liu2-45/+39
This is a pre-patch for multi-page ring feature. In connect_ring, we can know exactly how many pages are used for the shared ring, delay pending_req allocation here so that we won't waste too much memory. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-05NVMe: Automatic namespace rescanKeith Busch1-32/+127
Namespaces may be dynamically allocated and deleted or attached and detached. This has the driver rescan the device for namespace changes after each device reset or namespace change asynchronous event. There could potentially be many detached namespaces that we don't want polluting /dev/ with unusable block handles, so this will delete disks if the namespace is not active as indicated by the response from identify namespace. This also skips adding the disk if no capacity is provisioned to the namespace in the first place. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-05NVMe: Memory barrier before queue_count is incrementedJon Derrick1-1/+4
Protects against reordering and/or preempting which would allow the kthread to access the queue descriptor before it is set up Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-05NVMe: add sysfs and ioctl controller resetKeith Busch1-0/+53
We need the ability to perform an nvme controller reset as discussed on the mailing list thread: http://lists.infradead.org/pipermail/linux-nvme/2015-March/001585.html This adds a sysfs entry that when written to will reset perform an NVMe controller reset if the controller was successfully initialized in the first place. This also adds locking around resetting the device in the async probe method so the driver can't schedule two resets. Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Brandon Schultz <brandon.schulz@hgst.com> Cc: David Sariel <david.sariel@pmcs.com> Updated by Jens to: 1) Merge this with the ioctl reset patch from David Sariel. The ioctl path now shares the reset code from the sysfs path. 2) Don't flush work if we fail issuing the reset. Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02null_blk: restart request processing on completion handlerAkinobu Mita1-0/+12
When irqmode=2 (IRQ completion handler is timer) and queue_mode=1 (Block interface to use is rq), the completion handler should restart request handling for any pending requests on a queue because request processing stops when the number of commands are queued more than hw_queue_depth (null_rq_prep_fn returns BLKPREP_DEFER). Without this change, the following command cannot finish. # modprobe null_blk irqmode=2 queue_mode=1 hw_queue_depth=1 # fio --name=t --rw=read --size=1g --direct=1 \ --ioengine=libaio --iodepth=64 --filename=/dev/nullb0 Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02null_blk: prevent timer handler running on a different CPU where startedAkinobu Mita1-1/+1
When irqmode=2 (IRQ completion handler is timer), timer handler should be called on the same CPU where the timer has been started. Since completion_queues are per-cpu and the completion handler only touches completion_queue for local CPU, we need to prevent the handler from running on a different CPU where the timer has been started. Otherwise, the IO cannot be completed until another completion handler is executed on that CPU. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-01NVMe: Remove hctx reliance for multi-namespaceKeith Busch1-33/+19
The driver needs to track shared tags to support multiple namespaces that may be dynamically allocated or deleted. Relying on the first request_queue's hctx's is not appropriate as we cannot clear outstanding tags for all namespaces using this handle, nor can the driver easily track all request_queue's hctx as namespaces are attached/detached. Instead, this patch uses the nvme_dev's tagset to get the shared tag resources instead of through a request_queue hctx. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-01Merge branch 'for-4.2/core' into for-4.2/driversJens Axboe9-159/+64
2015-05-29NVMe: End sync requests immediately on failureKeith Busch1-0/+1
Do not retry failed sync commands so the original status may be seen without issuing unnecessary retries. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-29NVMe: Use requested sync command timeoutKeith Busch1-1/+1
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22NVMe: Fix obtaining command resultKeith Busch1-4/+6
Replaces req->sense_len usage, which is not owned by the LLD, to req->special to contain the command result for driver created commands, and sets the result unconditionally on completion. Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@fb.com> Fixes: d29ec8241c10 ("nvme: submit internal commands through the block layer") Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22block, dm: don't copy bios for request clonesChristoph Hellwig3-142/+59
Currently dm-multipath has to clone the bios for every request sent to the lower devices, which wastes cpu cycles and ties down memory. This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio to not complete bios attached to a request, which we set on clone requests similar to bios in a flush sequence. With this change I/O errors on a path failure only get propagated to dm-multipath, which can then either resubmit the I/O or complete the bios on the original request. I've done some basic testing of this on a Linux target with ALUA support, and it survives path failures during I/O nicely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22block: remove management of bi_remaining when restoring original bi_end_ioMike Snitzer6-17/+5
Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains") regressed all existing callers that followed this pattern: 1) saving a bio's original bi_end_io 2) wiring up an intermediate bi_end_io 3) restoring the original bi_end_io from intermediate bi_end_io 4) calling bio_endio() to execute the restored original bi_end_io The regression was due to BIO_CHAIN only ever getting set if bio_inc_remaining() is called. For the above pattern it isn't set until step 3 above (step 2 would've needed to establish BIO_CHAIN). As such the first bio_endio(), in step 2 above, never decremented __bi_remaining before calling the intermediate bi_end_io -- leaving __bi_remaining with the value 1 instead of 0. When bio_inc_remaining() occurred during step 3 it brought it to a value of 2. When the second bio_endio() was called, in step 4 above, it should've called the original bi_end_io but it didn't because there was an extra reference that wasn't dropped (due to atomic operations being optimized away since BIO_CHAIN wasn't set upfront). Fix this issue by removing the __bi_remaining management complexity for all callers that use the above pattern -- bio_chain() is the only interface that _needs_ to be concerned with __bi_remaining. For the above pattern callers just expect the bi_end_io they set to get called! Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls that aren't associated with the bio_chain() interface. Also, the bio_inc_remaining() interface has been moved local to bio.c. Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: submit internal commands through the block layerChristoph Hellwig2-530/+291
Use block layer queues with an internal cmd_type to submit internally generated NVMe commands. This both simplifies the code a lot and allow for a better structure. For example now the LighNVM code can construct commands without knowing the details of the underlying I/O descriptors. Or a future NVMe over network target could inject commands, as well as could the SCSI translation and ioctl code be reused for such a beast. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: fail SCSI read/write command with unsupported protection bitChristoph Hellwig1-0/+7
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: report the DPOFUA in MODE_SENSEChristoph Hellwig1-2/+2
NVMe device always support the FUA bit, and the SCSI translations accepts the DPO bit, which doesn't have much of a meaning for us. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: simplify and cleanup the READ/WRITE SCSI CDB parsing codeChristoph Hellwig1-54/+24
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: first round at deobsfucating the SCSI translation codeChristoph Hellwig1-244/+82
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: fix scsi translation error handlingChristoph Hellwig1-251/+127
Erorr handling for the scsi translation was completely broken, as there were two different positive error number spaces overlapping. Fix this up by removing one of them, and centralizing the generation of the other positive values in a single place. Also fix up a few places that didn't handle the NVMe error codes properly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: split nvme_trans_send_fw_cmdChristoph Hellwig1-46/+47
This function handles two totally different opcodes, so split it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: store a struct device pointer in struct nvme_devChristoph Hellwig2-95/+78
Most users want the generic device, so store that in struct nvme_dev instead of the pci_dev. This also happens to be a nice step towards making some code reusable for non-PCI transports. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: consolidate synchronous command submission helpersChristoph Hellwig2-72/+41
Note that we keep the unused timeout argument, but allow callers to pass 0 instead of a timeout if they want the default. This will allow adding a timeout to the pass through path later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20loop: remove (now) unused 'out' labelJens Axboe1-1/+0
gcc, righfully, complains: drivers/block/loop.c:1369:1: warning: label 'out' defined but not used [-Wunused-label] Kill it. Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20s390/block/dasd: remove obsolete while -EBUSY loopJarod Wilson1-9/+3
With the mutex_trylock bit gone from blkdev_reread_part(), the retry logic in dasd_scan_partitions() shouldn't be necessary. CC: Christoph Hellwig <hch@infradead.org> CC: Jens Axboe <axboe@kernel.dk> CC: Tejun Heo <tj@kernel.org> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Markus Pargmann <mpa@pengutronix.de> CC: Stefan Weinhuber <wein@de.ibm.com> CC: Stefan Haberland <stefan.haberland@de.ibm.com> CC: Sebastian Ott <sebott@linux.vnet.ibm.com> CC: Fabian Frederick <fabf@skynet.be> CC: Ming Lei <ming.lei@canonical.com> CC: David Herrmann <dh.herrmann@gmail.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Peter Zijlstra <peterz@infradead.org> CC: nbd-general@lists.sourceforge.net CC: linux-s390@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20block: dasd_genhd: convert to blkdev_reread_partMing Lei1-6/+3
Also remove the obsolete comment. Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jarod Wilson <jarod@redhat.com> Acked-by: Jarod Wilson <jarod@redhat.com> Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20block: nbd: convert to blkdev_reread_part()Ming Lei1-1/+1
Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jarod Wilson <jarod@redhat.com> Acked-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20block: loop: fix another reread part failureMing Lei1-4/+26
loop_clr_fd() can be run piggyback with lo_release(), and under this situation, reread partition may always fail because bd_mutex has been held already. This patch detects the situation by the reference count, and call __blkdev_reread_part() to avoid acquiring the lock again. In the meantime, this patch switches to new kernel APIs of blkdev_reread_part() and __blkdev_reread_part(). Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jarod Wilson <jarod@redhat.com> Acked-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-20block: loop: don't hold lo_ctl_mutex in lo_openMing Lei2-10/+13
The lo_ctl_mutex is held for running all ioctl handlers, and in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for rereading partitions, which requires bd_mutex. So it is easy to cause failure because trylock(bd_mutex) may fail inside blkdev_reread_part(), and follows the lock context: blkid or other application: ->open() ->mutex_lock(bd_mutex) ->lo_open() ->mutex_lock(lo_ctl_mutex) losetup(set fd ioctl): ->mutex_lock(lo_ctl_mutex) ->ioctl_by_bdev(BLKRRPART) ->trylock(bd_mutex) This patch trys to eliminate the ABBA lock dependency by removing lo_ctl_mutext in lo_open() with the following approach: 1) make lo_refcnt as atomic_t and avoid acquiring lo_ctl_mutex in lo_open(): - for open vs. add/del loop, no any problem because of loop_index_mutex - freeze request queue during clr_fd, so I/O can't come until clearing fd is completed, like the effect of holding lo_ctl_mutex in lo_open - both open() and release() have been serialized by bd_mutex already 2) don't hold lo_ctl_mutex for decreasing/checking lo_refcnt in lo_release(), then lo_ctl_mutex is only required for the last release. Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jarod Wilson <jarod@redhat.com> Acked-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19nvme: disable irqs in nvme_freeze_queuesChristoph Hellwig1-2/+2
The queue_lock needs to be taken with irqs disabled. This is mostly due to the old pre blk-mq usage pattern, but we've also picked it up in most of the few places where we use the queue_lock with blk-mq. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19cciss: correct the non-resettable board listTomas Henzl1-2/+21
The hpsa driver carries a more recent version, copy the table from there. Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-19cciss: remove duplicate entries from board_type structTomas Henzl1-4/+0
and devices not supported by this driver from unresettable list Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05block: loop: avoiding too many pending per work I/OMing Lei1-1/+1
If there are too many pending per work I/O, too many high priority work thread can be generated so that system performance can be effected. This patch limits the max_active parameter of workqueue as 16. This patch fixes Fedora 22 live booting performance regression when it is booted from squashfs over dm based on loop, and looks the following reasons are related with the problem: - not like other filesyststems(such as ext4), squashfs is a bit special, and I observed that increasing I/O jobs to access file in squashfs only improve I/O performance a little, but it can make big difference for ext4 - nested loop: both squashfs.img and ext3fs.img are mounted as loop block, and ext3fs.img is inside the squashfs - during booting, lots of tasks may run concurrently Fixes: b5dd2f6047ca108001328aac0e8588edd15f1778 Cc: stable@vger.kernel.org (v4.0) Cc: Justin M. Forbes <jforbes@fedoraproject.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05block: loop: convert to per-device workqueueMing Lei2-16/+15
Documentation/workqueue.txt: If there is dependency among multiple work items used during memory reclaim, they should be queued to separate wq each with WQ_MEM_RECLAIM. Loop devices can be stacked, so we have to convert to per-device workqueue. One example is Fedora live CD. Fixes: b5dd2f6047ca108001328aac0e8588edd15f1778 Cc: stable@vger.kernel.org (v4.0) Cc: Justin M. Forbes <jforbes@fedoraproject.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05nbd: stop using req->cmdChristoph Hellwig1-25/+23
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05block: move PM request support to IDEChristoph Hellwig4-19/+49
This removes the request types and hacks from the block code and into the old IDE driver. There is a small amunt of code duplication due to this, but it's not too bad. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05block: move REQ_TYPE_SENSE to the ide driverChristoph Hellwig4-9/+9
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05block: rename REQ_TYPE_SPECIAL to REQ_TYPE_DRV_PRIVChristoph Hellwig14-24/+24
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05bio: skip atomic inc/dec of ->bi_cnt for most use casesJens Axboe1-1/+1
Struct bio has a reference count that controls when it can be freed. Most uses cases is allocating the bio, which then returns with a single reference to it, doing IO, and then dropping that single reference. We can remove this atomic_dec_and_test() in the completion path, if nobody else is holding a reference to the bio. If someone does call bio_get() on the bio, then we flag the bio as now having valid count and that we must properly honor the reference count when it's being put. Tested-by: Robert Elliott <elliott@hp.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05bio: skip atomic inc/dec of ->bi_remaining for non-chainsJens Axboe4-5/+5
Struct bio has an atomic ref count for chained bio's, and we use this to know when to end IO on the bio. However, most bio's are not chained, so we don't need to always introduce this atomic operation as part of ending IO. Add a helper to elevate the bi_remaining count, and flag the bio as now actually needing the decrement at end_io time. Rename the field to __bi_remaining to catch any current users of this doing the incrementing manually. For high IOPS workloads, this reduces the overhead of bio_endio() substantially. Tested-by: Robert Elliott <elliott@hp.com> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-9/+9
Pull crypto fixes from Herbert Xu: "This fixes a build problem with bcm63xx and yet another fix to the memzero_explicit function to ensure that the memset is not elided" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: hwrng: bcm63xx - Fix driver compilation lib: make memzero_explicit more robust against dead store elimination
2015-05-05Merge tag 'media/v4.1-3' of ↵Linus Torvalds7-17/+40
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Three driver fixes: - fix for omap4, fixing a regression due to a subsystem API that got removed for 4.1 (commit efde234674d9); - fix for one of the formats supported by Marvel ccic driver; - fix rcar_vin driver that, when stopping abnormally, the driver can't return from wait_for_completion" * tag 'media/v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] v4l: omap4iss: Replace outdated OMAP4 control pad API with syscon [media] media: soc_camera: rcar_vin: Fix wait_for_completion [media] marvell-ccic: fix Y'CbCr ordering
2015-05-04hwrng: bcm63xx - Fix driver compilationÁlvaro Fernández Rojas1-9/+9
- s/clk_didsable_unprepare/clk_disable_unprepare - s/prov/priv - s/error/ret (bcm63xx_rng_probe) Fixes: 6229c16060fe ("hwrng: bcm63xx - make use of devm_hwrng_register") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-05-04Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds14-94/+110
Pull drm fixes from Dave Airlie: "One intel fix, one rockchip fix, and a bunch of radeon fixes for some regressions from audio rework and vm stability" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/i915/chv: Implement WaDisableShadowRegForCpd drm/radeon: fix userptr return value checking (v2) drm/radeon: check new address before removing old one drm/radeon: reset BOs address after clearing it. drm/radeon: fix lockup when BOs aren't part of the VM on release drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5 drm/radeon: adjust pll when audio is not enabled drm/radeon: only enable audio streams if the monitor supports it drm/radeon: only mark audio as connected if the monitor supports it (v3) drm/radeon/audio: don't enable packets until the end drm/radeon: drop dce6_dp_enable drm/radeon: fix ordering of AVI packet setup drm/radeon: Use drm_calloc_ab for CS relocs drm/rockchip: fix error check when getting irq MAINTAINERS: add entry for Rockchip drm drivers
2015-05-04Merge tag 'drm-intel-fixes-2015-04-30' of ↵Dave Airlie2-0/+10
git://anongit.freedesktop.org/drm-intel into drm-fixes Just a single intel fix * tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel: drm/i915/chv: Implement WaDisableShadowRegForCpd
2015-05-04Merge branch 'drm-next0420' of ↵Dave Airlie1-4/+5
https://github.com/markyzq/kernel-drm-rockchip into drm-fixes one fix and maintainers update * 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip: drm/rockchip: fix error check when getting irq MAINTAINERS: add entry for Rockchip drm drivers
2015-05-03Merge tag 'scsi-fixes' of ↵Linus Torvalds9-146/+47
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is three logical fixes (as 5 patches). The 3ware class of drivers were causing an oops with multiqueue by tearing down the command mappings after completing the command (where the variables in the command used to tear down the mapping were no-longer valid). There's also a fix for the qnap iscsi target which was choking on us sending it commands that were too long and a fix for the reworked aha1542 allocating GFP_KERNEL under a lock" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: 3w-9xxx: fix command completion race 3w-xxxx: fix command completion race 3w-sas: fix command completion race aha1542: Allocate memory before taking a lock SCSI: add 1024 max sectors black list flag