summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2013-02-07Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds6-18/+44
Pull block layer updates from Jens Axboe: "I've got a few bits pending for 3.8 final, that I better get sent out. It's all been sitting for a while, I consider it safe. It contains: - Two bug fixes for mtip32xx, fixing a driver hang and a crash. - A few-liner protocol error fix for drbd. - A few fixes for the xen block front/back driver, fixing a potential data corruption issue. - A race fix for disk_clear_events(), causing spurious warnings. Out of the Chrome OS base. - A deadlock fix for disk_clear_events(), moving it to the a unfreezable workqueue. Also from the Chrome OS base." * 'for-linus' of git://git.kernel.dk/linux-block: drbd: fix potential protocol error and resulting disconnect/reconnect mtip32xx: fix for crash when the device surprise removed during rebuild mtip32xx: fix for driver hang after a command timeout block: prevent race/cleanup block: remove deadlock in disk_clear_events xen-blkfront: handle bvecs with partial data llist/xen-blkfront: implement safe version of llist_for_each_entry xen-blkback: implement safe iterator for the list of persistent grants
2013-01-22Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-linusJens Axboe3-1/+9
2013-01-22drbd: fix potential protocol error and resulting disconnect/reconnectLars Ellenberg3-1/+9
When we notice a disk failure on the receiving side, we stop sending it new incoming writes. Depending on exact timing of various events, the same transfer log epoch could end up containing both replicated (before we noticed the failure) and local-only requests (after we noticed the failure). The sanity checks in tl_release(), called when receiving a P_BARRIER_ACK, check that the ack'ed transfer log epoch matches the expected epoch, and the number of contained writes matches the number of ack'ed writes. In this case, they counted both replicated and local-only writes, but the peer only acknowledges those it has seen. We get a mismatch, resulting in a protocol error and disconnect/reconnect cycle. Messages logged are "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n" A similar issue can also be triggered when starting a resync while having a healthy replication link, by invalidating one side, forcing a full sync, or attaching to a diskless node. Fix this by closing the current epoch if the state changes in a way that would cause the replication intent of the next write. Epochs now contain either only non-replicated, or only replicated writes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2013-01-21Merge tag 'fixes-for-linus' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module fixes and a virtio block fix from Rusty Russell: "Various minor fixes, but a slightly more complex one to fix the per-cpu overload problem introduced recently by kvm id changes." * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: put modules in list much earlier. module: add new state MODULE_STATE_UNFORMED. module: prevent warning when finit_module a 0 sized file virtio-blk: Don't free ida when disk is in use
2013-01-11mtip32xx: fix for crash when the device surprise removed during rebuildAsai Thambi S P1-2/+13
When rebuild is in progress, disk->queue is yet to be created. Surprise removing the device will call remove()-> del_gendisk(). del_gendisk() expect disk->queue be not NULL. Fix is to call put_disk() when disk_queue is NULL. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-01-11mtip32xx: fix for driver hang after a command timeoutAsai Thambi S P1-4/+5
If an I/O command times out when a PIO command is active, MTIP_PF_EH_ACTIVE_BIT is not cleared. This results in I/O hang in the driver. Fix is to clear this bit. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-01-04Drivers: block: remove __dev* attributes.Greg Kroah-Hartman11-100/+87
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitdata, __devinitconst, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Mike Miller <mike.miller@hp.com> Cc: Chirag Kantharia <chirag.kantharia@hp.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Jim Paris <jim@jtan.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: NeilBrown <neilb@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Tao Guo <Tao.Guo@emc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-02virtio-blk: Don't free ida when disk is in useAlexander Graf1-1/+6
When a file system is mounted on a virtio-blk disk, we then remove it and then reattach it, the reattached disk gets the same disk name and ids as the hot removed one. This leads to very nasty effects - mostly rendering the newly attached device completely unusable. Trying what happens when I do the same thing with a USB device, I saw that the sd node simply doesn't get free'd when a device gets forcefully removed. Imitate the same behavior for vd devices. This way broken vd devices simply are never free'd and newly attached ones keep working just fine. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org
2012-12-21Merge branch 'for-linus' of ↵Linus Torvalds2-428/+963
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph update from Sage Weil: "There are a few different groups of commits here. The largest is Alex's ongoing work to enable the coming RBD features (cloning, striping). There is some cleanup in libceph that goes along with it. Cyril and David have fixed some problems with NFS reexport (leaking dentries and page locks), and there is a batch of patches from Yan fixing problems with the fs client when running against a clustered MDS. There are a few bug fixes mixed in for good measure, many of which will be going to the stable trees once they're upstream. My apologies for the late pull. There is still a gremlin in the rbd map/unmap code and I was hoping to include the fix for that as well, but we haven't been able to confirm the fix is correct yet; I'll send that in a separate pull once it's nailed down." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (68 commits) rbd: get rid of rbd_{get,put}_dev() libceph: register request before unregister linger libceph: don't use rb_init_node() in ceph_osdc_alloc_request() libceph: init event->node in ceph_osdc_create_event() libceph: init osd->o_node in create_osd() libceph: report connection fault with warning libceph: socket can close in any connection state rbd: don't use ENOTSUPP rbd: remove linger unconditionally rbd: get rid of RBD_MAX_SEG_NAME_LEN libceph: avoid using freed osd in __kick_osd_requests() ceph: don't reference req after put rbd: do not allow remove of mounted-on image libceph: Unlock unprocessed pages in start_read() error path ceph: call handle_cap_grant() for cap import message ceph: Fix __ceph_do_pending_vmtruncate ceph: Don't add dirty inode to dirty list if caps is in migration ceph: Fix infinite loop in __wake_requests ceph: Don't update i_max_size when handling non-auth cap bdi_register: add __printf verification, fix arg mismatch ...
2012-12-20rbd: get rid of rbd_{get,put}_dev()Alex Elder1-12/+2
The functions rbd_get_dev() and rbd_put_dev() are trivial wrappers that add no value, and their existence suggests they may do more than what they do. Get rid of them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
2012-12-19Merge branch 'stable/for-jens-3.8' of ↵Jens Axboe2-11/+17
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.8 which has a bug-fix to the xen-blkfront and xen-blkback driver when using the persistent mode. An issue was discovered where LVM disks could not be read correctly and this fixes it. There is also a change in llist.h which has been blessed by akpm.
2012-12-18Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds7-321/+822
Merge misc patches from Andrew Morton: "Incoming: - lots of misc stuff - backlight tree updates - lib/ updates - Oleg's percpu-rwsem changes - checkpatch - rtc - aoe - more checkpoint/restart support I still have a pile of MM stuff pending - Pekka should be merging later today after which that is good to go. A number of other things are twiddling thumbs awaiting maintainer merges." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (180 commits) scatterlist: don't BUG when we can trivially return a proper error. docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output fs, fanotify: add @mflags field to fanotify output docs: add documentation about /proc/<pid>/fdinfo/<fd> output fs, notify: add procfs fdinfo helper fs, exportfs: add exportfs_encode_inode_fh() helper fs, exportfs: escape nil dereference if no s_export_op present fs, epoll: add procfs fdinfo helper fs, eventfd: add procfs fdinfo helper procfs: add ability to plug in auxiliary fdinfo providers tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test breakpoint selftests: print failure status instead of cause make error kcmp selftests: print fail status instead of cause make error kcmp selftests: make run_tests fix mem-hotplug selftests: print failure status instead of cause make error cpu-hotplug selftests: print failure status instead of cause make error mqueue selftests: print failure status instead of cause make error vm selftests: print failure status instead of cause make error ubifs: use prandom_bytes mtd: nandsim: use prandom_bytes ...
2012-12-18xen-blkfront: handle bvecs with partial dataRoger Pau Monne1-3/+4
Currently blkfront fails to handle cases in blkif_completion like the following: 1st loop in rq_for_each_segment * bv_offset: 3584 * bv_len: 512 * offset += bv_len * i: 0 2nd loop: * bv_offset: 0 * bv_len: 512 * i: 0 In the second loop i should be 1, since we assume we only wanted to read a part of the previous page. This patches fixes this cases where only a part of the shared page is read, and blkif_completion assumes that if the bv_offset of a bvec is less than the previous bv_offset plus the bv_size we have to switch to the next shared page. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: linux-kernel@vger.kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-12-18llist/xen-blkfront: implement safe version of llist_for_each_entryRoger Pau Monne1-1/+2
Implement a safe version of llist_for_each_entry, and use it in blkif_free. Previously grants where freed while iterating the list, which lead to dereferences when trying to fetch the next item. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> [v2: Move the llist_for_each_entry_safe in llist.h] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-12-18aoe: fix use after free in aoedev_by_aoeaddr()Dan Carpenter1-0/+1
We should return NULL on failure instead of returning a freed pointer. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: update internal version number to 81Ed Cashin1-2/+1
This version number is printed to the console on module initialization and is available in sysfs, which is where the userland aoe-version tool looks for it. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: identify source of runt AoE packetsEd Cashin1-3/+7
This change only affects experimental AoE storage networks. It modifies the console message about runt packets detected so that the AoE major and minor addresses of the AoE target that generated the runt are mentioned. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: allow comma separator in aoe_iflist valueEd Cashin2-2/+2
By default, the aoe driver uses any ethernet interface for AoE, but the aoe_iflist module parameter provides a convenient way to limit AoE traffic to a specific list of local network interfaces. This change allows a list to be specified using the comma character as a separator. For example, modprobe aoe aoe_iflist=eth2,eth3 Before, it was inconvenient to get the quoting right in shell scripts when setting aoe_iflist to have more than one network interface. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: allow user to disable target failure timeoutEd Cashin1-1/+3
With this change, the aoe driver treats the value zero as special for the aoe_deadsecs module parameter. Normally, this value specifies the number of seconds during which the driver will continue to attempt retransmits to an unresponsive AoE target. After aoe_deadsecs has elapsed, the aoe driver marks the aoe device as "down" and fails all I/O. The new meaning of an aoe_deadsecs of zero is for the driver to retransmit commands indefinitely. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: use dynamic number of remote ports for AoE storage targetEd Cashin4-21/+49
Many AoE targets have four or fewer network ports, but some existing storage devices have many, and the AoE protocol sets no limit. This patch allows the use of more than eight remote MAC addresses per AoE target, while reducing the amount of memory used by the aoe driver in cases where there are many AoE targets with fewer than eight MAC addresses each. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: avoid races between device destruction and discoveryEd Cashin3-63/+146
This change avoids a race that could result in a NULL pointer derference following a WARNing from kobject_add_internal, "don't try to register things with the same name in the same directory." The problem was found with a test that forgets and discovers an aoe device in a loop: while test ! -r /tmp/stop; do aoe-flush -a aoe-discover done The race was between aoedev_flush taking aoedevs out of the devlist, allowing a new discovery of the same AoE target to take place before the driver gets around to calling sysfs_remove_group. Fixing that one revealed another race between do_open and add_disk, and this patch avoids that, too. The fix required some care, because for flushing (forgetting) an aoedev, some of the steps must be performed under lock and some must be able to sleep. Also, for discovering a new aoedev, some steps might sleep. The check for a bad aoedev pointer remains from a time when about half of this patch was done, and it was possible for the bdev->bd_disk->private_data to become corrupted. The check should be removed eventually, but it is not expected to add significant overhead, occurring in the aoeblk_open routine. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: improve handling of misbehaving network pathsEd Cashin3-121/+268
An AoE target can have multiple network ports used for AoE, and in the aoe driver, those are tracked by the aoetgt struct. These changes allow the aoe driver to handle network paths, or aoetgts, that are not working well, compared to the others. Paths that do not get responses despite the retransmission of AoE commands are marked as "tainted", and non-tainted paths are preferred. Meanwhile, the aoe driver attempts to "probe" the tainted path in the background by issuing reads of LBA 0 that are padded out to full (possibly jumbo-frame) size. If the probes get responses, then the path is "redeemed", and its taint is removed. This mechanism has been shown to be helpful in transparently handling and recovering from real-world network "brown outs" in ways that the earlier "shoot the help-needing target in the head" mechanism could not. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: return real minor number for static minorsEd Cashin1-1/+1
The value returned by the static minor device number number allocator is the real minor number, so it must be multiplied by the supported number of partitions per aoedev. Without this fix the support for systems without udev is incomplete, and the few users of aoe on such systems will have surprising results when device nodes names do not match the AoE target. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: initialize sysminor to avoid compiler warningEd Cashin1-1/+1
Because the minor_get and related functions use the return values for errors, the compiler doesn't know that sysminor will always either 1) be initialized in aoedev_by_aoeaddr by the call to minor_get, or 2) be unused as the "goto out" is executed. This patch avoids the compiler warning. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: make error messages more specific in static minor allocationEd Cashin1-11/+20
For some special-purpose systems where udev isn't present, static allocation of minor numbers is desirable. This update distinguishes different failure scenarios, to help the user understand what went wrong. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: remove call to request handler from I/O completionEd Cashin1-2/+0
There is no need to call the request handler function in the I/O completion routine. The user impact of not doing it is a more "nice" aoe driver that is less susceptible to causing soft lockups. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: cleanup: correct comment for aoetgt noutEd Cashin1-1/+1
A misplaced comment was attached to the nout member of the aoetgt. This change corrects the comment. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: increase default cap on outstanding AoE commands in the networkEd Cashin1-1/+1
The aoe driver will never be waiting for more than aoe_maxout AoE commands from a given remote network port on an AoE target. Increasing the cap increases performance. Users can tighten the setting to reduce the amount of memory used for handling AoE traffic or the network bandwidth used for AoE. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: remove vestigial request queue allocationEd Cashin1-13/+4
Before the aoe driver was an I/O request handler, it was a make_request-style block driver. Even so, there was a problem where sysfs expected a request queue to exist, so one was provided in commit 7135a71b19be ("aoe: allocate unused request_queue for sysfs"). During the transition to the request-handler style, a patch was merged that was based on a driver without the noop queue, and the noop queue remained in place after the patch was merged, even though a new functional queue was introduced by the patch, allocated through blk_init_queue. The user impact is a memory leak proportional to the number of AoE targets discovered. This patch removes the memory leak and cleans up vestiges of the old do-nothing queue from the aoeblk_gdalloc function. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: copy fallback timing information on destination failoverEd Cashin1-0/+1
Commit f3b8e07af774 ("aoe: commands in retransmit queue use new destination on failure") omits the copying of the coarse-grained time when an AoE command was sent during the failover from one destination MAC address on the AoE target to another. The coarse-grained timing is only used when the system time changes or an unlikely length of time has passed since the sending of the AoE command. Users will not be impacted unless their system clock is very inaccurate or something unusual (e.g., 10 GbE link reset) happens during the period when the aoe driver is handling the failure of a port on the AoE target. Being effected will mean that an AoE target could be considered "down" too eagerly. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: update driver-internal version to 64+Ed Cashin1-1/+2
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: commands in retransmit queue use new destination on failureEd Cashin3-33/+75
When one remote MAC address isn't working as a destination for AoE commands, the frames used to track information associated with the AoE commands are moved to a new aoetgt (defined by the tuple of {AoE major, AoE minor, target MAC address}). This patch makes sure that the frames on the queue for retransmits that need to be done are updated to use the new destination, so that retransmits will be sent through a working network path. Without this change, packets on the retransmit queue will be needlessly retransmitted to the unresponsive destination MAC, possibly causing premature target failure before there's time for the retransmit timer to run again, decide to retransmit again, and finally update the destination to a working MAC address on the AoE target. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: use high-resolution RTTs with fallback to low-resEd Cashin2-11/+55
These changes improve the accuracy of the decision about whether it's time to retransmit an AoE command by using the microsecond-resolution gettimeofday instead of jiffies. Because the system time can jump suddenly, the decision reverts to using jiffies if the high-resolution time difference is relatively large. Otherwise the AoE targets could be considered failed inappropriately. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: manipulate aoedev network stats under lockEd Cashin1-2/+2
With this bugfix in place the calculation of the criterion for "lateness" is performed under lock. Without the lock, there is a chance that one of the non-atomic operations performed on the round trip time statistics could be incomplete, such that an incorrect lateness criterion would be calculated. Without this change, the effect of the bug would be rare unecessary but benign retransmissions. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: err device: include MAC addresses for unexpected responsesEd Cashin1-2/+4
The /dev/etherd/err character device provides low-level information about normal but sometimes interesting AoE command retransmits and "unexpected responses", i.e., responses for packets that have already been retransmitted. This change adds MAC addresses to the messages about unexpected responses, so that when they occur, it's more easy to determine the network paths to which they belong. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: improve network congestion handlingEd Cashin3-74/+121
The aoe driver already had some congestion handling, but it was limited in its ability to cope with the kind of congestion that can arise on more complex networks such as those involving paths through multiple ethernet switches. Some of the lessons from TCP's history of development can be applied to improving the congestion control and avoidance on AoE storage networks. These changes use familar concepts from Van Jacobson's "Congestion Avoidance and Control" paper from '88, without adding significant overhead. This patch depends on an upcoming patch that covers the failover case when AoE commands being retransmitted are transferred from one retransmit queue to another. Another upcoming patch increases the timing accuracy. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: provide ATA identify device content to user on requestEd Cashin3-0/+47
Make the aoe driver follow expected behavior when the user uses ioctl to get the ATA device identify information, allowing access to model, serial number, etc. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: update driver-internal version number to 60Ed Cashin1-1/+1
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: whitespace cleanupEd Cashin5-8/+8
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: cleanup: remove unused ata_scnt functionEd Cashin1-10/+0
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: "payload" sysfs file exports per-AoE-command data transfer sizeEd Cashin1-0/+10
The userland aoetools package includes an "aoe-stat" command that can display a "payload size" column when the aoe driver exports this information. Users can quickly see what amount of user data is transferred inside each AoE command on the network, network headers excluded. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: support larger I/O requests via aoe_maxsectors module paramEd Cashin1-0/+9
The GPFS filesystem is an example of an aoe user that requires the aoe driver to support I/O request sizes larger than the default. Most users will not need large I/O request sizes, because they would need to be split up into multiple AoE commands anyway. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: support the forgetting (flushing) of a user-specified AoE targetEd Cashin1-6/+38
Users sometimes want to cause the aoe driver to forget a particular previously discovered device when it is no longer online. The aoetools provide an "aoe-flush" command that users run to perform this administrative task. The changes below provide the support needed in the driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: update cap on outstanding commands based on config query responseEd Cashin2-4/+8
The ATA over Ethernet config query response contains a "buffer count" field reflecting the AoE target's capacity to buffer incoming AoE commands. By taking the current value of this field into accound, we increase performance throughput or avoid network congestion, when the value has increased or decreased, respectively. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: print warning regarding a common reason for dropped transmitsEd Cashin1-2/+7
Dropped transmits are not common, but when they do occur, increasing the transmit queue length often helps. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18aoe: describe the behavior of the "err" character deviceEd Cashin1-0/+5
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds26-8086/+11141
Pull block driver update from Jens Axboe: "Now that the core bits are in, here are the driver bits for 3.8. The branch contains: - A huge pile of drbd bits that were dumped from the 3.7 merge window. Following that, it was both made perfectly clear that there is going to be no more over-the-wall pulls and how the situation on individual pulls can be improved. - A few cleanups from Akinobu Mita for drbd and cciss. - Queue improvement for loop from Lukas. This grew into adding a generic interface for waiting/checking an even with a specific lock, allowing this to be pulled out of md and now loop and drbd is also using it. - A few fixes for xen back/front block driver from Roger Pau Monne. - Partition improvements from Stephen Warren, allowing partiion UUID to be used as an identifier." * 'for-3.8/drivers' of git://git.kernel.dk/linux-block: (609 commits) drbd: update Kconfig to match current dependencies drbd: Fix drbdsetup wait-connect, wait-sync etc... commands drbd: close race between drbd_set_role and drbd_connect drbd: respect no-md-barriers setting also when changed online via disk-options drbd: Remove obsolete check drbd: fixup after wait_even_lock_irq() addition to generic code loop: Limit the number of requests in the bio list wait: add wait_event_lock_irq() interface xen-blkfront: free allocated page xen-blkback: move free persistent grants code block: partition: msdos: provide UUIDs for partitions init: reduce PARTUUID min length to 1 from 36 block: store partition_meta_info.uuid as a string cciss: use check_signature() cciss: cleanup bitops usage drbd: use copy_highpage drbd: if the replication link breaks during handshake, keep retrying drbd: check return of kmalloc in receive_uuids drbd: Broadcast sync progress no more often than once per second drbd: don't try to clear bits once the disk has failed ...
2012-12-17rbd: don't use ENOTSUPPAlex Elder1-1/+1
ENOTSUPP is not a standard errno (it shows up as "Unknown error 524" in an error message). This is what was getting produced when the the local rbd code does not implement features required by a discovered rbd image. Change the error code returned in this case to ENXIO. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-17rbd: get rid of RBD_MAX_SEG_NAME_LENAlex Elder2-5/+3
RBD_MAX_SEG_NAME_LEN represents the maximum length of an rbd object name (i.e., one of the objects providing storage backing an rbd image). Another symbol, MAX_OBJ_NAME_SIZE, is used in the osd client code to define the maximum length of any object name in an osd request. Right now they disagree, with RBD_MAX_SEG_NAME_LEN being too big. There's no real benefit at this point to defining the rbd object name length limit separate from any other object name, so just get rid of RBD_MAX_SEG_NAME_LEN and use MAX_OBJ_NAME_SIZE in its place. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-17rbd: do not allow remove of mounted-on imageAlex Elder1-0/+13
There is no check in rbd_remove() to see if anybody holds open the image being removed. That's not cool. Add a simple open count that goes up and down with opens and closes (releases) of the device, and don't allow an rbd image to be removed if the count is non-zero. Protect the updates of the open count value with ctl_mutex to ensure the underlying rbd device doesn't get removed while concurrently being opened. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>