Age | Commit message (Collapse) | Author | Files | Lines |
|
Check that the ring does not have an insane amount of requests
(more than there could fit on the ring).
If we detect this case we will stop processing the requests
and wait until the XenBus disconnects the ring.
The existing check RING_REQUEST_CONS_OVERFLOW which checks for how
many responses we have created in the past (rsp_prod_pvt) vs
requests consumed (req_cons) and whether said difference is greater or
equal to the size of the ring, does not catch this case.
Wha the condition does check if there is a need to process more
as we still have a backlog of responses to finish. Note that both
of those values (rsp_prod_pvt and req_cons) are not exposed on the
shared ring.
To understand this problem a mini crash course in ring protocol
response/request updates is in place.
There are four entries: req_prod and rsp_prod; req_event and rsp_event
to track the ring entries. We are only concerned about the first two -
which set the tone of this bug.
The req_prod is a value incremented by frontend for each request put
on the ring. Conversely the rsp_prod is a value incremented by the backend
for each response put on the ring (rsp_prod gets set by rsp_prod_pvt when
pushing the responses on the ring). Both values can
wrap and are modulo the size of the ring (in block case that is 32).
Please see RING_GET_REQUEST and RING_GET_RESPONSE for the more details.
The culprit here is that if the difference between the
req_prod and req_cons is greater than the ring size we have a problem.
Fortunately for us, the '__do_block_io_op' loop:
rc = blk_rings->common.req_cons;
rp = blk_rings->common.sring->req_prod;
while (rc != rp) {
..
blk_rings->common.req_cons = ++rc; /* before make_response() */
}
will loop up to the point when rc == rp. The macros inside of the
loop (RING_GET_REQUEST) is smart and is indexing based on the modulo
of the ring size. If the frontend has provided a bogus req_prod value
we will loop until the 'rc == rp' - which means we could be processing
already processed requests (or responses) often.
The reason the RING_REQUEST_CONS_OVERFLOW is not helping here is
b/c it only tracks how many responses we have internally produced
and whether we would should process more. The astute reader will
notice that the macro RING_REQUEST_CONS_OVERFLOW provides two
arguments - more on this later.
For example, if we were to enter this function with these values:
blk_rings->common.sring->req_prod = X+31415 (X is the value from
the last time __do_block_io_op was called).
blk_rings->common.req_cons = X
blk_rings->common.rsp_prod_pvt = X
The RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, blk_rings->common.req_cons)
is doing:
req_cons - rsp_prod_pvt >= 32
Which is,
X - X >= 32 or 0 >= 32
And that is false, so we continue on looping (this bug).
If we re-use said macro RING_REQUEST_CONS_OVERFLOW and pass in the rp
instead (sring->req_prod) of rc, the this macro can do the check:
req_prod - rsp_prov_pvt >= 32
Which is,
X + 31415 - X >= 32 , or 31415 >= 32
which is true, so we can error out and break out of the function.
Unfortunatly the difference between rsp_prov_pvt and req_prod can be
at 32 (which would error out in the macro). This condition exists when
the backend is lagging behind with the responses and still has not finished
responding to all of them (so make_response has not been called), and
the rsp_prov_pvt + 32 == req_cons. This ends up with us not being able
to use said macro.
Hence introducing a new macro called RING_REQUEST_PROD_OVERFLOW which does
a simple check of:
req_prod - rsp_prod_pvt > RING_SIZE
And with the X values from above:
X + 31415 - X > 32
Returns true. Also not that if the ring is full (which is where
the RING_REQUEST_CONS_OVERFLOW triggered), we would not hit the
same condition:
X + 32 - X > 32
Which is false.
Lets use that macro.
Note that in v5 of this patchset the macro was different - we used an
earlier version.
Cc: stable@vger.kernel.org
[v1: Move the check outside the loop]
[v2: Add a pr_warn as suggested by David]
[v3: Use RING_REQUEST_CONS_OVERFLOW as suggested by Jan]
[v4: Move wake_up after kthread_stop as suggested by Jan]
[v5: Use RING_REQUEST_PROD_OVERFLOW instead]
[v6: Use RING_REQUEST_PROD_OVERFLOW - Jan's version]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
gadsa
|
|
We need to make sure that the device is not RO or that
the request is not past the number of sectors we want to
issue the DISCARD operation for.
This fixes CVE-2013-2140.
Cc: stable@vger.kernel.org
Acked-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
[v1: Made it pr_warn instead of pr_debug]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Currently xen-blkback passes the logical sector size over xenbus and
xen-blkfront sets up the paravirt disk with that logical block size.
But newer drives usually have the logical sector size set to 512 for
compatibility reasons and would show the actual sector size only in
physical sector size.
This results in the device being partitioned and accessed in dom0 with
the correct sector size, but the guest thinks 512 bytes is the correct
block size. And that results in poor performance.
To fix this, blkback gets modified to pass also physical-sector-size
over xenbus and blkfront to use both values to set up the paravirt
disk. I did not just change the passed in sector-size because I am
not sure having a bigger logical sector size than the physical one
is valid (and that would happen if a newer dom0 kernel hits an older
domU kernel). Also this way a domU set up before should still be
accessible (just some tools might detect the unaligned setup).
[v2: Make xenbus write failure non-fatal]
[v3: Use xenbus_scanf instead of xenbus_gather]
[v4: Rebased against segment changes]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
indirect segments.
The max module parameter (by default 32) is the maximum number of
segments that the frontend will negotiate with the backend for indirect
descriptors. Higher value means more potential throughput but more
memory usage. The backend picks the minimum of the frontend and its
default backend value.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
In blkif_queue_request blkfront iterates over the scatterlist in order
to set the segments of the request, and in blkif_completion blkfront
iterates over the raw request, which makes it hard to know the exact
position of the source and destination memory positions.
This can be solved by allocating a scatterlist for each request, that
will be keep until the request is finished, allowing us to copy the
data back to the original memory without having to iterate over the
raw request.
Oracle-Bug: 16660413 - LARGE ASYNCHRONOUS READS APPEAR BROKEN ON 2.6.39-400
CC: stable@vger.kernel.org
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-and-Tested-by: Anne Milicia <anne.milicia@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Allocate pending requests in smaller chunks instead of allocating them
all at the same time.
This change also removes the global array of pending_reqs, it is no
longer necessay.
Variables related to the grant mapping have been grouped into a struct
called "grant_page", this allows to allocate them in smaller chunks,
and also improves memory locality.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Indirect descriptors introduce a new block operation
(BLKIF_OP_INDIRECT) that passes grant references instead of segments
in the request. This grant references are filled with arrays of
blkif_request_segment_aligned, this way we can send more segments in a
request.
The proposed implementation sets the maximum number of indirect grefs
(frames filled with blkif_request_segment_aligned) to 256 in the
backend and 32 in the frontend. The value in the frontend has been
chosen experimentally, and the backend value has been set to a sane
value that allows expanding the maximum number of indirect descriptors
in the frontend if needed.
The migration code has changed from the previous implementation, in
which we simply remapped the segments on the shared ring. Now the
maximum number of segments allowed in a request can change depending
on the backend, so we have to requeue all the requests in the ring and
in the queue and split the bios in them if they are bigger than the
new maximum number of segments.
[v2: Fixed minor comments by Konrad.
[v1: Added padding to make the indirect request 64bit aligned.
Added some BUGs, comments; fixed number of indirect pages in
blkif_get_x86_{32/64}_req. Added description about the indirect operation
in blkif.h]
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
[v3: Fixed spaces and tabs mix ups]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Preparatory change for implementing indirect descriptors. Change
xen_blkbk_{map/unmap} in order to be able to map/unmap a random amount
of grants (previously it was limited to
BLKIF_MAX_SEGMENTS_PER_REQUEST). Also, remove the usage of pending_req
in the map/unmap functions, so we can map/unmap grants without needing
to pass a pending_req.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xen.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Remove the last dependency from blkbk by moving the list of free
requests to blkif. This change reduces the contention on the list of
available requests.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xen.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Moving grant ref handles from blkbk to pending_req will allow us to
get rid of the shared blkbk structure.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xen.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
This mechanism allows blkback to change the number of grants
persistently mapped at run time.
The algorithm uses a simple LRU mechanism that removes (if needed) the
persistent grants that have not been used since the last LRU run, or
if all grants have been used it removes the first grants in the list
(that are not in use).
The algorithm allows the user to change the maximum number of
persistent grants, by changing max_persistent_grants in sysfs.
Since we are storing the persistent grants used inside the request
struct (to be able to mark them as "unused" when unmapping), we no
longer need the bitmap (unmap_seg).
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xen.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Using balloon pages for all granted pages allows us to simplify the
logic in blkback, especially in the xen_blkbk_map function, since now
we can decide if we want to map a grant persistently or not after we
have actually mapped it. This could not be done before because
persistent grants used ballooned pages, whereas non-persistent grants
used pages from the kernel.
This patch also introduces several changes, the first one is that the
list of free pages is no longer global, now each blkback instance has
it's own list of free pages that can be used to map grants. Also, a
run time parameter (max_buffer_pages) has been added in order to tune
the maximum number of free pages each blkback instance will keep in
it's buffer.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: xen-devel@lists.xen.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: xen-devel@lists.xen.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Pull block fixes from Jens Axboe:
"I've got a few smaller fixes queued up for 3.9 that should go in. The
major one is the loop regression, the others are nice fixes on their
own though. It contains:
- Fix for unitialized var in the block sysfs code, courtesy of Arnd
and gcc-4.8.
- Two fixes for mtip32xx, fixing probe and command timeout. Also a
debug measure that could have waited for 3.10, but it's driver
only, so I let it slip in.
- Revert the loop partition cleanup fix, it could cause a deadlock on
auto-teardown as part of umount. The fix is clear, but at this
point we just want to revert it and get a real fix in for 3.10."
* tag 'for-linus-20130409' of git://git.kernel.dk/linux-block:
Revert "loop: cleanup partitions when detaching loop device"
mtip32xx: fix two smatch warnings
mtip32xx: Add debugfs entry device_status
mtip32xx: return 0 from pci probe in case of rebuild
mtip32xx: recovery from command timeout
block: avoid using uninitialized value in from queue_var_store
|
|
This reverts commit 8761a3dc1f07b163414e2215a2cadbb4cfe2a107.
There are situations where the destruction path is called
with the bdev->bd_mutex already held, which then deadlocks in
loop_clr_fd(). The normal partition cleanup does a trylock()
on the mutex, but it'd be nice to have a more bullet proof
method in loop. So punt this more involved fix to the next
merge window, and just back out this buggy fix for now.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Dan reports:
New smatch warnings:
drivers/block/mtip32xx/mtip32xx.c:2728 show_device_status() warn: variable dereferenced before check 'dd' (see line 2727)
drivers/block/mtip32xx/mtip32xx.c:2758 show_device_status() warn: variable dereferenced before check 'dd' (see line 2757)
which are checking if dd == NULL, in a list_for_each_entry() type loop.
Get rid of the check, dd can never be NULL here.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch adds a new debugfs entry 'device_status' in
/sys/kernel/debug/rssd. The value of this entry shows
devices online and those in the process of removing.
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix to return 0 from pci probe in case of rebuild. If not, pci consider
probe has failed, and crash during rmmod.
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
To recover from command timeouts, reset the device. In addition
to that improved timeout handling of PIO commands.
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
time we call loop_set_fd() we check that loop_device->lo_state is
Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
it will get EBUSY. And if we try to loop_clr_fd() on unbound loop
device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex. ]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
1) sadb_msg prepared for IPSEC userspace forgets to initialize the
satype field, fix from Nicolas Dichtel.
2) Fix mac80211 synchronization during station removal, from Johannes
Berg.
3) Fix IPSEC sequence number notifications when they wrap, from Steffen
Klassert.
4) Fix cfg80211 wdev tracing crashes when add_virtual_intf() returns an
error pointer, from Johannes Berg.
5) In mac80211, don't call into the channel context code with the
interface list mutex held. From Johannes Berg.
6) In mac80211, if we don't actually associate, do not restart the STA
timer, otherwise we can crash. From Ben Greear.
7) Missing dma_mapping_error() check in e1000, ixgb, and e1000e. From
Christoph Paasch.
8) Fix sja1000 driver defines to not conflict with SH port, from Marc
Kleine-Budde.
9) Don't call il4965_rs_use_green with a NULL station, from Colin Ian
King.
10) Suspend/Resume in the FEC driver fail because the buffer descriptors
are not initialized at all the moments in which they should. Fix
from Frank Li.
11) cpsw and davinci_emac drivers both use the wrong interface to
restart a stopped TX queue. Use netif_wake_queue not
netif_start_queue, the latter is for initialization/bringup not
active management of the queue. From Mugunthan V N.
12) Fix regression in rate calculations done by
psched_ratecfg_precompute(), missing u64 type promotion. From
Sergey Popovich.
13) Fix length overflow in tg3 VPD parsing, from Kees Cook.
14) AOE driver fails to allocate enough headroom, resulting in crashes.
Fix from Eric Dumazet.
15) RX overflow happens too quickly in sky2 driver because pause packet
thresholds are not programmed correctly. From Mirko Lindner.
16) Bonding driver manages arp_interval and miimon settings incorrectly,
disabling one unintentionally disables both. Fix from Nikolay
Aleksandrov.
17) smsc75xx drivers don't program the RX mac properly for jumbo frames.
Fix from Steve Glendinning.
18) Fix off-by-one in Codel packet scheduler. From Vijay Subramanian.
19) Fix packet corruption in atl1c by disabling MSI support, from Hannes
Frederic Sowa.
20) netdev_rx_handler_unregister() needs a synchronize_net() to fix
crashes in bonding driver unload stress tests. From Eric Dumazet.
21) rxlen field of ks8851 RX packet descriptors not interpreted
correctly (it is 12 bits not 16 bits, so needs to be masked after
shifting the 32-bit value down 16 bits). Fix from Max Nekludov.
22) Fix missed RX/TX enable in sh_eth driver due to mishandling of link
change indications. From Sergei Shtylyov.
23) Fix crashes during spurious ECI interrupts in sh_eth driver, also
from Sergei Shtylyov.
24) dm9000 driver initialization is done wrong for revision B devices
with DSP PHY, from Joseph CHANG.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
DM9000B: driver initialization upgrade
sh_eth: make 'link' field of 'struct sh_eth_private' *int*
sh_eth: workaround for spurious ECI interrupt
sh_eth: fix handling of no LINK signal
ks8851: Fix interpretation of rxlen field.
net: add a synchronize_net() in netdev_rx_handler_unregister()
MAINTAINERS: Update netxen_nic maintainers list
atl1e: drop pci-msi support because of packet corruption
net: fq_codel: Fix off-by-one error
net: calxedaxgmac: Wake-on-LAN fixes
net: calxedaxgmac: fix rx ring handling when OOM
net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb'
smsc75xx: fix jumbo frame support
net: fix the use of this_cpu_ptr
bonding: fix disabling of arp_interval and miimon
ipv6: don't accept node local multicast traffic from the wire
sky2: Threshold for Pause Packet is set wrong
sky2: Receive Overflows not counted
aoe: reserve enough headroom on skbs
line up comment for ndo_bridge_getlink
...
|
|
Pull block fixes from Jens Axboe:
"Alright, this time from 10K up in the air.
Collection of fixes that have been queued up since the merge window
opened, hence postponed until later in the cycle. The pull request
contains:
- A bunch of fixes for the xen blk front/back driver.
- A round of fixes for the new IBM RamSan driver, fixing various
nasty issues.
- Fixes for multiple drives from Wei Yongjun, bad handling of return
values and wrong pointer math.
- A fix for loop properly killing partitions when being detached."
* tag 'for-linus-20130331' of git://git.kernel.dk/linux-block: (25 commits)
mg_disk: fix error return code in mg_probe()
rsxx: remove unused variable
rsxx: enable error return of rsxx_eeh_save_issued_dmas()
block: removes dynamic allocation on stack
Block: blk-flush: Fixed indent code style
cciss: fix invalid use of sizeof in cciss_find_cfgtables()
loop: cleanup partitions when detaching loop device
loop: fix error return code in loop_add()
mtip32xx: fix error return code in mtip_pci_probe()
xen-blkfront: remove frame list from blk_shadow
xen-blkfront: pre-allocate pages for requests
xen-blkback: don't store dev_bus_addr
xen-blkfront: switch from llist to list
xen-blkback: fix foreach_grant_safe to handle empty lists
xen-blkfront: replace kmalloc and then memcpy with kmemdup
xen-blkback: fix dispatch_rw_block_io() error path
rsxx: fix missing unlock on error return in rsxx_eeh_remap_dmas()
Adding in EEH support to the IBM FlashSystem 70/80 device driver
block: IBM RamSan 70/80 error message bug fix.
block: IBM RamSan 70/80 branding changes.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fix from Sage Weil:
"This fixes a regression introduced during the last merge window when
mapping non-existent images."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: don't zero-fill non-image object requests
|
|
A result of ENOENT from a read request for an object that's part of
an rbd image indicates that there is a hole in that portion of the
image. Similarly, a short read for such an object indicates that
the remainder of the read should be interpreted a full read with
zeros filling out the end of the request.
This behavior is not correct for objects that are not backing rbd
image data. Currently rbd_img_obj_request_callback() assumes it
should be done for all objects.
Change rbd_img_obj_request_callback() so it only does this zeroing
for image objects. Encapsulate that special handling in its own
function. Add an assertion that the image object request is a bio
request, since we assume that (and we currently don't support any
other types).
This resolves a problem identified here:
http://tracker.ceph.com/issues/4559
The regression was introduced by bf0d5f503dc11d6314c0503591d258d60ee9c944.
Reported-by: Dan van der Ster <dan@vanderster.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-off-by: Sage Weil <sage@inktank.com>
|
|
Some network drivers use a non default hard_header_len
Transmitted skb should take into account dev->hard_header_len, or risk
crashes or expensive reallocations.
In the case of aoe, lets reserve MAX_HEADER bytes.
David reported a crash in defxx driver, solved by this patch.
Reported-by: David Oostdyk <daveo@ll.mit.edu>
Tested-by: David Oostdyk <daveo@ll.mit.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit d8d595df introduced a bug where we did not check for a NULL
return from kmalloc(). Make rsxx_eeh_save_issued_dmas() return an
error for that case, and make the callers handle that.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch removes dynamic allocation on the stack error.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull NVMe driver update from Matthew Wilcox:
"These patches have mostly been baking for a few months; sorry I didn't
get them in during the merge window. They're all bug fixes, except
for the addition of the SMART log and the addition to MAINTAINERS."
* git://git.infradead.org/users/willy/linux-nvme:
NVMe: Add namespaces with no LBA range feature
MAINTAINERS: Add entry for the NVMe driver
NVMe: Initialize iod nents to 0
NVMe: Define SMART log
NVMe: Add result to nvme_get_features
NVMe: Set result from user admin command
NVMe: End queued bio requests when freeing queue
NVMe: Free cmdid on nvme_submit_bio error
|
|
The LBA Range Type feature is optional in the NVMe specification,
so we should continue with adding namespaces for controllers that do
not implement this feature.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
|
|
sizeof() when applied to a pointer typed expression gives the
size of the pointer, not that of the pointed data.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: scameron@beardog.cce.hp.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Any partitions added by user space to the loop device were being
left in place after detaching the loop device. This was because
the detach path issued a BLKRRPART to clean up partitions if
LO_FLAGS_PARTSCAN was set, meaning that the partitions were auto
scanned on attach. Replace this BLKRRPART with code that
unconditionally cleans up partitions on detach instead.
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
Modified by Jens to export delete_partition().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix to return a negative error code from the error handling
case, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
Konrad writes:
[the branch] has a bunch of fixes. They vary from being able to deal
with unknown requests, overflow in statistics, compile warnings, bug in
the error path, removal of unnecessary logic. There is also one
performance fix - which is to allocate pages for requests when the
driver loads - instead of doing it per request
|
|
We already have the frame (pfn of the grant page) stored inside struct
grant, so there's no need to keep an aditional list of mapped frames
for a specific request. This reduces memory usage in blkfront.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xen.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
This prevents us from having to call alloc_page while we are preparing
the request. Since blkfront was calling alloc_page with a spinlock
held we used GFP_ATOMIC, which can fail if we are requesting a lot of
pages since it is using the emergency memory pools.
Allocating all the pages at init prevents us from having to call
alloc_page, thus preventing possible failures.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xen.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
dev_bus_addr returned in the grant ref map operation is the mfn of the
passed page, there's no need to store it in the persistent grant
entry, since we can always get it provided that we have the page.
This reduces the memory overhead of persistent grants in blkback.
While at it, rename the 'seg[i].buf' to be 'seg[i].offset' as
it makes much more sense - as we use that value in bio_add_page
which as the fourth argument expects the offset.
We hadn't used the physical address as part of this at all.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: xen-devel@lists.xen.org
[v1: s/buf/offset/]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
The git commit f84adf4921ae3115502f44ff467b04bf2f88cf04
(xen-blkfront: drop the use of llist_for_each_entry_safe)
was a stop-gate to fix a GCC4.1 bug. The appropiate way
is to actually use an list instead of using an llist.
As such this patch replaces the usage of llist with an
list.
Since we always manipulate the list while holding the io_lock, there's
no need for additional locking (llist used previously is safe to use
concurrently without additional locking).
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
CC: stable@vger.kernel.org
[v1: Redid the git commit description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
We may use foreach_grant_safe in the future with empty lists, so make
sure we can handle them.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: xen-devel@lists.xen.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
The benefits are:
* code is cleaner
* kmemdup adds additional debugging info useful for tracking the real
place where memory was allocated (CONFIG_DEBUG_SLAB).
Signed-off-by: Mihnea Dobrescu-Balaur <mihneadb@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Commit 7708992 ("xen/blkback: Seperate the bio allocation and the bio
submission") consolidated the pendcnt updates to just a single write,
neglecting the fact that the error path relied on it getting set to 1
up front (such that the decrement in __end_block_io_op() would actually
drop the count to zero, triggering the necessary cleanup actions).
Also remove a misleading and a stale (after said commit) comment.
CC: stable@vger.kernel.org
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
Spotted by Fenguan Wu's super build robot.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Changes in v2 include:
o Fixed spelling of guarantee.
o Fixed potential memory leak if slot reset fails out.
o Changed list_for_each_entry_safe with list_for_each_entry.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch includes a simple change to the rsxx_pci_remove
function that caused error messages because traffic was halted
too early.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch includes changing the hardware branding name from
IBM RamSan to IBM FlashSystem.
v2 Changes include:
o Removing the unnecessary IBM Vendor ID #define
v1 Changes include:
o Changed all references of RamSan to FlashSystem.
o Changed the vendor/device IDs for the product.
o Changed driver version number.
o Updated the MAINTAINERS file.
o Various other little things.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch includes changes to the cregs locking scheme. Before,
inconsistant locking would occur because of misuse of spin_lock,
spin_lock_bh, and counter parts.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch includes trivial changes that were recommended by
different members of the Linux Community.
Changes include:
o Removing the redundant wmb().
o Formatting
o Various other little things.
Signed-off-by: Philip J Kelleher <pjk1939@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
These values shouldn't be negative, but after an overflow their value
can turn into negative, if they are signed. xentop can show bogus
values in this case.
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reported-by: Ichiro Ogino <ichiro.ogino@citrix.co.jp>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|