kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2009-06-22	serial: samsung.c: mark s3c24xx_serial_remove as __devexit	Peter Korsgaard	8	-8/+8
	Mark the remove function as __devexit so it gets eliminated in CONFIG_HOTPLUG=n builds. Saves ~100 bytes. Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Acked-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	tty: fix some bogns in the serqt_usb2 driver	Alan Cox	1	-19/+10
	Remove the replicated urban legends from the comments and fix a couple of other silly calls Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	ppp: Fix throttling bugs	Alan Cox	2	-2/+0
	The ppp layer goes around calling the unthrottle method from non sleeping paths. This isn't safe because the unthrottle methods in the tty layer need to be able to sleep (consider a USB dongle). Until now this didn't show up because the ppp layer never actually throttled a port so the unthrottle was always a no-op. Currently it's a mutex taking path so warnings are spewed if the unthrottle occurs via certain paths. Fix this by removing the unneccessary unthrottle calls. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	vt_ioctl: fix lock imbalance	Jiri Slaby	1	-1/+2
	Don't return from switch/case directly in vt_ioctl. Set ret and break instead so that we unlock BKL. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	pcmcia/cm4000: fix lock imbalance	Jiri Slaby	1	-1/+2
	Don't return from switch/case, break instead, so that we unlock BKL. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	n_r3964: fix lock imbalance	Jiri Slaby	1	-12/+14
	There is omitted BKunL in r3964_read. Centralize the paths to one point with one unlock. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	serial: fix off by one errors	Roel Kluin	3	-7/+7
	In zs_console_putchar() occurs: if (zs_transmit_drain(zport, irq)) write_zsdata(zport, ch); However if in zs_transmit_drain() no empty Tx Buffer occurs, limit reaches -1 => true, and the write still occurs. This patch changes postfix to prefix decrements in this and similar functions to prevent similar mistakes in the future. This decreases the iterations with one but the chosen loop count was arbitrary anyway. In sunhv limit reaches -1, not 0, so the test is off by one. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	serial: bfin_5xx: fix building as module when early printk is enabled	Mike Frysinger	1	-0/+4
	Since early printk only makes sense/works when the serial driver is built into the kernel, disable the option for this driver when it is going to be built as a module. Otherwise we get build failures due to the ifdef handling. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	serial: bfin_5xx: add missing spin_lock init	Mike Frysinger	1	-0/+1
	The Blackfin serial driver never initialized the spin_lock that is part of the serial core structure, but we never noticed because spin_lock's are rarely enabled on UP systems. Yeah lockdep and friends. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	bfin_jtag_comm: clean up printk usage	Mike Frysinger	1	-15/+15
	The original patch garned some feedback and a v2 was posted, but that version seems to have been missed when merging the driver. At any rate, this cleans up the printk usage as suggested by Jiri Slaby. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	parport_pc: set properly the dma_mask for parport_pc device	FUJITA Tomonori	1	-0/+3
	parport_pc_probe_port() creates the own 'parport_pc' device if the device argument is NULL. Then parport_pc_probe_port() doesn't initialize the dma_mask and coherent_dma_mask of the device and calls dma_alloc_coherent with it. dma_alloc_coherent fails because dma_alloc_coherent() doesn't accept the uninitialized dma_mask: http://lkml.org/lkml/2009/6/16/150 Long ago, X86_32 and X86_64 had the own dma_alloc_coherent implementations; X86_32 accepted a device having dma_mask that is not initialized however X86_64 didn't. When we merged them, we chose to prohibit a device having dma_mask that is not initialized. I think that it's good to require drivers to set up dma_mask (and coherent_dma_mask) properly if the drivers want DMA. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reported-by: Malcom Blaney <malcolm.blaney@maptek.com.au> Tested-by: Malcom Blaney <malcolm.blaney@maptek.com.au> Cc: stable@kernel.org Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	parport_pc: after superio probing restore original register values	Jens Rottmann	1	-6/+25
	CONFIG_PARPORT_PC_SUPERIO probes for various superio chips by writing byte sequences to a set of different potential I/O ranges. But the probed ranges are not exclusive to parallel ports. Some of our boards just happen to have a watchdog in one of them. Took us almost a week to figure out why some distros reboot without warning after running flawlessly for 3 hours. For exactly 170 = 0xAA minutes, that is ... Fixed by restoring original values after probing. Also fixed too small request_region() in detect_and_report_it87(). Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de> Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: <stable@kernel.org> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	Revert "char: moxa, prevent opening unavailable ports"	Linus Torvalds	1	-6/+1
	This reverts commit a90b037583d5f1ae3e54e9c687c79df82d1d34a4, which already got fixed as commit f0e8527726b9e56649b9eafde3bc0fbc4dd2dd47: the same patch (trivial differences) got applied twice. Requested-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22	fusion: mptsas, fix lock imbalance	Jiri Slaby	1	-2/+2
	Fix two typos in mptsas_not_responding_devices. It was mutex_lock instead of unlock. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Acked-by: "Desai, Kashyap" <Kashyap.Desai@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-22	[S390] dasd: fix refcounting in dasd_change_state	Sebastian Ott	1	-3/+5
	To set a dasd online dasd_change_state is called twice. The first cycle will schedule initial analysis of the device, set the rc to -EAGAIN and will not touch the device state any more. The initial analysis will in turn call dasd_change_state to increase the state to the final DASD_STATE_ONLINE. If the dasd_change_state on the second thread outruns the other one both finish with the state set to DASD_STATE_ONLINE and the device refcount will be decreased by 2. Fix this by leaving dasd_change_state on rc == -EAGAIN so that the refcount will always be decreased by 1. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] driver_data access	Martin Schwidefsky	7	-13/+13
	Replace the remaining direct accesses to the driver_data pointer with calls to the dev_get_drvdata() and dev_set_drvdata() functions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] dasd_pm: fix stop flag handling	Stefan Haberland	2	-10/+12
	The stop flags are handled in the generic restore function so the stop flag is removed also for FBA and DIAG devices. Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] ap/zcrypt: Suspend/Resume ap bus and zcrypt	Felix Beck	1	-2/+83
	Add Suspend/Resume support to ap bus and zcrypt. All enhancements are done in the ap bus. No changes in the crypto card specific part are necessary. Signed-off-by: Felix Beck <felix.beck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] qdio: Sanitize do_QDIO sanity checks	Jan Glauber	1	-7/+2
	Remove unneeded sanity checks from do_QDIO since this is the hot path. Change the type of bufnr and count to unsigned int so the check for the maximum value works. Reported-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] qdio: leave inbound SBALs primed	Jan Glauber	1	-7/+0
	It is not required to change the state of primed SBALs. Leaving them primed saves a SQBS instruction under z/VM. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] qdio: merge AI tasklet into interrupt handler	Jan Glauber	1	-44/+21
	Since the adapter interrupt tasklet only schedules the queue tasklets and contains no code that requires serialization in can be merged with the adapter interrupt handler. That possibly safes some CPU cycles. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] qdio: extract all primed SBALs at once	Jan Glauber	1	-28/+6
	For devices without QIOASSIST primed SBALS were extracted in a loop. Remove the loop since get_buf_states can already return more than one primed SBAL. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] qdio: fix check for running under z/VM	Jan Glauber	1	-35/+13
	The check whether qdio runs under z/VM was incorrect since SIGA-Sync is not set if the device runs with QIOASSIST. Use MACHINE_IS_VM instead to prevent polling under z/VM. Merge qdio_inbound_q_done and tiqdio_is_inbound_q_done. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] qdio: move adapter interrupt tasklet code	Jan Glauber	4	-80/+75
	Move the adapter interrupt tasklet function to the qdio main code since all the functions used by the tasklet are located there. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] Use del_timer instead of del_timer_sync	Michael Holzheu	1	-1/+1
	When syncing the sclp console queue, we call del_timer_sync() while holding the "sclp_con_lock" spinlock. This lock is also taken in the timer function "sclp_console_timeout". Therefore the sync version of del_timer() cannot be used here. Because the synchronous deletion of the timer is only needed in the suspend callback and in that case only one CPU is remaining and therefore it is not possible that the timer function is running in parallel, we can safely use del_timer() instead of del_timer_sync(). Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] vt220 console: convert from bootmem to slab	Heiko Carstens	1	-13/+5
	The slab allocator is earlier available so convert the bootmem allocations to slab/gfp allocations. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] sclp console: convert from bootmem to slab	Heiko Carstens	1	-3/+2
	The slab allocator is earlier available so convert the bootmem allocations to slab/gfp allocations. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] 3270 console: convert from bootmem to slab	Heiko Carstens	2	-38/+7
	The slab allocator is earlier available so convert the bootmem allocations to slab/gfp allocations. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	[S390] 3215 console: convert from bootmem to slab	Heiko Carstens	1	-11/+7
	The slab allocator is earlier available so convert the bootmem allocations to slab/gfp allocations. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-06-22	s6gmac: xtensa s6000 on-chip ethernet driver	Oskar Schirmer	3	-0/+1085
	The s6000 on-chip MAC supports 10/100/1000Mbit and is connected to an external PHY via MII or RGMII interface. [jw@emlix.com: don't use device->bus_id directly] Signed-off-by: Oskar Schirmer <os@emlix.com> Signed-off-by: Daniel Glockner <dg@emlix.com> Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Zankel <chris@zankel.net>
2009-06-22	dm mpath: change to be request based	Kiyoshi Ueda	1	-65/+128
	This patch converts dm-multipath target to request-based from bio-based. Basically, the patch just converts the I/O unit from struct bio to struct request. In the course of the conversion, it also changes the I/O queueing mechanism. The change in the I/O queueing is described in details as follows. I/O queueing mechanism change ----------------------------- In I/O submission, map_io(), there is no mechanism change from bio-based, since the clone request is ready for retry as it is. However, in I/O complition, do_end_io(), there is a mechanism change from bio-based, since the clone request is not ready for retry. In do_end_io() of bio-based, the clone bio has all needed memory for resubmission. So the target driver can queue it and resubmit it later without memory allocations. The mechanism has almost no overhead. On the other hand, in do_end_io() of request-based, the clone request doesn't have clone bios, so the target driver can't resubmit it as it is. To resubmit the clone request, memory allocation for clone bios is needed, and it takes some overheads. To avoid the overheads just for queueing, the target driver doesn't queue the clone request inside itself. Instead, the target driver asks dm core for queueing and remapping the original request of the clone request, since the overhead for queueing is just a freeing memory for the clone request. As a result, the target driver doesn't need to record/restore the information of the original request for resubmitting the clone request. So dm_bio_details in dm_mpath_io is removed. multipath_busy() --------------------- The target driver returns "busy", only when the following case: o The target driver will map I/Os, if map() function is called and o The mapped I/Os will wait on underlying device's queue due to their congestions, if map() function is called now. In other cases, the target driver doesn't return "busy". Otherwise, dm core will keep the I/Os and the target driver can't do what it wants. (e.g. the target driver can't map I/Os now, so wants to kill I/Os.) Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm: disable interrupt when taking map_lock	Kiyoshi Ueda	1	-6/+9
	This patch disables interrupt when taking map_lock to avoid lockdep warnings in request-based dm. request-based dm takes map_lock after taking queue_lock with disabling interrupt: spin_lock_irqsave(queue_lock) q->request_fn() == dm_request_fn() => dm_get_table() => read_lock(map_lock) while queue_lock could be (but isn't) taken in interrupt context. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Christof Schmitt <christof.schmitt@de.ibm.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm: do not set QUEUE_ORDERED_DRAIN if request based	Kiyoshi Ueda	3	-1/+16
	Request-based dm doesn't have barrier support yet. So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm. Since the device type is decided at the first table loading time, the flag set is deferred until then. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm: enable request based option	Kiyoshi Ueda	4	-26/+285
	This patch enables request-based dm. o Request-based dm and bio-based dm coexist, since there are some target drivers which are more fitting to bio-based dm. Also, there are other bio-based devices in the kernel (e.g. md, loop). Since bio-based device can't receive struct request, there are some limitations on device stacking between bio-based and request-based. type of underlying device bio-based request-based ---------------------------------------------- bio-based OK OK request-based -- OK The device type is recognized by the queue flag in the kernel, so dm follows that. o The type of a dm device is decided at the first table binding time. Once the type of a dm device is decided, the type can't be changed. o Mempool allocations are deferred to at the table loading time, since mempools for request-based dm are different from those for bio-based dm and needed mempool type is fixed by the type of table. o Currently, request-based dm supports only tables that have a single target. To support multiple targets, we need to support request splitting or prevent bio/request from spanning multiple targets. The former needs lots of changes in the block layer, and the latter needs that all target drivers support merge() function. Both will take a time. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm: prepare for request based option	Kiyoshi Ueda	3	-4/+716
	This patch adds core functions for request-based dm. When struct mapped device (md) is initialized, md->queue has an I/O scheduler and the following functions are used for request-based dm as the queue functions: make_request_fn: dm_make_request() pref_fn: dm_prep_fn() request_fn: dm_request_fn() softirq_done_fn: dm_softirq_done() lld_busy_fn: dm_lld_busy() Actual initializations are done in another patch (PATCH 2). Below is a brief summary of how request-based dm behaves, including: - making request from bio - cloning, mapping and dispatching request - completing request and bio - suspending md - resuming md bio to request ============== md->queue->make_request_fn() (dm_make_request()) calls __make_request() for a bio submitted to the md. Then, the bio is kept in the queue as a new request or merged into another request in the queue if possible. Cloning and Mapping =================== Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()), when requests are dispatched after they are sorted by the I/O scheduler. dm_request_fn() checks busy state of underlying devices using target's busy() function and stops dispatching requests to keep them on the dm device's queue if busy. It helps better I/O merging, since no merge is done for a request once it is dispatched to underlying devices. Actual cloning and mapping are done in dm_prep_fn() and map_request() called from dm_request_fn(). dm_prep_fn() clones not only request but also bios of the request so that dm can hold bio completion in error cases and prevent the bio submitter from noticing the error. (See the "Completion" section below for details.) After the cloning, the clone is mapped by target's map_rq() function and inserted to underlying device's queue using blk_insert_cloned_request(). Completion ========== Request completion can be hooked by rq->end_io(), but then, all bios in the request will have been completed even error cases, and the bio submitter will have noticed the error. To prevent the bio completion in error cases, request-based dm clones both bio and request and hooks both bio->bi_end_io() and rq->end_io(): bio->bi_end_io(): end_clone_bio() rq->end_io(): end_clone_request() Summary of the request completion flow is below: blk_end_request() for a clone request => blk_update_request() => bio->bi_end_io() == end_clone_bio() for each clone bio => Free the clone bio => Success: Complete the original bio (blk_update_request()) Error: Don't complete the original bio => blk_finish_request() => rq->end_io() == end_clone_request() => blk_complete_request() => dm_softirq_done() => Free the clone request => Success: Complete the original request (blk_end_request()) Error: Requeue the original request end_clone_bio() completes the original request on the size of the original bio in successful cases. Even if all bios in the original request are completed by that completion, the original request must not be completed yet to keep the ordering of request completion for the stacking. So end_clone_bio() uses blk_update_request() instead of blk_end_request(). In error cases, end_clone_bio() doesn't complete the original bio. It just frees the cloned bio and gives over the error handling to end_clone_request(). end_clone_request(), which is called with queue lock held, completes the clone request and the original request in a softirq context (dm_softirq_done()), which has no queue lock, to avoid a deadlock issue on submission of another request during the completion: - The submitted request may be mapped to the same device - Request submission requires queue lock, but the queue lock has been held by itself and it doesn't know that The clone request has no clone bio when dm_softirq_done() is called. So target drivers can't resubmit it again even error cases. Instead, they can ask dm core for requeueing and remapping the original request in that cases. suspend ======= Request-based dm uses stopping md->queue as suspend of the md. For noflush suspend, just stops md->queue. For flush suspend, inserts a marker request to the tail of md->queue. And dispatches all requests in md->queue until the marker comes to the front of md->queue. Then, stops dispatching request and waits for the all dispatched requests to complete. After that, completes the marker request, stops md->queue and wake up the waiter on the suspend queue, md->wait. resume ====== Starts md->queue. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm raid1: add userspace log	Jonthan Brassow	5	-0/+1004
	This patch contains a device-mapper mirror log module that forwards requests to userspace for processing. The structures used for communication between kernel and userspace are located in include/linux/dm-log-userspace.h. Due to the frequency, diversity, and 2-way communication nature of the exchanges between kernel and userspace, 'connector' was chosen as the interface for communication. The first log implementations written in userspace - "clustered-disk" and "clustered-core" - support clustered shared storage. A userspace daemon (in the LVM2 source code repository) uses openAIS/corosync to process requests in an ordered fashion with the rest of the nodes in the cluster so as to prevent log state corruption. Other implementations with no association to LVM or openAIS/corosync, are certainly possible. (Imagine if two machines are writing to the same region of a mirror. They would both mark the region dirty, but you need a cluster-aware entity that can handle properly marking the region clean when they are done. Otherwise, you might clear the region when the first machine is done, not the second.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm: calculate queue limits during resume not load	Mike Snitzer	3	-87/+115
	Currently, device-mapper maintains a separate instance of 'struct queue_limits' for each table of each device. When the configuration of a device is to be changed, first its table is loaded and this structure is populated, then the device is 'resumed' and the calculated queue_limits are applied. This places restrictions on how userspace may process related devices, where it is often advantageous to 'load' tables for several devices at once before 'resuming' them together. As the new queue_limits only take effect after the 'resume', if they are changing and one device uses another, the latter must be 'resumed' before the former may be 'loaded'. This patch moves the calculation of these queue_limits out of the 'load' operation into 'resume'. Since we are no longer pre-calculating this struct, we no longer need to maintain copies within our dm structs. dm_set_device_limits() now passes the 'start' of the device's data area (aka pe_start) as the 'offset' to blk_stack_limits(). init_valid_queue_limits() is replaced by blk_set_default_limits(). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: martin.petersen@oracle.com Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm log: fix create_log_context to use logical_block_size of log device	Mike Snitzer	1	-3/+4
	create_log_context() must use the logical_block_size from the log disk, where the I/O happens, not the target's logical_block_size. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm target:s introduce iterate devices fn	Mike Snitzer	6	-6/+94
	Add .iterate_devices to 'struct target_type' to allow a function to be called for all devices in a DM target. Implemented it for all targets except those in dm-snap.c (origin and snapshot). (The raid1 version number jumps to 1.12 because we originally reserved 1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors' instead.) Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: martin.petersen@oracle.com
2009-06-22	dm table: establish queue limits by copying table limits	Mike Snitzer	1	-10/+2
	Copy the table's queue_limits to the DM device's request_queue. This properly initializes the queue's topology limits and also avoids having to track the evolution of 'struct queue_limits' in dm_table_set_restrictions() Also fixes a bug that was introduced in dm_table_set_restrictions() via commit ae03bf639a5027d27270123f5f6e3ee6a412781d. In addition to establishing 'bounce_pfn' in the queue's limits blk_queue_bounce_limit() also performs an allocation to setup the ISA DMA pool. This allocation resulted in "sleeping function called from invalid context" when called from dm_table_set_restrictions(). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm table: replace struct io_restrictions with struct queue_limits	Mike Snitzer	1	-95/+43
	Use blk_stack_limits() to stack block limits (including topology) rather than duplicate the equivalent within Device Mapper. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm table: validate device logical_block_size	Mike Snitzer	1	-0/+69
	Impose necessary and sufficient conditions on a devices's table such that any incoming bio which respects its logical_block_size can be processed successfully. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm table: ensure targets are aligned to logical_block_size	Mike Snitzer	1	-14/+44
	Ensure I/O is aligned to the logical block size of target devices. Rename check_device_area() to device_area_is_valid() for clarity and establish the device limits including the logical block size prior to calling it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm ioctl: support cookies for udev	Milan Broz	3	-11/+31
	Add support for passing a 32 bit "cookie" into the kernel with the DM_SUSPEND, DM_DEV_RENAME and DM_DEV_REMOVE ioctls. The (unsigned) value of this cookie is returned to userspace alongside the uevents issued by these ioctls in the variable DM_COOKIE. This means the userspace process issuing these ioctls can be notified by udev after udev has completed any actions triggered. To minimise the interface extension, we pass the cookie into the kernel in the event_nr field which is otherwise unused when calling these ioctls. Incrementing the version number allows userspace to determine in advance whether or not the kernel supports the cookie. If the kernel does support this but userspace does not, there should be no impact as the new variable will just get ignored. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm: sysfs add suspended attribute	Peter Rajnoha	1	-0/+9
	Add a file named 'suspended' to each device-mapper device directory in sysfs. It holds the value 1 while the device is suspended. Otherwise it holds 0. Signed-off-by: Peter Rajnoha <prajnoha@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm table: improve warning message when devices not freed before destruction	Jonthan Brassow	1	-5/+3
	Report any devices forgotten to be freed before a table is destroyed. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm mpath: add service time load balancer	Kiyoshi Ueda	3	-0/+350
	This patch adds a service time oriented dynamic load balancer, dm-service-time, which selects the path with the shortest estimated service time for the incoming I/O. The service time is estimated by dividing the in-flight I/O size by a performance value of each path. The performance value can be given as a table argument at the table loading time. If no performance value is given, all paths are considered equal. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm mpath: add queue length load balancer	Kiyoshi Ueda	3	-0/+273
	This patch adds a dynamic load balancer, dm-queue-length, which balances the number of in-flight I/Os across the paths. The code is based on the patch posted by Stefan Bader: https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm mpath: add start_io and nr_bytes to path selectors	Kiyoshi Ueda	3	-13/+25
	This patch makes two additions to the dm path selector interface for dynamic load balancers: o a new hook, start_io() o a new parameter 'nr_bytes' to select_path()/start_io()/end_io() to pass the size of the I/O start_io() is called when a target driver actually submits I/O to the selected path. Path selectors can use it to start accounting of the I/O. (e.g. counting the number of in-flight I/Os.) The start_io hook is based on the patch posted by Stefan Bader: https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html nr_bytes, the size of the I/O, is so path selectors can take the size of the I/O into account when deciding which path to use. dm-service-time uses it to estimate service time, for example. (Added the nr_bytes member to dm_mpath_io instead of using existing details.bi_size, since request-based dm patch deletes it.) Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22	dm snapshot: use barrier when writing exception store	Mikulas Patocka	1	-1/+1
	Send barrier requests when updating the exception area. Exception area updates need to be ordered w.r.t. data writes, so that the writes are not reordered in hardware disk cache. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>