kernel/linux.git/drivers/md, branch linux-2.6.35.y

md: avoid endless recovery loop when waiting for fail device to complete.

2011-08-01T20:54:58+00:00

commit 4274215d24633df7302069e51426659d4759c5ed upstream. If a device fails in a way that causes pending request to take a while to complete, md will not be able to immediately remove it from the array in remove_and_add_spares. It will then incorrectly look like a spare device and md will try to recover it even though it is failed. This leads to a recovery process starting and instantly aborting over and over again. We should check if the device is faulty before considering it to be a spare. This will avoid trying to start a recovery that cannot proceed. This bug was introduced in 2.6.26 so that patch is suitable for any kernel since then. Reported-by: Jim Paradis Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen

md/raid5: fix raid5_set_bi_hw_segments

2011-08-01T20:54:56+00:00

commit 9b2dc8b665932a8e681a7ab3237f60475e75e161 upstream. The @bio->bi_phys_segments consists of active stripes count in the lower 16 bits and processed stripes count in the upper 16 bits. So logical-OR operator should be bitwise one. This bug has been present since 2.6.27 and the fix is suitable for any -stable kernel since then. Fortunately the bad code is only used on error paths and is relatively unlikely to be hit. Signed-off-by: Namhyung Kim Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen

md: check ->hot_remove_disk when removing disk

2011-08-01T20:54:56+00:00

commit 01393f3d5836b7d62e925e6f4658a7eb22b83a11 upstream. Check pers->hot_remove_disk instead of pers->hot_add_disk in slot_store() during disk removal. The linear personality only has ->hot_add_disk and no ->hot_remove_disk, so that removing disk in the array resulted to following kernel bug: $ sudo mdadm --create /dev/md0 --level=linear --raid-devices=4 /dev/loop[0-3] $ echo none | sudo tee /sys/block/md0/md/dev-loop2/slot BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD c9f5d067 PUD 8575a067 PMD 0 Oops: 0010 [#1] SMP CPU 2 Modules linked in: linear loop bridge stp llc kvm_intel kvm asus_atk0110 sr_mod cdrom sg Pid: 10450, comm: tee Not tainted 3.0.0-rc1-leonard+ #173 System manufacturer System Product Name/P5G41TD-M PRO RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff880085757df0 EFLAGS: 00010282 RAX: ffffffffa00168e0 RBX: ffff8800d1431800 RCX: 000000000000006e RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88008543c000 RBP: ffff880085757e48 R08: 0000000000000002 R09: 000000000000000a R10: 0000000000000000 R11: ffff88008543c2e0 R12: 00000000ffffffff R13: ffff8800b4641000 R14: 0000000000000005 R15: 0000000000000000 FS: 00007fe8c9e05700(0000) GS:ffff88011fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000b4502000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process tee (pid: 10450, threadinfo ffff880085756000, task ffff8800c9f08000) Stack: ffffffff8138496a ffff8800b4641000 ffff88008543c268 0000000000000000 ffff8800b4641000 ffff88008543c000 ffff8800d1431868 ffffffff81a78a90 ffff8800b4641000 ffff88008543c000 ffff8800d1431800 ffff880085757e98 Call Trace: [] ? slot_store+0xaa/0x265 [] rdev_attr_store+0x89/0xa8 [] sysfs_write_file+0x108/0x144 [] vfs_write+0xb1/0x10d [] ? trace_hardirqs_on_caller+0x111/0x135 [] sys_write+0x4d/0x77 [] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [< (null)>] (null) RSP CR2: 0000000000000000 ---[ end trace ba5fc64319a826fb ]--- Signed-off-by: Namhyung Kim Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen

dm table: reject devices without request fns

2011-08-01T20:54:53+00:00

commit f4808ca99a203f20b4475601748e44b25a65bdec upstream. This patch adds a check that a block device has a request function defined before it is used. Otherwise, misconfiguration can cause an oops. Because we are allowing devices with zero size e.g. an offline multipath device as in commit 2cd54d9bedb79a97f014e86c0da393416b264eb3 ("dm: allow offline devices") there needs to be an additional check to ensure devices are initialised. Some block devices, like a loop device without a backing file, exist but have no request function. Reproducer is trivial: dm-mirror on unbound loop device (no backing file on loop devices) dmsetup create x --table "0 8 mirror core 2 8 sync 2 /dev/loop0 0 /dev/loop1 0" and mirror resync will immediatelly cause OOps. BUG: unable to handle kernel NULL pointer dereference at (null) ? generic_make_request+0x2bd/0x590 ? kmem_cache_alloc+0xad/0x190 submit_bio+0x53/0xe0 ? bio_add_page+0x3b/0x50 dispatch_io+0x1ca/0x210 [dm_mod] ? read_callback+0x0/0xd0 [dm_mirror] dm_io+0xbb/0x290 [dm_mod] do_mirror+0x1e0/0x748 [dm_mirror] Signed-off-by: Milan Broz Reported-by: Zdenek Kabelac Acked-by: Mike Snitzer Signed-off-by: Alasdair G Kergon Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen

md: Fix - again - partition detection when array becomes active

2011-03-31T18:58:54+00:00

[ upstream commit f0b4f7e2f29af678bd9af43422c537dcb6008603 ] Revert b821eaa572fd737faaf6928ba046e571526c36c6 and f3b99be19ded511a1bf05a148276239d9f13eefa When I wrote the first of these I had a wrong idea about the lifetime of 'struct block_device'. It can disappear at any time that the block device is not open if it falls out of the inode cache. So relying on the 'size' recorded with it to detect when the device size has changed and so we need to revalidate, is wrong. Rather, we really do need the 'changed' attribute stored directly in the mddev and set/tested as appropriate. Without this patch, a sequence of: mknod / open / close / unlink (which can cause a block_device to be created and then destroyed) will result in a rescan of the partition table and consequence removal and addition of partitions. Several of these in a row can get udev racing to create and unlink and other code can get confused. With the patch, the rescan is only performed when needed and so there are no races. This is suitable for any stable kernel from 2.6.35. Reported-by: "Wojcik, Krzysztof" Signed-off-by: NeilBrown Signed-off-by: Andi Kleen Cc: stable@kernel.org

md: correctly handle probe of an 'mdp' device.

2011-03-31T18:58:13+00:00

commit 8f5f02c460b7ca74ce55ce126ce0c1e58a3f923d upstream. 'mdp' devices are md devices with preallocated device numbers for partitions. As such it is possible to mknod and open a partition before opening the whole device. this causes md_probe() to be called with a device number of a partition, which in-turn calls mddev_find with such a number. However mddev_find expects the number of a 'whole device' and does the wrong thing with partition numbers. So add code to mddev_find to remove the 'partition' part of a device number and just work with the 'whole device'. This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=28652 Reported-by: hkmaly@bigfoot.com Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen

dm mpath: disable blk_abort_queue

2011-03-31T18:57:55+00:00

commit 09c9d4c9b6a2b5909ae3c6265e4cd3820b636863 upstream. Revert commit 224cb3e981f1b2f9f93dbd49eaef505d17d894c2 dm: Call blk_abort_queue on failed paths Multipath began to use blk_abort_queue() to allow for lower latency path deactivation. This was found to cause list corruption: the cmd gets blk_abort_queued/timedout run on it and the scsi eh somehow is able to complete and run scsi_queue_insert while scsi_request_fn is still trying to process the request. https://www.redhat.com/archives/dm-devel/2010-November/msg00085.html Signed-off-by: Mike Snitzer Signed-off-by: Alasdair G Kergon Signed-off-by: Andi Kleen Cc: Mike Anderson Cc: Mike Christie Signed-off-by: Greg Kroah-Hartman

dm: dont take i_mutex to change device size

2011-03-31T18:57:55+00:00

commit c217649bf2d60ac119afd71d938278cffd55962b upstream. No longer needlessly hold md->bdev->bd_inode->i_mutex when changing the size of a DM device. This additional locking is unnecessary because i_size_write() is already protected by the existing critical section in dm_swap_table(). DM already has a reference on md->bdev so the associated bd_inode may be changed without lifetime concerns. A negative side-effect of having held md->bdev->bd_inode->i_mutex was that a concurrent DM device resize and flush (via fsync) would deadlock. Dropping md->bdev->bd_inode->i_mutex eliminates this potential for deadlock. The following reproducer no longer deadlocks: https://www.redhat.com/archives/dm-devel/2009-July/msg00284.html Signed-off-by: Mike Snitzer Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen

md: fix regression with re-adding devices to arrays with no metadata

2011-03-31T18:57:53+00:00

commit bf572541ab44240163eaa2d486b06f306a31d45a upstream. Commit 1a855a0606 (2.6.37-rc4) fixed a problem where devices were re-added when they shouldn't be but caused a regression in a less common case that means sometimes devices cannot be re-added when they should be. In particular, when re-adding a device to an array without metadata we should always access the device, but after the above commit we didn't. This patch sets the In_sync flag in that case so that the re-add succeeds. This patch is suitable for any -stable kernel to which 1a855a0606 was applied. Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen

block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead

2011-02-06T19:03:52+00:00

commit e692cb668fdd5a712c6ed2a2d6f2a36ee83997b4 upstream. When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice. There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver. The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD. Reported-by: Ed Lin Signed-off-by: Martin K. Petersen Acked-by: Mike Snitzer Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen