kernel/linux.git/drivers/md, branch v4.11.5

md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop

2017-05-25T13:46:12+00:00

commit 065e519e71b2c1f41936cce75b46b5ab34adb588 upstream. if called md_set_readonly and set MD_CLOSING bit, the mddev cannot be opened any more due to the MD_CLOING bit wasn't cleared. Thus it needs to be cleared in md_ioctl after any call to md_set_readonly() or do_md_stop(). Signed-off-by: NeilBrown Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag") Signed-off-by: Zhilong Liu Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman

md: update slab_cache before releasing new stripes when stripes resizing

2017-05-25T13:46:12+00:00

commit 583da48e388f472e8818d9bb60ef6a1d40ee9f9d upstream. When growing raid5 device on machine with small memory, there is chance that mdadm will be killed and the following bug report can be observed. The same bug could also be reproduced in linux-4.10.6. [57600.075774] BUG: unable to handle kernel NULL pointer dereference at (null) [57600.083796] IP: [] _raw_spin_lock+0x7/0x20 [57600.110378] PGD 421cf067 PUD 4442d067 PMD 0 [57600.114678] Oops: 0002 [#1] SMP [57600.180799] CPU: 1 PID: 25990 Comm: mdadm Tainted: P O 4.2.8 #1 [57600.187849] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS QV05AR66 03/06/2013 [57600.197490] task: ffff880044e47240 ti: ffff880043070000 task.ti: ffff880043070000 [57600.204963] RIP: 0010:[] [] _raw_spin_lock+0x7/0x20 [57600.213057] RSP: 0018:ffff880043073810 EFLAGS: 00010046 [57600.218359] RAX: 0000000000000000 RBX: 000000000000000c RCX: ffff88011e296dd0 [57600.225486] RDX: 0000000000000001 RSI: ffffe8ffffcb46c0 RDI: 0000000000000000 [57600.232613] RBP: ffff880043073878 R08: ffff88011e5f8170 R09: 0000000000000282 [57600.239739] R10: 0000000000000005 R11: 28f5c28f5c28f5c3 R12: ffff880043073838 [57600.246872] R13: ffffe8ffffcb46c0 R14: 0000000000000000 R15: ffff8800b9706a00 [57600.253999] FS: 00007f576106c700(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000 [57600.262078] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [57600.267817] CR2: 0000000000000000 CR3: 00000000428fe000 CR4: 00000000001406e0 [57600.274942] Stack: [57600.276949] ffffffff8114ee35 ffff880043073868 0000000000000282 000000000000eb3f [57600.284383] ffffffff81119043 ffff880043073838 ffff880043073838 ffff88003e197b98 [57600.291820] ffffe8ffffcb46c0 ffff88003e197360 0000000000000286 ffff880043073968 [57600.299254] Call Trace: [57600.301698] [] ? cache_flusharray+0x35/0xe0 [57600.307523] [] ? __page_cache_release+0x23/0x110 [57600.313779] [] kmem_cache_free+0x63/0xc0 [57600.319344] [] drop_one_stripe+0x62/0x90 [57600.324915] [] raid5_cache_scan+0x8b/0xb0 [57600.330563] [] shrink_slab.part.36+0x19a/0x250 [57600.336650] [] shrink_zone+0x23c/0x250 [57600.342039] [] do_try_to_free_pages+0x153/0x420 [57600.348210] [] try_to_free_pages+0x91/0xa0 [57600.353959] [] __alloc_pages_nodemask+0x4d1/0x8b0 [57600.360303] [] check_reshape+0x62b/0x770 [57600.365866] [] raid5_check_reshape+0x55/0xa0 [57600.371778] [] update_raid_disks+0xc7/0x110 [57600.377604] [] md_ioctl+0xd83/0x1b10 [57600.382827] [] blkdev_ioctl+0x170/0x690 [57600.388307] [] block_ioctl+0x38/0x40 [57600.393525] [] do_vfs_ioctl+0x2b5/0x480 [57600.399010] [] ? vfs_write+0x14b/0x1f0 [57600.404400] [] SyS_ioctl+0x3c/0x70 [57600.409447] [] entry_SYSCALL_64_fastpath+0x12/0x6a [57600.415875] Code: 00 00 00 00 55 48 89 e5 8b 07 85 c0 74 04 31 c0 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 ef b0 01 5d c3 90 31 c0 ba 01 00 00 00 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 85 d1 63 ff 5d [57600.435460] RIP [] _raw_spin_lock+0x7/0x20 [57600.441208] RSP [57600.444690] CR2: 0000000000000000 [57600.448000] ---[ end trace cbc6b5cc4bf9831d ]--- The problem is that resize_stripes() releases new stripe_heads before assigning new slab cache to conf->slab_cache. If the shrinker function raid5_cache_scan() gets called after resize_stripes() starting releasing new stripes but right before new slab cache being assigned, it is possible that these new stripe_heads will be freed with the old slab_cache which was already been destoryed and that triggers this bug. Signed-off-by: Dennis Yang Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.") Reviewed-by: NeilBrown Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman

dm space map disk: fix some book keeping in the disk space map

2017-05-25T13:46:12+00:00

commit 0377a07c7a035e0d033cd8b29f0cb15244c0916a upstream. When decrementing the reference count for a block, the free count wasn't being updated if the reference count went to zero. Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm thin metadata: call precommit before saving the roots

2017-05-25T13:46:12+00:00

commit 91bcdb92d39711d1adb40c26b653b7978d93eb98 upstream. These calls were the wrong way round in __write_initial_superblock. Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm bufio: make the parameter "retain_bytes" unsigned long

2017-05-25T13:46:12+00:00

commit 13840d38016203f0095cd547b90352812d24b787 upstream. Change the type of the parameter "retain_bytes" from unsigned to unsigned long, so that on 64-bit machines the user can set more than 4GiB of data to be retained. Also, change the type of the variable "count" in the function "__evict_old_buffers" to unsigned long. The assignment "count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];" could result in unsigned long to unsigned overflow and that could result in buffers not being freed when they should. While at it, avoid division in get_retain_buffers(). Division is slow, we can change it to shift because we have precalculated the log2 of block size. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm cache metadata: fail operations if fail_io mode has been established

2017-05-25T13:46:12+00:00

commit 10add84e276432d9dd8044679a1028dd4084117e upstream. Otherwise it is possible to trigger crashes due to the metadata being inaccessible yet these methods don't safely account for that possibility without these checks. Reported-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm mpath: delay requeuing while path initialization is in progress

2017-05-25T13:46:12+00:00

commit c1d7ecf7ca11d0edd3085262c8597203440d056c upstream. Requeuing a request immediately while path initialization is ongoing causes high CPU usage, something that is undesired. Hence delay requeuing while path initialization is in progress. Signed-off-by: Bart Van Assche Reviewed-by: Hannes Reinecke Cc: Christoph Hellwig Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm mpath: avoid that path removal can trigger an infinite loop

2017-05-25T13:46:12+00:00

commit 7083abbbfc4fa706ff72d27d33a5214881979336 upstream. If blk_get_request() fails, check whether the failure is due to a path being removed. If that is the case, fail the path by triggering a call to fail_path(). This avoids that the following scenario can be encountered while removing paths: * CPU usage of a kworker thread jumps to 100%. * Removing the DM device becomes impossible. Delay requeueing if blk_get_request() returns -EBUSY or -EWOULDBLOCK, and the queue is not dying, because in these cases immediate requeuing is inappropriate. Signed-off-by: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm mpath: split and rename activate_path() to prepare for its expanded use

2017-05-25T13:46:11+00:00

commit 89bfce763e43fa4897e0d3af6b29ed909df64cfd upstream. activate_path() is renamed to activate_path_work() which now calls activate_or_offline_path(). activate_or_offline_path() will be used by the next commit. Signed-off-by: Bart Van Assche Cc: Hannes Reinecke Cc: Christoph Hellwig Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm mpath: requeue after a small delay if blk_get_request() fails

2017-05-25T13:46:11+00:00

commit 06eb061f48594aa369f6e852b352410298b317a8 upstream. If blk_get_request() returns ENODEV then multipath_clone_and_map() causes a request to be requeued immediately. This can cause a kworker thread to spend 100% of the CPU time of a single core in __blk_mq_run_hw_queue() and also can cause device removal to never finish. Avoid this by only requeuing after a delay if blk_get_request() fails. Additionally, reduce the requeue delay. Signed-off-by: Bart Van Assche Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman