<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/md.h, branch v6.6.39</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.39</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.39'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-09-08T20:16:40+00:00</updated>
<entry>
<title>md: fix warning for holder mismatch from export_rdev()</title>
<updated>2023-09-08T20:16:40+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-08-25T02:55:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=99892147f028d711f9d40fefad4f33632593864c'/>
<id>urn:sha1:99892147f028d711f9d40fefad4f33632593864c</id>
<content type='text'>
Commit a1d767191096 ("md: use mddev-&gt;external to select holder in
export_rdev()") fix the problem that 'claim_rdev' is used for
blkdev_get_by_dev() while 'rdev' is used for blkdev_put().

However, if mddev-&gt;external is changed from 0 to 1, then 'rdev' is used
for blkdev_get_by_dev() while 'claim_rdev' is used for blkdev_put(). And
this problem can be reporduced reliably by following:

New file: mdadm/tests/23rdev-lifetime

devname=${dev0##*/}
devt=`cat /sys/block/$devname/dev`
pid=""
runtime=2

clean_up_test() {
        pill -9 $pid
        echo clear &gt; /sys/block/md0/md/array_state
}

trap 'clean_up_test' EXIT

add_by_sysfs() {
        while true; do
                echo $devt &gt; /sys/block/md0/md/new_dev
        done
}

remove_by_sysfs(){
        while true; do
                echo remove &gt; /sys/block/md0/md/dev-${devname}/state
        done
}

echo md0 &gt; /sys/module/md_mod/parameters/new_array || die "create md0 failed"

add_by_sysfs &amp;
pid="$pid $!"

remove_by_sysfs &amp;
pid="$pid $!"

sleep $runtime
exit 0

Test cmd:

./test --save-logs --logdir=/tmp/ --keep-going --dev=loop --tests=23rdev-lifetime

Test result:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 960 at block/bdev.c:618 blkdev_put+0x27c/0x330
Modules linked in: multipath md_mod loop
CPU: 0 PID: 960 Comm: test Not tainted 6.5.0-rc2-00121-g01e55c376936-dirty #50
RIP: 0010:blkdev_put+0x27c/0x330
Call Trace:
 &lt;TASK&gt;
 export_rdev.isra.23+0x50/0xa0 [md_mod]
 mddev_unlock+0x19d/0x300 [md_mod]
 rdev_attr_store+0xec/0x190 [md_mod]
 sysfs_kf_write+0x52/0x70
 kernfs_fop_write_iter+0x19a/0x2a0
 vfs_write+0x3b5/0x770
 ksys_write+0x74/0x150
 __x64_sys_write+0x22/0x30
 do_syscall_64+0x40/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fix the problem by recording if 'rdev' is used as holder.

Fixes: a1d767191096 ("md: use mddev-&gt;external to select holder in export_rdev()")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230825025532.1523008-3-yukuai1@huaweicloud.com
</content>
</entry>
<entry>
<title>md: Hold mddev-&gt;reconfig_mutex when trying to get mddev-&gt;sync_thread</title>
<updated>2023-08-15T16:40:26+00:00</updated>
<author>
<name>Li Lingfeng</name>
<email>lilingfeng3@huawei.com</email>
</author>
<published>2023-08-03T07:17:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7eb8ff02c1df279bf7f7f29b866beb655a9eebe9'/>
<id>urn:sha1:7eb8ff02c1df279bf7f7f29b866beb655a9eebe9</id>
<content type='text'>
Commit ba9d9f1a707f ("Revert "md: unlock mddev before reap sync_thread in
action_store"") removed the scenario of calling md_unregister_thread()
without holding mddev-&gt;reconfig_mutex, so add a lock holding check before
acquiring mddev-&gt;sync_thread by passing mdev to md_unregister_thread().

Signed-off-by: Li Lingfeng &lt;lilingfeng3@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230803071711.2546560-1-lilingfeng@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
</entry>
<entry>
<title>md: also clone new io if io accounting is disabled</title>
<updated>2023-07-27T07:13:29+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-06-21T16:51:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c687297b884507a4737b747957eda567063901df'/>
<id>urn:sha1:c687297b884507a4737b747957eda567063901df</id>
<content type='text'>
Currently, 'active_io' is grabbed before make_reqeust() is called, and
it's dropped immediately make_reqeust() returns. Hence 'active_io'
actually means io is dispatching, not io is inflight.

For raid0 and raid456 that io accounting is enabled, 'active_io' will
also be grabbed when bio is cloned for io accounting, and this 'active_io'
is dropped until io is done.

Always clone new bio so that 'active_io' will mean that io is inflight,
raid1 and raid10 will switch to use this method in later patches.

Now that bio will be cloned even if io accounting is disabled, also
rename related structure from '*_acct_*' to '*_clone_*'.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230621165110.1498313-3-yukuai1@huaweicloud.com
</content>
</entry>
<entry>
<title>md: move initialization and destruction of 'io_acct_set' to md.c</title>
<updated>2023-07-27T07:13:29+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-06-21T16:51:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c567c86b90d4715081adfe5eb812141a5b6b4883'/>
<id>urn:sha1:c567c86b90d4715081adfe5eb812141a5b6b4883</id>
<content type='text'>
'io_acct_set' is only used for raid0 and raid456, prepare to use it for
raid1 and raid10, so that io accounting from different levels can be
consistent.

By the way, follow up patches will also use this io clone mechanism to
make sure 'active_io' represents in flight io, not io that is dispatching,
so that mddev_suspend will wait for io to be done as designed.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230621165110.1498313-2-yukuai1@huaweicloud.com
</content>
</entry>
<entry>
<title>md: refactor idle/frozen_sync_thread() to fix deadlock</title>
<updated>2023-07-27T07:13:28+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-05-29T13:20:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=130443d60b1b8c7a609a2af3384dd8e60df97181'/>
<id>urn:sha1:130443d60b1b8c7a609a2af3384dd8e60df97181</id>
<content type='text'>
Our test found a following deadlock in raid10:

1) Issue a normal write, and such write failed:

  raid10_end_write_request
   set_bit(R10BIO_WriteError, &amp;r10_bio-&gt;state)
   one_write_done
    reschedule_retry

  // later from md thread
  raid10d
   handle_write_completed
    list_add(&amp;r10_bio-&gt;retry_list, &amp;conf-&gt;bio_end_io_list)

  // later from md thread
  raid10d
   if (!test_bit(MD_SB_CHANGE_PENDING, &amp;mddev-&gt;sb_flags))
    list_move(conf-&gt;bio_end_io_list.prev, &amp;tmp)
    r10_bio = list_first_entry(&amp;tmp, struct r10bio, retry_list)
    raid_end_bio_io(r10_bio)

Dependency chain 1: normal io is waiting for updating superblock

2) Trigger a recovery:

  raid10_sync_request
   raise_barrier

Dependency chain 2: sync thread is waiting for normal io

3) echo idle/frozen to sync_action:

  action_store
   mddev_lock
    md_unregister_thread
     kthread_stop

Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread

4) md thread can't update superblock:

  raid10d
   md_check_recovery
    if (mddev_trylock(mddev))
     md_update_sb

Dependency chain 4: update superblock is waiting for 'reconfig_mutex'

Hence cyclic dependency exist, in order to fix the problem, we must
break one of them. Dependency 1 and 2 can't be broken because they are
foundation design. Dependency 4 may be possible if it can be guaranteed
that no io can be inflight, however, this requires a new mechanism which
seems complex. Dependency 3 is a good choice, because idle/frozen only
requires sync thread to finish, which can be done asynchronously that is
already implemented, and 'reconfig_mutex' is not needed anymore.

This patch switch 'idle' and 'frozen' to wait sync thread to be done
asynchronously, and this patch also add a sequence counter to record how
many times sync thread is done, so that 'idle' won't keep waiting on new
started sync thread.

Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
deadlock can be fixed by this patch as well.

[1] https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
[2] https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
</content>
</entry>
<entry>
<title>md: add a mutex to synchronize idle and frozen in action_store()</title>
<updated>2023-07-27T07:13:28+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-05-29T13:20:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6f56f0c4f1241f1694a6a9438dd4f78d4513a917'/>
<id>urn:sha1:6f56f0c4f1241f1694a6a9438dd4f78d4513a917</id>
<content type='text'>
Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
and call md_reap_sync_thread() to stop sync thread, however, this will
cause deadlock (explained in the next patch). In order to fix the
problem, following patch will release 'reconfig_mutex' and wait on
'resync_wait', like md_set_readonly() and do_md_stop() does.

Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
unconditionally, which might cause unexpected problems, for example,
frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
might starve in progress frozen. A mutex is added to synchronize idle
and frozen from action_store().

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230529132037.2124527-4-yukuai1@huaweicloud.com
</content>
</entry>
<entry>
<title>md: fix 'delete_mutex' deadlock</title>
<updated>2023-06-23T16:41:47+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-06-21T14:29:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4934b6401a812f9fe368e7d2d091cd1d120ea262'/>
<id>urn:sha1:4934b6401a812f9fe368e7d2d091cd1d120ea262</id>
<content type='text'>
Commit 3ce94ce5d05a ("md: fix duplicate filename for rdev") introduce a
new lock 'delete_mutex', and trigger a new deadlock:

t1: remove rdev			t2: sysfs writer

rdev_attr_store			rdev_attr_store
 mddev_lock
 state_store
 md_kick_rdev_from_array
  lock delete_mutex
  list_add mddev-&gt;deleting
  unlock delete_mutex
 mddev_unlock
				 mddev_lock
				 ...
  lock delete_mutex
  kobject_del
  // wait for sysfs writers to be done
				 mddev_unlock
				 lock delete_mutex
				 // wait for delete_mutex, deadlock

'delete_mutex' is used to protect the list 'mddev-&gt;deleting', turns out
that this list can be protected by 'reconfig_mutex' directly, and this
lock can be removed.

Fix this problem by removing the lock, and use 'reconfig_mutex' to
protect the list. mddev_unlock() will move this list to a local list to
be handled after 'reconfig_mutex' is dropped.

Fixes: 3ce94ce5d05a ("md: fix duplicate filename for rdev")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230621142933.1395629-1-yukuai1@huaweicloud.com
</content>
</entry>
<entry>
<title>md/md-bitmap: add a new helper to unplug bitmap asynchrously</title>
<updated>2023-06-13T22:25:44+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-05-29T13:11:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a022325ab970cf04b66ca128a87345714aa44b99'/>
<id>urn:sha1:a022325ab970cf04b66ca128a87345714aa44b99</id>
<content type='text'>
If bitmap is enabled, bitmap must update before submitting write io, this
is why unplug callback must move these io to 'conf-&gt;pending_io_list' if
'current-&gt;bio_list' is not empty, which will suffer performance
degradation.

A new helper md_bitmap_unplug_async() is introduced to submit bitmap io
in a kworker, so that submit bitmap io in raid10_unplug() doesn't require
that 'current-&gt;bio_list' is empty.

This patch prepare to limit the number of plugged bio.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230529131106.2123367-6-yukuai1@huaweicloud.com
</content>
</entry>
<entry>
<title>md: protect md_thread with rcu</title>
<updated>2023-06-13T22:25:39+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-05-23T02:10:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4469315439827290923fce4f3f672599cabeb366'/>
<id>urn:sha1:4469315439827290923fce4f3f672599cabeb366</id>
<content type='text'>
Currently, there are many places that md_thread can be accessed without
protection, following are known scenarios that can cause
null-ptr-dereference or uaf:

1) sync_thread that is allocated and started from md_start_sync()
2) mddev-&gt;thread can be accessed directly from timeout_store() and
   md_bitmap_daemon_work()
3) md_unregister_thread() from action_store().

Currently, a global spinlock 'pers_lock' is borrowed to protect
'mddev-&gt;thread' in some places, this problem can be fixed likewise,
however, use a global lock for all the cases is not good.

Fix this problem by protecting all md_thread with rcu.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230523021017.3048783-6-yukuai1@huaweicloud.com
</content>
</entry>
<entry>
<title>md: fix duplicate filename for rdev</title>
<updated>2023-06-13T22:24:14+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-05-23T01:27:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3ce94ce5d05ae89190a23f6187f64d8f4b2d3782'/>
<id>urn:sha1:3ce94ce5d05ae89190a23f6187f64d8f4b2d3782</id>
<content type='text'>
Commit 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device
from an md array via sysfs") delays the deletion of rdev, however, this
introduces a window that rdev can be added again while the deletion is
not done yet, and sysfs will complain about duplicate filename.

Follow up patches try to fix this problem by flushing workqueue, however,
flush_rdev_wq() is just dead code, the progress in
md_kick_rdev_from_array():

1) list_del_rcu(&amp;rdev-&gt;same_set);
2) synchronize_rcu();
3) queue_work(md_rdev_misc_wq, &amp;rdev-&gt;del_work);

So in flush_rdev_wq(), if rdev is found in the list, work_pending() can
never pass, in the meantime, if work is queued, then rdev can never be
found in the list.

flush_rdev_wq() can be replaced by flush_workqueue() directly, however,
this approach is not good:
- the workqueue is global, this synchronization for all raid disks is
  not necessary.
- flush_workqueue can't be called under 'reconfig_mutex', there is still
  a small window between flush_workqueue() and mddev_lock() that other
  contexts can queue new work, hence the problem is not solved completely.

sysfs already has apis to support delete itself through writer, and
these apis, specifically sysfs_break/unbreak_active_protection(), is used
to support deleting rdev synchronously. Therefore, the above commit can be
reverted, and sysfs duplicate filename can be avoided.

A new mdadm regression test is proposed as well([1]).

[1] https://lore.kernel.org/linux-raid/20230428062845.1975462-1-yukuai1@huaweicloud.com/

Fixes: 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device from an md array via sysfs")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230523012727.3042247-1-yukuai1@huaweicloud.com
</content>
</entry>
</feed>
