diff options
author | Qu Wenruo <wqu@suse.com> | 2021-06-07 12:02:58 +0300 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2021-06-21 16:19:10 +0300 |
commit | 3d078efae6f3854eadf9def9cbb4f30389c0c504 (patch) | |
tree | ad107df2ef923c9b8bb67043735063ebc1b82f8a /fs/btrfs/extent_io.c | |
parent | bcd77455d590eaa0422a5e84ae852007cfce574a (diff) | |
download | linux-3d078efae6f3854eadf9def9cbb4f30389c0c504.tar.xz |
btrfs: subpage: fix a rare race between metadata endio and eb freeing
[BUG]
There is a very rare ASSERT() triggering during full fstests run for
subpage rw support.
No other reproducer so far.
The ASSERT() gets triggered for metadata read in
btrfs_page_set_uptodate() inside end_page_read().
[CAUSE]
There is still a small race window for metadata only, the race could
happen like this:
T1 | T2
------------------------------------+-----------------------------
end_bio_extent_readpage() |
|- btrfs_validate_metadata_buffer() |
| |- free_extent_buffer() |
| Still have 2 refs |
|- end_page_read() |
|- if (unlikely(PagePrivate()) |
| The page still has Private |
| | free_extent_buffer()
| | | Only one ref 1, will be
| | | released
| | |- detach_extent_buffer_page()
| | |- btrfs_detach_subpage()
|- btrfs_set_page_uptodate() |
The page no longer has Private|
>>> ASSERT() triggered <<< |
This race window is super small, thus pretty hard to hit, even with so
many runs of fstests.
But the race window is still there, we have to go another way to solve
it other than relying on random PagePrivate() check.
Data path is not affected, as it will lock the page before reading,
while unlocking the page after the last read has finished, thus no race
window.
[FIX]
This patch will fix the bug by repurposing btrfs_subpage::readers.
Now btrfs_subpage::readers will be a member shared by both metadata and
data.
For metadata path, we don't do the page unlock as metadata only relies
on extent locking.
At the same time, teach page_range_has_eb() to take
btrfs_subpage::readers into consideration.
So that even if the last eb of a page gets freed, page::private won't be
detached as long as there still are pending end_page_read() calls.
By this we eliminate the race window, this will slight increase the
metadata memory usage, as the page may not be released as frequently as
usual. But it should not be a big deal.
The code got introduced in ("btrfs: submit read time repair only for
each corrupted sector"), but the fix is in a separate patch to keep the
problem description and the crash is rare so it should not hurt
bisectability.
Signed-off-by: Qu Wegruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs/btrfs/extent_io.c')
-rw-r--r-- | fs/btrfs/extent_io.c | 30 |
1 files changed, 9 insertions, 21 deletions
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1acbb7f1e6e3..9e81d25dea70 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2687,21 +2687,6 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len) ASSERT(page_offset(page) <= start && start + len <= page_offset(page) + PAGE_SIZE); - /* - * For subapge metadata case, all btrfs_page_* helpers need page to - * have page::private populated. - * But we can have rare case where the last eb in the page is only - * referred by the IO, and it gets released immedately after it's - * read and verified. - * - * This can detach the page private completely. - * In that case, we can just skip the page status update completely, - * as the page has no eb anymore. - */ - if (fs_info->sectorsize < PAGE_SIZE && unlikely(!PagePrivate(page))) { - ASSERT(!is_data_inode(page->mapping->host)); - return; - } if (uptodate) { btrfs_page_set_uptodate(fs_info, page, start, len); } else { @@ -2711,11 +2696,7 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len) if (fs_info->sectorsize == PAGE_SIZE) unlock_page(page); - else if (is_data_inode(page->mapping->host)) - /* - * For subpage data, unlock the page if we're the last reader. - * For subpage metadata, page lock is not utilized for read. - */ + else btrfs_subpage_end_reader(fs_info, page, start, len); } @@ -5603,6 +5584,12 @@ static bool page_range_has_eb(struct btrfs_fs_info *fs_info, struct page *page) subpage = (struct btrfs_subpage *)page->private; if (atomic_read(&subpage->eb_refs)) return true; + /* + * Even there is no eb refs here, we may still have + * end_page_read() call relying on page::private. + */ + if (atomic_read(&subpage->readers)) + return true; } return false; } @@ -5663,7 +5650,7 @@ static void detach_extent_buffer_page(struct extent_buffer *eb, struct page *pag /* * We can only detach the page private if there are no other ebs in the - * page range. + * page range and no unfinished IO. */ if (!page_range_has_eb(fs_info, page)) btrfs_detach_subpage(fs_info, page); @@ -6381,6 +6368,7 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait, check_buffer_tree_ref(eb); btrfs_subpage_clear_error(fs_info, page, eb->start, eb->len); + btrfs_subpage_start_reader(fs_info, page, eb->start, eb->len); ret = submit_extent_page(REQ_OP_READ | REQ_META, NULL, &bio_ctrl, page, eb->start, eb->len, eb->start - page_offset(page), |