<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/btrfs/extent_map.c, branch linux-6.0.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-03-14T12:13:50+00:00</updated>
<entry>
<title>btrfs: assert we have a write lock when removing and replacing extent maps</title>
<updated>2022-03-14T12:13:50+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2022-02-03T15:36:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6d3b050efa079f5bc4d5e2a8f37f0dc0345a2096'/>
<id>urn:sha1:6d3b050efa079f5bc4d5e2a8f37f0dc0345a2096</id>
<content type='text'>
Removing or replacing an extent map requires holding a write lock on the
extent map's tree. We currently do that everywhere, except in one of the
self tests, where it's harmless since there's no concurrency.

In order to catch possible races in the future, assert that we are holding
a write lock on the extent map tree before removing or replacing an extent
map in the tree, and update the self test to obtain a write lock before
removing extent maps.

Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: defrag: don't use merged extent map for their generation check</title>
<updated>2022-02-23T16:43:13+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2022-02-11T06:46:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=199257a78bb01341c3ba6e85bdcf3a2e6e452c6d'/>
<id>urn:sha1:199257a78bb01341c3ba6e85bdcf3a2e6e452c6d</id>
<content type='text'>
For extent maps, if they are not compressed extents and are adjacent by
logical addresses and file offsets, they can be merged into one larger
extent map.

Such merged extent map will have the higher generation of all the
original ones.

But this brings a problem for autodefrag, as it relies on accurate
extent_map::generation to determine if one extent should be defragged.

For merged extent maps, their higher generation can mark some older
extents to be defragged while the original extent map doesn't meet the
minimal generation threshold.

Thus this will cause extra IO.

So solve the problem, here we introduce a new flag, EXTENT_FLAG_MERGED,
to indicate if the extent map is merged from one or more ems.

And for autodefrag, if we find a merged extent map, and its generation
meets the generation requirement, we just don't use this one, and go
back to defrag_get_extent() to read extent maps from subvolume trees.

This could cause more read IO, but should result less defrag data write,
so in the long run it should be a win for autodefrag.

Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: rename btrfs_bio to btrfs_io_context</title>
<updated>2021-10-26T17:08:02+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2021-09-15T07:17:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4c6646117912397d026d70c04d92ec1599522e9f'/>
<id>urn:sha1:4c6646117912397d026d70c04d92ec1599522e9f</id>
<content type='text'>
The structure btrfs_bio is used by two different sites:

- bio-&gt;bi_private for mirror based profiles
  For those profiles (SINGLE/DUP/RAID1*/RAID10), this structures records
  how many mirrors are still pending, and save the original endio
  function of the bio.

- RAID56 code
  In that case, RAID56 only utilize the stripes info, and no long uses
  that to trace the pending mirrors.

So btrfs_bio is not always bind to a bio, and contains more info for IO
context, thus renaming it will make the naming less confusing.

Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: fix parameter description of btrfs_add_extent_mapping</title>
<updated>2021-02-08T21:58:53+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>nborisov@suse.com</email>
</author>
<published>2021-01-22T09:57:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9ad37bb3ffc51fbd9c48ba4d85414b4aa3e21c6d'/>
<id>urn:sha1:9ad37bb3ffc51fbd9c48ba4d85414b4aa3e21c6d</id>
<content type='text'>
This fixes the following compiler warnings:

fs/btrfs/extent_map.c:601: warning: Function parameter or member 'fs_info' not described in 'btrfs_add_extent_mapping'
fs/btrfs/extent_map.c:601: warning: Function parameter or member 'em_tree' not described in 'btrfs_add_extent_mapping'
fs/btrfs/extent_map.c:601: warning: Function parameter or member 'em_in' not described in 'btrfs_add_extent_mapping'
fs/btrfs/extent_map.c:601: warning: Function parameter or member 'start' not described in 'btrfs_add_extent_mapping'
fs/btrfs/extent_map.c:601: warning: Function parameter or member 'len' not described in 'btrfs_add_extent_mapping'

Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: document modified parameter of add_extent_mapping</title>
<updated>2021-02-08T21:58:53+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>nborisov@suse.com</email>
</author>
<published>2021-01-22T09:57:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=401bd2dd1299dd384849707c6577b2089ab9f615'/>
<id>urn:sha1:401bd2dd1299dd384849707c6577b2089ab9f615</id>
<content type='text'>
Fixes fs/btrfs/extent_map.c:399: warning: Function parameter or member
'modified' not described in 'add_extent_mapping'

Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Btrfs: fix race between using extent maps and merging them</title>
<updated>2020-02-12T16:16:46+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2020-01-31T14:06:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ac05ca913e9f3871126d61da275bfe8516ff01ca'/>
<id>urn:sha1:ac05ca913e9f3871126d61da275bfe8516ff01ca</id>
<content type='text'>
We have a few cases where we allow an extent map that is in an extent map
tree to be merged with other extents in the tree. Such cases include the
unpinning of an extent after the respective ordered extent completed or
after logging an extent during a fast fsync. This can lead to subtle and
dangerous problems because when doing the merge some other task might be
using the same extent map and as consequence see an inconsistent state of
the extent map - for example sees the new length but has seen the old start
offset.

With luck this triggers a BUG_ON(), and not some silent bug, such as the
following one in __do_readpage():

  $ cat -n fs/btrfs/extent_io.c
  3061  static int __do_readpage(struct extent_io_tree *tree,
  3062                           struct page *page,
  (...)
  3127                  em = __get_extent_map(inode, page, pg_offset, cur,
  3128                                        end - cur + 1, get_extent, em_cached);
  3129                  if (IS_ERR_OR_NULL(em)) {
  3130                          SetPageError(page);
  3131                          unlock_extent(tree, cur, end);
  3132                          break;
  3133                  }
  3134                  extent_offset = cur - em-&gt;start;
  3135                  BUG_ON(extent_map_end(em) &lt;= cur);
  (...)

Consider the following example scenario, where we end up hitting the
BUG_ON() in __do_readpage().

We have an inode with a size of 8KiB and 2 extent maps:

  extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by
            a previous transaction

  extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet
            persisted but writeback started for it already. The extent map
	    is pinned since there's writeback and an ordered extent in
	    progress, so it can not be merged with extent map A yet

The following sequence of steps leads to the BUG_ON():

1) The ordered extent for extent B completes, the respective page gets its
   writeback bit cleared and the extent map is unpinned, at that point it
   is not yet merged with extent map A because it's in the list of modified
   extents;

2) Due to memory pressure, or some other reason, the MM subsystem releases
   the page corresponding to extent B - btrfs_releasepage() is called and
   returns 1, meaning the page can be released as it's not dirty, not under
   writeback anymore and the extent range is not locked in the inode's
   iotree. However the extent map is not released, either because we are
   not in a context that allows memory allocations to block or because the
   inode's size is smaller than 16MiB - in this case our inode has a size
   of 8KiB;

3) Task B needs to read extent B and ends up __do_readpage() through the
   btrfs_readpage() callback. At __do_readpage() it gets a reference to
   extent map B;

4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B
   while holding the write lock on the inode's extent map tree - this
   results in try_merge_map() being called and since it's possible to merge
   extent map B with extent map A now (the extent map B was removed from
   the list of modified extents), the merging begins - it sets extent map
   B's start offset to 0 (was 4KiB), but before it increments the map's
   length to 8KiB (4kb + 4KiB), task A is at:

   BUG_ON(extent_map_end(em) &lt;= cur);

   The call to extent_map_end() sees the extent map has a start of 0
   and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so
   the BUG_ON() is triggered.

So it's dangerous to modify an extent map that is in the tree, because some
other task might have got a reference to it before and still using it, and
needs to see a consistent map while using it. Generally this is very rare
since most paths that lookup and use extent maps also have the file range
locked in the inode's iotree. The fsync path is pretty much the only
exception where we don't do it to avoid serialization with concurrent
reads.

Fix this by not allowing an extent map do be merged if if it's being used
by tasks other then the one attempting to merge the extent map (when the
reference count of the extent map is greater than 2).

Reported-by: ryusuke1925 &lt;st13s20@gm.ibaraki-ct.ac.jp&gt;
Reported-by: Koki Mitani &lt;koki.mitani.xg@hco.ntt.co.jp&gt;
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: remove extent_map::bdev</title>
<updated>2019-11-18T22:43:44+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2019-08-30T13:40:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a019e9e197eaa68ffe2efeba00d685581b1a5416'/>
<id>urn:sha1:a019e9e197eaa68ffe2efeba00d685581b1a5416</id>
<content type='text'>
We can now remove the bdev from extent_map. Previous patches made sure
that bio_set_dev is correctly in all places and that we don't need to
grab it from latest_bdev or pass it around inside the extent map.

Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: assert extent_map bdevs and lookup_map and split</title>
<updated>2019-11-18T11:47:01+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2019-05-10T15:48:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c3e14909d3b399d49d8dfcf3edbcd5463a7eed6f'/>
<id>urn:sha1:c3e14909d3b399d49d8dfcf3edbcd5463a7eed6f</id>
<content type='text'>
This is a preparatory patch for removing extent_map::bdev. There's some
history behind the code so this is only precaution to catch if things
break before the actual removal happens.

Logically, comparing a raw low-level block device (bdev) does not make
sense for extent maps (high-level objects). This had no effect in
practice but was quite confusing in the code.  The lookup_map is set iff
EXTENT_FLAG_FS_MAPPING is set.

The two pointers were stored in the same bytes and used potentially in
two meanings. Now they're split, so the asserts are in place to check
that the condition will not change.

The lookup map pointer misused bdev, this has been changed in commit
95617d69326c ("btrfs: cleanup, stop casting for extent_map-&gt;lookup
everywhere") to the explicit type. But the semantics hasn't changed and
bdev was not actually used to decide if maps are mergeable.

Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: assert extent map tree lock in add_extent_mapping</title>
<updated>2019-09-09T12:59:00+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2019-03-27T15:19:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d23ea3fa7dcb0d4a2c405de0879bc4ddcf521d7d'/>
<id>urn:sha1:d23ea3fa7dcb0d4a2c405de0879bc4ddcf521d7d</id>
<content type='text'>
As add_extent_mapping is called from several functions, let's add the
lock annotation. The tree is going to be modified so it must be the
exclusive lock.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: Optimize unallocated chunks discard</title>
<updated>2019-04-29T17:02:38+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>nborisov@suse.com</email>
</author>
<published>2019-03-27T12:24:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8811133d8a982d3cef5d25eef54a8dca9e8e6ded'/>
<id>urn:sha1:8811133d8a982d3cef5d25eef54a8dca9e8e6ded</id>
<content type='text'>
Currently unallocated chunks are always trimmed. For example
2 consecutive trims on large storage would trim freespace twice
irrespective of whether the space was actually allocated or not between
those trims.

Optimise this behavior by exploiting the newly introduced alloc_state
tree of btrfs_device. A new CHUNK_TRIMMED bit is used to mark
those unallocated chunks which have been trimmed and have not been
allocated afterwards. On chunk allocation the respective underlying devices'
physical space will have its CHUNK_TRIMMED flag cleared. This avoids
submitting discards for space which hasn't been changed since the last
time discard was issued.

This applies to the single mount period of the filesystem as the
information is not stored permanently.

Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
</feed>
