kernel/linux.git/fs/btrfs/raid56.h, branch linux-7.1.y

btrfs: raid56: introduce a new parameter to locate a sector

2025-11-25T00:46:58+00:00

Since we cannot ensure that all bios from the higher layer are backed by large folios (e.g. direct IO, encoded read/write/send), we need the ability to locate sub-block (aka, a page) inside a full stripe. So the existing @stripe_nr + @sector_nr combination is not enough to locate such page for bs > ps cases. Introduce a new parameter, @step_nr, to locate the page of a larger fs block. The naming is following the conventions used inside btrfs elsewhere, where one step is min(sectorsize, PAGE_SIZE). It's still a preparation, only touching the following aspects: - btrfs_dump_rbio() To show the new @sector_nsteps member. - btrfs_raid_bio::sector_nsteps Recording how many steps there are inside a fs block. - Enlarge btrfs_raid_bio::*_paddrs[] size To take @sector_nsteps into consideration. - index_one_bio() - index_stripe_sectors() - memcpy_from_bio_to_stripe() - cache_rbio_pages() - need_read_stripe_sectors() Those functions are iterating *_paddrs[], which needs to take sector_nsteps into consideration. - Rename rbio_stripe_sector_index() to rbio_sector_index() The "stripe" part is not that helpful. And an extra ASSERT() before returning the result. - Add a new rbio_paddr_index() helper This will take the extra @step_nr into consideration. - The comments of btrfs_raid_bio Signed-off-by: Qu Wenruo Signed-off-by: David Sterba

btrfs: raid56: add an overview for the btrfs_raid_bio structure

2025-11-25T00:46:29+00:00

The structure needs to track both the pages from higher layer bio and internal pages, thus it can be a little complex to grasp. Add an overview of the structure, especially how we track different pages from higher layer bios and internal ones, to save some time for future developers. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: raid56: remove sector_ptr structure

2025-11-24T21:42:23+00:00

Since sector_ptr structure is now only containing a single paddr, there is no need to use that structure. Instead use phys_addr_t array for bio and stripe pointers. This means several helpers are also needed to accept a paddr instead of a sector_ptr pointer. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: raid56: move sector_ptr::uptodate into a dedicated bitmap

2025-11-24T21:42:22+00:00

The uptodate boolean member can be extracted into a bitmap, which will save us some space (1 bit in a byte vs 8 bits in a byte). Furthermore we do not need to record the uptodate bitmap for bio sectors, as if bio_sectors[].paddr is valid it means there is a bio and will be uptodate. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: prepare scrub to support bs > ps cases

2025-09-23T06:49:25+00:00

This involves: - Migrate scrub_stripe::pages[] to folios[] - Use btrfs_alloc_folio_array() and folio_put() to alloc above array. - Migrate scrub_stripe_get_kaddr() and scrub_stripe_get_paddr() to use folio interfaces - Migrate raid56_parity_cache_data_pages() to raid56_parity_cache_data_folios() Since scrub is the only caller still using pages. This helper will copy the folio array contents into rbio::stripe_pages, with sector uptodate flags updated. And a new ASSERT() to make sure bs > ps cases will not hit this path. Since most scrub code is based on kaddr/paddr, the migration itself is pretty straightforward. And since we're here, also move the loop to set the stripe_sectors[].uptodate out of the copy loop. As we always mark all the sectors as uptodate for the data stripe, it's easier to do in one go, other than doing it inside the copy loop. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba

btrfs: add forward declarations and headers, part 2

2024-03-04T15:24:49+00:00

Do a cleanup in more headers: - add forward declarations for types referenced by pointers - add includes when types need them This fixes potential compilation problems if the headers are reordered or the missing includes are not provided indirectly. Signed-off-by: David Sterba

btrfs: use a dedicated data structure for chunk maps

2023-12-15T19:27:02+00:00

Currently we abuse the extent_map structure for two purposes: 1) To actually represent extents for inodes; 2) To represent chunk mappings. This is odd and has several disadvantages: 1) To create a chunk map, we need to do two memory allocations: one for an extent_map structure and another one for a map_lookup structure, so more potential for an allocation failure and more complicated code to manage and link two structures; 2) For a chunk map we actually only use 3 fields (24 bytes) of the respective extent map structure: the 'start' field to have the logical start address of the chunk, the 'len' field to have the chunk's size, and the 'orig_block_len' field to contain the chunk's stripe size. Besides wasting a memory, it's also odd and not intuitive at all to have the stripe size in a field named 'orig_block_len'. We are also using 'block_len' of the extent_map structure to contain the chunk size, so we have 2 fields for the same value, 'len' and 'block_len', which is pointless; 3) When an extent map is associated to a chunk mapping, we set the bit EXTENT_FLAG_FS_MAPPING on its flags and then make its member named 'map_lookup' point to the associated map_lookup structure. This means that for an extent map associated to an inode extent, we are not using this 'map_lookup' pointer, so wasting 8 bytes (on a 64 bits platform); 4) Extent maps associated to a chunk mapping are never merged or split so it's pointless to use the existing extent map infrastructure. So add a dedicated data structure named 'btrfs_chunk_map' to represent chunk mappings, this is basically the existing map_lookup structure with some extra fields: 1) 'start' to contain the chunk logical address; 2) 'chunk_len' to contain the chunk's length; 3) 'stripe_size' for the stripe size; 4) 'rb_node' for insertion into a rb tree; 5) 'refs' for reference counting. This way we do a single memory allocation for chunk mappings and we don't waste memory for them with unused/unnecessary fields from an extent_map. We also save 8 bytes from the extent_map structure by removing the 'map_lookup' pointer, so the size of struct extent_map is reduced from 144 bytes down to 136 bytes, and we can now have 30 extents map per 4K page instead of 28. Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: raid56: remove unused BTRFS_RBIO_REBUILD_MISSING

2023-08-21T12:52:12+00:00

Commit aca43fe839e4 ("btrfs: remove unused raid56 functions which were dedicated for scrub") removed the special handling of RAID56 scrub for missing device. As scrub goes full mirror_num based recovery, that means if it hits a missing device in RAID56, it would just try the next mirror, which would go through the BTRFS_RBIO_READ_REBUILD operation. This means there is no longer any use of BTRFS_RBIO_REBUILD_MISSING operation and we can safely remove it. Reviewed-by: Christoph Hellwig Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba

btrfs: scrub: use recovered data stripes as cache to avoid unnecessary read

2023-06-19T11:59:24+00:00

For P/Q stripe scrub, we have quite some duplicated read IO: - Data stripes read for verification This is triggered by the scrub_submit_initial_read() inside scrub_raid56_parity_stripe(). - Data stripes read (again) for P/Q stripe verification This is triggered by scrub_assemble_read_bios() from scrub_rbio(). Although we can have hit rbio cache and avoid unnecessary read, the chance is very low, as scrub would easily flush the whole rbio cache. This means, even we're just scrubbing a single P/Q stripe, we would read the data stripes twice for the best case scenario. If we need to recover some data stripes, it would cause more reads on the same data stripes, again and again. However before we call raid56_parity_submit_scrub_rbio() we already have all data stripes repaired and their contents ready to use. But RAID56 cache is unaware about the scrub cache, thus RAID56 layer itself still needs to re-read the data stripes. To avoid such cache miss, this patch would: - Introduce a new helper, raid56_parity_cache_data_pages() This function would grab the pages from an array, and copy the content to the rbio, marking all the involved sectors uptodate. The page copy is unavoidable because of the cache pages of rbio are all self managed, thus can not utilize outside pages without screwing up the lifespan. - Use the repaired data stripes as cache inside scrub_raid56_parity_stripe() By this, we ensure all the data sectors of the scrub rbio are already uptodate, and no need to read them again from disk. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba

btrfs: remove unused raid56 functions which were dedicated for scrub

2023-04-17T17:52:18+00:00

Since the scrub rework, the following RAID56 functions are no longer called: - raid56_add_scrub_pages() - raid56_alloc_missing_rbio() - raid56_submit_missing_rbio() Those functions are all utilized by scrub to handle missing device cases for RAID56. However the new scrub code handle them in a completely different way: - If it's data stripe, go recovery path through btrfs_submit_bio() - If it's P/Q stripe, it would be handled through raid56_parity_submit_scrub_rbio() And that function would handle dev-replace and repair properly. Thus we can safely remove those functions. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba