<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/btrfs/async-thread.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-03-04T15:24:46+00:00</updated>
<entry>
<title>btrfs: remove unused included headers</title>
<updated>2024-03-04T15:24:46+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2024-01-25T16:44:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2b712e3bb2c46165a3d35096f37bea6aa47f45d4'/>
<id>urn:sha1:2b712e3bb2c46165a3d35096f37bea6aa47f45d4</id>
<content type='text'>
With help of neovim, LSP and clangd we can identify header files that
are not actually needed to be included in the .c files. This is focused
only on removal (with minor fixups), further cleanups are possible but
will require doing the header files properly with forward declarations,
minimized includes and include-what-you-use care.

Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: merge ordered work callbacks in btrfs_work into one</title>
<updated>2023-10-12T14:44:10+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2023-09-19T16:49:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=078b8b90b8ffec54f7dc1e8ef6c1078d1e7d3dae'/>
<id>urn:sha1:078b8b90b8ffec54f7dc1e8ef6c1078d1e7d3dae</id>
<content type='text'>
There are two callbacks defined in btrfs_work but only two actually make
use of them, otherwise there are NULLs. We can get rid of the freeing
callback making it a special case of the normal work. This reduces the
size of btrfs_work by 8 bytes, final layout:

struct btrfs_work {
        btrfs_func_t               func;                 /*     0     8 */
        btrfs_ordered_func_t       ordered_func;         /*     8     8 */
        struct work_struct         normal_work;          /*    16    32 */
        struct list_head           ordered_list;         /*    48    16 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct btrfs_workqueue *   wq;                   /*    64     8 */
        long unsigned int          flags;                /*    72     8 */

        /* size: 80, cachelines: 2, members: 6 */
        /* last cacheline: 16 bytes */
};

This in turn reduces size of other structures (on a release config):

- async_chunk			 160 -&gt;  152
- async_submit_bio		 152 -&gt;  144
- btrfs_async_delayed_work	 104 -&gt;   96
- btrfs_caching_control		 176 -&gt;  168
- btrfs_delalloc_work		 144 -&gt;  136
- btrfs_fs_info			3608 -&gt; 3600
- btrfs_ordered_extent		 440 -&gt;  424
- btrfs_writepage_fixup		 104 -&gt;   96

Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: include trace header in where necessary</title>
<updated>2023-10-12T14:44:03+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2023-08-25T20:19:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5335f4376cdc3b19e2b36bf31ad6664092f0e356'/>
<id>urn:sha1:5335f4376cdc3b19e2b36bf31ad6664092f0e356</id>
<content type='text'>
If we no longer include the tracepoints from ctree.h we fail to compile
because we have the dependency in some of the header files and source
files.  Add the include where we have these dependencies to allow us to
remove the include from ctree.h.

Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: use alloc_ordered_workqueue() to create ordered workqueues</title>
<updated>2023-06-19T11:59:30+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2023-05-25T23:33:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=58e814fcacc1f652d2b794c82b7c9d96ee3c3bab'/>
<id>urn:sha1:58e814fcacc1f652d2b794c82b7c9d96ee3c3bab</id>
<content type='text'>
BACKGROUND
==========

When multiple work items are queued to a workqueue, their execution order
doesn't match the queueing order. They may get executed in any order and
simultaneously. When fully serialized execution - one by one in the queueing
order - is needed, an ordered workqueue should be used which can be created
with alloc_ordered_workqueue().

However, alloc_ordered_workqueue() was a later addition. Before it, an
ordered workqueue could be obtained by creating an UNBOUND workqueue with
@max_active==1. This originally was an implementation side-effect which was
broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered"). Because there were users that depended on the ordered execution,
5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
made workqueue allocation path to implicitly promote UNBOUND workqueues w/
@max_active==1 to ordered workqueues.

While this has worked okay, overloading the UNBOUND allocation interface
this way creates other issues. It's difficult to tell whether a given
workqueue actually needs to be ordered and users that legitimately want a
min concurrency level wq unexpectedly gets an ordered one instead. With
planned UNBOUND workqueue updates to improve execution locality and more
prevalence of chiplet designs which can benefit from such improvements, this
isn't a state we wanna be in forever.

This patch series audits all call sites that create an UNBOUND workqueue w/
@max_active==1 and converts them to alloc_ordered_workqueue() as necessary.

BTRFS
=====

* fs_info-&gt;scrub_workers initialized in scrub_workers_get() was setting
  @max_active to 1 when @is_dev_replace is set and it seems that the
  workqueue actually needs to be ordered if @is_dev_replace. Update the code
  so that alloc_ordered_workqueue() is used if @is_dev_replace.

* fs_info-&gt;discard_ctl.discard_workers initialized in
  btrfs_init_workqueues() was directly using alloc_workqueue() w/
  @max_active==1. Converted to alloc_ordered_workqueue().

* fs_info-&gt;fixup_workers and fs_info-&gt;qgroup_rescan_workers initialized in
  btrfs_queue_work() use the btrfs's workqueue wrapper, btrfs_workqueue,
  which are allocated with btrfs_alloc_workqueue().

  btrfs_workqueue implements automatic @max_active adjustment which is
  disabled when the specified max limit is below a certain threshold, so
  calling btrfs_alloc_workqueue() with @limit_active==1 yields an ordered
  workqueue whose @max_active won't be changed as the auto-tuning is
  disabled.

  This is rather brittle in that nothing clearly indicates that the two
  workqueues should be ordered or btrfs_alloc_workqueue() must disable
  auto-tuning when @limit_active==1.

  This patch factors out the common btrfs_workqueue init code into
  btrfs_init_workqueue() and add explicit btrfs_alloc_ordered_workqueue().
  The two workqueues are converted to use the new ordered allocation
  interface.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: simplify WQ_HIGHPRI handling in struct btrfs_workqueue</title>
<updated>2022-05-16T15:03:15+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-04-18T04:43:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a31b4a4368d28c5e780f0906588fbd1dcfe4ad54'/>
<id>urn:sha1:a31b4a4368d28c5e780f0906588fbd1dcfe4ad54</id>
<content type='text'>
Just let the one caller that wants optional WQ_HIGHPRI handling allocate
a separate btrfs_workqueue for that.  This allows to rename struct
__btrfs_workqueue to btrfs_workqueue, remove a pointer indirection and
separate allocation for all btrfs_workqueue users and generally simplify
the code.

Reviewed-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: fix memory ordering between normal and ordered work functions</title>
<updated>2021-11-16T15:50:23+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>nborisov@suse.com</email>
</author>
<published>2021-11-02T12:49:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=45da9c1767ac31857df572f0a909fbe88fd5a7e9'/>
<id>urn:sha1:45da9c1767ac31857df572f0a909fbe88fd5a7e9</id>
<content type='text'>
Ordered work functions aren't guaranteed to be handled by the same thread
which executed the normal work functions. The only way execution between
normal/ordered functions is synchronized is via the WORK_DONE_BIT,
unfortunately the used bitops don't guarantee any ordering whatsoever.

This manifested as seemingly inexplicable crashes on ARM64, where
async_chunk::inode is seen as non-null in async_cow_submit which causes
submit_compressed_extents to be called and crash occurs because
async_chunk::inode suddenly became NULL. The call trace was similar to:

    pc : submit_compressed_extents+0x38/0x3d0
    lr : async_cow_submit+0x50/0xd0
    sp : ffff800015d4bc20

    &lt;registers omitted for brevity&gt;

    Call trace:
     submit_compressed_extents+0x38/0x3d0
     async_cow_submit+0x50/0xd0
     run_ordered_work+0xc8/0x280
     btrfs_work_helper+0x98/0x250
     process_one_work+0x1f0/0x4ac
     worker_thread+0x188/0x504
     kthread+0x110/0x114
     ret_from_fork+0x10/0x18

Fix this by adding respective barrier calls which ensure that all
accesses preceding setting of WORK_DONE_BIT are strictly ordered before
setting the flag. At the same time add a read barrier after reading of
WORK_DONE_BIT in run_ordered_work which ensures all subsequent loads
would be strictly ordered after reading the bit. This in turn ensures
are all accesses before WORK_DONE_BIT are going to be strictly ordered
before any access that can occur in ordered_func.

Reported-by: Chris Murphy &lt;lists@colorremedies.com&gt;
Fixes: 08a9ff326418 ("btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue")
CC: stable@vger.kernel.org # 4.4+
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2011928
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Tested-by: Chris Murphy &lt;chris@colorremedies.com&gt;
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>Btrfs: fix crash during unmount due to race with delayed inode workers</title>
<updated>2020-03-23T16:01:51+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2020-02-28T13:04:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f0cc2cd70164efe8f75c5d99560f0f69969c72e4'/>
<id>urn:sha1:f0cc2cd70164efe8f75c5d99560f0f69969c72e4</id>
<content type='text'>
During unmount we can have a job from the delayed inode items work queue
still running, that can lead to at least two bad things:

1) A crash, because the worker can try to create a transaction just
   after the fs roots were freed;

2) A transaction leak, because the worker can create a transaction
   before the fs roots are freed and just after we committed the last
   transaction and after we stopped the transaction kthread.

A stack trace example of the crash:

 [79011.691214] kernel BUG at lib/radix-tree.c:982!
 [79011.692056] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
 [79011.693180] CPU: 3 PID: 1394 Comm: kworker/u8:2 Tainted: G        W         5.6.0-rc2-btrfs-next-54 #2
 (...)
 [79011.696789] Workqueue: btrfs-delayed-meta btrfs_work_helper [btrfs]
 [79011.697904] RIP: 0010:radix_tree_tag_set+0xe7/0x170
 (...)
 [79011.702014] RSP: 0018:ffffb3c84a317ca0 EFLAGS: 00010293
 [79011.702949] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 [79011.704202] RDX: ffffb3c84a317cb0 RSI: ffffb3c84a317ca8 RDI: ffff8db3931340a0
 [79011.705463] RBP: 0000000000000005 R08: 0000000000000005 R09: ffffffff974629d0
 [79011.706756] R10: ffffb3c84a317bc0 R11: 0000000000000001 R12: ffff8db393134000
 [79011.708010] R13: ffff8db3931340a0 R14: ffff8db393134068 R15: 0000000000000001
 [79011.709270] FS:  0000000000000000(0000) GS:ffff8db3b6a00000(0000) knlGS:0000000000000000
 [79011.710699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [79011.711710] CR2: 00007f22c2a0a000 CR3: 0000000232ad4005 CR4: 00000000003606e0
 [79011.712958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [79011.714205] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [79011.715448] Call Trace:
 [79011.715925]  record_root_in_trans+0x72/0xf0 [btrfs]
 [79011.716819]  btrfs_record_root_in_trans+0x4b/0x70 [btrfs]
 [79011.717925]  start_transaction+0xdd/0x5c0 [btrfs]
 [79011.718829]  btrfs_async_run_delayed_root+0x17e/0x2b0 [btrfs]
 [79011.719915]  btrfs_work_helper+0xaa/0x720 [btrfs]
 [79011.720773]  process_one_work+0x26d/0x6a0
 [79011.721497]  worker_thread+0x4f/0x3e0
 [79011.722153]  ? process_one_work+0x6a0/0x6a0
 [79011.722901]  kthread+0x103/0x140
 [79011.723481]  ? kthread_create_worker_on_cpu+0x70/0x70
 [79011.724379]  ret_from_fork+0x3a/0x50
 (...)

The following diagram shows a sequence of steps that lead to the crash
during ummount of the filesystem:

        CPU 1                                             CPU 2                                CPU 3

 btrfs_punch_hole()
   btrfs_btree_balance_dirty()
     btrfs_balance_delayed_items()
       --&gt; sees
           fs_info-&gt;delayed_root-&gt;items
           with value 200, which is greater
           than
           BTRFS_DELAYED_BACKGROUND (128)
           and smaller than
           BTRFS_DELAYED_WRITEBACK (512)
       btrfs_wq_run_delayed_node()
         --&gt; queues a job for
             fs_info-&gt;delayed_workers to run
             btrfs_async_run_delayed_root()

                                                                                            btrfs_async_run_delayed_root()
                                                                                              --&gt; job queued by CPU 1

                                                                                              --&gt; starts picking and running
                                                                                                  delayed nodes from the
                                                                                                  prepare_list list

                                                 close_ctree()

                                                   btrfs_delete_unused_bgs()

                                                   btrfs_commit_super()

                                                     btrfs_join_transaction()
                                                       --&gt; gets transaction N

                                                     btrfs_commit_transaction(N)
                                                       --&gt; set transaction state
                                                        to TRANTS_STATE_COMMIT_START

                                                                                             btrfs_first_prepared_delayed_node()
                                                                                               --&gt; picks delayed node X through
                                                                                                   the prepared_list list

                                                       btrfs_run_delayed_items()

                                                         btrfs_first_delayed_node()
                                                           --&gt; also picks delayed node X
                                                               but through the node_list
                                                               list

                                                         __btrfs_commit_inode_delayed_items()
                                                            --&gt; runs all delayed items from
                                                                this node and drops the
                                                                node's item count to 0
                                                                through call to
                                                                btrfs_release_delayed_inode()

                                                         --&gt; finishes running any remaining
                                                             delayed nodes

                                                       --&gt; finishes transaction commit

                                                   --&gt; stops cleaner and transaction threads

                                                   btrfs_free_fs_roots()
                                                     --&gt; frees all roots and removes them
                                                         from the radix tree
                                                         fs_info-&gt;fs_roots_radix

                                                                                             btrfs_join_transaction()
                                                                                               start_transaction()
                                                                                                 btrfs_record_root_in_trans()
                                                                                                   record_root_in_trans()
                                                                                                     radix_tree_tag_set()
                                                                                                       --&gt; crashes because
                                                                                                           the root is not in
                                                                                                           the radix tree
                                                                                                           anymore

If the worker is able to call btrfs_join_transaction() before the unmount
task frees the fs roots, we end up leaking a transaction and all its
resources, since after the call to btrfs_commit_super() and stopping the
transaction kthread, we don't expect to have any transaction open anymore.

When this situation happens the worker has a delayed node that has no
more items to run, since the task calling btrfs_run_delayed_items(),
which is doing a transaction commit, picks the same node and runs all
its items first.

We can not wait for the worker to complete when running delayed items
through btrfs_run_delayed_items(), because we call that function in
several phases of a transaction commit, and that could cause a deadlock
because the worker calls btrfs_join_transaction() and the task doing the
transaction commit may have already set the transaction state to
TRANS_STATE_COMMIT_DOING.

Also it's not possible to get into a situation where only some of the
items of a delayed node are added to the fs/subvolume tree in the current
transaction and the remaining ones in the next transaction, because when
running the items of a delayed inode we lock its mutex, effectively
waiting for the worker if the worker is running the items of the delayed
node already.

Since this can only cause issues when unmounting a filesystem, fix it in
a simple way by waiting for any jobs on the delayed workers queue before
calling btrfs_commit_supper() at close_ctree(). This works because at this
point no one can call btrfs_btree_balance_dirty() or
btrfs_balance_delayed_items(), and if we end up waiting for any worker to
complete, btrfs_commit_super() will commit the transaction created by the
worker.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: add __pure attribute to functions</title>
<updated>2019-11-18T11:46:52+00:00</updated>
<author>
<name>David Sterba</name>
<email>dsterba@suse.com</email>
</author>
<published>2019-10-01T17:57:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e1f60a6580c04d0d2492bb6034e968b8c29c78cf'/>
<id>urn:sha1:e1f60a6580c04d0d2492bb6034e968b8c29c78cf</id>
<content type='text'>
The attribute is more relaxed than const and the functions could
dereference pointers, as long as the observable state is not changed. We
do have such functions, based on -Wsuggest-attribute=pure .

The visible effects of this patch are negligible, there are differences
in the assembly but hard to summarize.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: get rid of pointless wtag variable in async-thread.c</title>
<updated>2019-11-18T11:46:49+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2019-09-16T18:30:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c9eb55db8439057165f106164622c146cdd59468'/>
<id>urn:sha1:c9eb55db8439057165f106164622c146cdd59468</id>
<content type='text'>
Commit ac0c7cf8be00 ("btrfs: fix crash when tracepoint arguments are
freed by wq callbacks") added a void pointer, wtag, which is passed into
trace_btrfs_all_work_done() instead of the freed work item. This is
silly for a few reasons:

1. The freed work item still has the same address.
2. work is still in scope after it's freed, so assigning wtag doesn't
   stop anyone from using it.
3. The tracepoint has always taken a void * argument, so assigning wtag
   doesn't actually make things any more type-safe. (Note that the
   original bug in commit bc074524e123 ("btrfs: prefix fsid to all trace
   events") was that the void * was implicitly casted when it was passed
   to btrfs_work_owner() in the trace point itself).

Instead, let's add some clearer warnings as comments.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
<entry>
<title>btrfs: get rid of unique workqueue helper functions</title>
<updated>2019-11-18T11:46:48+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2019-09-16T18:30:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a0cac0ec961f0d42828eeef196ac2246a2f07659'/>
<id>urn:sha1:a0cac0ec961f0d42828eeef196ac2246a2f07659</id>
<content type='text'>
Commit 9e0af2376434 ("Btrfs: fix task hang under heavy compressed
write") worked around the issue that a recycled work item could get a
false dependency on the original work item due to how the workqueue code
guarantees non-reentrancy. It did so by giving different work functions
to different types of work.

However, the fixes in the previous few patches are more complete, as
they prevent a work item from being recycled at all (except for a tiny
window that the kernel workqueue code handles for us). This obsoletes
the previous fix, so we don't need the unique helpers for correctness.
The only other reason to keep them would be so they show up in stack
traces, but they always seem to be optimized to a tail call, so they
don't show up anyways. So, let's just get rid of the extra indirection.

While we're here, rename normal_work_helper() to the more informative
btrfs_work_helper().

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
</entry>
</feed>
