<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs, branch v4.19.77</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.77</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.77'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2019-10-05T11:10:13+00:00</updated>
<entry>
<title>fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock</title>
<updated>2019-10-05T11:10:13+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2019-09-09T03:15:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5bead06b3443c784637d454c5c64a3bd05752cf4'/>
<id>urn:sha1:5bead06b3443c784637d454c5c64a3bd05752cf4</id>
<content type='text'>
[ Upstream commit 76e43c8ccaa35c30d5df853013561145a0f750a5 ]

When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.

This may have to wait for fuse_iqueue::waitq.lock to be released by one
of many places that take it with IRQs enabled.  Since the IRQ handler
may take kioctx::ctx_lock, lockdep reports that a deadlock is possible.

Fix it by protecting the state of struct fuse_iqueue with a separate
spinlock, and only accessing fuse_iqueue::waitq using the versions of
the waitqueue functions which do IRQ-safe locking internally.

Reproducer:

	#include &lt;fcntl.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;sys/mount.h&gt;
	#include &lt;sys/stat.h&gt;
	#include &lt;sys/syscall.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;linux/aio_abi.h&gt;

	int main()
	{
		char opts[128];
		int fd = open("/dev/fuse", O_RDWR);
		aio_context_t ctx = 0;
		struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd };
		struct iocb *cbp = &amp;cb;

		sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd);
		mkdir("mnt", 0700);
		mount("foo",  "mnt", "fuse", 0, opts);
		syscall(__NR_io_setup, 1, &amp;ctx);
		syscall(__NR_io_submit, ctx, 1, &amp;cbp);
	}

Beginning of lockdep output:

	=====================================================
	WARNING: SOFTIRQ-safe -&gt; SOFTIRQ-unsafe lock order detected
	5.3.0-rc5 #9 Not tainted
	-----------------------------------------------------
	syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
	000000003590ceda (&amp;fiq-&gt;waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
	000000003590ceda (&amp;fiq-&gt;waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline]
	000000003590ceda (&amp;fiq-&gt;waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825

	and this task is already holding:
	0000000075037284 (&amp;(&amp;ctx-&gt;ctx_lock)-&gt;rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
	0000000075037284 (&amp;(&amp;ctx-&gt;ctx_lock)-&gt;rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
	0000000075037284 (&amp;(&amp;ctx-&gt;ctx_lock)-&gt;rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
	which would create a new lock dependency:
	 (&amp;(&amp;ctx-&gt;ctx_lock)-&gt;rlock){..-.} -&gt; (&amp;fiq-&gt;waitq){+.+.}

	but this new dependency connects a SOFTIRQ-irq-safe lock:
	 (&amp;(&amp;ctx-&gt;ctx_lock)-&gt;rlock){..-.}

	[...]

Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com
Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com
Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Cc: &lt;stable@vger.kernel.org&gt; # v4.19+
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>CIFS: Fix oplock handling for SMB 2.1+ protocols</title>
<updated>2019-10-05T11:10:12+00:00</updated>
<author>
<name>Pavel Shilovsky</name>
<email>pshilov@microsoft.com</email>
</author>
<published>2019-09-26T19:31:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4290a9e593235e0e706a7762152758f57869541f'/>
<id>urn:sha1:4290a9e593235e0e706a7762152758f57869541f</id>
<content type='text'>
commit a016e2794fc3a245a91946038dd8f34d65e53cc3 upstream.

There may be situations when a server negotiates SMB 2.1
protocol version or higher but responds to a CREATE request
with an oplock rather than a lease.

Currently the client doesn't handle such a case correctly:
when another CREATE comes in the server sends an oplock
break to the initial CREATE and the client doesn't send
an ack back due to a wrong caching level being set (READ
instead of RWH). Missing an oplock break ack makes the
server wait until the break times out which dramatically
increases the latency of the second CREATE.

Fix this by properly detecting oplocks when using SMB 2.1
protocol version and higher.

Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Pavel Shilovsky &lt;pshilov@microsoft.com&gt;
Signed-off-by: Steve French &lt;stfrench@microsoft.com&gt;
Reviewed-by: Ronnie Sahlberg &lt;lsahlber@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>CIFS: fix max ea value size</title>
<updated>2019-10-05T11:10:12+00:00</updated>
<author>
<name>Murphy Zhou</name>
<email>jencce.kernel@gmail.com</email>
</author>
<published>2019-09-21T11:26:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a3a150895b6f09896f3c3121b2e6cc927f21aba2'/>
<id>urn:sha1:a3a150895b6f09896f3c3121b2e6cc927f21aba2</id>
<content type='text'>
commit 63d37fb4ce5ae7bf1e58f906d1bf25f036fe79b2 upstream.

It should not be larger then the slab max buf size. If user
specifies a larger size, it passes this check and goes
straightly to SMB2_set_info_init performing an insecure memcpy.

Signed-off-by: Murphy Zhou &lt;jencce.kernel@gmail.com&gt;
Reviewed-by: Aurelien Aptel &lt;aaptel@suse.com&gt;
CC: Stable &lt;stable@vger.kernel.org&gt;
Signed-off-by: Steve French &lt;stfrench@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ext4: fix punch hole for inline_data file systems</title>
<updated>2019-10-05T11:10:11+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2019-08-24T02:38:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=091c754d5ca943fd3bb0cadd20140184911b695d'/>
<id>urn:sha1:091c754d5ca943fd3bb0cadd20140184911b695d</id>
<content type='text'>
commit c1e8220bd316d8ae8e524df39534b8a412a45d5e upstream.

If a program attempts to punch a hole on an inline data file, we need
to convert it to a normal file first.

This was detected using ext4/032 using the adv configuration.  Simple
reproducer:

mke2fs -Fq -t ext4 -O inline_data /dev/vdc
mount /vdc
echo "" &gt; /vdc/testfile
xfs_io -c 'truncate 33554432' /vdc/testfile
xfs_io -c 'fpunch 0 1048576' /vdc/testfile
umount /vdc
e2fsck -fy /dev/vdc

Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ext4: fix warning inside ext4_convert_unwritten_extents_endio</title>
<updated>2019-10-05T11:10:11+00:00</updated>
<author>
<name>Rakesh Pandit</name>
<email>rakesh@tuxera.com</email>
</author>
<published>2019-08-23T02:53:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=775e3e734bd35fa4d11bc0d9abb91a44815bacaa'/>
<id>urn:sha1:775e3e734bd35fa4d11bc0d9abb91a44815bacaa</id>
<content type='text'>
commit e3d550c2c4f2f3dba469bc3c4b83d9332b4e99e1 upstream.

Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing
first argument.  This was introduced in commit ff95ec22cd7f ("ext4:
add warning to ext4_convert_unwritten_extents_endio") and splitting
extents inside endio would trigger it.

Fixes: ff95ec22cd7f ("ext4: add warning to ext4_convert_unwritten_extents_endio")
Signed-off-by: Rakesh Pandit &lt;rakesh@tuxera.com&gt;
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Btrfs: fix race setting up and completing qgroup rescan workers</title>
<updated>2019-10-05T11:10:09+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2019-09-24T09:49:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bacff03bb28632d7d29259580b69c6c8e5ac4dbd'/>
<id>urn:sha1:bacff03bb28632d7d29259580b69c6c8e5ac4dbd</id>
<content type='text'>
commit 13fc1d271a2e3ab8a02071e711add01fab9271f6 upstream.

There is a race between setting up a qgroup rescan worker and completing
a qgroup rescan worker that can lead to callers of the qgroup rescan wait
ioctl to either not wait for the rescan worker to complete or to hang
forever due to missing wake ups. The following diagram shows a sequence
of steps that illustrates the race.

        CPU 1                                                         CPU 2                                  CPU 3

 btrfs_ioctl_quota_rescan()
  btrfs_qgroup_rescan()
   qgroup_rescan_init()
    mutex_lock(&amp;fs_info-&gt;qgroup_rescan_lock)
    spin_lock(&amp;fs_info-&gt;qgroup_lock)

    fs_info-&gt;qgroup_flags |=
      BTRFS_QGROUP_STATUS_FLAG_RESCAN

    init_completion(
      &amp;fs_info-&gt;qgroup_rescan_completion)

    fs_info-&gt;qgroup_rescan_running = true

    mutex_unlock(&amp;fs_info-&gt;qgroup_rescan_lock)
    spin_unlock(&amp;fs_info-&gt;qgroup_lock)

    btrfs_init_work()
     --&gt; starts the worker

                                                        btrfs_qgroup_rescan_worker()
                                                         mutex_lock(&amp;fs_info-&gt;qgroup_rescan_lock)

                                                         fs_info-&gt;qgroup_flags &amp;=
                                                           ~BTRFS_QGROUP_STATUS_FLAG_RESCAN

                                                         mutex_unlock(&amp;fs_info-&gt;qgroup_rescan_lock)

                                                         starts transaction, updates qgroup status
                                                         item, etc

                                                                                                           btrfs_ioctl_quota_rescan()
                                                                                                            btrfs_qgroup_rescan()
                                                                                                             qgroup_rescan_init()
                                                                                                              mutex_lock(&amp;fs_info-&gt;qgroup_rescan_lock)
                                                                                                              spin_lock(&amp;fs_info-&gt;qgroup_lock)

                                                                                                              fs_info-&gt;qgroup_flags |=
                                                                                                                BTRFS_QGROUP_STATUS_FLAG_RESCAN

                                                                                                              init_completion(
                                                                                                                &amp;fs_info-&gt;qgroup_rescan_completion)

                                                                                                              fs_info-&gt;qgroup_rescan_running = true

                                                                                                              mutex_unlock(&amp;fs_info-&gt;qgroup_rescan_lock)
                                                                                                              spin_unlock(&amp;fs_info-&gt;qgroup_lock)

                                                                                                              btrfs_init_work()
                                                                                                               --&gt; starts another worker

                                                         mutex_lock(&amp;fs_info-&gt;qgroup_rescan_lock)

                                                         fs_info-&gt;qgroup_rescan_running = false

                                                         mutex_unlock(&amp;fs_info-&gt;qgroup_rescan_lock)

							 complete_all(&amp;fs_info-&gt;qgroup_rescan_completion)

Before the rescan worker started by the task at CPU 3 completes, if
another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
fs_info-&gt;qgroup_flags, which is expected and correct behaviour.

However if other task calls btrfs_ioctl_quota_rescan_wait() before the
rescan worker started by the task at CPU 3 completes, it will return
immediately without waiting for the new rescan worker to complete,
because fs_info-&gt;qgroup_rescan_running is set to false by CPU 2.

This race is making test case btrfs/171 (from fstests) to fail often:

  btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
#      --- tests/btrfs/171.out     2018-09-16 21:30:48.505104287 +0100
#      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad      2019-09-19 02:01:36.938486039 +0100
#      @@ -1,2 +1,3 @@
#       QA output created by 171
#      +ERROR: quota rescan failed: Operation now in progress
#       Silence is golden
#      ...
#      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad'  to see the entire diff)

That is because the test calls the btrfs-progs commands "qgroup quota
rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
commands 'qgroup assign' and 'qgroup remove' often call the rescan start
ioctl after calling the qgroup assign ioctl,
btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
for a rescan worker to complete.

Another problem the race can cause is missing wake ups for waiters,
since the call to complete_all() happens outside a critical section and
after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
diagram above, if we have a waiter for the first rescan task (executed
by CPU 2), then fs_info-&gt;qgroup_rescan_completion.wait is not empty, and
if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
before it calls complete_all() against
fs_info-&gt;qgroup_rescan_completion, the task at CPU 3 calls
init_completion() against fs_info-&gt;qgroup_rescan_completion which
re-initilizes its wait queue to an empty queue, therefore causing the
rescan worker at CPU 2 to call complete_all() against an empty queue,
never waking up the task waiting for that rescan worker.

Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
fs_info-&gt;qgroup_rescan_running to false in the same critical section,
delimited by the mutex fs_info-&gt;qgroup_rescan_lock, as well as doing the
call to complete_all() in that same critical section. This gives the
protection needed to avoid rescan wait ioctl callers not waiting for a
running rescan worker and the lost wake ups problem, since setting that
rescan flag and boolean as well as initializing the wait queue is done
already in a critical section delimited by that mutex (at
qgroup_rescan_init()).

Fixes: 57254b6ebce4ce ("Btrfs: add ioctl to wait for qgroup rescan completion")
Fixes: d2c609b834d62f ("btrfs: properly track when rescan worker is running")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls</title>
<updated>2019-10-05T11:10:09+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2019-09-16T12:02:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b5c42ef09df33110ffb7f2625f919934f65b218a'/>
<id>urn:sha1:b5c42ef09df33110ffb7f2625f919934f65b218a</id>
<content type='text'>
commit d4e204948fe3e0dc8e1fbf3f8f3290c9c2823be3 upstream.

[BUG]
The following script can cause btrfs qgroup data space leak:

  mkfs.btrfs -f $dev
  mount $dev -o nospace_cache $mnt

  btrfs subv create $mnt/subv
  btrfs quota en $mnt
  btrfs quota rescan -w $mnt
  btrfs qgroup limit 128m $mnt/subv

  for (( i = 0; i &lt; 3; i++)); do
          # Create 3 64M holes for latter fallocate to fail
          truncate -s 192m $mnt/subv/file
          xfs_io -c "pwrite 64m 4k" $mnt/subv/file &gt; /dev/null
          xfs_io -c "pwrite 128m 4k" $mnt/subv/file &gt; /dev/null
          sync

          # it's supposed to fail, and each failure will leak at least 64M
          # data space
          xfs_io -f -c "falloc 0 192m" $mnt/subv/file &amp;&gt; /dev/null
          rm $mnt/subv/file
          sync
  done

  # Shouldn't fail after we removed the file
  xfs_io -f -c "falloc 0 64m" $mnt/subv/file

[CAUSE]
Btrfs qgroup data reserve code allow multiple reservations to happen on
a single extent_changeset:
E.g:
	btrfs_qgroup_reserve_data(inode, &amp;data_reserved, 0, SZ_1M);
	btrfs_qgroup_reserve_data(inode, &amp;data_reserved, SZ_1M, SZ_2M);
	btrfs_qgroup_reserve_data(inode, &amp;data_reserved, 0, SZ_4M);

Btrfs qgroup code has its internal tracking to make sure we don't
double-reserve in above example.

The only pattern utilizing this feature is in the main while loop of
btrfs_fallocate() function.

However btrfs_qgroup_reserve_data()'s error handling has a bug in that
on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED
flag but doesn't free previously reserved bytes.

This bug has a two fold effect:
- Clearing EXTENT_QGROUP_RESERVED ranges
  This is the correct behavior, but it prevents
  btrfs_qgroup_check_reserved_leak() to catch the leakage as the
  detector is purely EXTENT_QGROUP_RESERVED flag based.

- Leak the previously reserved data bytes.

The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
the last one fails, leaking space reserved in the previous ones.

[FIX]
Also free previously reserved data bytes when btrfs_qgroup_reserve_data
fails.

Fixes: 524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space</title>
<updated>2019-10-05T11:10:09+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2019-09-16T12:02:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c521bfa8e2dc3d91ba0d921d7cfc788497b303d0'/>
<id>urn:sha1:c521bfa8e2dc3d91ba0d921d7cfc788497b303d0</id>
<content type='text'>
commit bab32fc069ce8829c416e8737c119f62a57970f9 upstream.

[BUG]
Under the following case with qgroup enabled, if some error happened
after we have reserved delalloc space, then in error handling path, we
could cause qgroup data space leakage:

From btrfs_truncate_block() in inode.c:

	ret = btrfs_delalloc_reserve_space(inode, &amp;data_reserved,
					   block_start, blocksize);
	if (ret)
		goto out;

 again:
	page = find_or_create_page(mapping, index, mask);
	if (!page) {
		btrfs_delalloc_release_space(inode, data_reserved,
					     block_start, blocksize, true);
		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
		ret = -ENOMEM;
		goto out;
	}

[CAUSE]
In the above case, btrfs_delalloc_reserve_space() will call
btrfs_qgroup_reserve_data() and mark the io_tree range with
EXTENT_QGROUP_RESERVED flag.

In the error handling path, we have the following call stack:
btrfs_delalloc_release_space()
|- btrfs_free_reserved_data_space()
   |- btrsf_qgroup_free_data()
      |- __btrfs_qgroup_release_data(reserved=@reserved, free=1)
         |- qgroup_free_reserved_data(reserved=@reserved)
            |- clear_record_extent_bits();
            |- freed += changeset.bytes_changed;

However due to a completion bug, qgroup_free_reserved_data() will clear
EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)-&gt;io_failure_tree, other
than the correct BTRFS_I(inode)-&gt;io_tree.
Since io_failure_tree is never marked with that flag,
btrfs_qgroup_free_data() will not free any data reserved space at all,
causing a leakage.

This type of error handling can only be triggered by errors outside of
qgroup code. So EDQUOT error from qgroup can't trigger it.

[FIX]
Fix the wrong target io_tree.

Reported-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Fixes: bc42bda22345 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>btrfs: Relinquish CPUs in btrfs_compare_trees</title>
<updated>2019-10-05T11:10:09+00:00</updated>
<author>
<name>Nikolay Borisov</name>
<email>nborisov@suse.com</email>
</author>
<published>2019-09-04T16:33:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=067f82a05c2f6f150245890002d2e787b754c7d3'/>
<id>urn:sha1:067f82a05c2f6f150245890002d2e787b754c7d3</id>
<content type='text'>
commit 6af112b11a4bc1b560f60a618ac9c1dcefe9836e upstream.

When doing any form of incremental send the parent and the child trees
need to be compared via btrfs_compare_trees. This  can result in long
loop chains without ever relinquishing the CPU. This causes softlockup
detector to trigger when comparing trees with a lot of items. Example
report:

watchdog: BUG: soft lockup - CPU#0 stuck for 24s! [snapperd:16153]
CPU: 0 PID: 16153 Comm: snapperd Not tainted 5.2.9-1-default #1 openSUSE Tumbleweed (unreleased)
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : __ll_sc_arch_atomic_sub_return+0x14/0x20
lr : btrfs_release_extent_buffer_pages+0xe0/0x1e8 [btrfs]
sp : ffff00001273b7e0
Call trace:
 __ll_sc_arch_atomic_sub_return+0x14/0x20
 release_extent_buffer+0xdc/0x120 [btrfs]
 free_extent_buffer.part.0+0xb0/0x118 [btrfs]
 free_extent_buffer+0x24/0x30 [btrfs]
 btrfs_release_path+0x4c/0xa0 [btrfs]
 btrfs_free_path.part.0+0x20/0x40 [btrfs]
 btrfs_free_path+0x24/0x30 [btrfs]
 get_inode_info+0xa8/0xf8 [btrfs]
 finish_inode_if_needed+0xe0/0x6d8 [btrfs]
 changed_cb+0x9c/0x410 [btrfs]
 btrfs_compare_trees+0x284/0x648 [btrfs]
 send_subvol+0x33c/0x520 [btrfs]
 btrfs_ioctl_send+0x8a0/0xaf0 [btrfs]
 btrfs_ioctl+0x199c/0x2288 [btrfs]
 do_vfs_ioctl+0x4b0/0x820
 ksys_ioctl+0x84/0xb8
 __arm64_sys_ioctl+0x28/0x38
 el0_svc_common.constprop.0+0x7c/0x188
 el0_svc_handler+0x34/0x90
 el0_svc+0x8/0xc

Fix this by adding a call to cond_resched at the beginning of the main
loop in btrfs_compare_trees.

Fixes: 7069830a9e38 ("Btrfs: add btrfs_compare_trees function")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Signed-off-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Btrfs: fix use-after-free when using the tree modification log</title>
<updated>2019-10-05T11:10:09+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@suse.com</email>
</author>
<published>2019-08-12T18:14:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b08344be3546ff71b395a1adac3c075f887edd9f'/>
<id>urn:sha1:b08344be3546ff71b395a1adac3c075f887edd9f</id>
<content type='text'>
commit efad8a853ad2057f96664328a0d327a05ce39c76 upstream.

At ctree.c:get_old_root(), we are accessing a root's header owner field
after we have freed the respective extent buffer. This results in an
use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC
is set, results in a stack trace like the following:

  [ 3876.799331] stack segment: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
  [ 3876.799363] CPU: 0 PID: 15436 Comm: pool Not tainted 5.3.0-rc3-btrfs-next-54 #1
  [ 3876.799385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  [ 3876.799433] RIP: 0010:btrfs_search_old_slot+0x652/0xd80 [btrfs]
  (...)
  [ 3876.799502] RSP: 0018:ffff9f08c1a2f9f0 EFLAGS: 00010286
  [ 3876.799518] RAX: ffff8dd300000000 RBX: ffff8dd85a7a9348 RCX: 000000038da26000
  [ 3876.799538] RDX: 0000000000000000 RSI: ffffe522ce368980 RDI: 0000000000000246
  [ 3876.799559] RBP: dae1922adadad000 R08: 0000000008020000 R09: ffffe522c0000000
  [ 3876.799579] R10: ffff8dd57fd788c8 R11: 000000007511b030 R12: ffff8dd781ddc000
  [ 3876.799599] R13: ffff8dd9e6240578 R14: ffff8dd6896f7a88 R15: ffff8dd688cf90b8
  [ 3876.799620] FS:  00007f23ddd97700(0000) GS:ffff8dda20200000(0000) knlGS:0000000000000000
  [ 3876.799643] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 3876.799660] CR2: 00007f23d4024000 CR3: 0000000710bb0005 CR4: 00000000003606f0
  [ 3876.799682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 3876.799703] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 3876.799723] Call Trace:
  [ 3876.799735]  ? do_raw_spin_unlock+0x49/0xc0
  [ 3876.799749]  ? _raw_spin_unlock+0x24/0x30
  [ 3876.799779]  resolve_indirect_refs+0x1eb/0xc80 [btrfs]
  [ 3876.799810]  find_parent_nodes+0x38d/0x1180 [btrfs]
  [ 3876.799841]  btrfs_check_shared+0x11a/0x1d0 [btrfs]
  [ 3876.799870]  ? extent_fiemap+0x598/0x6e0 [btrfs]
  [ 3876.799895]  extent_fiemap+0x598/0x6e0 [btrfs]
  [ 3876.799913]  do_vfs_ioctl+0x45a/0x700
  [ 3876.799926]  ksys_ioctl+0x70/0x80
  [ 3876.799938]  ? trace_hardirqs_off_thunk+0x1a/0x20
  [ 3876.799953]  __x64_sys_ioctl+0x16/0x20
  [ 3876.799965]  do_syscall_64+0x62/0x220
  [ 3876.799977]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [ 3876.799993] RIP: 0033:0x7f23e0013dd7
  (...)
  [ 3876.800056] RSP: 002b:00007f23ddd96ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [ 3876.800078] RAX: ffffffffffffffda RBX: 00007f23d80210f8 RCX: 00007f23e0013dd7
  [ 3876.800099] RDX: 00007f23d80210f8 RSI: 00000000c020660b RDI: 0000000000000003
  [ 3876.800626] RBP: 000055fa2a2a2440 R08: 0000000000000000 R09: 00007f23ddd96d7c
  [ 3876.801143] R10: 00007f23d8022000 R11: 0000000000000246 R12: 00007f23ddd96d80
  [ 3876.801662] R13: 00007f23ddd96d78 R14: 00007f23d80210f0 R15: 00007f23ddd96d80
  (...)
  [ 3876.805107] ---[ end trace e53161e179ef04f9 ]---

Fix that by saving the root's header owner field into a local variable
before freeing the root's extent buffer, and then use that local variable
when needed.

Fixes: 30b0463a9394d9 ("Btrfs: fix accessing the root pointer in tree mod log functions")
CC: stable@vger.kernel.org # 3.10+
Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Signed-off-by: Filipe Manana &lt;fdmanana@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
