<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/md/bcache, branch v3.18.100</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.18.100</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.18.100'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2018-02-07T19:07:55+00:00</updated>
<entry>
<title>bcache: check return value of register_shrinker</title>
<updated>2018-02-07T19:07:55+00:00</updated>
<author>
<name>Michael Lyle</name>
<email>mlyle@lyle.org</email>
</author>
<published>2017-11-24T23:14:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ed996209f0209d8863e135252a5c392aa5e775fe'/>
<id>urn:sha1:ed996209f0209d8863e135252a5c392aa5e775fe</id>
<content type='text'>
[ Upstream commit 6c4ca1e36cdc1a0a7a84797804b87920ccbebf51 ]

register_shrinker is now __must_check, so check it to kill a warning.
Caller of bch_btree_cache_alloc in super.c appropriately checks return
value so this is fully plumbed through.

This V2 fixes checkpatch warnings and improves the commit description,
as I was too hasty getting the previous version out.

Signed-off-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reviewed-by: Vojtech Pavlik &lt;vojtech@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: fix wrong cache_misses statistics</title>
<updated>2017-12-20T09:01:33+00:00</updated>
<author>
<name>tang.junhui</name>
<email>tang.junhui@zte.com.cn</email>
</author>
<published>2017-10-30T21:46:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d4ec687b9721e56dedf152041c1ed19ffe34020c'/>
<id>urn:sha1:d4ec687b9721e56dedf152041c1ed19ffe34020c</id>
<content type='text'>
[ Upstream commit c157313791a999646901b3e3c6888514ebc36d62 ]

Currently, Cache missed IOs are identified by s-&gt;cache_miss, but actually,
there are many situations that missed IOs are not assigned a value for
s-&gt;cache_miss in cached_dev_cache_miss(), for example, a bypassed IO
(s-&gt;iop.bypass = 1), or the cache_bio allocate failed. In these situations,
it will go to out_put or out_submit, and s-&gt;cache_miss is null, which leads
bch_mark_cache_accounting() to treat this IO as a hit IO.

[ML: applied by 3-way merge]

Signed-off-by: tang.junhui &lt;tang.junhui@zte.com.cn&gt;
Reviewed-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: explicitly destroy mutex while exiting</title>
<updated>2017-12-20T09:01:33+00:00</updated>
<author>
<name>Liang Chen</name>
<email>liangchen.linux@gmail.com</email>
</author>
<published>2017-10-30T21:46:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5df0ce4b14be6bd7ecbebc72bee58b2059333228'/>
<id>urn:sha1:5df0ce4b14be6bd7ecbebc72bee58b2059333228</id>
<content type='text'>
[ Upstream commit 330a4db89d39a6b43f36da16824eaa7a7509d34d ]

mutex_destroy does nothing most of time, but it's better to call
it to make the code future proof and it also has some meaning
for like mutex debug.

As Coly pointed out in a previous review, bcache_exit() may not be
able to handle all the references properly if userspace registers
cache and backing devices right before bch_debug_init runs and
bch_debug_init failes later. So not exposing userspace interface
until everything is ready to avoid that issue.

Signed-off-by: Liang Chen &lt;liangchen.linux@gmail.com&gt;
Reviewed-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Reviewed-by: Eric Wheeler &lt;bcache@linux.ewheeler.net&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: recover data from backing when data is clean</title>
<updated>2017-12-09T17:29:46+00:00</updated>
<author>
<name>Rui Hua</name>
<email>huarui.dev@gmail.com</email>
</author>
<published>2017-11-24T23:14:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fa541bf05f1750c04579e70f4737967da65a611b'/>
<id>urn:sha1:fa541bf05f1750c04579e70f4737967da65a611b</id>
<content type='text'>
commit e393aa2446150536929140739f09c6ecbcbea7f0 upstream.

When we send a read request and hit the clean data in cache device, there
is a situation called cache read race in bcache(see the commit in the tail
of cache_look_up(), the following explaination just copy from there):
The bucket we're reading from might be reused while our bio is in flight,
and we could then end up reading the wrong data. We guard against this
by checking (in bch_cache_read_endio()) if the pointer is stale again;
if so, we treat it as an error (s-&gt;iop.error = -EINTR) and reread from
the backing device (but we don't pass that error up anywhere)

It should be noted that cache read race happened under normal
circumstances, not the circumstance when SSD failed, it was counted
and shown in  /sys/fs/bcache/XXX/internal/cache_read_races.

Without this patch, when we use writeback mode, we will never reread from
the backing device when cache read race happened, until the whole cache
device is clean, because the condition
(s-&gt;recoverable &amp;&amp; (dc &amp;&amp; !atomic_read(&amp;dc-&gt;has_dirty))) is false in
cached_dev_read_error(). In this situation, the s-&gt;iop.error(= -EINTR)
will be passed up, at last, user will receive -EINTR when it's bio end,
this is not suitable, and wield to up-application.

In this patch, we use s-&gt;read_dirty_data to judge whether the read
request hit dirty data in cache device, it is safe to reread data from
the backing device when the read request hit clean data. This can not
only handle cache read race, but also recover data when failed read
request from cache device.

[edited by mlyle to fix up whitespace, commit log title, comment
spelling]

Fixes: d59b23795933 ("bcache: only permit to recovery read error when cache device is clean")
Signed-off-by: Hua Rui &lt;huarui.dev@gmail.com&gt;
Reviewed-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: only permit to recovery read error when cache device is clean</title>
<updated>2017-12-09T17:29:46+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2017-10-30T21:46:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aa129cb1754de24f56ed60faec18fdae2e17e262'/>
<id>urn:sha1:aa129cb1754de24f56ed60faec18fdae2e17e262</id>
<content type='text'>
commit d59b23795933678c9638fd20c942d2b4f3cd6185 upstream.

When bcache does read I/Os, for example in writeback or writethrough mode,
if a read request on cache device is failed, bcache will try to recovery
the request by reading from cached device. If the data on cached device is
not synced with cache device, then requester will get a stale data.

For critical storage system like database, providing stale data from
recovery may result an application level data corruption, which is
unacceptible.

With this patch, for a failed read request in writeback or writethrough
mode, recovery a recoverable read request only happens when cache device
is clean. That is to say, all data on cached device is up to update.

For other cache modes in bcache, read request will never hit
cached_dev_read_error(), they don't need this patch.

Please note, because cache mode can be switched arbitrarily in run time, a
writethrough mode might be switched from a writeback mode. Therefore
checking dc-&gt;has_data in writethrough mode still makes sense.

Changelog:
V4: Fix parens error pointed by Michael Lyle.
v3: By response from Kent Oversteet, he thinks recovering stale data is a
    bug to fix, and option to permit it is unnecessary. So this version
    the sysfs file is removed.
v2: rename sysfs entry from allow_stale_data_on_failure  to
    allow_stale_data_on_failure, and fix the confusing commit log.
v1: initial patch posted.

[small change to patch comment spelling by mlyle]

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reported-by: Arne Wolf &lt;awolf@lenovo.com&gt;
Reviewed-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Cc: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Cc: Nix &lt;nix@esperi.org.uk&gt;
Cc: Kai Krakow &lt;hurikhan77@gmail.com&gt;
Cc: Eric Wheeler &lt;bcache@lists.ewheeler.net&gt;
Cc: Junhui Tang &lt;tang.junhui@zte.com.cn&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: check ca-&gt;alloc_thread initialized before wake up it</title>
<updated>2017-11-30T08:35:50+00:00</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2017-10-13T23:35:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=001cefee48affd8751808d38de16e6a32eb4b730'/>
<id>urn:sha1:001cefee48affd8751808d38de16e6a32eb4b730</id>
<content type='text'>
commit 91af8300d9c1d7c6b6a2fd754109e08d4798b8d8 upstream.

In bcache code, sysfs entries are created before all resources get
allocated, e.g. allocation thread of a cache set.

There is posibility for NULL pointer deference if a resource is accessed
but which is not initialized yet. Indeed Jorg Bornschein catches one on
cache set allocation thread and gets a kernel oops.

The reason for this bug is, when bch_bucket_alloc() is called during
cache set registration and attaching, ca-&gt;alloc_thread is not properly
allocated and initialized yet, call wake_up_process() on ca-&gt;alloc_thread
triggers NULL pointer deference failure. A simple and fast fix is, before
waking up ca-&gt;alloc_thread, checking whether it is allocated, and only
wake up ca-&gt;alloc_thread when it is not NULL.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Reported-by: Jorg Bornschein &lt;jb@capsec.org&gt;
Cc: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Reviewed-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: fix bch_hprint crash and improve output</title>
<updated>2017-09-27T08:57:21+00:00</updated>
<author>
<name>Michael Lyle</name>
<email>mlyle@lyle.org</email>
</author>
<published>2017-09-06T06:26:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0aad447e99a025ae3a3833c18f94f9700b575367'/>
<id>urn:sha1:0aad447e99a025ae3a3833c18f94f9700b575367</id>
<content type='text'>
commit 9276717b9e297a62d1151a43d1cd286213f68eb7 upstream.

Most importantly, solve a crash where %llu was used to format signed
numbers.  This would cause a buffer overflow when reading sysfs
writeback_rate_debug, as only 20 bytes were allocated for this and
%llu writes 20 characters plus a null.

Always use the units mechanism rather than having different output
paths for simplicity.

Also, correct problems with display output where 1.10 was a larger
number than 1.09, by multiplying by 10 and then dividing by 1024 instead
of dividing by 100.  (Remainders of &gt;= 1000 would print as .10).

Minor changes: Always display the decimal point instead of trying to
omit it based on number of digits shown.  Decide what units to use
based on 1000 as a threshold, not 1024 (in other words, always print
at most 3 digits before the decimal point).

Signed-off-by: Michael Lyle &lt;mlyle@lyle.org&gt;
Reported-by: Dmitry Yu Okunev &lt;dyokunev@ut.mephi.ru&gt;
Acked-by: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: fix for gc and write-back race</title>
<updated>2017-09-27T08:57:21+00:00</updated>
<author>
<name>Tang Junhui</name>
<email>tang.junhui@zte.com.cn</email>
</author>
<published>2017-09-06T06:25:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=50aee960b0d1f55c2fb682ab54262bcceaf558aa'/>
<id>urn:sha1:50aee960b0d1f55c2fb682ab54262bcceaf558aa</id>
<content type='text'>
commit 9baf30972b5568d8b5bc8b3c46a6ec5b58100463 upstream.

gc and write-back get raced (see the email "bcache get stucked" I sended
before):
gc thread                               write-back thread
|                                       |bch_writeback_thread()
|bch_gc_thread()                        |
|                                       |==&gt;read_dirty()
|==&gt;bch_btree_gc()                      |
|==&gt;btree_root() //get btree root       |
|                //node write locker    |
|==&gt;bch_btree_gc_root()                 |
|                                       |==&gt;read_dirty_submit()
|                                       |==&gt;write_dirty()
|                                       |==&gt;continue_at(cl,
|                                       |               write_dirty_finish,
|                                       |               system_wq);
|                                       |==&gt;write_dirty_finish()//excute
|                                       |               //in system_wq
|                                       |==&gt;bch_btree_insert()
|                                       |==&gt;bch_btree_map_leaf_nodes()
|                                       |==&gt;__bch_btree_map_nodes()
|                                       |==&gt;btree_root //try to get btree
|                                       |              //root node read
|                                       |              //lock
|                                       |-----stuck here
|==&gt;bch_btree_set_root()
|==&gt;bch_journal_meta()
|==&gt;bch_journal()
|==&gt;journal_try_write()
|==&gt;journal_write_unlocked() //journal_full(&amp;c-&gt;journal)
|                            //condition satisfied
|==&gt;continue_at(cl, journal_write, system_wq); //try to excute
|                               //journal_write in system_wq
|                               //but work queue is excuting
|                               //write_dirty_finish()
|==&gt;closure_sync(); //wait journal_write execute
|                   //over and wake up gc,
|-------------stuck here
|==&gt;release root node write locker

This patch alloc a separate work-queue for write-back thread to avoid such
race.

(Commit log re-organized by Coly Li to pass checkpatch.pl checking)

Signed-off-by: Tang Junhui &lt;tang.junhui@zte.com.cn&gt;
Acked-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: Correct return value for sysfs attach errors</title>
<updated>2017-09-27T08:57:21+00:00</updated>
<author>
<name>Tony Asleson</name>
<email>tasleson@redhat.com</email>
</author>
<published>2017-09-06T06:25:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c04dc907e590c85d84f252c1de390da1cfd5e3dc'/>
<id>urn:sha1:c04dc907e590c85d84f252c1de390da1cfd5e3dc</id>
<content type='text'>
commit 77fa100f27475d08a569b9d51c17722130f089e7 upstream.

If you encounter any errors in bch_cached_dev_attach it will return
a negative error code.  The variable 'v' which stores the result is
unsigned, thus user space sees a very large value returned for bytes
written which can cause incorrect user space behavior.  Utilize 1
signed variable to use throughout the function to preserve error return
capability.

Signed-off-by: Tony Asleson &lt;tasleson@redhat.com&gt;
Acked-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bcache: correct cache_dirty_target in __update_writeback_rate()</title>
<updated>2017-09-27T08:57:21+00:00</updated>
<author>
<name>Tang Junhui</name>
<email>tang.junhui@zte.com.cn</email>
</author>
<published>2017-09-06T06:25:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b312f81db414f2ec82034c317bd2dae58c03fb1'/>
<id>urn:sha1:0b312f81db414f2ec82034c317bd2dae58c03fb1</id>
<content type='text'>
commit a8394090a9129b40f9d90dcb7f4a49d60c727ca6 upstream.

__update_write_rate() uses a Proportion-Differentiation Controller
algorithm to control writeback rate. A dirty target number is used in
this PD controller to control writeback rate. A larger target number
will make the writeback rate smaller, on the versus, a smaller target
number will make the writeback rate larger.

bcache uses the following steps to calculate the target number,
1) cache_sectors = all-buckets-of-cache-set * buckets-size
2) cache_dirty_target = cache_sectors * cached-device-writeback_percent
3) target = cache_dirty_target *
(sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set)

The calculation at step 1) for cache_sectors is incorrect, which does
not consider dirty blocks occupied by flash only volume.

A flash only volume can be took as a bcache device without cached
device. All data sectors allocated for it are persistent on cache device
and marked dirty, they are not touched by bcache writeback and garbage
collection code. So data blocks of flash only volume should be ignore
when calculating cache_sectors of cache set.

Current code does not subtract dirty sectors of flash only volume, which
results a larger target number from the above 3 steps. And in sequence
the cache device's writeback rate is smaller then a correct value,
writeback speed is slower on all cached devices.

This patch fixes the incorrect slower writeback rate by subtracting
dirty sectors of flash only volumes in __update_writeback_rate().

(Commit log composed by Coly Li to pass checkpatch.pl checking)

Signed-off-by: Tang Junhui &lt;tang.junhui@zte.com.cn&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
