<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/blkdev.h, branch v4.3.1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.3.1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.3.1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2015-09-29T17:32:45+00:00</updated>
<entry>
<title>blk-mq: fix sysfs registration/unregistration race</title>
<updated>2015-09-29T17:32:45+00:00</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2015-09-26T17:09:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4593fdbe7a2f44d5e64c627c715dd0bcec9bdf14'/>
<id>urn:sha1:4593fdbe7a2f44d5e64c627c715dd0bcec9bdf14</id>
<content type='text'>
There is a race between cpu hotplug handling and adding/deleting
gendisk for blk-mq, where both are trying to register and unregister
the same sysfs entries.

null_add_dev
    --&gt; blk_mq_init_queue
        --&gt; blk_mq_init_allocated_queue
            --&gt; add to 'all_q_list' (*)
    --&gt; add_disk
        --&gt; blk_register_queue
            --&gt; blk_mq_register_disk (++)

null_del_dev
    --&gt; del_gendisk
        --&gt; blk_unregister_queue
            --&gt; blk_mq_unregister_disk (--)
    --&gt; blk_cleanup_queue
        --&gt; blk_mq_free_queue
            --&gt; del from 'all_q_list' (*)

blk_mq_queue_reinit
    --&gt; blk_mq_sysfs_unregister (-)
    --&gt; blk_mq_sysfs_register (+)

While the request queue is added to 'all_q_list' (*),
blk_mq_queue_reinit() can be called for the queue anytime by CPU
hotplug callback.  But blk_mq_sysfs_unregister (-) and
blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called
before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--)
is finished.  Because '/sys/block/*/mq/' is not exists.

There has already been BLK_MQ_F_SYSFS_UP flag in hctx-&gt;flags which can
be used to track these sysfs stuff, but it is only fixing this issue
partially.

In order to fix it completely, we just need per-queue flag instead of
per-hctx flag with appropriate locking.  So this introduces
q-&gt;mq_sysfs_init_done which is properly protected with all_q_mutex.

Also, we need to ensure that blk_mq_map_swqueue() is called with
all_q_mutex is held.  Since hctx-&gt;nr_ctx is reset temporarily and
updated in blk_mq_map_swqueue(), so we should avoid
blk_mq_register_hctx() seeing the temporary hctx-&gt;nr_ctx value
in CPU hotplug handling or adding/deleting gendisk .

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Reviewed-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Cc: Ming Lei &lt;tom.leiming@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.dk/linux-block</title>
<updated>2015-09-20T01:57:09+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-20T01:57:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=133bb59585140747fd3938002670cb395f40dc76'/>
<id>urn:sha1:133bb59585140747fd3938002670cb395f40dc76</id>
<content type='text'>
Pull block updates from Jens Axboe:
 "This is a bit bigger than it should be, but I could (did) not want to
  send it off last week due to both wanting extra testing, and expecting
  a fix for the bounce regression as well.  In any case, this contains:

   - Fix for the blk-merge.c compilation warning on gcc 5.x from me.

   - A set of back/front SG gap merge fixes, from me and from Sagi.
     This ensures that we honor SG gapping for integrity payloads as
     well.

   - Two small fixes for null_blk from Matias, fixing a leak and a
     capacity propagation issue.

   - A blkcg fix from Tejun, fixing a NULL dereference.

   - A fast clone optimization from Ming, fixing a performance
     regression since the arbitrarily sized bio's were introduced.

   - Also from Ming, a regression fix for bouncing IOs"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: fix bounce_end_io
  block: blk-merge: fast-clone bio when splitting rw bios
  block: blkg_destroy_all() should clear q-&gt;root_blkg and -&gt;root_rl.blkg
  block: Copy a user iovec if it includes gaps
  block: Refuse adding appending a gapped integrity page to a bio
  block: Refuse request/bio merges with gaps in the integrity payload
  block: Check for gaps on front and back merges
  null_blk: fix wrong capacity when bs is not 512 bytes
  null_blk: fix memory leak on cleanup
  block: fix bogus compiler warnings in blk-merge.c
</content>
</entry>
<entry>
<title>blk: rq_data_dir() should not return a boolean</title>
<updated>2015-09-12T19:03:30+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-05-27T22:32:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=10fbd36e362a0f367e34a7cd876a81295d8fc5ca'/>
<id>urn:sha1:10fbd36e362a0f367e34a7cd876a81295d8fc5ca</id>
<content type='text'>
rq_data_dir() returns either READ or WRITE (0 == READ, 1 == WRITE), not
a boolean value.

Now, admittedly the "!= 0" doesn't really change the value (0 stays as
zero, 1 stays as one), but it's not only redundant, it confuses gcc, and
causes gcc to warn about the construct

    switch (rq_data_dir(req)) {
        case READ:
            ...
        case WRITE:
            ...

that we have in a few drivers.

Now, the gcc warning is silly and stupid (it seems to warn not about the
switch value having a different type from the case statements, but about
_any_ boolean switch value), but in this case the code itself is silly
and stupid too, so let's just change it, and get rid of warnings like
this:

  drivers/block/hd.c: In function ‘hd_request’:
  drivers/block/hd.c:630:11: warning: switch condition has boolean value [-Wswitch-bool]
     switch (rq_data_dir(req)) {

The odd '!= 0' came in when "cmd_flags" got turned into a "u64" in
commit 5953316dbf90 ("block: make rq-&gt;cmd_flags be 64-bit") and is
presumably because the old code (that just did a logical 'and' with 1)
would then end up making the type of rq_data_dir() be u64 too.

But if we want to retain the old regular integer type, let's just cast
the result to 'int' rather than use that rather odd '!= 0'.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>block: Refuse request/bio merges with gaps in the integrity payload</title>
<updated>2015-09-11T15:03:04+00:00</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagig@mellanox.com</email>
</author>
<published>2015-09-11T15:03:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7f39add3b08cbbdb99abe50e6d7c342e6800d684'/>
<id>urn:sha1:7f39add3b08cbbdb99abe50e6d7c342e6800d684</id>
<content type='text'>
If a driver sets the block queue virtual boundary mask, it means that
it cannot handle gaps so we must not allow those in the integrity
payload as well.

Signed-off-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;

Fixed up by me to have duplicate integrity merge functions, depending
on whether block integrity is enabled or not. Fixes a compilations
issue with CONFIG_BLK_DEV_INTEGRITY unset.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2015-09-08T21:35:59+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-08T21:35:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=12f03ee606914317e7e6a0815e53a48205c31dae'/>
<id>urn:sha1:12f03ee606914317e7e6a0815e53a48205c31dae</id>
<content type='text'>
Pull libnvdimm updates from Dan Williams:
 "This update has successfully completed a 0day-kbuild run and has
  appeared in a linux-next release.  The changes outside of the typical
  drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
  removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
  the introduction of ZONE_DEVICE + devm_memremap_pages().

  Summary:

   - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
     mechanism for adding device-driver-discovered memory regions to the
     kernel's direct map.

     This facility is used by the pmem driver to enable pfn_to_page()
     operations on the page frames returned by DAX ('direct_access' in
     'struct block_device_operations').

     For now, the 'memmap' allocation for these "device" pages comes
     from "System RAM".  Support for allocating the memmap from device
     memory will arrive in a later kernel.

   - Introduce memremap() to replace usages of ioremap_cache() and
     ioremap_wt().  memremap() drops the __iomem annotation for these
     mappings to memory that do not have i/o side effects.  The
     replacement of ioremap_cache() with memremap() is limited to the
     pmem driver to ease merging the api change in v4.3.

     Completion of the conversion is targeted for v4.4.

   - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
     driver, update the VFS DAX implementation and PMEM api to provide
     persistence guarantees for kernel operations on a DAX mapping.

   - Convert the ACPI NFIT 'BLK' driver to map the block apertures as
     cacheable to improve performance.

   - Miscellaneous updates and fixes to libnvdimm including support for
     issuing "address range scrub" commands, clarifying the optimal
     'sector size' of pmem devices, a clarification of the usage of the
     ACPI '_STA' (status) property for DIMM devices, and other minor
     fixes"

* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
  libnvdimm, pmem: direct map legacy pmem by default
  libnvdimm, pmem: 'struct page' for pmem
  libnvdimm, pfn: 'struct page' provider infrastructure
  x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
  add devm_memremap_pages
  mm: ZONE_DEVICE for "device memory"
  mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
  dax: drop size parameter to -&gt;direct_access()
  nd_blk: change aperture mapping from WC to WB
  nvdimm: change to use generic kvfree()
  pmem, dax: have direct_access use __pmem annotation
  dax: update I/O path to do proper PMEM flushing
  pmem: add copy_from_iter_pmem() and clear_pmem()
  pmem, x86: clean up conditional pmem includes
  pmem: remove layer when calling arch_has_wmb_pmem()
  pmem, x86: move x86 PMEM API to new pmem.h header
  libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
  pmem: switch to devm_ allocations
  devres: add devm_memremap
  libnvdimm, btt: write and validate parent_uuid
  ...
</content>
</entry>
<entry>
<title>block: Check for gaps on front and back merges</title>
<updated>2015-09-03T16:33:09+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2015-09-03T16:28:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5e7c4274a70aa2d6f485996d0ca1dad52d0039ca'/>
<id>urn:sha1:5e7c4274a70aa2d6f485996d0ca1dad52d0039ca</id>
<content type='text'>
We are checking for gaps to previous bio_vec, which can
only detect back merges gaps. Moreover, at the point where
we check for a gap, we don't know if we will attempt a back
or a front merge. Thus, check for gap to prev in a back merge
attempt and check for a gap to next in a front merge attempt.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
[sagig: Minor rename change]
Signed-off-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;
</content>
</entry>
<entry>
<title>dax: drop size parameter to -&gt;direct_access()</title>
<updated>2015-08-27T23:40:58+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2015-08-07T21:41:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cb389b9c0e00c30c9daf20287f7d91e2466edbb1'/>
<id>urn:sha1:cb389b9c0e00c30c9daf20287f7d91e2466edbb1</id>
<content type='text'>
None of the implementations currently use it.  The common
bdev_direct_access() entry point handles all the size checks before
calling -&gt;direct_access().

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>pmem, dax: have direct_access use __pmem annotation</title>
<updated>2015-08-20T18:07:24+00:00</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2015-08-18T19:55:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e2e05394e4a3420dab96f728df4531893494e15d'/>
<id>urn:sha1:e2e05394e4a3420dab96f728df4531893494e15d</id>
<content type='text'>
Update the annotation for the kaddr pointer returned by direct_access()
so that it is a __pmem pointer.  This is consistent with the PMEM driver
and with how this direct_access() pointer is used in the DAX code.

Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>block: Replace SG_GAPS with new queue limits mask</title>
<updated>2015-08-19T21:26:02+00:00</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2015-08-19T21:24:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=03100aada96f0645bbcb89aea24c01f02d0ef1fa'/>
<id>urn:sha1:03100aada96f0645bbcb89aea24c01f02d0ef1fa</id>
<content type='text'>
The SG_GAPS queue flag caused checks for bio vector alignment against
PAGE_SIZE, but the device may have different constraints. This patch
adds a queue limits so a driver with such constraints can set to allow
requests that would have been unnecessarily split. The new gaps check
takes the request_queue as a parameter to simplify the logic around
invoking this function.

This new limit makes the queue flag redundant, so removing it and
all usage. Device-mappers will inherit the correct settings through
blk_stack_limits().

Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: bump BLK_DEF_MAX_SECTORS to 2560</title>
<updated>2015-08-18T20:21:15+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2015-08-13T18:57:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d2be537c3ba3568acd79cd178327b842e60d035e'/>
<id>urn:sha1:d2be537c3ba3568acd79cd178327b842e60d035e</id>
<content type='text'>
A value of 2560 (1280k) will accommodate a 10-data-disk stripe
write with chunk size 128k.  In the testing I've done using
iozone, fio, and aio-stress across a number of different storage
devices, a value of 1280 does not show a big performance
difference from 512, but will hopefully help software RAID
setups using SATA disks, as reported by Christoph.

NOTE: drivers/block/aoe/aoeblk.c sets its own max_hw_sectors_kb to
BLK_DEF_MAX_SECTORS.  So, this patch essentially changes aeoblk to
Use a larger maximum sector size, and I did not test this.

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
</feed>
