<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/block/genhd.c, branch v2.6.38</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v2.6.38</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v2.6.38'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2011-02-24T06:25:47+00:00</updated>
<entry>
<title>Fix over-zealous flush_disk when changing device size.</title>
<updated>2011-02-24T06:25:47+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2011-02-24T06:25:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=93b270f76e7ef3b81001576860c2701931cdc78b'/>
<id>urn:sha1:93b270f76e7ef3b81001576860c2701931cdc78b</id>
<content type='text'>
There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data.  In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef17a which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.

Cc: stable@kernel.org
Acked-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Cc: Andrew Patterson &lt;andrew.patterson@hp.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-2.6.38/event-handling' into for-2.6.38/core</title>
<updated>2011-01-13T13:47:54+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-01-13T13:47:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=81c5e2ae33c4b19e53966b427e33646bf6811830'/>
<id>urn:sha1:81c5e2ae33c4b19e53966b427e33646bf6811830</id>
<content type='text'>
</content>
</entry>
<entry>
<title>block: add internal hd part table references</title>
<updated>2011-01-07T07:43:37+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-01-07T07:43:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6c23a9681c0fe7fb7dd331b39dda11926f43746e'/>
<id>urn:sha1:6c23a9681c0fe7fb7dd331b39dda11926f43746e</id>
<content type='text'>
We can't use krefs since it's apparently restricted to very basic
reference counting.

This reverts commit e4a683c8.

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: fix accounting bug on cross partition merges</title>
<updated>2011-01-05T15:57:38+00:00</updated>
<author>
<name>Jerome Marchand</name>
<email>jmarchan@redhat.com</email>
</author>
<published>2011-01-05T15:57:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=09e099d4bafea3b15be003d548bdf94b4b6e0e17'/>
<id>urn:sha1:09e099d4bafea3b15be003d548bdf94b4b6e0e17</id>
<content type='text'>
/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
   8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
   8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                ~~~~~~~~~~
   8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
   8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
   8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
   8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct-&gt;in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct-&gt;in_flight
   is 0 and sda2's one is 1.

        | hd_struct-&gt;in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
   step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
   from sda2 region to sda1 region. However the two partition's
   hd_struct-&gt;in_flight are not changed.

        | hd_struct-&gt;in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
   sda2's hd_struct-&gt;in_flight, not a sda1's one, is decremented.

        | hd_struct-&gt;in_flight
   ---------------------------
   sda1 |         -1
   sda2 |          1
   ---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

Also add a refcount to struct hd_struct to keep the partition in
memory as long as users exist. We use kref_test_and_get() to ensure
we don't add a reference to a partition which is going away.

Signed-off-by: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Signed-off-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: stable@kernel.org
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>fs/block: type signature of major_to_index(int) to major_to_index(unsigned)</title>
<updated>2010-12-17T08:00:18+00:00</updated>
<author>
<name>Yang Zhang</name>
<email>kthreadd@gmail.com</email>
</author>
<published>2010-12-17T08:00:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e61eb2e93fe86931d46831752a82dab25a5335ca'/>
<id>urn:sha1:e61eb2e93fe86931d46831752a82dab25a5335ca</id>
<content type='text'>
The major/minor device numbers are always defined and used as `unsigned'.

Signed-off-by: Yang Zhang &lt;kthreadd@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: convert !IS_ERR(p) &amp;&amp; p to !IS_ERR_NOR_NULL(p)</title>
<updated>2010-12-17T07:58:36+00:00</updated>
<author>
<name>Yang Zhang</name>
<email>kthreadd@gmail.com</email>
</author>
<published>2010-12-17T07:58:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b9f985b6e05ebd7af2aaef0eb3ae369390ef191f'/>
<id>urn:sha1:b9f985b6e05ebd7af2aaef0eb3ae369390ef191f</id>
<content type='text'>
Signed-off-by: Yang Zhang &lt;kthreadd@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>implement in-kernel gendisk events handling</title>
<updated>2010-12-16T16:53:38+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-12-08T19:57:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=77ea887e433ad8389d416826936c110fa7910f80'/>
<id>urn:sha1:77ea887e433ad8389d416826936c110fa7910f80</id>
<content type='text'>
Currently, media presence polling for removeable block devices is done
from userland.  There are several issues with this.

* Polling is done by periodically opening the device.  For SCSI
  devices, the command sequence generated by such action involves a
  few different commands including TEST_UNIT_READY.  This behavior,
  while perfectly legal, is different from Windows which only issues
  single command, GET_EVENT_STATUS_NOTIFICATION.  Unfortunately, some
  ATAPI devices lock up after being periodically queried such command
  sequences.

* There is no reliable and unintrusive way for a userland program to
  tell whether the target device is safe for media presence polling.
  For example, polling for media presence during an on-going burning
  session can make it fail.  The polling program can avoid this by
  opening the device with O_EXCL but then it risks making a valid
  exclusive user of the device fail w/ -EBUSY.

* Userland polling is unnecessarily heavy and in-kernel implementation
  is lighter and better coordinated (workqueue, timer slack).

This patch implements framework for in-kernel disk event handling,
which includes media presence polling.

* bdops-&gt;check_events() is added, which supercedes -&gt;media_changed().
  It should check whether there's any pending event and return if so.
  Currently, two events are defined - DISK_EVENT_MEDIA_CHANGE and
  DISK_EVENT_EJECT_REQUEST.  -&gt;check_events() is guaranteed not to be
  called parallelly.

* gendisk-&gt;events and -&gt;async_events are added.  These should be
  initialized by block driver before passing the device to add_disk().
  The former contains the mask of all supported events and the latter
  the mask of all events which the device can report without polling.
  /sys/block/*/events[_async] export these to userland.

* Kernel parameter block.events_dfl_poll_msecs controls the system
  polling interval (default is 0 which means disable) and
  /sys/block/*/events_poll_msecs control polling intervals for
  individual devices (default is -1 meaning use system setting).  Note
  that if a device can report all supported events asynchronously and
  its polling interval isn't explicitly set, the device won't be
  polled regardless of the system polling interval.

* If a device is opened exclusively with write access, event checking
  is automatically disabled until all write exclusive accesses are
  released.

* There are event 'clearing' events.  For example, both of currently
  defined events are cleared after the device has been successfully
  opened.  This information is passed to -&gt;check_events() callback
  using @clearing argument as a hint.

* Event checking is always performed from system_nrt_wq and timer
  slack is set to 25% for polling.

* Nothing changes for drivers which implement -&gt;media_changed() but
  not -&gt;check_events().  Going forward, all drivers will be converted
  to -&gt;check_events() and -&gt;media_change() will be dropped.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: move register_disk() and del_gendisk() to block/genhd.c</title>
<updated>2010-12-16T16:53:38+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-12-08T19:57:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d2bf1b6723ed0eab378363649d15b7893bf14e91'/>
<id>urn:sha1:d2bf1b6723ed0eab378363649d15b7893bf14e91</id>
<content type='text'>
There's no reason for register_disk() and del_gendisk() to be in
fs/partitions/check.c.  Move both to genhd.c.  While at it, collapse
unlink_gendisk(), which was artificially in a separate function due to
genhd.c / check.c split, into del_gendisk().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>block: kill genhd_media_change_notify()</title>
<updated>2010-12-16T16:53:38+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2010-12-08T19:57:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dddd9dc340ae1a41d90e084529ca979c77c4ecfe'/>
<id>urn:sha1:dddd9dc340ae1a41d90e084529ca979c77c4ecfe</id>
<content type='text'>
There's no user of the facility.  Kill it.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
<entry>
<title>Revert "block: fix accounting bug on cross partition merges"</title>
<updated>2010-10-24T20:06:02+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2010-10-24T20:06:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f253b86b4ad1b3220544e75880510fd455ebd23f'/>
<id>urn:sha1:f253b86b4ad1b3220544e75880510fd455ebd23f</id>
<content type='text'>
This reverts commit 7681bfeeccff5efa9eb29bf09249a3c400b15327.

Conflicts:

	include/linux/genhd.h

It has numerous issues with the cleanup path and non-elevator
devices. Revert it for now so we can come up with a clean
version without rushing things.

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
</entry>
</feed>
