<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git, branch v3.0.52</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.0.52</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.0.52'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2012-11-17T21:14:48+00:00</updated>
<entry>
<title>Linux 3.0.52</title>
<updated>2012-11-17T21:14:48+00:00</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2012-11-17T21:14:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=89d2d133c6947c04a8ab539b997f266535beaafe'/>
<id>urn:sha1:89d2d133c6947c04a8ab539b997f266535beaafe</id>
<content type='text'>
</content>
</entry>
<entry>
<title>ALSA: usb-audio: Fix mutex deadlock at disconnection</title>
<updated>2012-11-17T21:14:26+00:00</updated>
<author>
<name>Takashi Iwai</name>
<email>tiwai@suse.de</email>
</author>
<published>2012-11-13T10:22:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=093371484195393366f3407465b6f6c53eefdc57'/>
<id>urn:sha1:093371484195393366f3407465b6f6c53eefdc57</id>
<content type='text'>
commit 10e44239f67d0b6fb74006e61a7e883b8075247a upstream.

The recent change for USB-audio disconnection race fixes introduced a
mutex deadlock again.  There is a circular dependency between
chip-&gt;shutdown_rwsem and pcm-&gt;open_mutex, depicted like below, when a
device is opened during the disconnection operation:

A. snd_usb_audio_disconnect() -&gt;
     card.c::register_mutex -&gt;
       chip-&gt;shutdown_rwsem (write) -&gt;
         snd_card_disconnect() -&gt;
           pcm.c::register_mutex -&gt;
             pcm-&gt;open_mutex

B. snd_pcm_open() -&gt;
     pcm-&gt;open_mutex -&gt;
       snd_usb_pcm_open() -&gt;
         chip-&gt;shutdown_rwsem (read)

Since the chip-&gt;shutdown_rwsem protection in the case A is required
only for turning on the chip-&gt;shutdown flag and it doesn't have to be
taken for the whole operation, we can reduce its window in
snd_usb_audio_disconnect().

Reported-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Takashi Iwai &lt;tiwai@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ALSA: Fix card refcount unbalance</title>
<updated>2012-11-17T21:14:26+00:00</updated>
<author>
<name>Takashi Iwai</name>
<email>tiwai@suse.de</email>
</author>
<published>2012-11-08T13:36:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aaf238baf31e14cb1c111815193c3a78770b1873'/>
<id>urn:sha1:aaf238baf31e14cb1c111815193c3a78770b1873</id>
<content type='text'>
commit 8bb4d9ce08b0a92ca174e41d92c180328f86173f upstream.

There are uncovered cases whether the card refcount introduced by the
commit a0830dbd isn't properly increased or decreased:
- OSS PCM and mixer success paths
- When lookup function gets NULL

This patch fixes these places.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=50251

Signed-off-by: Takashi Iwai &lt;tiwai@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>intel-iommu: Fix AB-BA lockdep report</title>
<updated>2012-11-17T21:14:26+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>roland@purestorage.com</email>
</author>
<published>2011-07-20T13:22:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=919609d3cba52431b16501a63e6a5d45266a25c6'/>
<id>urn:sha1:919609d3cba52431b16501a63e6a5d45266a25c6</id>
<content type='text'>
commit 3e7abe2556b583e87dabda3e0e6178a67b20d06f upstream.

When unbinding a device so that I could pass it through to a KVM VM, I
got the lockdep report below.  It looks like a legitimate lock
ordering problem:

 - domain_context_mapping_one() takes iommu-&gt;lock and calls
   iommu_support_dev_iotlb(), which takes device_domain_lock (inside
   iommu-&gt;lock).

 - domain_remove_one_dev_info() starts by taking device_domain_lock
   then takes iommu-&gt;lock inside it (near the end of the function).

So this is the classic AB-BA deadlock.  It looks like a safe fix is to
simply release device_domain_lock a bit earlier, since as far as I can
tell, it doesn't protect any of the stuff accessed at the end of
domain_remove_one_dev_info() anyway.

BTW, the use of device_domain_lock looks a bit unsafe to me... it's
at least not obvious to me why we aren't vulnerable to the race below:

  iommu_support_dev_iotlb()
                                          domain_remove_dev_info()

  lock device_domain_lock
    find info
  unlock device_domain_lock

                                          lock device_domain_lock
                                            find same info
                                          unlock device_domain_lock

                                          free_devinfo_mem(info)

  do stuff with info after it's free

However I don't understand the locking here well enough to know if
this is a real problem, let alone what the best fix is.

Anyway here's the full lockdep output that prompted all of this:

     =======================================================
     [ INFO: possible circular locking dependency detected ]
     2.6.39.1+ #1
     -------------------------------------------------------
     bash/13954 is trying to acquire lock:
      (&amp;(&amp;iommu-&gt;lock)-&gt;rlock){......}, at: [&lt;ffffffff812f6421&gt;] domain_remove_one_dev_info+0x121/0x230

     but task is already holding lock:
      (device_domain_lock){-.-...}, at: [&lt;ffffffff812f6508&gt;] domain_remove_one_dev_info+0x208/0x230

     which lock already depends on the new lock.

     the existing dependency chain (in reverse order) is:

     -&gt; #1 (device_domain_lock){-.-...}:
            [&lt;ffffffff8109ca9d&gt;] lock_acquire+0x9d/0x130
            [&lt;ffffffff81571475&gt;] _raw_spin_lock_irqsave+0x55/0xa0
            [&lt;ffffffff812f8350&gt;] domain_context_mapping_one+0x600/0x750
            [&lt;ffffffff812f84df&gt;] domain_context_mapping+0x3f/0x120
            [&lt;ffffffff812f9175&gt;] iommu_prepare_identity_map+0x1c5/0x1e0
            [&lt;ffffffff81ccf1ca&gt;] intel_iommu_init+0x88e/0xb5e
            [&lt;ffffffff81cab204&gt;] pci_iommu_init+0x16/0x41
            [&lt;ffffffff81002165&gt;] do_one_initcall+0x45/0x190
            [&lt;ffffffff81ca3d3f&gt;] kernel_init+0xe3/0x168
            [&lt;ffffffff8157ac24&gt;] kernel_thread_helper+0x4/0x10

     -&gt; #0 (&amp;(&amp;iommu-&gt;lock)-&gt;rlock){......}:
            [&lt;ffffffff8109bf3e&gt;] __lock_acquire+0x195e/0x1e10
            [&lt;ffffffff8109ca9d&gt;] lock_acquire+0x9d/0x130
            [&lt;ffffffff81571475&gt;] _raw_spin_lock_irqsave+0x55/0xa0
            [&lt;ffffffff812f6421&gt;] domain_remove_one_dev_info+0x121/0x230
            [&lt;ffffffff812f8b42&gt;] device_notifier+0x72/0x90
            [&lt;ffffffff8157555c&gt;] notifier_call_chain+0x8c/0xc0
            [&lt;ffffffff81089768&gt;] __blocking_notifier_call_chain+0x78/0xb0
            [&lt;ffffffff810897b6&gt;] blocking_notifier_call_chain+0x16/0x20
            [&lt;ffffffff81373a5c&gt;] __device_release_driver+0xbc/0xe0
            [&lt;ffffffff81373ccf&gt;] device_release_driver+0x2f/0x50
            [&lt;ffffffff81372ee3&gt;] driver_unbind+0xa3/0xc0
            [&lt;ffffffff813724ac&gt;] drv_attr_store+0x2c/0x30
            [&lt;ffffffff811e4506&gt;] sysfs_write_file+0xe6/0x170
            [&lt;ffffffff8117569e&gt;] vfs_write+0xce/0x190
            [&lt;ffffffff811759e4&gt;] sys_write+0x54/0xa0
            [&lt;ffffffff81579a82&gt;] system_call_fastpath+0x16/0x1b

     other info that might help us debug this:

     6 locks held by bash/13954:
      #0:  (&amp;buffer-&gt;mutex){+.+.+.}, at: [&lt;ffffffff811e4464&gt;] sysfs_write_file+0x44/0x170
      #1:  (s_active#3){++++.+}, at: [&lt;ffffffff811e44ed&gt;] sysfs_write_file+0xcd/0x170
      #2:  (&amp;__lockdep_no_validate__){+.+.+.}, at: [&lt;ffffffff81372edb&gt;] driver_unbind+0x9b/0xc0
      #3:  (&amp;__lockdep_no_validate__){+.+.+.}, at: [&lt;ffffffff81373cc7&gt;] device_release_driver+0x27/0x50
      #4:  (&amp;(&amp;priv-&gt;bus_notifier)-&gt;rwsem){.+.+.+}, at: [&lt;ffffffff8108974f&gt;] __blocking_notifier_call_chain+0x5f/0xb0
      #5:  (device_domain_lock){-.-...}, at: [&lt;ffffffff812f6508&gt;] domain_remove_one_dev_info+0x208/0x230

     stack backtrace:
     Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1
     Call Trace:
      [&lt;ffffffff810993a7&gt;] print_circular_bug+0xf7/0x100
      [&lt;ffffffff8109bf3e&gt;] __lock_acquire+0x195e/0x1e10
      [&lt;ffffffff810972bd&gt;] ? trace_hardirqs_off+0xd/0x10
      [&lt;ffffffff8109d57d&gt;] ? trace_hardirqs_on_caller+0x13d/0x180
      [&lt;ffffffff8109ca9d&gt;] lock_acquire+0x9d/0x130
      [&lt;ffffffff812f6421&gt;] ? domain_remove_one_dev_info+0x121/0x230
      [&lt;ffffffff81571475&gt;] _raw_spin_lock_irqsave+0x55/0xa0
      [&lt;ffffffff812f6421&gt;] ? domain_remove_one_dev_info+0x121/0x230
      [&lt;ffffffff810972bd&gt;] ? trace_hardirqs_off+0xd/0x10
      [&lt;ffffffff812f6421&gt;] domain_remove_one_dev_info+0x121/0x230
      [&lt;ffffffff812f8b42&gt;] device_notifier+0x72/0x90
      [&lt;ffffffff8157555c&gt;] notifier_call_chain+0x8c/0xc0
      [&lt;ffffffff81089768&gt;] __blocking_notifier_call_chain+0x78/0xb0
      [&lt;ffffffff810897b6&gt;] blocking_notifier_call_chain+0x16/0x20
      [&lt;ffffffff81373a5c&gt;] __device_release_driver+0xbc/0xe0
      [&lt;ffffffff81373ccf&gt;] device_release_driver+0x2f/0x50
      [&lt;ffffffff81372ee3&gt;] driver_unbind+0xa3/0xc0
      [&lt;ffffffff813724ac&gt;] drv_attr_store+0x2c/0x30
      [&lt;ffffffff811e4506&gt;] sysfs_write_file+0xe6/0x170
      [&lt;ffffffff8117569e&gt;] vfs_write+0xce/0x190
      [&lt;ffffffff811759e4&gt;] sys_write+0x54/0xa0
      [&lt;ffffffff81579a82&gt;] system_call_fastpath+0x16/0x1b

Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
Signed-off-by: David Woodhouse &lt;David.Woodhouse@intel.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>xfs: fix reading of wrapped log data</title>
<updated>2012-11-17T21:14:25+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2012-11-02T00:38:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=11b34bb13aebb075ba47e3a292443ab3cd9638f7'/>
<id>urn:sha1:11b34bb13aebb075ba47e3a292443ab3cd9638f7</id>
<content type='text'>
commit 6ce377afd1755eae5c93410ca9a1121dfead7b87 upstream.

Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
3.0-rc1 introduced a regression when recovering log buffers that
wrapped around the end of log. The second part of the log buffer at
the start of the physical log was being read into the header buffer
rather than the data buffer, and hence recovery was seeing garbage
in the data buffer when it got to the region of the log buffer that
was incorrectly read.

Reported-by: Torsten Kaiser &lt;just.for.lkml@googlemail.com&gt;
Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Mark Tinguely &lt;tinguely@sgi.com&gt;
Signed-off-by: Ben Myers &lt;bpm@sgi.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>USB: mos7840: remove unused variable</title>
<updated>2012-11-17T21:14:25+00:00</updated>
<author>
<name>Johan Hovold</name>
<email>jhovold@gmail.com</email>
</author>
<published>2012-11-08T17:28:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=87725c35384baf9b6fbea6342807aab7071b0462'/>
<id>urn:sha1:87725c35384baf9b6fbea6342807aab7071b0462</id>
<content type='text'>
Fix warning about unused variable introduced by commit e681b66f2e19fa
("USB: mos7840: remove invalid disconnect handling") upstream.

A subsequent fix which removed the disconnect function got rid of the
warning but that one was only backported to v3.6.

Reported-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Johan Hovold &lt;jhovold@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>drm/i915: clear the entire sdvo infoframe buffer</title>
<updated>2012-11-17T21:14:25+00:00</updated>
<author>
<name>Daniel Vetter</name>
<email>daniel.vetter@ffwll.ch</email>
</author>
<published>2012-10-21T10:52:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b7832d49e5e0f4be7328efb271f097d39f61804a'/>
<id>urn:sha1:b7832d49e5e0f4be7328efb271f097d39f61804a</id>
<content type='text'>
commit b6e0e543f75729f207b9c72b0162ae61170635b2 upstream.

Like in the case of native hdmi, which is fixed already in

commit adf00b26d18e1b3570451296e03bcb20e4798cdd
Author: Paulo Zanoni &lt;paulo.r.zanoni@intel.com&gt;
Date:   Tue Sep 25 13:23:34 2012 -0300

    drm/i915: make sure we write all the DIP data bytes

we need to clear the entire sdvo buffer to avoid upsetting the
display.

Since infoframe buffer writing is now a bit more elaborate, extract it
into it's own function. This will be useful if we ever get around to
properly update the ELD for sdvo. Also #define proper names for the
two buffer indexes with fixed usage.

v2: Cite the right commit above, spotted by Paulo Zanoni.

v3: I'm too stupid to paste the right commit.

v4: Ben Hutchings noticed that I've failed to handle an underflow in
my loop logic, breaking it for i &gt;= length + 8. Since I've just lost C
programmer license, use his solution. Also, make the frustrated 0-base
buffer size a notch more clear.

Reported-and-tested-by: Jürg Billeter &lt;j@bitron.ch&gt;
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=25732
Cc: Paulo Zanoni &lt;przanoni@gmail.com&gt;
Cc: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Reviewed-by: Rodrigo Vivi &lt;rodrigo.vivi@gmail.com&gt;
Signed-off-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>drm/i915: fixup infoframe support for sdvo</title>
<updated>2012-11-17T21:14:25+00:00</updated>
<author>
<name>Daniel Vetter</name>
<email>daniel.vetter@ffwll.ch</email>
</author>
<published>2012-05-12T18:22:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9615dee441f72d194234455ed30660c145260973'/>
<id>urn:sha1:9615dee441f72d194234455ed30660c145260973</id>
<content type='text'>
commit 81014b9d0b55fb0b48f26cd2a943359750d532db upstream.

At least the worst offenders:
- SDVO specifies that the encoder should compute the ecc. Testing also
  shows that we must not send the ecc field, so copy the dip_infoframe
  struct to a temporay place and avoid the ecc field. This way the avi
  infoframe is exactly 17 bytes long, which agrees with what the spec
  mandates as a minimal storage capacity (with the ecc field it would
  be 18 bytes).
- Only 17 when sending the avi infoframe. The SDVO spec explicitly
  says that sending more data than what the device announces results
  in undefined behaviour.
- Add __attribute__((packed)) to the avi and spd infoframes, for
  otherwise they're wrongly aligned. Noticed because the avi infoframe
  ended up being 18 bytes large instead of 17. We haven't noticed this
  yet because we don't use the uint16_t fields yet (which are the only
  ones that would be wrongly aligned).

This regression has been introduce by

3c17fe4b8f40a112a85758a9ab2aebf772bdd647 is the first bad commit
commit 3c17fe4b8f40a112a85758a9ab2aebf772bdd647
Author: David Härdeman &lt;david@hardeman.nu&gt;
Date:   Fri Sep 24 21:44:32 2010 +0200

    i915: enable AVI infoframe for intel_hdmi.c [v4]

Patch tested on my g33 with a sdvo hdmi adaptor.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=25732
Tested-by: Peter Ross &lt;pross@xvid.org&gt; (G35 SDVO-HDMI)
Reviewed-by: Eugeni Dodonov &lt;eugeni.dodonov@intel.com&gt;
Signed-Off-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Cc: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>drm/vmwgfx: Fix hibernation device reset</title>
<updated>2012-11-17T21:14:25+00:00</updated>
<author>
<name>Thomas Hellstrom</name>
<email>thellstrom@vmware.com</email>
</author>
<published>2012-11-09T09:05:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d18edec7ace955fad9948eb3ee71d34f79d377ca'/>
<id>urn:sha1:d18edec7ace955fad9948eb3ee71d34f79d377ca</id>
<content type='text'>
commit 95e8f6a21996c4cc2c4574b231c6e858b749dce3 upstream.

The device would not reset properly when resuming from hibernation.

Signed-off-by: Thomas Hellstrom &lt;thellstrom@vmware.com&gt;
Reviewed-by: Brian Paul &lt;brianp@vmware.com&gt;
Reviewed-by: Dmitry Torokhov &lt;dtor@vmware.com&gt;
Cc: linux-graphics-maintainer@vmware.com
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>futex: Handle futex_pi OWNER_DIED take over correctly</title>
<updated>2012-11-17T21:14:25+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2012-10-23T20:29:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2f56580d924c9283292d10567fb7744d9c0fd2b2'/>
<id>urn:sha1:2f56580d924c9283292d10567fb7744d9c0fd2b2</id>
<content type='text'>
commit 59fa6245192159ab5e1e17b8e31f15afa9cff4bf upstream.

Siddhesh analyzed a failure in the take over of pi futexes in case the
owner died and provided a workaround.
See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076

The detailed problem analysis shows:

Futex F is initialized with PTHREAD_PRIO_INHERIT and
PTHREAD_MUTEX_ROBUST_NP attributes.

T1 lock_futex_pi(F);

T2 lock_futex_pi(F);
   --&gt; T2 blocks on the futex and creates pi_state which is associated
       to T1.

T1 exits
   --&gt; exit_robust_list() runs
       --&gt; Futex F userspace value TID field is set to 0 and
           FUTEX_OWNER_DIED bit is set.

T3 lock_futex_pi(F);
   --&gt; Succeeds due to the check for F's userspace TID field == 0
   --&gt; Claims ownership of the futex and sets its own TID into the
       userspace TID field of futex F
   --&gt; returns to user space

T1 --&gt; exit_pi_state_list()
       --&gt; Transfers pi_state to waiter T2 and wakes T2 via
       	   rt_mutex_unlock(&amp;pi_state-&gt;mutex)

T2 --&gt; acquires pi_state-&gt;mutex and gains real ownership of the
       pi_state
   --&gt; Claims ownership of the futex and sets its own TID into the
       userspace TID field of futex F
   --&gt; returns to user space

T3 --&gt; observes inconsistent state

This problem is independent of UP/SMP, preemptible/non preemptible
kernels, or process shared vs. private. The only difference is that
certain configurations are more likely to expose it.

So as Siddhesh correctly analyzed the following check in
futex_lock_pi_atomic() is the culprit:

	if (unlikely(ownerdied || !(curval &amp; FUTEX_TID_MASK))) {

We check the userspace value for a TID value of 0 and take over the
futex unconditionally if that's true.

AFAICT this check is there as it is correct for a different corner
case of futexes: the WAITERS bit became stale.

Now the proposed change

-	if (unlikely(ownerdied || !(curval &amp; FUTEX_TID_MASK))) {
+       if (unlikely(ownerdied ||
+                       !(curval &amp; (FUTEX_TID_MASK | FUTEX_WAITERS)))) {

solves the problem, but it's not obvious why and it wreckages the
"stale WAITERS bit" case.

What happens is, that due to the WAITERS bit being set (T2 is blocked
on that futex) it enforces T3 to go through lookup_pi_state(), which
in the above case returns an existing pi_state and therefor forces T3
to legitimately fight with T2 over the ownership of the pi_state (via
pi_state-&gt;mutex). Probelm solved!

Though that does not work for the "WAITERS bit is stale" problem
because if lookup_pi_state() does not find existing pi_state it
returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
return -ESRCH to user space because the OWNER_DIED bit is not set.

Now there is a different solution to that problem. Do not look at the
user space value at all and enforce a lookup of possibly available
pi_state. If pi_state can be found, then the new incoming locker T3
blocks on that pi_state and legitimately races with T2 to acquire the
rt_mutex and the pi_state and therefor the proper ownership of the
user space futex.

lookup_pi_state() has the correct order of checks. It first tries to
find a pi_state associated with the user space futex and only if that
fails it checks for futex TID value = 0. If no pi_state is available
nothing can create new state at that point because this happens with
the hash bucket lock held.

So the above scenario changes to:

T1 lock_futex_pi(F);

T2 lock_futex_pi(F);
   --&gt; T2 blocks on the futex and creates pi_state which is associated
       to T1.

T1 exits
   --&gt; exit_robust_list() runs
       --&gt; Futex F userspace value TID field is set to 0 and
           FUTEX_OWNER_DIED bit is set.

T3 lock_futex_pi(F);
   --&gt; Finds pi_state and blocks on pi_state-&gt;rt_mutex

T1 --&gt; exit_pi_state_list()
       --&gt; Transfers pi_state to waiter T2 and wakes it via
       	   rt_mutex_unlock(&amp;pi_state-&gt;mutex)

T2 --&gt; acquires pi_state-&gt;mutex and gains ownership of the pi_state
   --&gt; Claims ownership of the futex and sets its own TID into the
       userspace TID field of futex F
   --&gt; returns to user space

This covers all gazillion points on which T3 might come in between
T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
also solves the "WAITERS bit stale" problem by forcing the take over.

Another benefit of changing the code this way is that it makes it less
dependent on untrusted user space values and therefor minimizes the
possible wreckage which might be inflicted.

As usual after staring for too long at the futex code my brain hurts
so much that I really want to ditch that whole optimization of
avoiding the syscall for the non contended case for PI futexes and rip
out the maze of corner case handling code. Unfortunately we can't as
user space relies on that existing behaviour, but at least thinking
about it helps me to preserve my mental sanity. Maybe we should
nevertheless :)

Reported-and-tested-by: Siddhesh Poyarekar &lt;siddhesh.poyarekar@gmail.com&gt;
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos
Acked-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
