kernel/linux.git/drivers/infiniband, branch linux-2.6.18.y

IB/mad: Fix race between cancel and receive completion

2007-02-23T23:49:51+00:00

When ib_cancel_mad() is called, it puts the canceled send on a list and schedules a "flushed" callback from process context. However, this leaves a window where a receive completion could be processed before the send is fully flushed. This is fine, except that ib_find_send_mad() will find the MAD and return it to the receive processing, which results in the sender getting both a successful receive and a "flushed" send completion for the same request. Understandably, this confuses the sender, which is expecting only one of these two callbacks, and leads to grief such as a use-after-free in IPoIB. Fix this by changing ib_find_send_mad() to return a send struct only if the status is still successful (and not "flushed"). The search of the send_list already had this check, so this patch just adds the same check to the search of the wait_list. Signed-off-by: Roland Dreier Signed-off-by: Chris Wright Signed-off-by: Greg Kroah-Hartman

IB/srp: Fix FMR mapping for 32-bit kernels and addresses above 4G

2007-02-23T23:49:51+00:00

struct srp_device.fmr_page_mask was unsigned long, which means that the top part of addresses above 4G was being chopped off on 32-bit architectures. Of course nothing good happens when data from SRP targets is DMAed to the wrong place. Fix this by changing fmr_page_mask to u64, to match the addresses actually used by IB devices. Thanks to Brian Cain and David McMillen for help diagnosing the bug and testing the fix. Signed-off-by: Roland Dreier Signed-off-by: Chris Wright Signed-off-by: Greg Kroah-Hartman

[PATCH] IB/mthca: Use mmiowb after doorbell ring

2006-11-04T01:33:47+00:00

We discovered a problem when running IPoIB applications on multiple CPUs on an Altix system. Many messages such as: ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 0 nreq) appear in syslog, and the driver wedges up. Apparently this is because writes to the doorbells from different CPUs reach the device out of order. The following patch adds mmiowb() calls after doorbell rings to ensure the doorbell writes are ordered. Signed-off-by: Arthur Kepner Signed-off-by: Roland Dreier Signed-off-by: Greg Kroah-Hartman Signed-off-by: Chris Wright

[PATCH] IPoIB: Rejoin all multicast groups after a port event

2006-11-04T01:33:47+00:00

When ipoib_ib_dev_flush() is called because of a port event, the driver needs to rejoin all multicast groups, since the flush will call ipoib_mcast_dev_flush() (via ipoib_ib_dev_down()). Otherwise no (non-broadcast) multicast groups will be rejoined until the networking core calls ->set_multicast_list again, and so multicast reception will be broken for potentially a long time. Signed-off-by: Eli Cohen Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier Signed-off-by: Greg Kroah-Hartman Signed-off-by: Chris Wright

IB/mthca: Fix lid used for sending traps

2006-10-13T20:23:21+00:00

The SM LID used to send traps to is incorrectly set to port LID. This is a regression from 2.6.17 -- after a PortInfo MAD is received, no traps are sent to the SM LID. The traps go to the loopback interface instead, and are dropped there. The SM LID should be taken from the sm_lid of the PortInfo response. The bug was introduced by commit 12bbb2b7be7f5564952ebe0196623e97464b8ac5: IB/mthca: Add client reregister event generation Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier Signed-off-by: Greg Kroah-Hartman

RDMA/cma: Increase the IB CM retry count in CMA

2006-09-14T20:55:30+00:00

3 seems like a low number of IB Communication Manager retries to set; we see connections failing under stress, and in any case 3 just looks like an arbitrary number. 15 is the max value allowed by the InfiniBand spec. Signed-off-by: Michael S. Tsirkin Acked-by: Sean Hefty Signed-off-by: Roland Dreier

IPoIB: Retry failed send-only multicast group joins

2006-09-14T20:51:41+00:00

When a send-only multicast group join fails, mcast->query must be set to NULL. Otherwise, IPoIB will never retry the join and the multicast group will never be reachable. Signed-off-by: Eli Cohen Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier

IB/srp: Don't schedule reconnect from srp

2006-09-14T20:51:40+00:00

If there is a problem in the connection, the SCSI mid-layer will eventually call srp_reset_host(), which will call srp_reconnect(), so we do not need to schedule a call to srp_reconnect_work() from srp_completion(). Removing this prevents srp_reset_host() from failing if a reconnect scheduled from srp_completion() is already in progress, which in turn was causing crashes as both SCSI midlayer and srp_reconnect() were cancelling commands. Signed-off-by: Ishai Rabinovitz Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier

IB/mthca: Use IRQ safe locks to protect allocation bitmaps

2006-09-01T00:25:56+00:00

It is supposed to be OK to call mthca_create_ah() and mthca_destroy_ah() from any context. However, for mem-full HCAs, these functions use the mthca_alloc() and mthca_free() bitmap helpers, and those helpers use non-IRQ-safe spin_lock() internally. Lockdep correctly warns that this could lead to a deadlock. Fix this by changing mthca_alloc() and mthca_free() to use spin_lock_irqsave(). Signed-off-by: Roland Dreier

Merge gregkh@master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6

2006-08-26T20:04:23+00:00