Age | Commit message (Collapse) | Author | Files | Lines |
|
mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:
@mmiowb@
@@
- mmiowb();
and invoked as:
$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done
NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation. If you've ended up bisecting to this commit, you can
reintroduce the mmiowb() calls using wmb() instead, which should restore
the old behaviour on all architectures other than some esoteric ia64
systems.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Now when we have the udata passed to all the ib_xxx object creation APIs
and the additional macro 'rdma_udata_to_drv_context' to get the
ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally
start to remove the dependency of the drivers in the
ib_xxx->uobject->context.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Drivers should be using udata to determine if a method is invoked from
user space or kernel space. A pd does not necessarily say a different
objects is kernel or user.
Transforming the tests to use udata eliminates a large number of uobject
references from the drivers.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.
To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:
kmalloc(a * b, gfp)
with:
kmalloc_array(a * b, gfp)
as well as handling cases of:
kmalloc(a * b * c, gfp)
with:
kmalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kmalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kmalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kmalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kmalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kmalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kmalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kmalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kmalloc
+ kmalloc_array
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kmalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kmalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kmalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kmalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kmalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kmalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kmalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kmalloc(sizeof(THING) * C2, ...)
|
kmalloc(sizeof(TYPE) * C2, ...)
|
kmalloc(C1 * C2 * C3, ...)
|
kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * E2
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kmalloc
+ kmalloc_array
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Instead of having firmware command functions return an error and also
a status, leading to code like:
err = mthca_FW_COMMAND(..., &status);
if (err)
goto out;
if (status) {
err = -E...;
goto out;
}
all over the place, just handle the FW status inside the FW command
handling code (the way mlx4 does it), so we can simply write:
err = mthca_FW_COMMAND(...);
if (err)
goto out;
In addition to simplifying the source code, this also saves a healthy
chunk of text:
add/remove: 0/0 grow/shrink: 10/88 up/down: 510/-3357 (-2847)
function old new delta
static.trans_table 324 584 +260
mthca_cmd_poll 352 477 +125
mthca_cmd_wait 511 567 +56
mthca_table_put 213 240 +27
mthca_cleanup_db_tab 372 387 +15
__mthca_remove_one 314 323 +9
mthca_cleanup_user_db_tab 275 283 +8
__mthca_init_one 1738 1746 +8
mthca_cleanup 20 21 +1
mthca_MAD_IFC 1081 1082 +1
mthca_MGID_HASH 43 40 -3
mthca_MAP_ICM_AUX 23 20 -3
mthca_MAP_ICM 19 16 -3
mthca_MAP_FA 23 20 -3
mthca_READ_MGM 43 38 -5
mthca_QUERY_SRQ 43 38 -5
mthca_QUERY_QP 59 54 -5
mthca_HW2SW_SRQ 43 38 -5
mthca_HW2SW_MPT 60 55 -5
mthca_HW2SW_EQ 43 38 -5
mthca_HW2SW_CQ 43 38 -5
mthca_free_icm_table 120 114 -6
mthca_query_srq 214 206 -8
mthca_free_qp 662 654 -8
mthca_cmd 38 28 -10
mthca_alloc_db 1321 1311 -10
mthca_setup_hca 1067 1055 -12
mthca_WRITE_MTT 35 22 -13
mthca_WRITE_MGM 40 27 -13
mthca_UNMAP_ICM_AUX 36 23 -13
mthca_UNMAP_FA 36 23 -13
mthca_SYS_DIS 36 23 -13
mthca_SYNC_TPT 36 23 -13
mthca_SW2HW_SRQ 35 22 -13
mthca_SW2HW_MPT 35 22 -13
mthca_SW2HW_EQ 35 22 -13
mthca_SW2HW_CQ 35 22 -13
mthca_RUN_FW 36 23 -13
mthca_DISABLE_LAM 36 23 -13
mthca_CLOSE_IB 36 23 -13
mthca_CLOSE_HCA 38 25 -13
mthca_ARM_SRQ 39 26 -13
mthca_free_icms 178 164 -14
mthca_QUERY_DDR 389 375 -14
mthca_resize_cq 1063 1048 -15
mthca_unmap_eq_icm 123 107 -16
mthca_map_eq_icm 396 380 -16
mthca_cmd_box 90 74 -16
mthca_SET_IB 433 417 -16
mthca_RESIZE_CQ 369 353 -16
mthca_MAP_ICM_page 240 224 -16
mthca_MAP_EQ 183 167 -16
mthca_INIT_IB 473 457 -16
mthca_INIT_HCA 745 729 -16
mthca_map_user_db 816 798 -18
mthca_SYS_EN 157 139 -18
mthca_cleanup_qp_table 78 59 -19
mthca_cleanup_eq_table 168 149 -19
mthca_UNMAP_ICM 143 121 -22
mthca_modify_srq 172 149 -23
mthca_unmap_fmr 198 174 -24
mthca_query_qp 814 790 -24
mthca_query_pkey 343 319 -24
mthca_SET_ICM_SIZE 34 10 -24
mthca_QUERY_DEV_LIM 1870 1846 -24
mthca_map_cmd 1130 1105 -25
mthca_ENABLE_LAM 401 375 -26
mthca_modify_port 247 220 -27
mthca_query_device 884 850 -34
mthca_NOP 75 41 -34
mthca_table_get 287 249 -38
mthca_init_qp_table 333 293 -40
mthca_MODIFY_QP 348 308 -40
mthca_close_hca 131 89 -42
mthca_free_eq 435 390 -45
mthca_query_port 755 705 -50
mthca_free_cq 581 528 -53
mthca_alloc_icm_table 578 524 -54
mthca_multicast_attach 1041 986 -55
mthca_init_hca 326 271 -55
mthca_query_gid 487 431 -56
mthca_free_srq 524 468 -56
mthca_free_mr 168 111 -57
mthca_create_eq 1560 1501 -59
mthca_multicast_detach 790 728 -62
mthca_write_mtt 918 854 -64
mthca_register_device 1406 1342 -64
mthca_fmr_alloc 947 883 -64
mthca_mr_alloc 652 582 -70
mthca_process_mad 1242 1164 -78
mthca_dev_lim 910 830 -80
find_mgm 482 400 -82
mthca_modify_qp 3852 3753 -99
mthca_init_cq 1281 1181 -100
mthca_alloc_srq 1719 1610 -109
mthca_init_eq_table 1807 1679 -128
mthca_init_tavor 761 491 -270
mthca_init_arbel 2617 2098 -519
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
|
|
They don't get updated by git and so they're worse than useless.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
We have recently discovered that Tavor mode requires each WQE in a
posted list of receive WQEs to have a valid NDA field at all times.
This requirement holds true for regular QPs as well as for SRQs. This
patch prelinks the receive queue in a regular QP and keeps the free
list in SRQ always properly linked.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
The SRQ receive posting functions make sure that srq->first_free never
becomes negative, so we can remove tests of whether it is negative.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Architectures such as ia64 see alignment traps when doing a 64-bit
read from __be32 doorbell[2] arrays to do doorbell writes in
mthca_write64(). Fix this by just passing the two halves of the
doorbell value into mthca_write64(). This actually improves the
generated code by allowing the compiler to see what's going on better.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Factor code to set data segment entries out of the work request
posting functions into inline functions mthca_set_data_seg() and
mthca_set_data_seg_inval(). This makes the code more readable and
also allows the compiler to do a better job -- on x86_64:
add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-69 (-69)
function old new delta
mthca_arbel_post_srq_recv 373 369 -4
mthca_arbel_post_receive 570 562 -8
mthca_tavor_post_srq_recv 520 508 -12
mthca_tavor_post_send 1344 1330 -14
mthca_arbel_post_send 1481 1467 -14
mthca_tavor_post_receive 792 775 -17
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For some reason gcc-3.4.5 on sparc64 does:
WARNING: "____ilog2_NaN" [drivers/infiniband/hw/mthca/ib_mthca.ko] undefined!
Points to note:
(1) The asm volatile flush/flushw are just markers for viewing what comes out
in the assembly; removing them has no effect on the result.
(2) Changing almost anything else in dwh__mthca_arbel_init_srq_context() or
dwh__mthca_alloc_srq() causes the problem to go away.
The compiler command line issued by the kernel build is:
/opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -fno-strict-aliasing -fno-common -Os -m64 -mno-fpu -mcpu=ultrasparc -mcmodel=medlow -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wa,--undeclared-regs -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls -fasynchronous-unwind-tables -g -c -o drivers/infiniband/hw/mthca/.tmp_mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c
This can be reduced to this whilst still retaining the problem:
/opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -m64 -c -o drivers/infiniband/hw/mthca/mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c -Os
Removing -Os or changing it to -O or -O0 thru -O6 gets rid of the problem.
This patch to the kernel code fixes the problem:
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
This facility provides three entry points:
ilog2() Log base 2 of unsigned long
ilog2_u32() Log base 2 of u32
ilog2_u64() Log base 2 of u64
These facilities can either be used inside functions on dynamic data:
int do_something(long q)
{
...;
y = ilog2(x)
...;
}
Or can be used to statically initialise global variables with constant values:
unsigned n = ilog2(27);
When performing static initialisation, the compiler will report "error:
initializer element is not constant" if asked to take a log of zero or of
something not reducible to a constant. They treat negative numbers as
unsigned.
When not dealing with a constant, they fall back to using fls() which permits
them to use arch-specific log calculation instructions - such as BSR on
x86/x86_64 or SCAN on FRV - if available.
[akpm@osdl.org: MMC fix]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Wojtek Kaniewski <wojtekka@toxygen.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
When initializing an mthca SRQ, the log_srq_size field should be the
log of the number of SRQ WQEs, not the log of the number of bytes in
the SRQ.
This affects only mthca drivers for memfree HCAs which set the initial
srq wqe counter (in the SW2HW transition) to a non-zero value.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Commit b3b30f5e ("IB/mthca: Recover from catastrophic errors")
introduced some section mismatch breakage, because the error recovery
code tears down and reinitializes the device, which calls into lots of
code originally marked __devinit and __devexit from regular .text.
Fix this by getting rid of these now-incorrect section markers.
Reported by Randy Dunlap <randy.dunlap@oracle.com>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
We discovered a problem when running IPoIB applications on multiple
CPUs on an Altix system. Many messages such as:
ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 0 nreq)
appear in syslog, and the driver wedges up.
Apparently this is because writes to the doorbells from different CPUs
reach the device out of order. The following patch adds mmiowb() calls
after doorbell rings to ensure the doorbell writes are ordered.
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
All HCAs (not just mem-free) need a spare SRQ entry, so bump srq->max
by 1 in all cases.
Noted by Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Pass a struct ib_udata to the low-level driver's ->modify_srq() and
->modify_qp() methods, so that it can get to the device-specific data
passed in by the userspace driver.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Mem-free HCAs always keep one spare SRQ WQE, so the SRQ limit cannot
be set beyond srq->max - 1.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Documentation/infiniband/core_locking.txt says:
All of the methods in struct ib_device exported by a low-level
driver must be fully reentrant. The low-level driver is required to
perform all synchronization necessary to maintain consistency, even
if multiple function calls using the same object are run
simultaneously.
However, mthca's modify_qp, modify_srq and resize_cq methods are
currently not reentrant. Add a mutex to the QP, SRQ and CQ structures
so that these calls can be properly serialized.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
If we post a list of length exactly a multiple of 256, nreq in
doorbell gets set to 256 which is wrong: it should be encoded by 0.
This is because we only zero it out on the next WR, which may not be
there. The solution is to ring the doorbell after posting a WQE, not
before posting the next one.
This is the same bug that we just fixed for QPs with non-shared RQ.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Fix races in in destroying various objects. If a destroy routine
waits for an object to become free by doing
wait_event(&obj->wait, !atomic_read(&obj->refcount));
/* now clean up and destroy the object */
and another place drops a reference to the object by doing
if (atomic_dec_and_test(&obj->refcount))
wake_up(&obj->wait);
then this is susceptible to a race where the wait_event() and final
freeing of the object occur between the atomic_dec_and_test() and the
wake_up(). And this is a use-after-free, since wake_up() will be
called on part of the already-freed object.
Fix this in mthca by replacing the atomic_t refcounts with plain old
integers protected by a spinlock. This makes it possible to do the
decrement of the reference count and the wake_up() so that it appears
as a single atomic operation to the code waiting on the wait queue.
While touching this code, also simplify mthca_cq_clean(): the CQ being
cleaned cannot go away, because it still has a QP attached to it. So
there's no reason to be paranoid and look up the CQ by number; it's
perfectly safe to use the pointer that the callers already have.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
The driver allocates SRQ WQEs size with a power of 2 size both for
Tavor and for memfree. For Tavor, however, the hardware only requires
the WQE size to be a multiple of 16, not a power of 2, and the max
number of scatter-gather allowed is reported accordingly by the
firmware (and this is the value currently returned by
ib_query_device() and ibv_query_device()).
If the max number of scatter/gather entries reported by the FW is used
when creating an SRQ, the creation will fail for Tavor, since the
required WQE size will be increased to the next power of 2, which
turns out to be larger than the device permitted max WQE size (which
is not a power of 2).
This patch reduces the reported SRQ max wqe size so that it can be used
successfully in creating an SRQ on Tavor HCAs.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Quite a few cleanup functions in mthca were marked as __devexit.
However, they could also be called from error paths during
initialization, so they cannot be marked that way. Just delete all of
the incorrect annotations.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
The previous patch for Tavor broke MemFree logic.
The driver should perform limit check only for Tavor. For MemFree,
the check is incorrect, since ds (WQE stride) is always a power-of-2
(although the max_desc_size may not be).
In Tavor, however, WQE stride and desc_size are the same, and are not
necessarily power-of-2. The check was really for the WQE stride (and
it Tavor, we use max_desc_size for the stride).
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
When setting the shared receive queue (SRQ) watermark in a modify SRQ
operation, make sure that the supplied value is not larger than the
full size of the SRQ.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Guarantee the calculated work queue entry size does not exceed the max
allowable WQE size when creating an SRQ. This is a problem with Arbel
in Tavor-compatibility mode because the current WQE size computation
method rounds up to next power of 2.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Fix endianness handling of srq_limit: it is big-endian in the context
structure, so we need to swab it before returning it.
Also add support for srq_limit query for Tavor (non-MemFree) HCAs.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
MemFree devices need to reserve one shared receive queue (SRQ) work
request for internal use, so the capacity returned from the create_srq
and query_srq methods should be srq->max - 1.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Have mthca's create_srq method return the actual capacity of the SRQ
that gets created. Also update comments in <rdma/ib_verbs.h> to
clarify that this is what is expected from ib_create_srq().
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Implement the query_qp and query_srq methods in mthca.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Remove trailing whitespace and fix indentation that with spaces
instead of tabs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Thinko: 64 bytes is the minimum SRQ WQE size (not the maximum).
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
|
|
In Tavor mode, when posting a long list of receive work requests, a
doorbell must be rung every 256 requests. Add code to do this when
required.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch. This should now allow not to include sched.h
from module.h, which is done by a followup patch.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix wqe_to_link() to use a structure field that we know is definitely
always unused for receive work requests, so that it really avoids the
free list corruption bug that the comment claims it does.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Check the sizes of CQs, QPs and SRQs when creating objects, and fail
instead of creating too-big queues. Also return real limits instead
of just plausible-sounding values from mthca_query_device().
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
The hardware relies on us keeping one extra work request that never
gets used in SRQs. Add checks to the SRQ work request posting
functions so that they fail when someone is about to use up that extra
work request, rather than when someone uses the very last work request.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Our hardware supports generating an event when the number of receives
posted to a shared receive queue (SRQ) falls below a user-specified
limit. Implement mthca_modify_srq() to arm the limit, and add code to
handle dispatching SRQ events when they occur.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Userspace SRQs don't have a buffer allocated for them in the kernel, so
it doesn't make sense to set srq->last during initialization. In fact,
this can crash trying to follow a nonexistent buffer pointer.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
The error handling paths in mthca_tavor_post_srq_recv() and
mthca_arbel_post_srq_recv() are quite bogus, the result of a
screwed up merge. Fix them so they work as intended.
Pointed out by Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Fix posting first WQE for mem-free HCAs: we need to link to previous
WQE even in that case. While we're at it, simplify code for
Tavor-mode HCAs. We don't really need the conditional test there
either; we can similarly always link to the previous WQE.
Based on Michael S. Tsirkin's analogous fix for userspace libmthca.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
Add mthca support for shared receive queues (SRQs),
including userspace SRQs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|