summaryrefslogtreecommitdiff
path: root/drivers/ntb/ntb_transport.c
AgeCommit message (Collapse)AuthorFilesLines
2018-06-13Merge tag 'overflow-v4.18-rc1-part2' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more overflow updates from Kees Cook: "The rest of the overflow changes for v4.18-rc1. This includes the explicit overflow fixes from Silvio, further struct_size() conversions from Matthew, and a bug fix from Dan. But the bulk of it is the treewide conversions to use either the 2-factor argument allocators (e.g. kmalloc(a * b, ...) into kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a * b) into vmalloc(array_size(a, b)). Coccinelle was fighting me on several fronts, so I've done a bunch of manual whitespace updates in the patches as well. Summary: - Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees)" * tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits) treewide: Use array_size in f2fs_kvzalloc() treewide: Use array_size() in f2fs_kzalloc() treewide: Use array_size() in f2fs_kmalloc() treewide: Use array_size() in sock_kmalloc() treewide: Use array_size() in kvzalloc_node() treewide: Use array_size() in vzalloc_node() treewide: Use array_size() in vzalloc() treewide: Use array_size() in vmalloc() treewide: devm_kzalloc() -> devm_kcalloc() treewide: devm_kmalloc() -> devm_kmalloc_array() treewide: kvzalloc() -> kvcalloc() treewide: kvmalloc() -> kvmalloc_array() treewide: kzalloc_node() -> kcalloc_node() treewide: kzalloc() -> kcalloc() treewide: kmalloc() -> kmalloc_array() mm: Introduce kvcalloc() video: uvesafb: Fix integer overflow in allocation UBIFS: Fix potential integer overflow in allocation leds: Use struct_size() in allocation Convert intel uncore to struct_size ...
2018-06-13treewide: kzalloc_node() -> kcalloc_node()Kees Cook1-2/+2
The kzalloc_node() function has a 2-factor argument form, kcalloc_node(). This patch replaces cases of: kzalloc_node(a * b, gfp, node) with: kcalloc_node(a * b, gfp, node) as well as handling cases of: kzalloc_node(a * b * c, gfp, node) with: kzalloc_node(array3_size(a, b, c), gfp, node) as it's slightly less ugly than: kcalloc_node(array_size(a, b), c, gfp, node) This does, however, attempt to ignore constant size factors like: kzalloc_node(4 * 1024, gfp, node) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc_node( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc_node( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc_node( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc_node( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc_node( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc_node( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc_node( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc_node( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc_node( - sizeof(char) * COUNT + COUNT , ...) | kzalloc_node( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc_node + kcalloc_node ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc_node( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc_node( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc_node( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc_node( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc_node( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc_node( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc_node( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc_node( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc_node( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc_node( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc_node( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc_node( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc_node( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc_node( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc_node( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc_node( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc_node( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc_node( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc_node( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc_node(C1 * C2 * C3, ...) | kzalloc_node( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc_node( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc_node( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc_node( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc_node(sizeof(THING) * C2, ...) | kzalloc_node(sizeof(TYPE) * C2, ...) | kzalloc_node(C1 * C2 * C3, ...) | kzalloc_node(C1 * C2, ...) | - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc_node + kcalloc_node ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc_node + kcalloc_node ( - (E1) * E2 + E1, E2 , ...) | - kzalloc_node + kcalloc_node ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc_node + kcalloc_node ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-11ntb: ntb_transport: Replace GFP_ATOMIC with GFP_KERNEL in ↵Jia-Ju Bai1-2/+2
ntb_transport_create_queue ntb_transport_create_queue() is never called in atomic context. ntb_transport_create_queue() is only called by ntb_netdev_probe(), which is set as ".probe" in struct ntb_transport_client. Despite never getting called from atomic context, ntb_transport_create_queue() calls kzalloc_node() with GFP_ATOMIC, which does not sleep for allocation. GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which can sleep and improve the possibility of sucessful allocation. This is found by a static analysis tool named DCNS written by myself. And I also manually check it Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-06-11ntb: ntb_transport: Replace GFP_ATOMIC with GFP_KERNEL in ↵Jia-Ju Bai1-1/+1
ntb_transport_setup_qp_mw ntb_transport_setup_qp_mw() is never called in atomic context. ntb_transport_setup_qp_mw() is only called by ntb_transport_link_work(), which is set as a parameter of INIT_DELAYED_WORK() in ntb_transport_probe(). Despite never getting called from atomic context, ntb_transport_setup_qp_mw() calls kzalloc_node() with GFP_ATOMIC, which does not sleep for allocation. GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which can sleep and improve the possibility of sucessful allocation. This is found by a static analysis tool named DCNS written by myself. And I also manually check it. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2018-01-29ntb_transport: Fix bug with max_mw_size parameterLogan Gunthorpe1-0/+3
When using the max_mw_size parameter of ntb_transport to limit the size of the Memory windows, communication cannot be established and the queues freeze. This is because the mw_size that's reported to the peer is correctly limited but the size used locally is not. So the MW is initialized with a buffer smaller than the window but the TX side is using the full window. This means the TX side will be writing to a region of the window that points nowhere. This is easily fixed by applying the same limit to tx_size in ntb_transport_init_queue(). Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-11-19NTB: Ensure ntb_mw_get_align() is only called when the link is upLogan Gunthorpe1-10/+10
With Switchtec hardware it's impossible to get the alignment parameters for a peer's memory window until the peer's driver has configured its windows. Strictly speaking, the link doesn't have to be up for this, but the link being up is the only way the client can tell that the other side has been configured. This patch converts ntb_transport and ntb_perf to use this function after the link goes up. This simplifies these clients slightly because they no longer have to store the alignment parameters. It also tweaks ntb_tool so that peer_mw_trans will print zero if it is run before the link goes up. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-08-01ntb: transport shouldn't disable link due to bogus values in SPADsDave Jiang1-3/+1
It seems that under certain scenarios the SPAD can have bogus values caused by an agent (i.e. BIOS or other software) that is not the kernel driver, and that causes memory window setup failure. This should not cause the link to be disabled because if we do that, the driver will never recover again. We have verified in testing that this issue happens and prevents proper link recovery. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> Fixes: 84f766855f61 ("ntb: stop link work when we do not have memory")
2017-07-17ntb: use correct mw_count function in ntb_tool and ntb_transportLogan Gunthorpe1-1/+1
After converting to the new API, both ntb_tool and ntb_transport are using ntb_mw_count to iterate through ntb_peer_get_addr when they should be using ntb_peer_mw_count. This probably isn't an issue with the Intel and AMD drivers but this will matter for any future driver with asymetric memory window counts. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> Fixes: 443b9a14ecbe ("NTB: Alter MW API to support multi-ports devices")
2017-07-06NTB: Alter Scratchpads API to support multi-ports devicesSerge Semin1-8/+7
Even though there is no any real NTB hardware, which would have both more than two ports and Scratchpad registers, it is logically correct to have Scratchpad API accepting a peer port index as well. Intel/AMD drivers utilize Primary and Secondary topology to split Scratchpad between connected root devices. Since port-index API introduced, Intel/AMD NTB hardware drivers can use device port to determine which Scratchpad registers actually belong to local and peer devices. The same approach can be used if some potential hardware in future will be multi-port and have some set of Scratchpads. Here are the brief of changes in the API: ntb_spad_count() - return number of Scratchpads per each port ntb_peer_spad_addr(pidx, sidx) - address of Scratchpad register of the peer device with pidx-index ntb_peer_spad_read(pidx, sidx) - read specified Scratchpad register of the peer with pidx-index ntb_peer_spad_write(pidx, sidx) - write data to Scratchpad register of the peer with pidx-index Since there is hardware which doesn't support Scratchpad registers, the corresponding API methods are now made optional. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-07-06NTB: Alter MW API to support multi-ports devicesSerge Semin1-5/+16
Multi-port NTB devices permit to share a memory between all accessible peers. Memory Windows API is altered to correspondingly initialize and map memory windows for such devices: ntb_mw_count(pidx); - number of inbound memory windows, which can be allocated for shared buffer with specified peer device. ntb_mw_get_align(pidx, widx); - get alignment and size restriction parameters to properly allocate inbound memory region. ntb_peer_mw_count(); - get number of outbound memory windows. ntb_peer_mw_get_addr(widx); - get mapping address of an outbound memory window If hardware supports inbound translation configured on the local ntb port: ntb_mw_set_trans(pidx, widx); - set translation address of allocated inbound memory window so a peer device could access it. ntb_mw_clear_trans(pidx, widx); - clear the translation address of an inbound memory window. If hardware supports outbound translation configured on the peer ntb port: ntb_peer_mw_set_trans(pidx, widx); - set translation address of a memory window retrieved from a peer device ntb_peer_mw_clear_trans(pidx, widx); - clear the translation address of an outbound memory window Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-07-06NTB: Add indexed ports NTB APISerge Semin1-0/+6
There is some NTB hardware, which can combine more than just two domains over NTB. For instance, some IDT PCIe-switches can have NTB-functions activated on more than two-ports. The different domains are distinguished by ports they are connected to. So the new port-related methods are added to the NTB API: ntb_port_number() - return local port ntb_peer_port_count() - return number of peers local port can connect to ntb_peer_port_number(pdix) - return port number by it index ntb_peer_port_idx(port) - return port index by it number Current test-drivers aren't changed much. They still support two-ports devices for the time being while multi-ports hardware drivers aren't added. By default port-related API is declared for two-ports hardware. So corresponding hardware drivers won't need to implement it. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-06-19ntb: no sleep in ntb_async_tx_submitAllen Hubbe1-43/+7
Do not sleep in ntb_async_tx_submit, which could deadlock. This reverts commit "8c874cc140d667f84ae4642bb5b5e0d6396d2ca4" Fixes: 8c874cc140d6 ("NTB: Address out of DMA descriptor issue with NTB") Reported-by: Jia-Ju Bai <baijiaju1990@163.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@dell.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-06-19ntb_transport: fix bug calculating num_qps_mwLogan Gunthorpe1-2/+2
A divide by zero error occurs if qp_count is less than mw_count because num_qps_mw is calculated to be zero. The calculation appears to be incorrect. The requirement is for num_qps_mw to be set to qp_count / mw_count with any remainder divided among the earlier mws. For example, if mw_count is 5 and qp_count is 12 then mws 0 and 1 will have 3 qps per window and mws 2 through 4 will have 2 qps per window. Thus, when mw_num < qp_count % mw_count, num_qps_mw is 1 higher than when mw_num >= qp_count. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers") Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-06-19ntb_transport: fix qp count bugLogan Gunthorpe1-2/+2
In cases where there are more mw's than spads/2-2, the mw count gets reduced to match the limitation. ntb_transport also tries to ensure that there are fewer qps than mws but uses the full mw count instead of the reduced one. When this happens, the math in 'ntb_transport_setup_qp_mw' will get confused and result in a kernel paging request bug. This patch fixes the bug by reducing qp_count to the reduced mw count instead of the full mw count. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers") Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-02-17ntb_transport: Pick an unused queueThomas VanSelus1-1/+1
Fix typo causing ntb_transport_create_queue to select the first queue every time, instead of using the next free queue. Signed-off-by: Thomas VanSelus <tvanselus@xes-inc.com> Signed-off-by: Aaron Sierra <asierra@xes-inc.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Fixes: fce8a7bb5 ("PCI-Express Non-Transparent Bridge Support") Signed-off-by: Jon Mason <jdmason@kudzu.us>
2017-02-17NTB: ntb_transport: fix debugfs_remove_recursiveAllen Hubbe1-2/+1
The call to debugfs_remove_recursive(qp->debugfs_dir) of the sub-level directory must not be later than debugfs_remove_recursive(nt_debugfs_dir) of the top-level directory. Otherwise, the sub-level directory will not exist, and it would be invalid (panic) to attempt to remove it. This removes the top-level directory last, after sub-level directories have been cleaned up. Signed-off-by: Allen Hubbe <Allen.Hubbe@dell.com> Fixes: e26a5843f ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-12-24ntb_transport: Remove unnecessary call to ntb_peer_spad_readSteve Wahl1-1/+0
The results were previously ignored, anyway. Signed-off-by: Steve Wahl <Steve.Wahl@dell.com> Fixes: e26a5843f7f5014ae4460030ca4de029a3ac35d3 Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-12-24ntb_transport: Limit memory windows based on available, scratchpadsShyam Sundar S K1-12/+16
When the underlying NTB H/W driver advertises more memory windows than the number of scratchpads available to setup MW's, it is likely that we may end up filling the remaining memory windows with garbage. So to avoid that, lets limit the memory windows that transport driver can setup based on the available scratchpads. Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-11-14ntb_transport: make DMA_OUT_RESOURCE_TO HZ independentNicholas Mc Guire1-1/+1
schedule_timeout_* takes a timeout in jiffies but the code currently is passing in a constant which makes this timeout HZ dependent, so pass it through msecs_to_jiffies() to fix this up. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-08ntb: add DMA error handling for RX DMADave Jiang1-16/+67
Adding support on the rx DMA path to allow recovery of errors when DMA responds with error status and abort all the subsequent ops. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Cc: Jon Mason <jdmason@kudzu.us> Cc: linux-ntb@googlegroups.com Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-08-08ntb: add DMA error handling for TX DMADave Jiang1-27/+83
Adding support on the tx DMA path to allow recovery of errors when DMA responds with error status and abort all the subsequent ops. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Cc: Jon Mason <jdmason@kudzu.us> Cc: linux-ntb@googlegroups.com Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-08-05ntb_transport: Check the number of spads the hardware supportsLogan Gunthorpe1-2/+7
I'm working on hardware that currently has a limited number of scratchpad registers and ntb_ndev fails with no clue as to why. I feel it is better to fail early and provide a reasonable error message then to fail later on. The same is done to ntb_perf, but it doesn't currently require enough spads to actually fail. I've also removed the unused SPAD_MSG and SPAD_ACK enums so that MAX_SPAD accurately reflects the number of spads used. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-08-05NTB: allocate number transport entries depending on size of ring sizeDave Jiang1-2/+27
Currently we only allocate a fixed default number of descriptors for the tx and rx side. We should dynamically resize it to the number of descriptors resides in the transport rings. We should know the number of transmit descriptors at initializaiton. We will allocate the default number of descriptors for receive side and allocate additional ones when we know the actual max entries for receive. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <allen.hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-03-18ntb: stop link work when we do not have memoryDave Jiang1-1/+8
Instead of keep trying to go through the init routine when we aren't able to allocate memory, we should just stop and go down. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-03-18ntb: stop tasklet from spinning forever during shutdown.Dave Jiang1-6/+16
We can leave tasklet spinning forever if we disable the tasklet during qp shutdown and the tasklets are still being kicked off. This hopefully should avoid that race condition. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reported-by: Alex Depoutovitch <alex@pernixdata.com> Tested-by: Alex Depoutovitch <alex@pernixdata.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-01-11NTB: Address out of DMA descriptor issue with NTBDave Jiang1-7/+41
The transport right now does not handle the case where we run out of DMA descriptors. We just fail when we do not succeed. Adding code to retry for a bit attempting to use the DMA engine instead of instantly fail to CPU copy. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2016-01-11NTB: ntb_process_tx error path bugJon Mason1-1/+1
The transmit overrun avoidance error path in ntb_process_tx accidentally swapped the first two values being passed to the tx_handler client. This could result in crashes in the ntb_netdev (or other out-of-tree NTB clients). Reported-by: Alex Depoutovitch <alex@pernixdata.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-11-09NTB: fix 32-bit compiler warningArnd Bergmann1-2/+2
resource_size_t may be 32-bit wide on some architectures, which causes this warning when building the NTB code: drivers/ntb/ntb_transport.c: In function 'ntb_transport_link_work': drivers/ntb/ntb_transport.c:828:46: warning: right shift count >= width of type [-Wshift-count-overflow] The warning is harmless but can be avoided by using the upper_32_bits() macro. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers") Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-11-09NTB: invalid buf pointer in multi-MW setupsJon Mason1-2/+2
Order of operations issue with the QP Num and MW count, which would result in the receive buffer pointer being invalid if there are more than 1 MW. Corrected with parenthesis to enforce the proper order of operations. Reported-by: John I. Kading <John.Kading@gd-ms.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-11-09NTB: remove unused variableSudip Mukherjee1-4/+0
These variables were not used anywhere. So remove them. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-11-09NTB: fix access of free-ed pointerSudip Mukherjee1-8/+7
We were accessing nt->mw_vec after freeing it. Fix the error path so that we free nt->mw_vec after we have finished using it. Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-11-09NTB: Fix issue where we may be accessing NULL ptrDave Jiang1-8/+9
smatch detected an issue in the function ntb_transport_max_size() where we could be dereferencing a dma channel pointer when it is NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-09-07NTB: Use unique DMA channels for TX and RXDave Jiang1-19/+58
Allocate two DMA channels, one for TX operation and one for RX operation, instead of having one DMA channel for everything. This provides slightly better performance, and also will make error handling cleaner later on. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-09-07NTB: Remove dma_sync_wait from ntb_async_rxAllen Hubbe1-9/+3
The dma_sync_wait can hurt the performance of workloads mixed with both large and small frames. Large frames will be copied using the dma engine. Small frames will be copied by the cpu. The dma_sync_wait prevents the cpu and dma engine copying in parallel. In the period where the cpu is copying, the dma engine is stopped. The dma engine is not doing any useful work to copy large frames during that time, and the additional time to restart the dma engine for the next large frame. This will decrease the throughput for the portion of a workload with large frames. In the period where the dma engine is copying, the cpu is held up waiting for dma to complete. The small frames processing will be delayed until the dma is complete. The RX frames are completed in-order, and the processing of small frames takes very little time, so dma_sync_wait may have an insignificant impact on the respose time of frames. The more significant impact is to the system, because the delay in dma_sync_wait is implemented as busy non-blocking wait. This can prevent the delayed core from doing any useful work, even if it could be processing work for other drivers, unrelated to transport RX processing. After applying the earlier patch to fix out-of-order RX acknoledgement, the dma_sync_wait is no longer necessary. Remove it, so that cpu memcpy will proceed immediately for small frames, in parallel with ongoing dma for large frames. Do not hold up the cpu from doing work while dma is in progress. The prior fix will continue to ensure in-order completion of the RX frames to the upper layer, and in-order delivery of the RX acknoledgement. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-09-07NTB: Clean up QP stats infoDave Jiang1-9/+16
Make QP stats info more readable for debugging purposes. Also add an entry to indicate whether DMA is being used. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-09-07NTB: Make the transport list in order of discoveryDave Jiang1-1/+1
The list should be added from the bottom and not the top in order to ensure the transport is provided in the same order to clients as ntb devices are discovered. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-09-07NTB: Add flow control to the ntb_netdevDave Jiang1-1/+17
Right now if we push the NTB really hard, we start dropping packets due to not able to process the packets fast enough. We need to st:qop the upper layer from flooding us when that happens. A timer is necessary in order to restart the queue once the resource has been processed on the receive side. Due to the way NTB is setup, the resources on the tx side are tied to the processing of the rx side and there's no async way to know when the rx side has released those resources. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-08-09NTB: Fix dereference before checkAllen Hubbe1-2/+1
Remove early dereference of a pointer that is checked later in the code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-08-09NTB: Fix zero size or integer overflow in ntb_set_mwAllen Hubbe1-3/+6
A plain 32 bit integer will overflow for values over 4GiB. Change the plain integer size to the appropriate size type in ntb_set_mw. Change the type of the size parameter and two local variables used for size. Even if there is no overflow, a size of zero is invalid here. Reported-by: Juyoung Jung <jjung@micron.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-08-09NTB: Schedule to receive on QP link upAllen Hubbe1-0/+2
Schedule to receive on QP link up, to make sure that the doorbell is properly cleared for interrupts. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-08-09NTB: Fix oops in debugfs when transport is half-upDave Jiang1-1/+5
When the remote side is not up, we do not have all the context for the transport, and that causes NULL ptr access. Have the debugfs reads check to see if transport is up before we make access. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-08-09NTB: Fix transport stats for multiple devicesDave Jiang1-2/+10
Currently the debugfs does not have files for all NTB transport queue pairs. When there are multiple NTBs present in a system, the QP names of the last transport clobber the names of previously added transport QPs. Only the last added QPs can be observed via debugfs. Create a directory per NTB transport to associate the QPs with that transport. Name the directory the same as the PCI device. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-08-09NTB: Fix ntb_transport out-of-order RX updateAllen Hubbe1-69/+100
It was possible for a synchronous update of the RX index in the error case to get ahead of the asynchronous RX index update in the normal case. Change the RX processing to preserve an RX completion order. There were two error cases. First, if a buffer is not present to receive data, there would be no queue entry to preserve the RX completion order. Instead of dropping the RX frame, leave the RX frame in the ring. Schedule RX processing when RX entries are enqueued, in case there are RX frames waiting in the ring to be received. Second, if a buffer is too small to receive data, drop the frame in the ring, mark the RX entry as done, and indicate the error in the RX entry length. Check for a negative length in the receive callback in ntb_netdev, and count occurrences as rx_length_errors. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Print driver name and version in module initDave Jiang1-0/+2
Printouts driver name and version to indicate what is being loaded. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Increase transport MTU to 64k from 16kDave Jiang1-1/+1
Benchmarking showed a significant performance increase with the MTU size to 64k instead of 16k. Change the driver default to 64k. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Default to CPU memcpy for performanceDave Jiang1-4/+13
Disable DMA usage by default, since the CPU provides much better performance with write combining. Provide a module parameter to enable DMA usage when offloading the memcpy is preferred. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Improve performance with write combiningDave Jiang1-1/+10
Changing the memory window BAR mappings to write combining significantly boosts the performance. We will also use memcpy that uses non-temporal store, which showed performance improvement when doing non-cached memcpys. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Use NUMA memory and DMA chan in transportAllen Hubbe1-14/+32
Allocate memory and request the DMA channel for the same NUMA node as the NTB device. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Rate limit ntb_qp_link_workAllen Hubbe1-1/+1
When the ntb transport is connecting and waiting for the peer, the debug console receives lots of debug level messages about the remote qp link status being down. Rate limit those messages. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Reset transport QP link stats on downAllen Hubbe1-8/+28
Reset the link stats when the link goes down. In particular, the TX and RX index and count must be reset, or else the TX side will be sending packets to the RX side where the RX side is not expecting them. Reset all the stats, to be consistent. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>