Age | Commit message (Collapse) | Author | Files | Lines |
|
This symbol is not used outside of net.c, so marks it static.
Fix the following sparse warning:
drivers/net/mhi/net.c:385:23: warning: symbol 'mhi_wwan_ops' was not
declared. Should it be static?
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently rx page will be reused to receive future packet when
the stack releases the previous skb quickly. If the old page
can not be reused, a new page will be allocated and mapped,
which comsumes a lot of cpu when IOMMU is in the strict mode,
especially when the application and irq/NAPI happens to run on
the same cpu.
So allocate a new frag to memcpy the data to avoid the costly
IOMMU unmapping/mapping operation, and add "frag_alloc_err"
and "frag_alloc" stats in "ethtool -S ethX" cmd.
The throughput improves above 50% when running single thread of
iperf using TCP when IOMMU is in strict mode and iperf shares the
same cpu with irq/NAPI(rx_copybreak = 2048 and mtu = 1500).
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current rx page offset only reset to zero when all the below
conditions are satisfied:
1. rx page is only owned by driver.
2. rx page is reusable.
3. the page offset that is above to be given to the stack has
reached the end of the page.
If the page offset is over the hns3_buf_size(), it means the
buffer below the offset of the page is usable when the above
condition 1 & 2 are satisfied, so page offset can be reset to
zero instead of increasing the offset. We may be able to always
reuse the first 4K buffer of a 64K page, which means we can
limit the hot buffer size as much as possible.
The above optimization is a side effect when refacting the
rx page reuse handling in order to support the rx copybreak.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using the queue based tx buffer, it is also possible to allocate a
sgl buffer, and use skb_to_sgvec() to convert the skb to the sgvec
in order to support the dma_map_sg() to decreases the overhead of
IOMMU mapping and unmapping.
Firstly, it reduces the number of buffers. For example, a tcp skb
may have a 66-byte header and 3 fragments of 4328, 32768, and 28064
bytes. With this patch, dma_map_sg() will combine them into two
buffers, 66-bytes header and one 65160-bytes fragment by using IOMMU.
Secondly, it reduces the number of dma mapping and unmapping. All the
original 4 buffers are mapped only once rather than 4 times.
The throughput improves above 10% when running single thread of iperf
using TCP when IOMMU is in strict mode.
Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to query tx spare buffer size from configuration
file, and use this info to do spare buffer initialization when
the module parameter 'tx_spare_buf_size' is not specified.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
when the packet or frag size is small, it causes both security and
performance issue. As dma can't map sub-page, this means some extra
kernel data is visible to devices. On the other hand, the overhead
of dma map and unmap is huge when IOMMU is on.
So add a queue based tx shared bounce buffer to memcpy the small
packet when the len of the xmitted skb is below tx_copybreak.
Add tx_spare_buf_size module param to set the size of tx spare
buffer, and add set/get_tunable to set or query the tx_copybreak.
The throughtput improves from 30 Gbps to 90+ Gbps when running 16
netperf threads with 32KB UDP message size when IOMMU is in the
strict mode(tx_copybreak = 2000 and mtu = 1500).
Suggested-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Factor out hns3_fill_desc() so that it can be reused in the
tx bounce supporting.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
desc_cb is used to store mapping and freeing info for the
corresponding desc, which is used in the cleaning process.
There will be more desc_cb type coming up when supporting the
tx bounce buffer, change desc_cb type to bit-wise value in order
to reduce the desc_cb type checking operation in the data path.
Also move the desc_cb type definition to hns3_enet.h because it
is only used in hns3_enet.c, and declare a local variable desc_cb
in hns3_clear_desc() to reduce lines of code.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As already done for mvneta and mvpp2, enable skb recycling for ti
ethernet drivers
ti driver on net-next:
----------------------
[perf top]
47.15% [kernel] [k] _raw_spin_unlock_irqrestore
11.77% [kernel] [k] __cpdma_chan_free
3.16% [kernel] [k] ___bpf_prog_run
2.52% [kernel] [k] cpsw_rx_vlan_encap
2.34% [kernel] [k] __netif_receive_skb_core
2.27% [kernel] [k] free_unref_page
2.26% [kernel] [k] kmem_cache_free
2.24% [kernel] [k] kmem_cache_alloc
1.69% [kernel] [k] __softirqentry_text_start
1.61% [kernel] [k] cpsw_rx_handler
1.19% [kernel] [k] page_pool_release_page
1.19% [kernel] [k] clear_bits_ll
1.15% [kernel] [k] page_frag_free
1.06% [kernel] [k] __dma_page_dev_to_cpu
0.99% [kernel] [k] memset
0.94% [kernel] [k] __alloc_pages_bulk
0.92% [kernel] [k] kfree_skb
0.85% [kernel] [k] packet_rcv
0.78% [kernel] [k] page_address
0.75% [kernel] [k] v7_dma_inv_range
0.71% [kernel] [k] __lock_text_start
[iperf3 tcp]
[ 5] 0.00-10.00 sec 873 MBytes 732 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 866 MBytes 726 Mbits/sec receiver
ti + skb recycling:
-------------------
[perf top]
40.58% [kernel] [k] _raw_spin_unlock_irqrestore
16.18% [kernel] [k] __softirqentry_text_start
10.33% [kernel] [k] __cpdma_chan_free
2.62% [kernel] [k] ___bpf_prog_run
2.05% [kernel] [k] cpsw_rx_vlan_encap
2.00% [kernel] [k] kmem_cache_alloc
1.86% [kernel] [k] __netif_receive_skb_core
1.80% [kernel] [k] kmem_cache_free
1.63% [kernel] [k] cpsw_rx_handler
1.12% [kernel] [k] cpsw_rx_mq_poll
1.11% [kernel] [k] page_pool_put_page
1.04% [kernel] [k] _raw_spin_unlock
0.97% [kernel] [k] clear_bits_ll
0.90% [kernel] [k] packet_rcv
0.88% [kernel] [k] __dma_page_dev_to_cpu
0.85% [kernel] [k] kfree_skb
0.80% [kernel] [k] memset
0.71% [kernel] [k] __lock_text_start
0.66% [kernel] [k] v7_dma_inv_range
0.64% [kernel] [k] gen_pool_free_owner
[iperf3 tcp]
[ 5] 0.00-10.00 sec 884 MBytes 742 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 878 MBytes 735 Mbits/sec receiver
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a spelling mistake in a dev_err message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The pointer dev can never be null, the null check is redundant
and can be removed. Cleans up a static analysis warning that
pointer priv is dereferencing dev before dev is being null
checked.
Addresses-Coverity: ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The round-robin rr_tx_counter was shared across CPUs leading to
significant cache thrashing at high packet rates. This patch switches
the round-robin packet counter to use a per-cpu variable to decide
the destination slave.
On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh
(-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after
this patch.
"perf top -e cache_misses" before:
12.31% [bonding] [k] bond_xmit_roundrobin_slave_get
10.59% [sch_fq_codel] [k] fq_codel_dequeue
9.34% [kernel] [k] skb_release_data
after:
15.42% [sch_fq_codel] [k] fq_codel_dequeue
10.06% [kernel] [k] __memset
9.12% [kernel] [k] skb_release_data
Signed-off-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-06-14
1) Trivial Lag refactroing in preparation for upcomming Single FDB lag feature
- First 3 patches
2) Scalable IRQ distriburion for Sub-functions
A subfunction (SF) is a lightweight function that has a parent PCI
function (PF) on which it is deployed.
Currently, mlx5 subfunction is sharing the IRQs (MSI-X) with their
parent PCI function.
Before this series the PF allocates enough IRQs to cover
all the cores in a system, Newly created SFs will re-use all the IRQs
that the PF has allocated for itself.
Hence, the more SFs are created, there are more EQs per IRQs. Therefore,
whenever we handle an interrupt, we need to pull all SFs EQs and PF EQs
instead of PF EQs without SFs on the system. This leads to a hard impact
on the performance of SFs and PF.
For example, on machine with:
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores.
PCI Express 3 with BW of 126 Gb/s.
ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16.
test case: iperf TX BW single CPU, affinity of app and IRQ are the same.
PF only: no SFs on the system, 56 IRQs.
SF (before), 250 SFs Sharing the same 56 IRQs .
SF (now), 250 SFs + 255 avaiable IRQs for the NIC. (please see IRQ spread scheme below).
application SF-IRQ channel BW(Gb/sec) interrupts/sec
iperf TX affinity
PF only cpu={0} cpu={0} cpu={0} 79 8200
SF (before) cpu={0} cpu={0} cpu={0} 51.3 (-35%) 9500
SF (now) cpu={0} cpu={0} cpu={0} 78 (-2%) 8200
command:
$ taskset -c 0 iperf -c 11.1.1.1 -P 3 -i 6 -t 30 | grep SUM
The different between the SF examples is that before this series we
allocate num_cpus (56) IRQs, and all of them were shared among the PF
and the SFs. And after this series, we allocate 255 IRQs, and we spread
the SFs among the above IRQs. This have significantly decreased the load
on each IRQ and the number of EQs per IRQ is down by 95% (251->11).
In this patchset the solution proposed is to have a dedicated IRQ pool
for SFs to use. the pool will allocate a large number of IRQs
for SFs to grab from in order to minimize irq sharing between the
different SFs.
IRQs will not be requested from the OS until they are 1st requested by
an SF consumer, and will be eventually released when the last SF consumer
releases them.
For the detailed IRQ spread and allocation scheme please see last patch:
("net/mlx5: Round-Robin EQs over IRQs")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added police action for ingress TC flower
hardware offload. With this rate limiting can be
done per flow. Since rate limiting is tied to
RQs in hardware the number of TC flower filters
with action as police is limited to number
of receive queues of the interface. Both bps
and pps modes are supported.
Examples to rate limit a flow:
$ ethtool -K eth0 hw-tc-offload on
$ tc qdisc add dev eth0 ingress
$ tc filter add dev eth0 parent ffff: protocol ip \
flower ip_proto udp dst_port 80 action \
police rate 100Mbit burst 32Kbit
$ tc filter add dev eth0 parent ffff: \
protocol ip flower dst_mac 5e:b2:34:ee:29:49 \
action police pkts_rate 5000 pkts_burst 2048
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch modifies all netdev_err messages in
tc code to NL_SET_ERR_MSG_MOD. NL_SET_ERR_MSG_MOD
does not support format specifiers yet hence
netdev_err messages with only strings are modified.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add TC_MATCHALL ingress ratelimiting offload support with POLICE
action for entire traffic coming into the interface.
Eg: To ratelimit ingress traffic to 100Mbps
$ ethtool -K eth0 hw-tc-offload on
$ tc qdisc add dev eth0 clsact
$ tc filter add dev eth0 ingress matchall skip_sw \
action police rate 100Mbit burst 32Kbit
To support this, a leaf level bandwidth profile is allocated and all
RQs' contexts used by this interface are updated to point to it.
And the leaf level bandwidth profile is configured with user specified
rate and burst sizes.
Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added support for dumping current resource status of bandwidth
profiles and contexts of allocated profiles via debugfs.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CN10K silicons supports hierarchial ingress packet ratelimiting.
There are 3 levels of profilers supported leaf, mid and top.
Ratelimiting is done after packet forwarding decision is taken
and a NIXLF's RQ is identified to DMA the packet. RQ's context
points to a leaf bandwidth profile which can be configured
to achieve desired ratelimit.
This patch adds logic for management of these bandwidth profiles
ie profile alloc, free, context update etc.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Networking block comments don't use an empty /* line,
use /* Comment...
This patch fixes the comments style issues.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Macro argument 'card' may be better as '(card)' to
avoid precedence issues.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add spaces required after that close brace '}'.
Add spaces required before the open parenthesis '('.
Add spaces required after that ','.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to the chackpatch.pl, comparison to NULL could
be written "!card".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes the checkpatch error about missing a blank line
after declarations.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes some redundant blank lines.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes unnecessary out of memory message,
to fix the following checkpatch.pl warning:
"WARNING: Possible unnecessary 'out of memory' message"
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes the code style issue according to checkpatch.pl error:
"ERROR: that open brace { should be on the previous line".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add space required before the open parenthesis '(' and '{'.
Add space required after that close brace '}' and ','
Add spaces required around that '=' , '&', '*', '|', '+', '/' and '-'.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes trailing whitespaces.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to the chackpatch.pl, else should follow close brace '}',
braces {} should be used on all arms of this statement.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Networking block comments don't use an empty /* line,
use /* Comment...
Block comments use * on subsequent lines.
Block comments use a trailing */ on a separate line.
This patch fixes the comments style issues.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to the chackpatch.pl, comparison to NULL could
be written "!skb".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to the chackpatch.pl,
EXPORT_SYMBOL(foo); should immediately follow its function/variable.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes the checkpatch error about missing a blank line
after declarations.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes some redundant blank lines.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On RX an SKB is allocated and the received buffer is copied into it.
But on some architectures, the memcpy() needs the source and destination
buffers to have the same alignment to be efficient.
This is not our case, because SKB data pointer is misaligned by two bytes
to compensate the ethernet header.
Align the RX buffer the same way as the SKB one, so the copy is faster.
An iperf3 RX test gives a decent improvement on a RISC-V machine:
before:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 733 MBytes 615 Mbits/sec 88 sender
[ 5] 0.00-10.01 sec 730 MBytes 612 Mbits/sec receiver
after:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver
And the memcpy() overhead during the RX drops dramatically.
before:
Overhead Shared O Symbol
43.35% [kernel] [k] memcpy
33.77% [kernel] [k] __asm_copy_to_user
3.64% [kernel] [k] sifive_l2_flush64_range
after:
Overhead Shared O Symbol
45.40% [kernel] [k] __asm_copy_to_user
28.09% [kernel] [k] memcpy
4.27% [kernel] [k] sifive_l2_flush64_range
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Whenever users provided affinity for an EQ creation request, map the
EQ to a matching IRQ.
Matching IRQ=IRQ with the same affinity and type (completion/control) of
the EQ created.
This mapping is being done in agressive dedicated IRQ allocation scheme,
which described bellow.
First, we check whether there is a matching IRQ that his min threshold
is not exhausted.
- min_eqs_threshold = 3 for control EQ.
- min_eqs_threshold = 1 for completion EQ.
In case no matching IRQ was found, try to request a new IRQ.
In case we can't request a new IRQ, reuse least-used matching IRQ.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Move mlx5_sf_max_functions() and friends from the privete sf/sf.h
to the public lib/sf.h. This is done in order to have one direction
include paths.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
SFs (Sub Functions) currently use IRQs from the global IRQ table their
parent Physical Function have. In order to better scale, we need to
allocate more IRQs and share them between different SFs.
Driver will maintain 3 separated irq pools:
1. A pool that serve the PF consumer (PF's netdev, rdma stacks), similar
to what the driver had before this patch. i.e, this pool will share irqs
between rdma and netev, and will keep the irq indexes and allocation
order. The last is important for PF netdev rmap (aRFS).
2. A pool of control IRQs for SFs. The size of this pool is the number
of SFs that can be created divided by SFS_PER_IRQ. This pool will serve
the control path EQs of the SFs.
3. A pool of completion data path IRQs for SFs transport queues. The
size of this pool is:
num_irqs_allocated - pf_pool_size - sf_ctrl_pool_size.
This pool will served netdev and rdma stacks. Moreover, rmap is not
supported on SFs.
Sharing methodology of the SFs pools is explained in the next patch.
Important note: rmap is not supported on SFs because rmap mapping cannot
function correctly for IRQs that are shared for different core/netdev RX
rings.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Store newly created IRQs in the xarray DB instead of a static array,
so we will be able to store only IRQs which are being used.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
IRQs are being simplified in order to ease their sharing and any feature
specific object will be moved to upper layer.
Hence we move rmap object into eq_table.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Extend mlx5_irq_request so that IRQs will be requested upon EQ creation,
and not on driver boot.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In next patches, IRQs will be requested according to demand, instead of
statically on driver boot.
Also, currently, rmap is managed by the IRQ layer. rmap management will
move out from the IRQ layer in future patches.
Therefore, we want to remove the IRQ from the rmap, when IRQ is destroyed,
instead of removing all the IRQs from the rmap when irq_table is destroyed.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The eq.[c|h] files are under major rewrite. so use this opportunity and
update their copyright and license texts.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The users of EQ are running their code on different CPUs and with
various affinity patterns. Move the cpumask setting close to their
actual usage.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Introduce new API that will allow IRQs users to hold a pointer to
mlx5_irq.
In the end of this series, IRQs will be allocated on demand. Hence,
this will allow us to properly manage and use IRQs.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Shared IRQ are consumed by multiple EQ users and in order to properly
initialize and later release such IRQs, we add kref counting of IRQ
structure.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Lag is used to combine two PCI functions of the same HCA into a single
logical unit. This is a core functionality and as such should be managed by
the core driver. Currently this isn't the case. While we store the lag
software structure inside the lower device, its lifetime (creation /
destruction) is dictated by the mlx5e part. Change the ownership model so
lag is tied to the lifetime of the lower level driver instead to the
mlx5e part.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
If MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV is set it means the device is going
down and mlx5_rescan_drivers_locked() shouldn't be called.
With this patch and the previous one in the series, unbinding a PCI
function when its netdev is part of a bond works and leaves the system in a
working state.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When a net device is removed (can happen if the PCI function is unbound
from the system) it's not enough to destroy the hardware lag. The system
should recreate the original devices that were present before the lag.
As the same flow is done when a net device is removed from the bond
refactor and reuse the code.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
There is not strong reason to have both WWAN and WWAN_CORE symbols,
Let's build the WWAN core framework when WWAN is selected, in the
same way as for other subsystems.
This fixes issue with mhi_net selecting WWAN_CORE without WWAN and
reported by kernel test robot:
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for WWAN_CORE
Depends on NETDEVICES && WWAN
Selected by
- MHI_NET && NETDEVICES && NET_CORE && MHI_BUS
Fixes: 9a44c1cc6388 ("net: Add a WWAN subsystem")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|