Age | Commit message (Collapse) | Author | Files | Lines |
|
In a nested VM environment, we have to refuse to assign to a nested
guest the same CID assigned to our guest->host transport.
In this way, the user can use the local CID for loopback.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds 'module' member in the 'struct vsock_transport'
in order to get/put the transport module. This prevents the
module unloading while sockets are assigned to it.
We increase the module refcnt when a socket is assigned to a
transport, and we decrease the module refcnt when the socket
is destructed.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To allow other transports to be loaded with vmci_transport,
we register the vmci_transport as G2H or H2G only when a VMCI guest
or host is active.
To do that, this patch adds a callback registered in the vmci driver
that will be called when the host or guest becomes active.
This callback will register the vmci_transport in the VSOCK core.
Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the support of multiple transports in the
VSOCK core.
With the multi-transports support, we can use vsock with nested VMs
(using also different hypervisors) loading both guest->host and
host->guest transports at the same time.
Major changes:
- vsock core module can be loaded regardless of the transports
- vsock_core_init() and vsock_core_exit() are renamed to
vsock_core_register() and vsock_core_unregister()
- vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
to identify which directions the transport can handle and if it's
support DGRAM (only vmci)
- each stream socket is assigned to a transport when the remote CID
is set (during the connect() or when we receive a connection request
on a listener socket).
The remote CID is used to decide which transport to use:
- remote CID <= VMADDR_CID_HOST will use guest->host transport;
- remote CID == local_cid (guest->host transport) will use guest->host
transport for loopback (host->guest transports don't support loopback);
- remote CID > VMADDR_CID_HOST will use host->guest transport;
- listener sockets are not bound to any transports since no transport
operations are done on it. In this way we can create a listener
socket, also if the transports are not loaded or with VMADDR_CID_ANY
to listen on all transports.
- DGRAM sockets are handled as before, since only the vmci_transport
provides this feature.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
virtio_transport and vmci_transport handle the buffer_size
sockopts in a very similar way.
In order to support multiple transports, this patch moves this
handling in the core to allow the user to change the options
also if the socket is not yet assigned to any transport.
This patch also adds the '.notify_buffer_size' callback in the
'struct virtio_transport' in order to inform the transport,
when the buffer_size is changed by the user. It is also useful
to limit the 'buffer_size' requested (e.g. virtio transports).
Acked-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to add 'struct vsock_sock *' parameter to
virtio_transport_get_ops().
In some cases, like in the virtio_transport_reset_no_sock(),
we don't have any socket assigned to the packet received,
so we can't use the virtio_transport_get_ops().
In order to allow virtio_transport_reset_no_sock() to use the
'.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
we add the 'struct virtio_transport *' to it and to its caller:
virtio_transport_recv_pkt().
We moved the 'vhost_transport' and 'virtio_transport' definition,
to pass their address to the virtio_transport_recv_pkt().
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Traffic for a CGX mapped NIXLF can be stopped by disabling entries
in NPC MCAM or by configuring CGX and mailbox messages exist for the
two options. If traffic is stopped at CGX then VFs of that PF are
also effected hence CGX traffic should be started/stopped by
tracking all the users of it. This patch implements that CGX users
tracking. CGX is also configured along with NPC if required.
Also removed a check which mandates even number of LBK VFs.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A config option is added to disable caching of dynamic entries
like SQEs and stack pages. Also locks down all HW contexts in NDC,
preventing them from being evicted.
This option is useful when the queue count is large and there are
huge NDC cache misses. It's trade off between SQ context misses and
dynamically changing entries like SQE and stack page pointers.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each of the NIX/NPA LFs can choose which ways of their respective
NDC caches should be used to cache their contexts. This enables
flexible configurations like disabling caching for a LF, limiting
it's context to a certain set of ways etc etc. Separate way_mask
for NIX-TX and NIX-RX is not supported.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ingress packet replication support has been added to 96xx B0
silicon. This patch enables using that feature to replicate
ingress broadcast packets to PF and it's VFs.
Also fixed below issues
- VFs can also install NPC MCAM entry to forward broadcast pkts.
Otherwise, unless PF's interface is UP, VFs will not receive
bcast packets.
- NPC MCAM entry is disabled when PF and all it's VFs are down.
- Few corner cases in installing multicast entry list.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CN96xx initial silicon doesn't support all features pertaining to
NIX transmit scheduling and shaping.
- It supports a fixed topology of 1:1 mapped transmit
limiters at all levels.
- Supports DWRR only at SMQ/MDQ and TL1.
- Doesn't support shaping and coloring.
This patch adds HW capability structure by which each variant
and skew of silicon can be differentiated by their supported
features. And adds support for A0 silicon's transmit scheduler
capabilities or rather limitations.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support for few more RSS key types for flow key
algorithm to compute rss hash index.
Following flow key types have been added.
- Tunnel types like NVGRE, VXLAN, GENEVE.
- L2 offload type ETH_DMAC, Here we will consider only DMAC 6 bytes.
- And extension header IPV6_EXT (1 byte followed by IPV6 header
- Hashing inner protocol fields for inner DMAC, IPv4/v6, TCP, UDP, SCTP.
Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Writing into NPC MCAM1 and MCAM0 registers are suppressed if
they happened to form a reserved combination. Hence
clear and disable MCAM entries before update.
For HRM:
[CAM(1)]<n>=1, [CAM(0)]<n>=1: Reserved.
The reserved combination is not allowed. Hardware suppresses any
write to CAM(0) or CAM(1) that would result in the reserved combination for
any CAM bit.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Updated NPC KPU packet parsing profile with support for following
- Fragmentation support for IPv4 IPv6 outer header
- NIX instruction header support
- QinQ with TPID of 0x8100 as non inner most vlan tag, as legacy
network equipments still generate QinQ packets with this configuration.
- To better support RSS for tunnelled packets, udp based tunnel
protocols such as vxlan, vxlan-gpe, geneve and gtpu are now
captured into a separate layer E. Consequently, the inner
packet headers are pushed one layer down to LF, LG, and LH
accordingly.
- Support for rfc7510 mpls in udp. Up to 4 MPLS labels can be parsed
and captured in one layer LE.
- Parser support for DSA, extended DSA and eDSA tags right after
ethernet header by Marvell SOHO and Falcon switches. For extended
DSA and eDSA tags, a special PKIND of 62 is used, as these tags don't
contain a tpid field.
- Higig2 protocol header parsing support, added a NPC_LT_LA_HIGIG2_ETHER
for a combined header of HIGIG2 and Ethernet. Add a
NPC_LT_LA_IH_NIX_HIGIG2_ETHER for a combined header of nix_ih,
HIGIG2 and Ethernet on egress side. Also added 2 upper flags in LA to
indicate the presence of nix_ih and HIGIG2.
Other changes include
- IPv4.TTL==0 IPv6.HLIM==0 check
- Per RFC 1858, mark fragment offset == 1 as error
- TCP invalid flags check
- Separate error codes for outer and inner IPv4 checksum errors.
- Fix a parser error when KPU parses incoming IPSec ESP and AH packets
- NPC vtag capture/strip hardware expect tag pointer to point to
tpid/ethertype instead of tci. So move lb_ptr to point to tpid/ethertype.
- Fix npc parser error when parsing udp packets that don't have any payload.
- For a single MCAM entry to match on packets with one or stacked vlan tags
combine NPC_LT_LB_STAG and NPC_LT_LB_QINQ to NPC_LT_LB_STAG_QINQ.
- NVGRE to have a separate ltype LD_NVGRE instead of combined with LD_GRE.
- Reserve top LD/LTYPEs to support custom KPU profile fields.
Signed-off-by: Hao Zheng <haoz@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For every mailbox handler added to rvu, we are adding a function
declaration in rvu header file. Cleaned this up by adding a macro
to generate these declarations automatically.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If mailbox client has a bounce buffer or a intermediate buffer where
mbox messages are framed then copy them from there to HW buffer.
If 'mbase' and 'hw_mbase' are not same then assume 'mbase' points to
bounce buffer.
This patch also adds msg_size field to mbox header to copy only valid
data instead of whole buffer.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added a new mailbox API which goes through all responses
to check their IDs and response codes.
Also added logic to prevent queuing multiple works to
process the same mailbox message. This scenario happens
when AF is processing a PF's request and menawhile PF
sends ACK to AF sent UP message, then mbox_hdr->num_msgs
in the PF->AF DOWN mbox region will be nonzero and AF
will end up processing PF's request again. This is fixed
by taking a backup of num_msgs counter and clearing the
same in the mbox region before scheduling work.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added support to display current NPC MCAM entries and counter's allocation
status ín debugfs.
cat /sys/kernel/debug/octeontx2/npc/mcam_info' will dump following info
- MCAM Rx and Tx keysize
- Total MCAM entries and counters
- Current available count
- Count of number of MCAM entries and counters allocated
by a RVU PF/VF device.
Also, one NPC MCAM counter (last one) is reserved and mapped to
NPC RX_INTF's MISS_ACTION to count dropped packets due to no MCAM
entry match. This pkt drop counter can be checked via debugfs.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A CGX port is shared by a RVU PF and it's VFs. These per
CGX port level NIX Rx/Tx counters are cumilative stats of
all NIXLFs sharing this port. These stats when compared
to CGX Rx/Tx stats helps in identifying pkts dropped within
the system, if any.
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds CGX LMAC physical interface or serdes Rx/Tx
packet stats to debugfs.
'cat cgx<idx>/lmac<idx>/stats' dumps the current interface link
status and Rx/Tx stats. Stats include pkt received/transmitted,
dropped, pause frames etc etc.
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
NDC is a data cache unit which caches NPA and NIX block's
aura/pool/RQ/SQ/CQ/etc contexts to reduce number of costly
DRAM accesses.
This patch adds support to dump cache's performance stats
like cache line hit/miss counters, average cycles taken for
accessing cached and non-cached data. This will help in
checking if NPA/NIX context reads/writes are having NDC cache
misses which inturn might effect performance.
Also changed NDC enums to reflect correct NDC hardware instance.
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To aid in debugging NIX block related issues, added support to dump
NIX block LF's RQ, SQ and CQ hardware contexts in debugfs. User can
check which contexts are enabled currently and dump it's current HW
context.
Four new files 'qsize', 'rq_ctx', 'sq_ctx' and 'cq_ctx' are added to the
debugfs at 'sys/kernel/debug/octeontx2/nix/'
'echo <nixlf index> > qsize' will display current enabled CQ/SQ/RQs.
'echo <nixlf> [rq number/all] > rq_ctx',
'echo <nixlf> [sq number/all] > sq_ctx' &
'echo <nixlf> [cq number/all] > cq_ctx' will dump RQ/SQ/CQ's current
hardware context.
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To aid in debugging NPA related issues, added support to dump
NPA (pool allocator) block LF's aura and pool hardware contexts in
debugfs. User can check which contexts are enabled currently and dump
it's current HW context.
Three new files 'qsize', 'aura_ctx', 'pool_ctx' are added to the
debugfs at 'sys/kernel/debug/octeontx2/npa/'
'echo <npalf index> > qsize' will display current enabled Aura/Pools.
'echo <npalf> [aura number/all] > aura_ctx' &
'echo <npalf> [aura number/all] > pool_ctx' will dump Aura/Pool
context info.
Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added support to dump current resource provisioning status
of all resource virtualization unit (RVU) block's
(i.e NPA, NIX, SSO, SSOW, CPT, TIM) local functions attached
to a PF_FUNC into a debugfs file.
'cat /sys/kernel/debug/octeontx2/rsrc_alloc'
will show the current block LF's allocation status.
Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix build_skb for bm capable devices when they fall-back using swbm path
(e.g. when bm properties are configured in device tree but
CONFIG_MVNETA_BM_ENABLE is not set). In this case rx_offset_correction is
overwritten so we need to use it building skb instead of
MVNETA_SKB_HEADROOM directly
Fixes: 8dc9a0888f4c ("net: mvneta: rely on build_skb in mvneta_rx_swbm poll routine")
Fixes: 0db51da7a8e9 ("net: mvneta: add basic XDP support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reported-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-11-12
1) Merge mlx5-next for devlink reload and flowtable offloads dependencies
2) Devlink reload support
3) TC Flowtable offloads
4) Misc cleanup
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use r8168d_modify_extpage() also in rtl8168f_config_eee_phy() to
simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when ibmveth receive a loopback packet, it reports an
ambiguous error message "tx: h_send_logical_lan failed with rc=-4"
because the hypervisor rejects those types of packets. This fix
detects loopback packet and assures the source packet's MAC address
matches the driver's MAC address before transmitting to the
hypervisor.
Signed-off-by: Cris Forno <cforno12@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for the TI DP83869 Gigabit ethernet phy
device.
The DP83869 is a robust, low power, fully featured
Physical Layer transceiver with integrated PMD
sublayers to support 10BASE-T, 100BASE-TX and
1000BASE-T Ethernet protocols.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Enable GDM GDMA_DROP_ALL mode to drop all packet during the
stop operation. This is recommended by the mt762x HW design
to drop all packet from GMAC before stopping PDMA.
Signed-off-by: MarkLee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Refine the timing of GDM/PSE setup, move it from mtk_hw_init
to mtk_open. This is recommended by the mt762x HW design to
do GDM/PSE setup only after PDMA has been started.
We exclude mt7628 in mtk_gdm_config function since it is a old IP
and there is no GDM/PSE block on it.
Signed-off-by: MarkLee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Integrate GDM/PSE setup operations into single function "mtk_gdm_config"
Signed-off-by: MarkLee <Mark-MC.Lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We don't really need 10k species of reset. Remove everything except cold
reset which is what is actually used. Too bad the hardware designers
couldn't agree to use the same bit field for rev 1 and rev 2, so the
(*reset_cmd) function pointer is there to stay.
However let's simplify the prototype and give it a struct dsa_switch (we
want to avoid forward-declarations of structures, in this case struct
sja1105_private, wherever we can).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tested using the following bash script and the tc from iproute2-next:
#!/bin/bash
set -e -u -o pipefail
NSEC_PER_SEC="1000000000"
gatemask() {
local tc_list="$1"
local mask=0
for tc in ${tc_list}; do
mask=$((${mask} | (1 << ${tc})))
done
printf "%02x" ${mask}
}
if ! systemctl is-active --quiet ptp4l; then
echo "Please start the ptp4l service"
exit
fi
now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
# Phase-align the base time to the start of the next second.
sec=$(echo "${now}" | gawk -F. '{ print $1; }')
base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
tc qdisc add dev swp5 parent root handle 100 taprio \
num_tc 8 \
map 0 1 2 3 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S $(gatemask 7) 100000 \
sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
clockid CLOCK_TAI flags 2
The "state machine" is a workqueue invoked after each manipulation
command on the PTP clock (reset, adjust time, set time, adjust
frequency) which checks over the state of the time-aware scheduler.
So it is not monitored periodically, only in reaction to a PTP command
typically triggered from a userspace daemon (linuxptp). Otherwise there
is no reason for things to go wrong.
Now that the timecounter/cyclecounter has been replaced with hardware
operations on the PTP clock, the TAS Kconfig now depends upon PTP and
the standalone clocksource operating mode has been removed.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PTPSTRTSCH and PTPSTOPSCH bits are actually readable and indicate
whether the time-aware scheduler is running or not. We will be using
that for monitoring the scheduler in the next patch, so refactor the PTP
command API in order to allow that.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
"ret" is zero or possibly uninitialized on this error path. It
should be a negative error code instead.
Fixes: 2d0cb84dd973 ("cxgb4: add ETHOFLD hardware queue support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
irqreturn_t type is an enum and in this context it's unsigned, so "err"
can't be irqreturn_t or it breaks the error handling. In fact the "err"
variable is only used to store integers (never irqreturn_t) so it should
be declared as int.
I removed the initialization because it's not required. Using a bogus
initializer turns off GCC's uninitialized variable warnings. Secondly,
there is a GCC warning about unused assignments and we would like to
enable that feature eventually so we have been trying to remove these
unnecessary initializers.
Fixes: 7b0c342f1f67 ("net: atlantic: code style cleanup")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the array overrun while keeping the eth_addr and eth_addr_mask
pointers as u16 to avoid unaligned u16 access. These were overlooked
when modifying the code to use u16 pointer for proper alignment.
Fixes: 90f906243bf6 ("bnxt_en: Add support for L2 rewrite")
Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since both tc rules and flow table rules are of the same format,
we can re-use tc parsing for that, and move the flow table rules
to their steering domain - In this case, the next chain after
max tc chain.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Implement devlink reload for mlx5.
Usage example:
devlink dev reload pci/0000:06:00.0
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use devlink instance name space to set the netdev net namespace.
Preparation patch for devlink reload implementation.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Be sure to release the neighbour in case of failures after successful
route lookup.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Neighbour initializations to NULL are not necessary as the pointers are
not used if an error is returned, and if success returned, pointers are
initialized.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
mlx5_device_disable_sriov() currently reads num_vfs from the PCI core.
However when mlx5_device_disable_sriov() is executed, SR-IOV is
already disabled at the PCI level.
Due to this disable_hca() cleanup is not done during SR-IOV disable
flow.
mlx5_sriov_disable()
pci_enable_sriov()
mlx5_device_disable_sriov() <- num_vfs is zero here.
When SR-IOV enablement fails during mlx5_sriov_enable(), HCA's are left
in enabled stage because mlx5_device_disable_sriov() relies on num_vfs
from PCI core.
mlx5_sriov_enable()
mlx5_device_enable_sriov()
pci_enable_sriov() <- Fails
Hence, to overcome above issues,
(a) Read num_vfs before disabling SR-IOV and use it.
(b) Use num_vfs given when enabling sriov in error unwinding path.
Fixes: d886aba677a0 ("net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
When selecting a matcher ste_builder_arr will always be evaluated
as true, instead check if num_of_builders is set for validity.
Fixes: 667f264676c7 ("net/mlx5: DR, Support IPv4 and IPv6 mixed matcher")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
1) New generic devlink param "enable_roce", for downstream devlink
reload support
2) Do vport ACL configuration on per vport basis when
enabling/disabling a vport. This enables to have vports enabled/disabled
outside of eswitch config for future
3) Split the code for legacy vs offloads mode and make it clear
4) Tide up vport locking and workqueue usage
5) Fix metadata enablement for ECPF
6) Make explicit use of VF property to publish IB_DEVICE_VIRTUAL_FUNCTION
7) E-Switch and flow steering core low level support and refactoring for
netfilter flowtables offload
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Netfilter tables (nftables) implements a software datapath that
comes after tc ingress datapath. The datapath supports offloading
such rules via the flow table offload API.
This API is currently only used by NFT and it doesn't provide the
global priority in regards to tc offload, so we assume offloading such
rules must come after tc. It does provide a flow table priority
parameter, so we need to provide some supported priority range.
For that, split fastpath prio to two, flow table offload and tc offload,
with one dedicated priority chain for flow table offload.
Next patch will re-use the multi chain API to access this chain by
allowing access to this chain by the fdb_sub_namespace.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Next patch will re-use this to add a new chain but in a
different prio.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Tc chains are implemented by creating a chained prio steering type, and
inside it there is a namespace for each chain (FDB_TC_MAX_CHAINS). Each
of those has a list of priorities.
Currently, all namespaces in a prio start at the parent prio level.
But since we can jump from chain (namespace) to another chain in the
same prio, we need the levels for higher chains to be higher as well.
So we created unused prios to account for levels in previous namespaces.
Fix that by accumulating the namespaces levels if we are inside a chained
type prio, and removing the unused prios.
Fixes: 328edb499f99 ('net/mlx5: Split FDB fast path prio to multiple namespaces')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Define FDB_TC_LEVELS_PER_PRIO instead of magic number 2.
This is the number of levels used by each tc prio table in the fdb.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|