summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-11-20octeontx2-af: Support to enable/disable default MCAM entriesSunil Goutham4-28/+122
For a PF/VF with a NIXLF attached has default/reserved MCAM entries for receiving Ucast/Bcast/Promisc traffic. Ideally traffic should be forwarded to NIXLF only after it's contexts are initialized. This patch keeps these default entries disabled and adds mbox messages for a PF/VF to enable these once NPA/NIXLF initialization is done. Likewise while PF/VF is being teared down, it can send the disable mailbox message to stop receiving traffic. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: Add MKEX default profileSantosh Shukla3-25/+151
Added basic default MKEX profile. This profile tells hardware what data to extract from packet and where to place it (bit offset) in final KEY generated for the parsed packet. Based on the bit placement of the packet data, MCAM entries have to programmed for matching. Also added a msg to retrieve this MKEX profile from PF/VF which inturn can process it to determine how MCAM entry has to be populated. Signed-off-by: Santosh Shukla <sshukla@marvell.com> Signed-off-by: Yuri Tolstov <ytolstov@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: Alloc and config NPC MCAM entry at a timeSunil Goutham3-0/+94
A new mailbox message is added to support allocating a MCAM entry along with a counter and configuring it in one go. This reduces the amount of mailbox communication involved in installing a new MCAM rule. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: Map or unmap NPC MCAM entry and counterSunil Goutham3-4/+192
Alloc memory to save MCAM 'entry to counter' mapping and since multiple entries can map to same counter, added counter's reference count tracking. Do 'entry to counter' mapping when a entry is being installed and mbox msg sender requested to configure a counter as well. Mapping is removed when a entry or counter is being freed or a explicit mbox msg is received to unmap them. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: Support for NPC MCAM countersSunil Goutham3-0/+190
NPC HW has counters which can be mapped to MCAM entries to gather entry match statistics. This patch adds support to allocate, free, clear and retrieve stats of NPC MCAM counters. New mailbox messages have been added for this. Similar to MCAM entries both contiguous and non-contiguous counter allocation is supported. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: MCAM entry installation supportSunil Goutham3-8/+228
Add support for a RVU PF/VF to enable, disable, configure and shuffle MCAM entries via mbox commands. This patch adds mailbox message formats and handling of these commands. As of now otherthan validating MCAM entry index, info like channel number e.t.c in MCAM config data sent by PF/VF are not validated. Also a max of 64 MCAM entries can be shuffled with a single mbox command. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: NPC MCAM entry alloc/free supportSunil Goutham4-5/+588
This patch adds NPC MCAM entry management and support for allocating and freeing them via mailbox. Both contiguous and non-contiguous allocations are supported. Incase of contiguous, if request cannot be met then max contiguous number of available entries are allocated. High or low priority index allocation w.r.t a reference MCAM index is also supported. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: Relax resource lock into mutexStanislaw Kardach4-25/+30
Mailbox message handling is done in a workqueue context scheduled from interrupt handler. So resource locks does not need to be a spinlock. Therefore relax them into a mutex so that later on we may use them in routines that might sleep. Signed-off-by: Stanislaw Kardach <skardach@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: Support to get NIX HW constants from AFKiran Kumar2-0/+12
This patch adds reading HW limits like number of Rx/Tx stats, number of queue IRQs supported per NIX LF from AF registers and sync them to PF/VF. Signed-off-by: Kiran Kumar <kirankumark@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: Support to modify min/max allowed packet lengthsSunil Goutham5-1/+220
This patch adds support for RVU PF/VFs to modify min/max packet lengths allowed by HW. For VFs on PF0, settings will be automatically applied on LBK link. RX link's min/maxlen is configured to min/max of PF and it's all VFs. On the TX side if requested all SMQs attached to the requesting NIXLF will be updated with new min/max lengths. Also updates transmit credits for Tx links based on new maxlen. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20octeontx2-af: Convert mbox handlers APIs to lowercaseSunil Goutham7-102/+107
This patch converts all mailbox message handler API names to lowercase. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20Merge branch 'r8169-series-with-further-smaller-improvements'David S. Miller1-171/+112
Heiner Kallweit says: ==================== r8169: series with further smaller improvements Again nothing exciting, just smaller improvements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: improve chip version identificationHeiner Kallweit1-60/+59
Only the upper 12 bits are used for chip identification, this helps to reduce the size of array mac_info. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: simplify ocp functionsHeiner Kallweit1-51/+17
rtl8168_oob_notify is used in rtl8168dp_driver_start and rtl8168dp_driver_stop only, so we can rename it to r8168dp_oob_notify. The same applies to condition rtl_ocp_read_cond which can be renamed to rtl_dp_ocp_read_cond. This allows to simplify the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: remove workaround for ancient gcc bugHeiner Kallweit1-3/+3
The kernel can't be built any longer with this ancient GCC version. Eventually it becomes clear what this statement actually does. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: remove manual padding in struct ring_infoHeiner Kallweit1-1/+0
The compiler takes care of alignment and padding, I see no need to bother him with manual hints. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: remove "not PCI Express" messageHeiner Kallweit1-3/+0
The ones who want to know can easily identify whether chip is PCI or PCIe based on the chip name. I doubt there's any benefit in this message, so remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: remove print_mac_versionHeiner Kallweit1-9/+0
The syslog message printed on driver load allows to easily identify the mac version number (based on chip name and XID). So we don't need this extra debug message which is wrong anyway because e.g. RTL_GIGA_MAC_VER_01 has value 0. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: use PCI_VDEVICE macroHeiner Kallweit1-14/+14
Using macro PCI_VDEVICE helps to simplify the PCI ID table. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: replace event_slow with irq_maskHeiner Kallweit1-11/+11
Recently the "slow event" handler was removed, therefore the member name isn't appropriate any longer. In addition store the full mask, including the RTL_EVENT_NAPI interrupt source bits. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: remove unused interrupt sourcesHeiner Kallweit1-4/+3
Setting PCSTimeout interrupt source was copied from the vendor driver which uses the chip programmable timer interrupt. The mainline driver doesn't use this timer interrupt. SYSErr indicates a PCI error and isn't defined on the PCIe models. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: use dev_get_drvdata where possibleHeiner Kallweit1-10/+5
Using dev_get_drvdata directly is simpler here. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20r8169: merge rtl_irq_enable and rtl_irq_enable_allHeiner Kallweit1-9/+4
After the recent changes to the interrupt handler rtl_irq_enable and rtl_irq_enable_all can be merged. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20NFSv4: Fix a NFSv4 state manager deadlockTrond Myklebust2-5/+13
Fix a deadlock whereby the NFSv4 state manager can get stuck in the delegation return code, waiting for a layout return to complete in another thread. If the server reboots before that other thread completes, then we need to be able to start a second state manager thread in order to perform recovery. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-20Merge branch 'bpf-zero-hash-seed'Daniel Borkmann4-17/+82
Lorenz Bauer says: ==================== Allow forcing the seed of a hash table to zero, for deterministic execution during benchmarking and testing. Changes from v2: * Change ordering of BPF_F_ZERO_SEED in linux/bpf.h Comments adressed from v1: * Add comment to discourage production use to linux/bpf.h * Require CAP_SYS_ADMIN ==================== Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20tools: add selftest for BPF_F_ZERO_SEEDLorenz Bauer1-9/+55
Check that iterating two separate hash maps produces the same order of keys if BPF_F_ZERO_SEED is used. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20tools: sync linux/bpf.hLorenz Bauer1-3/+10
Synchronize changes to linux/bpf.h from * "bpf: allow zero-initializing hash map seed" * "bpf: move BPF_F_QUERY_EFFECTIVE after map flags" Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20bpf: move BPF_F_QUERY_EFFECTIVE after map flagsLorenz Bauer1-3/+3
BPF_F_QUERY_EFFECTIVE is in the middle of the flags valid for BPF_MAP_CREATE. Move it to its own section to reduce confusion. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20bpf: allow zero-initializing hash map seedLorenz Bauer2-2/+14
Add a new flag BPF_F_ZERO_SEED, which forces a hash map to initialize the seed to zero. This is useful when doing performance analysis both on individual BPF programs, as well as the kernel's hash table implementation. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20bpf: libbpf: retry map creation without the nameStanislav Fomichev1-1/+10
Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name for maps. Pre v4.14 kernels don't know about map names and return an error about unexpected non-zero data. Retry sys_bpf without a map name to cover older kernels. v2 changes: * check for errno == EINVAL as suggested by Daniel Borkmann Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20net/mlx5e: Fix failing ethtool query on FEC query errorShay Agroskin1-2/+1
If FEC caps query fails when executing 'ethtool <interface>' the whole callback fails unnecessarily, fixed that by replacing the error return code with debug logging only. Fixes: 6cfa94605091 ("net/mlx5e: Ethtool driver callback for query/set FEC policy") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: Removed unnecessary warnings in FEC caps queryShay Agroskin2-4/+4
Querying interface FEC caps with 'ethtool [int]' after link reset throws warning regading link speed. This warning is not needed as there is already an indication in user space that the link is not up. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: Fix wrong field name in FEC related functionsShay Agroskin1-2/+2
This bug would result in reading wrong FEC capabilities for 10G/40G. Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: Fix a bug in turning off FEC policy in unsupported speedsShay Agroskin1-16/+12
Some speeds don't support turning FEC policy off. In case a requested FEC policy is not supported for a speed (including current speed), its new FEC policy would be: no FEC - if disabling FEC is supported for that speed unchanged - else Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20Merge branch 'ena-hibernation-and-rmmod-bug-fixes'David S. Miller2-13/+12
Arthur Kiyanovski says: ==================== net: ena: hibernation and rmmod bug fixes This patchset includes 2 bug fixes: 1. A fix to a crash during resume from hibernation. 2. A fix to an illegal memory access during driver removal (e.g. during rmmod) which might cause a crash in certain systems. The subminor number in the driver version is also promoted to indicate driver was changed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net: ena: update driver version from 2.0.1 to 2.0.2Arthur Kiyanovski1-1/+1
Update driver version due to critical bug fixes. Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net: ena: fix crash during ena_remove()Arthur Kiyanovski1-11/+10
In ena_remove() we have the following stack call: ena_remove() unregister_netdev() ena_destroy_device() netif_carrier_off() Calling netif_carrier_off() causes linkwatch to try to handle the link change event on the already unregistered netdev, which leads to a read from an unreadable memory address. This patch switches the order of the two functions, so that netif_carrier_off() is called on a regiestered netdev. To accomplish this fix we also had to: 1. Remove the set bit ENA_FLAG_TRIGGER_RESET 2. Add a sanitiy check in ena_close() both to prevent double device reset (when calling unregister_netdev() ena_close is called, but the device was already deleted in ena_destroy_device()). 3. Set the admin_queue running state to false to avoid using it after device was reset (for example when calling ena_destroy_all_io_queues() right after ena_com_dev_reset() in ena_down) Fixes: 944b28aa2982 ("net: ena: fix missing lock during device destruction") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net: ena: fix crash during failed resume from hibernationArthur Kiyanovski1-1/+1
During resume from hibernation if ena_restore_device fails, ena_com_dev_reset() is called, and uses the readless read mechanism, which was already destroyed by the call to ena_com_mmio_reg_read_request_destroy(). This causes a NULL pointer reference. In this commit we switch the call order of the above two functions to avoid this crash. Fixes: d7703ddbd7c9 ("net: ena: fix rare bug when failed restart/resume is followed by driver removal") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20sctp: not increase stream's incnt before sending addstrm_in requestXin Long1-1/+0
Different from processing the addstrm_out request, The receiver handles an addstrm_in request by sending back an addstrm_out request to the sender who will increase its stream's in and incnt later. Now stream->incnt has been increased since it sent out the addstrm_in request in sctp_send_add_streams(), with the wrong stream->incnt will even cause crash when copying stream info from the old stream's in to the new one's in sctp_process_strreset_addstrm_out(). This patch is to fix it by simply removing the stream->incnt change from sctp_send_add_streams(). Fixes: 242bd2d519d7 ("sctp: implement sender-side procedures for Add Incoming/Outgoing Streams Request Parameter") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-20net/mlx5e: Fix selftest for small MTUsValentine Fatiev1-16/+10
Loopback test had fixed packet size, which can be bigger than configured MTU. Shorten the loopback packet size to be bigger than minimal MTU allowed by the device. Text field removed from struct 'mlx5ehdr' as redundant to allow send small packets as minimal allowed MTU. Fixes: d605d66 ("net/mlx5e: Add support for ethtool self diagnostics test") Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: RX, verify received packet size in Linear Striding RQMoshe Shemesh5-1/+15
In case of striding RQ, we use MPWRQ (Multi Packet WQE RQ), which means that WQE (RX descriptor) can be used for many packets and so the WQE is much bigger than MTU. In virtualization setups where the port mtu can be larger than the vf mtu, if received packet is bigger than MTU, it won't be dropped by HW on too small receive WQE. If we use linear SKB in striding RQ, since each stride has room for mtu size payload and skb info, an oversized packet can lead to crash for crossing allocated page boundary upon the call to build_skb. So driver needs to check packet size and drop it. Introduce new SW rx counter, rx_oversize_pkts_sw_drop, which counts the number of packets dropped by the driver for being too large. As a new field is added to the RQ struct, re-open the channels whenever this field is being used in datapath (i.e., in the case of linear Striding RQ). Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: Apply the correct check for supporting TC esw rules splitRoi Dayan1-1/+1
The mirror and not the output count is the one denoting a split. Fix to condition the offload attempt on the mirror count being > 0 along the firmware to have the related capability. Fixes: 592d36515969 ("net/mlx5e: Parse mirroring action for offloaded TC eswitch flows") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Yossi Kuperman <yossiku@mellanox.com> Reviewed-by: Chris Mi <chrism@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: Adjust to max number of channles when re-attachingYuval Avnery1-5/+22
When core driver enters deattach/attach flow after pci reset, Number of logical CPUs may have changed. As a result we need to update the cpu affiliated resource tables. 1. indirect rqt list 2. eq table Reproduction (PowerPC): echo 1000 > /sys/kernel/debug/powerpc/eeh_max_freezes ppc64_cpu --smt=on # Restart driver modprobe -r ... ; modprobe ... # Link up ifconfig ... # Only physical CPUs ppc64_cpu --smt=off # Inject PCI errors so PCI will reset - calling the pci error handler echo 0x8000000000000000 > /sys/kernel/debug/powerpc/<PCI BUS>/err_injct_inboundA Call trace when trying to add non-existing rqs to an indirect rqt: mlx5e_redirect_rqt+0x84/0x260 [mlx5_core] (unreliable) mlx5e_redirect_rqts+0x188/0x190 [mlx5_core] mlx5e_activate_priv_channels+0x488/0x570 [mlx5_core] mlx5e_open_locked+0xbc/0x140 [mlx5_core] mlx5e_open+0x50/0x130 [mlx5_core] mlx5e_nic_enable+0x174/0x1b0 [mlx5_core] mlx5e_attach_netdev+0x154/0x290 [mlx5_core] mlx5e_attach+0x88/0xd0 [mlx5_core] mlx5_attach_device+0x168/0x1e0 [mlx5_core] mlx5_load_one+0x1140/0x1210 [mlx5_core] mlx5_pci_resume+0x6c/0xf0 [mlx5_core] Create cq will fail when trying to use non-existing EQ. Fixes: 89d44f0a6c73 ("net/mlx5_core: Add pci error handlers to mlx5_core driver") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: Always use the match level enum when parsing TC rule matchOr Gerlitz1-2/+2
We get the match level (none, l2, l3, l4) while going over the match dissectors of an offloaded tc rule. When doing this, the match level enum and the not min inline enum values should be used, fix that. This worked accidentally b/c both enums have the same numerical values. Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: Claim TC hw offloads support only under a proper build configOr Gerlitz1-0/+6
Currently, we are only supporting tc hw offloads when the eswitch support is compiled in, but we are not gating the adevertizment of the NETIF_F_HW_TC feature on this config being set. Fix it, and while doing that, also avoid dealing with the feature on ethtool when the config is not set. Fixes: e8f887ac6a45 ('net/mlx5e: Introduce tc offload support') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: Don't match on vlan non-existence if ethertype is wildcardedOr Gerlitz1-31/+32
For the "all" ethertype we should not care whether the packet has vlans. Besides being wrong, the way we did it caused FW error for rules such as: tc filter add dev eth0 protocol all parent ffff: \ prio 1 flower skip_sw action drop b/c the matching meta-data (outer headers bit in struct mlx5_flow_spec) wasn't set. Fix that by matching on vlan non-existence only if we were also told to match on the ethertype. Fixes: cee26487620b ('net/mlx5e: Set vlan masks for all offloaded TC rules') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5e: IPoIB, Reset QP after channels are closedDenis Drozdov1-1/+1
The mlx5e channels should be closed before mlx5i_uninit_underlay_qp puts the QP into RST (reset) state during mlx5i_close. Currently QP state incorrectly set to RST before channels got deactivated and closed, since mlx5_post_send request expects QP in RTS (Ready To Send) state. The fix is to keep QP in RTS state until mlx5e channels get closed and to reset QP afterwards. Also this fix is simply correct in order to keep the open/close flow symmetric, i.e mlx5i_init_underlay_qp() is called first thing at open, the correct thing to do is to call mlx5i_uninit_underlay_qp() last thing at close, which is exactly what this patch is doing. Fixes: dae37456c8ac ("net/mlx5: Support for attaching multiple underlay QPs to root flow table") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5: IPSec, Fix the SA context hash keyRaed Salem1-2/+8
The commit "net/mlx5: Refactor accel IPSec code" introduced a bug where asynchronous short time change in hash key value by create/release SA context might happen during an asynchronous hash resize operation this could cause a subsequent remove SA context operation to fail as the key value used during resize is not the same key value used when remove SA context operation is invoked. This commit fixes the bug by defining the SA context hash key such that it includes only fields that never change during the lifetime of the SA context object. Fixes: d6c4f0298cec ("net/mlx5: Refactor accel IPSec code") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20xfs: make xfs_file_remap_range() staticEric Biggers1-1/+1
xfs_file_remap_range() is only used in fs/xfs/xfs_file.c, so make it static. This addresses a gcc warning when -Wmissing-prototypes is enabled. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-20xfs: fix shared extent data corruption due to missing cow reservationBrian Foster1-0/+1
Page writeback indirectly handles shared extents via the existence of overlapping COW fork blocks. If COW fork blocks exist, writeback always performs the associated copy-on-write regardless if the underlying blocks are actually shared. If the blocks are shared, then overlapping COW fork blocks must always exist. fstests shared/010 reproduces a case where a buffered write occurs over a shared block without performing the requisite COW fork reservation. This ultimately causes writeback to the shared extent and data corruption that is detected across md5 checks of the filesystem across a mount cycle. The problem occurs when a buffered write lands over a shared extent that crosses an extent size hint boundary and that also happens to have a partial COW reservation that doesn't cover the start and end blocks of the data fork extent. For example, a buffered write occurs across the file offset (in FSB units) range of [29, 57]. A shared extent exists at blocks [29, 35] and COW reservation already exists at blocks [32, 34]. After accommodating a COW extent size hint of 32 blocks and the existing reservation at offset 32, xfs_reflink_reserve_cow() allocates 32 blocks of reservation at offset 0 and returns with COW reservation across the range of [0, 34]. The associated data fork extent is still [29, 35], however, which isn't fully covered by the COW reservation. This leads to a buffered write at file offset 35 over a shared extent without associated COW reservation. Writeback eventually kicks in, performs an overwrite of the underlying shared block and causes the associated data corruption. Update xfs_reflink_reserve_cow() to accommodate the fact that a delalloc allocation request may not fully cover the extent in the data fork. Trim the data fork extent appropriately, just as is done for shared extent boundaries and/or existing COW reservations that happen to overlap the start of the data fork extent. This prevents shared/010 failures due to data corruption on reflink enabled filesystems. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>