summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-02-04mac80211: vht: use HE macros for parsing HE capabilitiesMordechay Goodstein1-2/+2
IEEE80211_VHT_MCS_NOT_SUPPORTED and IEEE80211_HE_MCS_NOT_SUPPORTED have the same value so no real bug, but for code integrity use the HE macros for parsing HE capabilities. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.e974b7b3b217.I732cc7f770c7fa06e4840adb5d45d7ee99ac8eb5@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04ieee80211: radiotap: fix -Wcast-qual warningsJohannes Berg1-2/+2
When enabling -Wcast-qual e.g. via W=3, we get a lot of warnings from this file, whenever it's included. Since the fixes are simple, just do that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.cc733aeb1a18.I03396e1bf7a1af364cbd0916037f65d800035039@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04cfg80211: fix -Wcast-qual warningsJohannes Berg1-5/+5
When enabling -Wcast-qual e.g. via W=3, we get a lot of warnings from this file, whenever it's included. Since the fixes are simple, just do that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.6a1d52213019.I92d82e7251cf712faa43fd09db3142327a3bce3d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04ieee80211: fix -Wcast-qual warningsJohannes Berg1-4/+4
When enabling -Wcast-qual e.g. via W=3, we get a lot of warnings from this file, whenever it's included. Since the fixes are simple, just do that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.79ec4a4bab29.I8177a0c79d656c552e22c88931d8da06f2977896@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04cfg80211: don't add non transmitted BSS to 6GHz scanned channelsAvraham Stern1-1/+8
When adding 6GHz channels to scan request based on reported co-located APs, don't add channels that have only APs with "non-transmitted" BSSes if they only match the wildcard SSID since they will be found by probing the "transmitted" BSS. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.f6ddf099f934.I231e55885d3644f292d00dfe0f42653269f2559e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04cfg80211/mac80211: assume CHECKSUM_COMPLETE includes SNAPJohannes Berg3-3/+10
There's currently only one driver that reports CHECKSUM_COMPLETE, that is iwlwifi. The current hardware there calculates checksum after the SNAP header, but only RFC 1042 (and some other cases, but replicating the exact hardware logic for corner cases in the driver seemed awkward.) Newer generations of hardware will checksum _including_ the SNAP, which makes things easier. To handle that, simply always assume the checksum _includes_ the SNAP header, which this patch does, requiring to first add it for older iwlwifi hardware, and then remove it again later on conversion. Alternatively, we could have: 1) Always assumed the checksum starts _after_ the SNAP header; the problem with this is that we'd have to replace the exact "what is the SNAP" check in iwlwifi that cfg80211 has. 2) Made it configurable with some flag, but that seemed like too much complexity. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.230736e19e0e.I3e6745873585ad943c152fab9e23b5221f17a95f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04mac80211: consider RX NSS in UHB connectionMordechay Goodstein1-3/+58
In UHB connection we don't have any HT/VHT elemens so in order to calculate the max RX-NSS we need also to look at HE capa element, this causes to limit us to max rx nss in UHB to 1. Also anyway we need to look at HE max rx NSS and not only at HT/VHT capa to determine the max rx nss over the connection. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.3713e0dea5dd.I3b9a15b4c53465c3f86f35459e9dc15ae4ea2abd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04mac80211: limit bandwidth in HE capabilitiesJohannes Berg4-10/+32
If we're limiting bandwidth for some reason such as regulatory restrictions, then advertise that limitation just like we do for VHT today, so the AP is aware we cannot use the higher BW it might be using. Fixes: 41cbb0f5a295 ("mac80211: add support for HE") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20220202104617.70c8e3e7ee76.If317630de69ff1146bec7d47f5b83038695eb71d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-02-04iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()Joerg Roedel1-0/+2
The polling loop for the register change in iommu_ga_log_enable() needs to have a udelay() in it. Otherwise the CPU might be faster than the IOMMU hardware and wrongly trigger the WARN_ON() further down the code stream. Use a 10us for udelay(), has there is some hardware where activation of the GA log can take more than a 100ms. A future optimization should move the activation check of the GA log to the point where it gets used for the first time. But that is a bigger change and not suitable for a fix. Fixes: 8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log") Signed-off-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20220204115537.3894-1-joro@8bytes.org
2022-02-04mt76: redefine mt76_for_each_q_rx to adapt mt7986 changesBo Jiao1-2/+2
Check ndesc of q_rx to avoid potential hole in iteration. This is an intermediate patch to add mt7986 support. Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Bo Jiao <Bo.Jiao@mediatek.com> Reviewed-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-02-04ixgbevf: Require large buffers for build_skb on 82599VFSamuel Mendoza-Jonas1-6/+7
From 4.17 onwards the ixgbevf driver uses build_skb() to build an skb around new data in the page buffer shared with the ixgbe PF. This uses either a 2K or 3K buffer, and offsets the DMA mapping by NET_SKB_PAD + NET_IP_ALIGN. When using a smaller buffer RXDCTL is set to ensure the PF does not write a full 2K bytes into the buffer, which is actually 2K minus the offset. However on the 82599 virtual function, the RXDCTL mechanism is not available. The driver attempts to work around this by using the SET_LPE mailbox method to lower the maximm frame size, but the ixgbe PF driver ignores this in order to keep the PF and all VFs in sync[0]. This means the PF will write up to the full 2K set in SRRCTL, causing it to write NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the buffer. With 4K pages split into two buffers, this means it either writes NET_SKB_PAD + NET_IP_ALIGN bytes past the first buffer (and into the second), or NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the DMA mapping. Avoid this by only enabling build_skb when using "large" buffers (3K). These are placed in each half of an order-1 page, preventing the PF from writing past the end of the mapping. [0]: Technically it only ever raises the max frame size, see ixgbe_set_vf_lpe() in ixgbe_sriov.c Fixes: f15c5ba5b6cd ("ixgbevf: add support for using order 1 pages to receive large frames") Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04Merge branch 'ipa-RX-replenish'David S. Miller4-81/+60
Alex Elder says: ==================== net: ipa: improve RX buffer replenishing This series revises the algorithm used for replenishing receive buffers on RX endpoints. Currently there are two atomic variables that track how many receive buffers can be sent to the hardware. The new algorithm obviates the need for those, by just assuming we always want to provide the hardware with buffers until it can hold no more. The first patch eliminates an atomic variable that's not required. The next moves some code into the main replenish function's caller, making one of the called function's arguments unnecessary. The next six refactor things a bit more, adding a new helper function that allows us to eliminate an additional atomic variable. And the final two implement two more minor improvements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: determine replenish doorbell differentlyAlex Elder2-6/+10
Rather than tracking the number of receive buffer transactions that have been submitted without a doorbell, just track the total number of transactions that have been issued. Then ring the doorbell when that number modulo the replenish batch size is 0. The effect is roughly the same, but the new count is slightly more interesting, and this approach will someday allow the replenish batch size to be tuned at runtime. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: replenish after delivering payloadAlex Elder1-3/+3
Replenishing is now solely driven by whether transactions are available for a channel, and it doesn't really matter whether we replenish before or after we deliver received packets to the network stack. Replenishing before delivering the payload adds a little latency. Eliminate that by requesting a replenish after the payload is delivered. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: kill replenish_backlogAlex Elder2-9/+0
We no longer use the replenish_backlog atomic variable to decide when we've got work to do providing receive buffers to hardware. Basically, we try to keep the hardware as full as possible, all the time. We keep supplying buffers until the hardware has no more space for them. As a result, we can get rid of the replenish_backlog field and the atomic operations performed on it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: introduce gsi_channel_trans_idle()Alex Elder3-12/+26
Create a new function that returns true if all transactions for a channel are available for use. Use it in ipa_endpoint_replenish_enable() to see whether to start replenishing, and in ipa_endpoint_replenish() to determine whether it's necessary after a failure to schedule delayed work to ensure a future replenish attempt occurs. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: don't use replenish_backlogAlex Elder1-5/+2
Rather than determining when to stop replenishing using the replenish backlog, just stop when we have exhausted all available transactions. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: allocate transaction in replenish loopAlex Elder1-24/+16
When replenishing, have ipa_endpoint_replenish() allocate a transaction, and pass that to ipa_endpoint_replenish_one() to fill. Then, if that produces no error, commit the transaction within the replenish loop as well. In this way we can distinguish between transaction failures and buffer allocation/mapping failures. Failure to allocate a transaction simply means the hardware already has as many receive buffers as it can hold. In that case we can break out of the replenish loop because there's nothing more to do. If we fail to allocate or map pages for the receive buffer, just try again later. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: decide on doorbell in replenish loopAlex Elder1-9/+12
Decide whether the doorbell should be signaled when committing a replenish transaction in the main replenish loop, rather than in ipa_endpoint_replenish_one(). This is a step to facilitate the next patch. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: increment backlog in replenish callerAlex Elder1-20/+9
Three spots call ipa_endpoint_replenish(), and just one of those requests that the backlog be incremented after completing the replenish operation. Instead, have the caller increment the backlog, and get rid of the add_one argument to ipa_endpoint_replenish(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: allocate transaction before pages when replenishingAlex Elder1-8/+8
A transaction failure only occurs if no more transactions are available for an endpoint. It's a very cheap test. When replenishing an RX endpoint buffer, there's no point in allocating pages if transactions are exhausted. So don't bother doing so unless the transaction allocation succeeds. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: ipa: kill replenish_savedAlex Elder2-15/+4
The replenish_saved field keeps track of the number of times a new buffer is added to the backlog when replenishing is disabled. We don't really use it though, so there's no need for us to track it separately. Whether replenishing is enabled or not, we can simply increment the backlog. Get rid of replenish_saved, and initialize and increment the backlog where it would have otherwise been used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04tls: cap the output scatter list to something reasonableJakub Kicinski2-1/+19
TLS recvmsg() passes user pages as destination for decrypt. The decrypt operation is repeated record by record, each record being 16kB, max. TLS allocates an sg_table and uses iov_iter_get_pages() to populate it with enough pages to fit the decrypted record. Even though we decrypt a single message at a time we size the sg_table based on the entire length of the iovec. This leads to unnecessarily large allocations, risking triggering OOM conditions. Use iov_iter_truncate() / iov_iter_reexpand() to construct a "capped" version of iov_iter_npages(). Alternatively we could parametrize iov_iter_npages() to take the size as arg instead of using i->count, or do something else.. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04net: dsa: realtek: convert to phylink_generic_validate()Russell King (Oracle)1-34/+12
Populate the supported interfaces and MAC capabilities for the Realtek rtl8365 DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04Merge branch '40GbE' of ↵David S. Miller7-60/+255
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2022-02-03 This series contains updates to the i40e client header file and driver. Mateusz disables HW TC offload by default. Joe Damato removes a no longer used statistic. Jakub Kicinski removes an unused enum from the client header file. Jedrzej changes some admin queue commands to occur under atomic context and adds new functions for admin queue MAC VLAN filters to avoid a potential race that could occur due storing results in a structure that could be overwritten by the next admin queue call. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-04PCI/MSI: Remove bogus warning in pci_irq_get_affinity()Thomas Gleixner1-1/+2
The recent overhaul of pci_irq_get_affinity() introduced a regression when pci_irq_get_affinity() is called for an MSI-X interrupt which was not allocated with affinity descriptor information. The original code just returned a NULL pointer in that case, but the rework added a WARN_ON() under the assumption that the corresponding WARN_ON() in the MSI case can be applied to MSI-X as well. In fact the MSI warning in the original code does not make sense either because it's legitimate to invoke pci_irq_get_affinity() for a MSI interrupt which was not allocated with affinity descriptor information. Remove it and just return NULL as the original code did. Fixes: f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/87ee4n38sm.ffs@tglx
2022-02-04KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointerSean Christopherson1-3/+3
Use ERR_PTR_USR() when returning -EFAULT from kvm_get_attr_addr(), sparse complains about implicitly casting the kernel pointer from ERR_PTR() into a __user pointer. >> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression (different address spaces) @@ expected void [noderef] __user * @@ got void * @@ arch/x86/kvm/x86.c:4342:31: sparse: expected void [noderef] __user * arch/x86/kvm/x86.c:4342:31: sparse: got void * >> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression (different address spaces) @@ expected void [noderef] __user * @@ got void * @@ arch/x86/kvm/x86.c:4342:31: sparse: expected void [noderef] __user * arch/x86/kvm/x86.c:4342:31: sparse: got void * No functional change intended. Fixes: 56f289a8d23a ("KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220202005157.2545816-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-04KVM: x86: Report deprecated x87 features in supported CPUIDJim Mattson1-6/+7
CPUID.(EAX=7,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6] and CPUID.(EAX=7,ECX=0):EBX.ZERO_FCS_FDS[bit 13] are "defeature" bits. Unlike most of the other CPUID feature bits, these bits are clear if the features are present and set if the features are not present. These bits should be reported in KVM_GET_SUPPORTED_CPUID, because if these bits are set on hardware, they cannot be cleared in the guest CPUID. Doing so would claim guest support for a feature that the hardware doesn't support and that can't be efficiently emulated. Of course, any software (e.g WIN87EM.DLL) expecting these features to be present likely predates these CPUID feature bits and therefore doesn't know to check for them anyway. Aaron Lewis added the corresponding X86_FEATURE macros in commit cbb99c0f5887 ("x86/cpufeatures: Add FDP_EXCPTN_ONLY and ZERO_FCS_FDS"), with the intention of reporting these bits in KVM_GET_SUPPORTED_CPUID, but I was unable to find a proposed patch on the kvm list. Opportunistically reordered the CPUID_7_0_EBX capability bits from least to most significant. Cc: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220204001348.2844660-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-04ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkageAnton Lundin2-0/+11
06f6c4c6c3e8 ("ata: libata: add missing ata_identify_page_supported() calls") introduced additional calls to ata_identify_page_supported(), thus also adding indirectly accesses to the device log directory log page through ata_log_supported(). Reading this log page causes SATADOM-ML 3ME devices to lock up. Introduce the horkage flag ATA_HORKAGE_NO_LOG_DIR to prevent accesses to the log directory in ata_log_supported() and add a blacklist entry with this flag for "SATADOM-ML 3ME" devices. Fixes: 636f6e2af4fb ("libata: add horkage for missing Identify Device log") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Anton Lundin <glance@acc.umu.se> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-02-04MAINTAINERS: add myself as Renesas R-Car SATA driver reviewerSergey Shtylyov1-0/+8
Add myself as a reviewer for the Renesas R-Car SATA driver -- I don't have the hardware anymore (Geert Uytterhoeven does have a lot of hardware!) but I do have the manuals still! :-) Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-02-04ksmbd: add support for key exchangeNamjae Jeon2-2/+29
When mounting cifs client, can see the following warning message. CIFS: decode_ntlmssp_challenge: authentication has been weakened as server does not support key exchange To remove this warning message, Add support for key exchange feature to ksmbd. This patch decrypts 16-byte ciphertext value sent by the client using RC4 with session key. The decrypted value is the recovered secondary key that will use instead of the session key for signing and sealing. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04ksmbd: reduce smb direct max read/write sizeNamjae Jeon1-1/+1
ksmbd does not support more than one Buffer Descriptor V1 element in an smbdirect protocol request. Reducing the maximum read/write size to about 512KB allows interoperability with Windows over a wider variety of RDMA NICs, as an interim workaround. Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04ksmbd: don't align last entry offset in smb2 query directoryNamjae Jeon2-3/+5
When checking smb2 query directory packets from other servers, OutputBufferLength is different with ksmbd. Other servers add an unaligned next offset to OutputBufferLength for the last entry. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04ksmbd: fix same UniqueId for dot and dotdot entriesNamjae Jeon1-1/+4
ksmbd sets the inode number to UniqueId. However, the same UniqueId for dot and dotdot entry is set to the inode number of the parent inode. This patch set them using the current inode and parent inode. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04ksmbd: smbd: validate buffer descriptor structuresHyunchul Lee1-6/+30
Check ChannelInfoOffset and ChannelInfoLength to validate buffer descriptor structures. And add a debug log to print the structures' content. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04Merge tag 'drm-intel-fixes-2022-02-03' of ↵Dave Airlie7-22/+117
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Fix GitLab issue #4698: DP monitor through Type-C dock(Dell DA310) doesn't work. Fixes for inconsistent engine busyness value and read timeout with GuC. Fix to use ALLOW_FAIL for error capture buffer allocation. Don't use interruptible lock on error path. Smatch fix to reject zero sized overlays. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YfuiG8SKMKP5V/Dm@jlahtine-mobl.ger.corp.intel.com
2022-02-04netfilter: nft_compat: suppress comment matchFlorian Westphal1-0/+9
No need to have the datapath call the always-true comment match stub. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: exthdr: add support for tcp option removalFlorian Westphal1-1/+95
This allows to replace a tcp option with nop padding to selectively disable a particular tcp option. Optstrip mode is chosen when userspace passes the exthdr expression with neither a source nor a destination register attribute. This is identical to xtables TCPOPTSTRIP extension. The only difference is that TCPOPTSTRIP allows to pass in a bitmap of options to remove rather than a single number. Unlike TCPOPTSTRIP this expression can be used multiple times in the same rule to get the same effect. We could add a new nested attribute later on in case there is a use case for single-expression-multi-remove. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: pptp: use single option structureFlorian Westphal3-77/+45
Instead of exposing the four hooks individually use a sinle hook ops structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: remove extension register apiFlorian Westphal19-292/+7
These no longer register/unregister a meaningful structure so remove it. Cc: Paul Blakey <paulb@nvidia.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: handle ->destroy hook via nat_ops insteadFlorian Westphal5-36/+16
The nat module already exposes a few functions to the conntrack core. Move the nat extension destroy hook to it. After this, no conntrack extension needs a destroy hook. 'struct nf_ct_ext_type' and the register/unregister api can be removed in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: move extension sizes into coreFlorian Westphal13-58/+76
No need to specify this in the registration modules, we already collect all sizes for build-time checks on the maximum combined size. After this change, all extensions except nat have no meaningful content in their nf_ct_ext_type struct definition. Next patch handles nat, this will then allow to remove the dynamic register api completely. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: make all extensions 8-byte alignnedFlorian Westphal12-15/+2
All extensions except one need 8 byte alignment, so just make that the default. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: nfqueue: enable to get skb->priorityNicolas Dichtel2-0/+6
This info could be useful to improve traffic analysis. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARYKevin Mitchell1-1/+3
The udp_error function verifies the checksum of incoming UDP packets if one is set. This has the desirable side effect of setting skb->ip_summed to CHECKSUM_COMPLETE, signalling that this verification need not be repeated further up the stack. Conversely, when the UDP checksum is empty, which is perfectly legal (at least inside IPv4), udp_error previously left no trace that the checksum had been deemed acceptable. This was a problem in particular for nf_reject_ipv4, which verifies the checksum in nf_send_unreach() before sending ICMP_DEST_UNREACH. It makes no accommodation for zero UDP checksums unless they are already marked as CHECKSUM_UNNECESSARY. This commit ensures packets with empty UDP checksum are marked as CHECKSUM_UNNECESSARY, which is explicitly recommended in skbuff.h. Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04Merge tag 'drm-misc-fixes-2022-02-03' of ↵Dave Airlie16-76/+761
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes * dma-buf/heaps: Fix potential spectre v1 gadget * drm/kmb: Fix potential out-of-bounds access * drm/mxsfb: Fix NULL-pointer dereference * drm/nouveau: Fix potential out-of-bounds access in BIOS decoding * fbdev: Re-add support for fbcon hardware acceleration Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/Yfu8mTZQUNt1RwZd@linux-uq9g
2022-02-04netfilter: ctnetlink: disable helper autoassignFlorian Westphal2-2/+3
When userspace, e.g. conntrackd, inserts an entry with a specified helper, its possible that the helper is lost immediately after its added: ctnetlink_create_conntrack -> nf_ct_helper_ext_add + assign helper -> ctnetlink_setup_nat -> ctnetlink_parse_nat_setup -> parse_nat_setup -> nfnetlink_parse_nat_setup -> nf_nat_setup_info -> nf_conntrack_alter_reply -> __nf_ct_try_assign_helper ... and __nf_ct_try_assign_helper will zero the helper again. Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like when helper is assigned via ruleset. Dropped old 'not strictly necessary' comment, it referred to use of rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER(). NB: Fixes tag intentionally incorrect, this extends the referenced commit, but this change won't build without IPS_HELPER introduced there. Fixes: 6714cf5465d280 ("netfilter: nf_conntrack: fix explicit helper attachment and NAT") Reported-by: Pham Thanh Tuyen <phamtyn@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04MAINTAINERS: netfilter: update git linksFlorian Westphal1-2/+2
nf and nf-next have a new location. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: re-init state for retransmitted syn-ackFlorian Westphal1-0/+12
TCP conntrack assumes that a syn-ack retransmit is identical to the previous syn-ack. This isn't correct and causes stuck 3whs in some more esoteric scenarios. tcpdump to illustrate the problem: client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083035583 ecr 0,wscale 7] server > client: Flags [S.] seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215367629 ecr 2082921663] Note the invalid/outdated synack ack number. Conntrack marks this syn-ack as out-of-window/invalid, but it did initialize the reply direction parameters based on this packets content. client > server: Flags [S] seq 1365731894, win 29200, [mss 1460,sackOK,TS val 2083036623 ecr 0,wscale 7] ... retransmit... server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215368644 ecr 2082921663] and another bogus synack. This repeats, then client re-uses for a new attempt: client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083100223 ecr 0,wscale 7] server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215430754 ecr 2082921663] ... but still gets a invalid syn-ack. This repeats until: server > client: Flags [S.], seq 145824453, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215437785 ecr 2082921663] server > client: Flags [R.], seq 145824454, ack 643160523, win 65535, [mss 8952,wscale 5,TS val 3215443451 ecr 2082921663] client > server: Flags [S], seq 2375731741, win 29200, [mss 1460,sackOK,TS val 2083115583 ecr 0,wscale 7] server > client: Flags [S.], seq 162602410, ack 2375731742, win 65535, [mss 8952,wscale 5,TS val 3215445754 ecr 2083115583] This syn-ack has the correct ack number, but conntrack flags it as invalid: The internal state was created from the first syn-ack seen so the sequence number of the syn-ack is treated as being outside of the announced window. Don't assume that retransmitted syn-ack is identical to previous one. Treat it like the first syn-ack and reinit state. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-04netfilter: conntrack: move synack init code to helperFlorian Westphal1-18/+29
It seems more readable to use a common helper in the followup fix rather than copypaste or goto. No functional change intended. The function is only called for syn-ack or syn in repy direction in case of simultaneous open. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>